[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)
* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
This commit is contained in:
parent
ac83e45a6b
commit
7611b3c9be
|
@ -0,0 +1,216 @@
|
|||
[role="xpack"]
|
||||
[testenv="platinum"]
|
||||
[[ml-dfa-analysis-objects]]
|
||||
=== Analysis configuration objects
|
||||
|
||||
{dfanalytics-cap} resources contain `analysis` objects. For example, when you
|
||||
create a {dfanalytics-job}, you must define the type of analysis it performs.
|
||||
This page lists all the available parameters that you can use in the `analysis`
|
||||
object grouped by {dfanalytics} types.
|
||||
|
||||
|
||||
[discrete]
|
||||
[[oldetection-resources]]
|
||||
==== {oldetection-cap} configuration objects
|
||||
|
||||
An `outlier_detection` configuration object has the following properties:
|
||||
|
||||
`compute_feature_influence`::
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
|
||||
|
||||
`feature_influence_threshold`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
|
||||
|
||||
`method`::
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=method]
|
||||
|
||||
`n_neighbors`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=n-neighbors]
|
||||
|
||||
`outlier_fraction`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
|
||||
|
||||
`standardization_enabled`::
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
|
||||
|
||||
|
||||
[discrete]
|
||||
[[regression-resources]]
|
||||
==== {regression-cap} configuration objects
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT _ml/data_frame/analytics/house_price_regression_analysis
|
||||
{
|
||||
"source": {
|
||||
"index": "houses_sold_last_10_yrs" <1>
|
||||
},
|
||||
"dest": {
|
||||
"index": "house_price_predictions" <2>
|
||||
},
|
||||
"analysis":
|
||||
{
|
||||
"regression": { <3>
|
||||
"dependent_variable": "price" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[skip:TBD]
|
||||
|
||||
<1> Training data is taken from source index `houses_sold_last_10_yrs`.
|
||||
<2> Analysis results will be output to destination index
|
||||
`house_price_predictions`.
|
||||
<3> The regression analysis configuration object.
|
||||
<4> Regression analysis will use field `price` to train on. As no other
|
||||
parameters have been specified it will train on 100% of eligible data, store its
|
||||
prediction in destination index field `price_prediction` and use in-built
|
||||
hyperparameter optimization to give minimum validation errors.
|
||||
|
||||
|
||||
[float]
|
||||
[[regression-resources-standard]]
|
||||
===== Standard parameters
|
||||
|
||||
`dependent_variable`::
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
|
||||
+
|
||||
--
|
||||
The data type of the field must be numeric.
|
||||
--
|
||||
|
||||
`prediction_field_name`::
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
|
||||
|
||||
`training_percent`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
|
||||
|
||||
`randomize_seed`::
|
||||
(Optional, long)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
|
||||
|
||||
|
||||
[float]
|
||||
[[regression-resources-advanced]]
|
||||
===== Advanced parameters
|
||||
|
||||
Advanced parameters are for fine-tuning {reganalysis}. They are set
|
||||
automatically by <<ml-hyperparameter-optimization,hyperparameter optimization>>
|
||||
to give minimum validation error. It is highly recommended to use the default
|
||||
values unless you fully understand the function of these parameters. If these
|
||||
parameters are not supplied, their values are automatically tuned to give
|
||||
minimum validation error.
|
||||
|
||||
`eta`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
||||
|
||||
`feature_bag_fraction`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
|
||||
|
||||
`maximum_number_trees`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
|
||||
|
||||
`gamma`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
||||
|
||||
`lambda`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
||||
|
||||
|
||||
[discrete]
|
||||
[[classification-resources]]
|
||||
==== {classification-cap} configuration objects
|
||||
|
||||
|
||||
[float]
|
||||
[[classification-resources-standard]]
|
||||
===== Standard parameters
|
||||
|
||||
`dependent_variable`::
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
|
||||
+
|
||||
--
|
||||
The data type of the field must be numeric or boolean.
|
||||
--
|
||||
|
||||
`num_top_classes`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=num-top-classes]
|
||||
|
||||
`prediction_field_name`::
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
|
||||
|
||||
`training_percent`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
|
||||
|
||||
`randomize_seed`::
|
||||
(Optional, long)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
|
||||
|
||||
|
||||
[float]
|
||||
[[classification-resources-advanced]]
|
||||
===== Advanced parameters
|
||||
|
||||
Advanced parameters are for fine-tuning {classanalysis}. They are set
|
||||
automatically by <<ml-hyperparameter-optimization,hyperparameter optimization>>
|
||||
to give minimum validation error. It is highly recommended to use the default
|
||||
values unless you fully understand the function of these parameters. If these
|
||||
parameters are not supplied, their values are automatically tuned to give
|
||||
minimum validation error.
|
||||
|
||||
`eta`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
||||
|
||||
`feature_bag_fraction`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
|
||||
|
||||
`maximum_number_trees`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
|
||||
|
||||
`gamma`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
||||
|
||||
`lambda`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
||||
|
||||
[discrete]
|
||||
[[ml-hyperparameter-optimization]]
|
||||
==== Hyperparameter optimization
|
||||
|
||||
If you don't supply {regression} or {classification} parameters, hyperparameter
|
||||
optimization will be performed by default to set a value for the undefined
|
||||
parameters. The starting point is calculated for data dependent parameters by
|
||||
examining the loss on the training data. Subject to the size constraint, this
|
||||
operation provides an upper bound on the improvement in validation loss.
|
||||
|
||||
A fixed number of rounds is used for optimization which depends on the number of
|
||||
parameters being optimized. The optimization starts with random search, then
|
||||
Bayesian optimization is performed that is targeting maximum expected
|
||||
improvement. If you override any parameters, then the optimization will
|
||||
calculate the value of the remaining parameters accordingly and use the value
|
||||
you provided for the overridden parameter. The number of rounds are reduced
|
||||
respectively. The validation error is estimated in each round by using 4-fold
|
||||
cross validation.
|
|
@ -11,22 +11,27 @@ Deletes an existing {dfanalytics-job}.
|
|||
|
||||
experimental[]
|
||||
|
||||
|
||||
[[ml-delete-dfanalytics-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
`DELETE _ml/data_frame/analytics/<data_frame_analytics_id>`
|
||||
|
||||
|
||||
[[ml-delete-dfanalytics-prereq]]
|
||||
==== {api-prereq-title}
|
||||
|
||||
* You must have `machine_learning_admin` built-in role to use this API. For more
|
||||
information, see <<security-privileges>> and <<built-in-roles>>.
|
||||
|
||||
|
||||
[[ml-delete-dfanalytics-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<data_frame_analytics_id>`::
|
||||
(Required, string) Identifier for the {dfanalytics-job} you want to delete.
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics]
|
||||
|
||||
|
||||
[[ml-delete-dfanalytics-example]]
|
||||
==== {api-examples-title}
|
||||
|
|
|
@ -1,298 +0,0 @@
|
|||
[role="xpack"]
|
||||
[testenv="platinum"]
|
||||
[[ml-dfanalytics-resources]]
|
||||
=== {dfanalytics-cap} job resources
|
||||
|
||||
{dfanalytics-cap} resources relate to APIs such as <<put-dfanalytics>> and
|
||||
<<get-dfanalytics>>.
|
||||
|
||||
[discrete]
|
||||
[[ml-dfanalytics-properties]]
|
||||
==== {api-definitions-title}
|
||||
|
||||
`analysis`::
|
||||
(object) The type of analysis that is performed on the `source`. For example:
|
||||
`outlier_detection` or `regression`. For more information, see
|
||||
<<dfanalytics-types>>.
|
||||
|
||||
`analyzed_fields`::
|
||||
(Optional, object) Specify `includes` and/or `excludes` patterns to select
|
||||
which fields will be included in the analysis. If `analyzed_fields` is not set,
|
||||
only the relevant fields will be included. For example, all the numeric fields
|
||||
for {oldetection}. For the supported field types, see <<ml-put-dfanalytics-supported-fields>>.
|
||||
Also see the <<explain-dfanalytics>> which helps understand field selection.
|
||||
|
||||
`includes`:::
|
||||
(Optional, array) An array of strings that defines the fields that will be included in
|
||||
the analysis.
|
||||
|
||||
`excludes`:::
|
||||
(Optional, array) An array of strings that defines the fields that will be excluded
|
||||
from the analysis.
|
||||
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT _ml/data_frame/analytics/loganalytics
|
||||
{
|
||||
"source": {
|
||||
"index": "logdata"
|
||||
},
|
||||
"dest": {
|
||||
"index": "logdata_out"
|
||||
},
|
||||
"analysis": {
|
||||
"outlier_detection": {
|
||||
}
|
||||
},
|
||||
"analyzed_fields": {
|
||||
"includes": [ "request.bytes", "response.counts.error" ],
|
||||
"excludes": [ "source.geo" ]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:setup_logdata]
|
||||
|
||||
`description`::
|
||||
(Optional, string) A description of the job.
|
||||
|
||||
`dest`::
|
||||
(object) The destination configuration of the analysis.
|
||||
|
||||
`index`:::
|
||||
(Required, string) Defines the _destination index_ to store the results of
|
||||
the {dfanalytics-job}.
|
||||
|
||||
`results_field`:::
|
||||
(Optional, string) Defines the name of the field in which to store the
|
||||
results of the analysis. Default to `ml`.
|
||||
|
||||
`id`::
|
||||
(string) The unique identifier for the {dfanalytics-job}. This identifier can
|
||||
contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
|
||||
underscores. It must start and end with alphanumeric characters. This property
|
||||
is informational; you cannot change the identifier for existing jobs.
|
||||
|
||||
`model_memory_limit`::
|
||||
(string) The approximate maximum amount of memory resources that are
|
||||
permitted for analytical processing. The default value for {dfanalytics-jobs}
|
||||
is `1gb`. If your `elasticsearch.yml` file contains an
|
||||
`xpack.ml.max_model_memory_limit` setting, an error occurs when you try to
|
||||
create {dfanalytics-jobs} that have `model_memory_limit` values greater than
|
||||
that setting. For more information, see <<ml-settings>>.
|
||||
|
||||
`source`::
|
||||
(object) The configuration of how to source the analysis data. It requires an `index`.
|
||||
Optionally, `query` and `_source` may be specified.
|
||||
|
||||
`index`:::
|
||||
(Required, string or array) Index or indices on which to perform the
|
||||
analysis. It can be a single index or index pattern as well as an array of
|
||||
indices or patterns.
|
||||
|
||||
`query`:::
|
||||
(Optional, object) The {es} query domain-specific language
|
||||
(<<query-dsl,DSL>>). This value corresponds to the query object in an {es}
|
||||
search POST body. All the options that are supported by {es} can be used,
|
||||
as this object is passed verbatim to {es}. By default, this property has
|
||||
the following value: `{"match_all": {}}`.
|
||||
|
||||
`_source`:::
|
||||
(Optional, object) Specify `includes` and/or `excludes` patterns to select
|
||||
which fields will be present in the destination. Fields that are excluded
|
||||
cannot be included in the analysis.
|
||||
|
||||
`includes`::::
|
||||
(array) An array of strings that defines the fields that will be included in
|
||||
the destination.
|
||||
|
||||
`excludes`::::
|
||||
(array) An array of strings that defines the fields that will be excluded
|
||||
from the destination.
|
||||
|
||||
[[dfanalytics-types]]
|
||||
==== Analysis objects
|
||||
|
||||
{dfanalytics-cap} resources contain `analysis` objects. For example, when you
|
||||
create a {dfanalytics-job}, you must define the type of analysis it performs.
|
||||
|
||||
[discrete]
|
||||
[[oldetection-resources]]
|
||||
==== {oldetection-cap} configuration objects
|
||||
|
||||
An `outlier_detection` configuration object has the following properties:
|
||||
|
||||
`compute_feature_influence`::
|
||||
(boolean) If `true`, the feature influence calculation is enabled. Defaults to
|
||||
`true`.
|
||||
|
||||
`feature_influence_threshold`::
|
||||
(double) The minimum {olscore} that a document needs to have in order to
|
||||
calculate its {fiscore}. Value range: 0-1 (`0.1` by default).
|
||||
|
||||
`method`::
|
||||
(string) Sets the method that {oldetection} uses. If the method is not set
|
||||
{oldetection} uses an ensemble of different methods and normalises and
|
||||
combines their individual {olscores} to obtain the overall {olscore}. We
|
||||
recommend to use the ensemble method. Available methods are `lof`, `ldof`,
|
||||
`distance_kth_nn`, `distance_knn`.
|
||||
|
||||
`n_neighbors`::
|
||||
(integer) Defines the value for how many nearest neighbors each method of
|
||||
{oldetection} will use to calculate its {olscore}. When the value is not set,
|
||||
different values will be used for different ensemble members. This helps
|
||||
improve diversity in the ensemble. Therefore, only override this if you are
|
||||
confident that the value you choose is appropriate for the data set.
|
||||
|
||||
`outlier_fraction`::
|
||||
(double) Sets the proportion of the data set that is assumed to be outlying prior to
|
||||
{oldetection}. For example, 0.05 means it is assumed that 5% of values are real outliers
|
||||
and 95% are inliers.
|
||||
|
||||
`standardization_enabled`::
|
||||
(boolean) If `true`, then the following operation is performed on the columns
|
||||
before computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to
|
||||
`true`. For more information, see
|
||||
https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].
|
||||
|
||||
|
||||
[discrete]
|
||||
[[regression-resources]]
|
||||
==== {regression-cap} configuration objects
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT _ml/data_frame/analytics/house_price_regression_analysis
|
||||
{
|
||||
"source": {
|
||||
"index": "houses_sold_last_10_yrs" <1>
|
||||
},
|
||||
"dest": {
|
||||
"index": "house_price_predictions" <2>
|
||||
},
|
||||
"analysis":
|
||||
{
|
||||
"regression": { <3>
|
||||
"dependent_variable": "price" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[skip:TBD]
|
||||
|
||||
<1> Training data is taken from source index `houses_sold_last_10_yrs`.
|
||||
<2> Analysis results will be output to destination index
|
||||
`house_price_predictions`.
|
||||
<3> The regression analysis configuration object.
|
||||
<4> Regression analysis will use field `price` to train on. As no other
|
||||
parameters have been specified it will train on 100% of eligible data, store its
|
||||
prediction in destination index field `price_prediction` and use in-built
|
||||
hyperparameter optimization to give minimum validation errors.
|
||||
|
||||
|
||||
[float]
|
||||
[[regression-resources-standard]]
|
||||
===== Standard parameters
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent_variable]
|
||||
+
|
||||
--
|
||||
The data type of the field must be numeric.
|
||||
--
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction_field_name]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=training_percent]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize_seed]
|
||||
|
||||
|
||||
[float]
|
||||
[[regression-resources-advanced]]
|
||||
===== Advanced parameters
|
||||
|
||||
Advanced parameters are for fine-tuning {reganalysis}. They are set
|
||||
automatically by <<ml-hyperparameter-optimization,hyperparameter optimization>>
|
||||
to give minimum validation error. It is highly recommended to use the default
|
||||
values unless you fully understand the function of these parameters. If these
|
||||
parameters are not supplied, their values are automatically tuned to give
|
||||
minimum validation error.
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature_bag_fraction]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum_number_trees]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
||||
|
||||
|
||||
[discrete]
|
||||
[[classification-resources]]
|
||||
==== {classification-cap} configuration objects
|
||||
|
||||
|
||||
[float]
|
||||
[[classification-resources-standard]]
|
||||
===== Standard parameters
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent_variable]
|
||||
+
|
||||
--
|
||||
The data type of the field must be numeric or boolean.
|
||||
--
|
||||
|
||||
`num_top_classes`::
|
||||
(Optional, integer) Defines the number of categories for which the predicted
|
||||
probabilities are reported. It must be non-negative. If it is greater than the
|
||||
total number of categories (in the {version} version of the {stack}, it's two)
|
||||
to predict then we will report all category probabilities. Defaults to 2.
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction_field_name]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=training_percent]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize_seed]
|
||||
|
||||
|
||||
[float]
|
||||
[[classification-resources-advanced]]
|
||||
===== Advanced parameters
|
||||
|
||||
Advanced parameters are for fine-tuning {classanalysis}. They are set
|
||||
automatically by <<ml-hyperparameter-optimization,hyperparameter optimization>>
|
||||
to give minimum validation error. It is highly recommended to use the default
|
||||
values unless you fully understand the function of these parameters. If these
|
||||
parameters are not supplied, their values are automatically tuned to give
|
||||
minimum validation error.
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature_bag_fraction]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum_number_trees]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
||||
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
||||
|
||||
|
||||
[[ml-hyperparameter-optimization]]
|
||||
===== Hyperparameter optimization
|
||||
|
||||
If you don't supply {regression} or {classification} parameters, hyperparameter
|
||||
optimization will be performed by default to set a value for the undefined
|
||||
parameters. The starting point is calculated for data dependent parameters by
|
||||
examining the loss on the training data. Subject to the size constraint, this
|
||||
operation provides an upper bound on the improvement in validation loss.
|
||||
|
||||
A fixed number of rounds is used for optimization which depends on the number of
|
||||
parameters being optimized. The optimization starts with random search, then
|
||||
Bayesian optimization is performed that is targeting maximum expected
|
||||
improvement. If you override any parameters, then the optimization will
|
||||
calculate the value of the remaining parameters accordingly and use the value
|
||||
you provided for the overridden parameter. The number of rounds are reduced
|
||||
respectively. The validation error is estimated in each round by using 4-fold
|
||||
cross validation.
|
|
@ -12,6 +12,7 @@ Evaluates the {dfanalytics} for an annotated index.
|
|||
|
||||
experimental[]
|
||||
|
||||
|
||||
[[ml-evaluate-dfanalytics-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
|
@ -37,26 +38,113 @@ result field to be present.
|
|||
[[ml-evaluate-dfanalytics-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
||||
`index`::
|
||||
(Required, object) Defines the `index` in which the evaluation will be
|
||||
performed.
|
||||
|
||||
`query`::
|
||||
(Optional, object) A query clause that retrieves a subset of data from the
|
||||
source index. See <<query-dsl>>.
|
||||
|
||||
`evaluation`::
|
||||
(Required, object) Defines the type of evaluation you want to perform. See
|
||||
<<ml-evaluate-dfanalytics-resources>>.
|
||||
(Required, object) Defines the type of evaluation you want to perform. The
|
||||
value of this object can be different depending on the type of evaluation you
|
||||
want to perform. See <<ml-evaluate-dfanalytics-resources>>.
|
||||
+
|
||||
--
|
||||
Available evaluation types:
|
||||
|
||||
* `binary_soft_classification`
|
||||
* `regression`
|
||||
* `classification`
|
||||
--
|
||||
|
||||
`index`::
|
||||
(Required, object) Defines the `index` in which the evaluation will be
|
||||
performed.
|
||||
|
||||
`query`::
|
||||
(Optional, object) A query clause that retrieves a subset of data from the
|
||||
source index. See <<query-dsl>>.
|
||||
|
||||
[[ml-evaluate-dfanalytics-resources]]
|
||||
==== {dfanalytics-cap} evaluation resources
|
||||
|
||||
[[binary-sc-resources]]
|
||||
===== Binary soft classification configuration objects
|
||||
|
||||
Binary soft classification evaluates the results of an analysis which outputs
|
||||
the probability that each document belongs to a certain class. For example, in
|
||||
the context of {oldetection}, the analysis outputs the probability whether each
|
||||
document is an outlier.
|
||||
|
||||
`actual_field`::
|
||||
(Required, string) The field of the `index` which contains the `ground truth`.
|
||||
The data type of this field can be boolean or integer. If the data type is
|
||||
integer, the value has to be either `0` (false) or `1` (true).
|
||||
|
||||
`predicted_probability_field`::
|
||||
(Required, string) The field of the `index` that defines the probability of
|
||||
whether the item belongs to the class in question or not. It's the field that
|
||||
contains the results of the analysis.
|
||||
|
||||
`metrics`::
|
||||
(Optional, object) Specifies the metrics that are used for the evaluation.
|
||||
Available metrics:
|
||||
|
||||
`auc_roc`::
|
||||
(Optional, object) The AUC ROC (area under the curve of the receiver
|
||||
operating characteristic) score and optionally the curve. Default value is
|
||||
{"includes_curve": false}.
|
||||
|
||||
`precision`::
|
||||
(Optional, object) Set the different thresholds of the {olscore} at where
|
||||
the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
|
||||
|
||||
`recall`::
|
||||
(Optional, object) Set the different thresholds of the {olscore} at where
|
||||
the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
|
||||
|
||||
`confusion_matrix`::
|
||||
(Optional, object) Set the different thresholds of the {olscore} at where
|
||||
the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
|
||||
negative, `fn` - false negative) are calculated. Default value is
|
||||
{"at": [0.25, 0.50, 0.75]}.
|
||||
|
||||
|
||||
[[regression-evaluation-resources]]
|
||||
===== {regression-cap} evaluation objects
|
||||
|
||||
{regression-cap} evaluation evaluates the results of a {regression} analysis
|
||||
which outputs a prediction of values.
|
||||
|
||||
`actual_field`::
|
||||
(Required, string) The field of the `index` which contains the `ground truth`.
|
||||
The data type of this field must be numerical.
|
||||
|
||||
`predicted_field`::
|
||||
(Required, string) The field in the `index` that contains the predicted value,
|
||||
in other words the results of the {regression} analysis.
|
||||
|
||||
`metrics`::
|
||||
(Required, object) Specifies the metrics that are used for the evaluation.
|
||||
Available metrics are `r_squared` and `mean_squared_error`.
|
||||
|
||||
|
||||
[[classification-evaluation-resources]]
|
||||
==== {classification-cap} evaluation objects
|
||||
|
||||
{classification-cap} evaluation evaluates the results of a {classanalysis} which
|
||||
outputs a prediction that identifies to which of the classes each document
|
||||
belongs.
|
||||
|
||||
`actual_field`::
|
||||
(Required, string) The field of the `index` which contains the ground truth.
|
||||
The data type of this field must be keyword.
|
||||
|
||||
`metrics`::
|
||||
(Required, object) Specifies the metrics that are used for the evaluation.
|
||||
Available metric is `multiclass_confusion_matrix`.
|
||||
|
||||
`predicted_field`::
|
||||
(Required, string) The field in the `index` that contains the predicted value,
|
||||
in other words the results of the {classanalysis}. The data type of this field
|
||||
is string. You need to add `.keyword` to the predicted field name (the name
|
||||
you put in the {classanalysis} object as `prediction_field_name` or the
|
||||
default value of the same field if you didn't specified explicitly). For
|
||||
example, `predicted_field` : `ml.animal_class_prediction.keyword`.
|
||||
|
||||
|
||||
////
|
||||
[[ml-evaluate-dfanalytics-results]]
|
||||
|
@ -75,6 +163,7 @@ Available evaluation types:
|
|||
`recall`::: TBD
|
||||
////
|
||||
|
||||
|
||||
[[ml-evaluate-dfanalytics-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
|
|
|
@ -1,128 +0,0 @@
|
|||
[role="xpack"]
|
||||
[testenv="platinum"]
|
||||
[[ml-evaluate-dfanalytics-resources]]
|
||||
=== {dfanalytics-cap} evaluation resources
|
||||
|
||||
Evaluation configuration objects relate to the <<evaluate-dfanalytics>>.
|
||||
|
||||
[discrete]
|
||||
[[ml-evaluate-dfanalytics-properties]]
|
||||
==== {api-definitions-title}
|
||||
|
||||
`evaluation`::
|
||||
(object) Defines the type of evaluation you want to perform. The value of this
|
||||
object can be different depending on the type of evaluation you want to
|
||||
perform.
|
||||
+
|
||||
--
|
||||
Available evaluation types:
|
||||
* `binary_soft_classification`
|
||||
* `regression`
|
||||
* `classification`
|
||||
--
|
||||
|
||||
`query`::
|
||||
(object) A query clause that retrieves a subset of data from the source index.
|
||||
See <<query-dsl>>. The evaluation only applies to those documents of the index
|
||||
that match the query.
|
||||
|
||||
|
||||
[[binary-sc-resources]]
|
||||
==== Binary soft classification configuration objects
|
||||
|
||||
Binary soft classification evaluates the results of an analysis which outputs
|
||||
the probability that each document belongs to a certain class. For
|
||||
example, in the context of outlier detection, the analysis outputs the
|
||||
probability whether each document is an outlier.
|
||||
|
||||
[discrete]
|
||||
[[binary-sc-resources-properties]]
|
||||
===== {api-definitions-title}
|
||||
|
||||
`actual_field`::
|
||||
(string) The field of the `index` which contains the `ground truth`.
|
||||
The data type of this field can be boolean or integer. If the data type is
|
||||
integer, the value has to be either `0` (false) or `1` (true).
|
||||
|
||||
`predicted_probability_field`::
|
||||
(string) The field of the `index` that defines the probability of
|
||||
whether the item belongs to the class in question or not. It's the field that
|
||||
contains the results of the analysis.
|
||||
|
||||
`metrics`::
|
||||
(object) Specifies the metrics that are used for the evaluation.
|
||||
Available metrics:
|
||||
|
||||
`auc_roc`::
|
||||
(object) The AUC ROC (area under the curve of the receiver operating
|
||||
characteristic) score and optionally the curve.
|
||||
Default value is {"includes_curve": false}.
|
||||
|
||||
`precision`::
|
||||
(object) Set the different thresholds of the {olscore} at where the metric
|
||||
is calculated.
|
||||
Default value is {"at": [0.25, 0.50, 0.75]}.
|
||||
|
||||
`recall`::
|
||||
(object) Set the different thresholds of the {olscore} at where the metric
|
||||
is calculated.
|
||||
Default value is {"at": [0.25, 0.50, 0.75]}.
|
||||
|
||||
`confusion_matrix`::
|
||||
(object) Set the different thresholds of the {olscore} at where the metrics
|
||||
(`tp` - true positive, `fp` - false positive, `tn` - true negative, `fn` -
|
||||
false negative) are calculated.
|
||||
Default value is {"at": [0.25, 0.50, 0.75]}.
|
||||
|
||||
|
||||
[[regression-evaluation-resources]]
|
||||
==== {regression-cap} evaluation objects
|
||||
|
||||
{regression-cap} evaluation evaluates the results of a {regression} analysis
|
||||
which outputs a prediction of values.
|
||||
|
||||
|
||||
[discrete]
|
||||
[[regression-evaluation-resources-properties]]
|
||||
===== {api-definitions-title}
|
||||
|
||||
`actual_field`::
|
||||
(string) The field of the `index` which contains the `ground truth`. The data
|
||||
type of this field must be numerical.
|
||||
|
||||
`predicted_field`::
|
||||
(string) The field in the `index` that contains the predicted value,
|
||||
in other words the results of the {regression} analysis.
|
||||
|
||||
`metrics`::
|
||||
(object) Specifies the metrics that are used for the evaluation. Available
|
||||
metrics are `r_squared` and `mean_squared_error`.
|
||||
|
||||
|
||||
[[classification-evaluation-resources]]
|
||||
==== {classification-cap} evaluation objects
|
||||
|
||||
{classification-cap} evaluation evaluates the results of a {classanalysis} which
|
||||
outputs a prediction that identifies to which of the classes each document
|
||||
belongs.
|
||||
|
||||
|
||||
[discrete]
|
||||
[[classification-evaluation-resources-properties]]
|
||||
===== {api-definitions-title}
|
||||
|
||||
`actual_field`::
|
||||
(string) The field of the `index` which contains the ground truth. The data
|
||||
type of this field must be keyword.
|
||||
|
||||
`metrics`::
|
||||
(object) Specifies the metrics that are used for the evaluation. Available
|
||||
metric is `multiclass_confusion_matrix`.
|
||||
|
||||
`predicted_field`::
|
||||
(string) The field in the `index` that contains the predicted value, in other
|
||||
words the results of the {classanalysis}. The data type of this field is
|
||||
string. You need to add `.keyword` to the predicted field name (the name you
|
||||
put in the {classanalysis} object as `prediction_field_name` or the default
|
||||
value of the same field if you didn't specified explicitly). For example,
|
||||
`predicted_field` : `ml.animal_class_prediction.keyword`.
|
|
@ -12,6 +12,7 @@ Explains a {dataframe-analytics-config}.
|
|||
|
||||
experimental[]
|
||||
|
||||
|
||||
[[ml-explain-dfanalytics-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
|
@ -23,38 +24,43 @@ experimental[]
|
|||
|
||||
`POST _ml/data_frame/analytics/<data_frame_analytics_id>/_explain`
|
||||
|
||||
|
||||
[[ml-explain-dfanalytics-prereq]]
|
||||
==== {api-prereq-title}
|
||||
|
||||
* You must have `monitor_ml` privilege to use this API. For more
|
||||
information, see <<security-privileges>> and <<built-in-roles>>.
|
||||
|
||||
|
||||
[[ml-explain-dfanalytics-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
This API provides explanations for a {dataframe-analytics-config} that either exists already or one that has not been created yet.
|
||||
This API provides explanations for a {dataframe-analytics-config} that either
|
||||
exists already or one that has not been created yet.
|
||||
The following explanations are provided:
|
||||
|
||||
* which fields are included or not in the analysis and why
|
||||
* how much memory is estimated to be required. The estimate can be used when deciding the appropriate value for `model_memory_limit` setting later on.
|
||||
* which fields are included or not in the analysis and why,
|
||||
* how much memory is estimated to be required. The estimate can be used when
|
||||
deciding the appropriate value for `model_memory_limit` setting later on,
|
||||
|
||||
about either an existing {dfanalytics-job} or one that has not been created yet.
|
||||
|
||||
|
||||
[[ml-explain-dfanalytics-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<data_frame_analytics_id>`::
|
||||
(Optional, string) A numerical character string that uniquely identifies the existing
|
||||
{dfanalytics-job} to explain. This identifier can contain lowercase alphanumeric
|
||||
characters (a-z and 0-9), hyphens, and underscores. It must start and end with
|
||||
alphanumeric characters.
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics]
|
||||
|
||||
|
||||
[[ml-explain-dfanalytics-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
||||
`data_frame_analytics_config`::
|
||||
(Optional, object) Intended configuration of {dfanalytics-job}. For more information, see
|
||||
<<ml-dfanalytics-resources>>.
|
||||
Note that `id` and `dest` don't need to be provided in the context of this API.
|
||||
(Optional, object) Intended configuration of {dfanalytics-job}. Note that `id`
|
||||
and `dest` don't need to be provided in the context of this API.
|
||||
|
||||
|
||||
[[ml-explain-dfanalytics-results]]
|
||||
==== {api-response-body-title}
|
||||
|
@ -62,38 +68,13 @@ about either an existing {dfanalytics-job} or one that has not been created yet.
|
|||
The API returns a response that contains the following:
|
||||
|
||||
`field_selection`::
|
||||
(array) An array of objects that explain selection for each field, sorted by the field names.
|
||||
Each object in the array has the following properties:
|
||||
|
||||
`name`:::
|
||||
(string) The field name.
|
||||
|
||||
`mapping_types`:::
|
||||
(string) The mapping types of the field.
|
||||
|
||||
`is_included`:::
|
||||
(boolean) Whether the field is selected to be included in the analysis.
|
||||
|
||||
`is_required`:::
|
||||
(boolean) Whether the field is required.
|
||||
|
||||
`feature_type`:::
|
||||
(string) The feature type of this field for the analysis. May be `categorical` or `numerical`.
|
||||
|
||||
`reason`:::
|
||||
(string) The reason a field is not selected to be included in the analysis.
|
||||
(array)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=field-selection]
|
||||
|
||||
`memory_estimation`::
|
||||
(object) An object containing the memory estimates. The object has the following properties:
|
||||
(object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=memory-estimation]
|
||||
|
||||
`expected_memory_without_disk`:::
|
||||
(string) Estimated memory usage under the assumption that the whole {dfanalytics} should happen in memory
|
||||
(i.e. without overflowing to disk).
|
||||
|
||||
`expected_memory_with_disk`:::
|
||||
(string) Estimated memory usage under the assumption that overflowing to disk is allowed during {dfanalytics}.
|
||||
`expected_memory_with_disk` is usually smaller than `expected_memory_without_disk` as using disk allows to
|
||||
limit the main memory needed to perform {dfanalytics}.
|
||||
|
||||
[[ml-explain-dfanalytics-example]]
|
||||
==== {api-examples-title}
|
||||
|
@ -116,6 +97,7 @@ POST _ml/data_frame/analytics/_explain
|
|||
--------------------------------------------------
|
||||
// TEST[skip:TBD]
|
||||
|
||||
|
||||
The API returns the following results:
|
||||
|
||||
[source,console-result]
|
||||
|
|
|
@ -36,35 +36,24 @@ information, see <<security-privileges>> and <<built-in-roles>>.
|
|||
==== {api-path-parms-title}
|
||||
|
||||
`<data_frame_analytics_id>`::
|
||||
(Optional, string)Identifier for the {dfanalytics-job}. If you do not specify
|
||||
one of these options, the API returns information for the first hundred
|
||||
{dfanalytics-jobs}.
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-default]
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-stats-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
`allow_no_match`::
|
||||
(Optional, boolean) Specifies what to do when the request:
|
||||
+
|
||||
--
|
||||
* Contains wildcard expressions and there are no {dfanalytics-jobs} that match.
|
||||
* Contains the `_all` string or no identifiers and there are no matches.
|
||||
* Contains wildcard expressions and there are only partial matches.
|
||||
|
||||
The default value is `true`, which returns an empty `data_frame_analytics` array
|
||||
when there are no matches and the subset of results when there are partial
|
||||
matches. If this parameter is `false`, the request returns a `404` status code
|
||||
when there are no matches or only partial matches.
|
||||
--
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-match]
|
||||
|
||||
`from`::
|
||||
(Optional, integer) Skips the specified number of {dfanalytics-jobs}. The
|
||||
default value is `0`.
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=from]
|
||||
|
||||
`size`::
|
||||
(Optional, integer) Specifies the maximum number of {dfanalytics-jobs} to
|
||||
obtain. The default value is `100`.
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=size]
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-stats-response-body]]
|
||||
|
@ -73,25 +62,8 @@ when there are no matches or only partial matches.
|
|||
The API returns the following information:
|
||||
|
||||
`data_frame_analytics`::
|
||||
(array) An array of statistics objects for {dfanalytics-jobs}, which are
|
||||
sorted by the `id` value in ascending order.
|
||||
|
||||
`id`::
|
||||
(string) The unique identifier of the {dfanalytics-job}.
|
||||
|
||||
`state`::
|
||||
(string) Current state of the {dfanalytics-job}.
|
||||
|
||||
`progress`::
|
||||
(array) The progress report of the {dfanalytics-job} by phase.
|
||||
|
||||
`phase`::
|
||||
(string) Defines the phase of the {dfanalytics-job}. Possible phases:
|
||||
`reindexing`, `loading_data`, `analyzing`, and `writing_results`.
|
||||
|
||||
`progress_percent`::
|
||||
(integer) The progress that the {dfanalytics-job} has made expressed in
|
||||
percentage.
|
||||
(array)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=data-frame-analytics-stats]
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-stats-response-codes]]
|
||||
|
|
|
@ -11,6 +11,7 @@ Retrieves configuration information for {dfanalytics-jobs}.
|
|||
|
||||
experimental[]
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
|
@ -22,11 +23,13 @@ experimental[]
|
|||
|
||||
`GET _ml/data_frame/analytics/_all`
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-prereq]]
|
||||
==== {api-prereq-title}
|
||||
|
||||
* You must have `monitor_ml` privilege to use this API. For more
|
||||
information, see <<security-privileges>> and <<built-in-roles>>.
|
||||
* You must have `monitor_ml` privilege to use this API. For more information,
|
||||
see <<security-privileges>> and <<built-in-roles>>.
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-desc]]
|
||||
==== {api-description-title}
|
||||
|
@ -34,47 +37,44 @@ information, see <<security-privileges>> and <<built-in-roles>>.
|
|||
You can get information for multiple {dfanalytics-jobs} in a single API request
|
||||
by using a comma-separated list of {dfanalytics-jobs} or a wildcard expression.
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<data_frame_analytics_id>`::
|
||||
(Optional, string) Identifier for the {dfanalytics-job}. If you do not specify
|
||||
one of these options, the API returns information for the first hundred
|
||||
{dfanalytics-jobs}. You can get information for all {dfanalytics-jobs} by
|
||||
using _all, by specifying `*` as the `<data_frame_analytics_id>`, or by
|
||||
omitting the `<data_frame_analytics_id>`.
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-default]
|
||||
+
|
||||
--
|
||||
You can get information for all {dfanalytics-jobs} by using _all, by specifying
|
||||
`*` as the `<data_frame_analytics_id>`, or by omitting the
|
||||
`<data_frame_analytics_id>`.
|
||||
--
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
`allow_no_match`::
|
||||
(Optional, boolean) Specifies what to do when the request:
|
||||
+
|
||||
--
|
||||
* Contains wildcard expressions and there are no {dfanalytics-jobs} that match.
|
||||
* Contains the `_all` string or no identifiers and there are no matches.
|
||||
* Contains wildcard expressions and there are only partial matches.
|
||||
|
||||
The default value is `true`, which returns an empty `data_frame_analytics` array
|
||||
when there are no matches and the subset of results when there are partial
|
||||
matches. If this parameter is `false`, the request returns a `404` status code
|
||||
when there are no matches or only partial matches.
|
||||
--
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-match]
|
||||
|
||||
`from`::
|
||||
(Optional, integer) Skips the specified number of {dfanalytics-jobs}. The
|
||||
default value is `0`.
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=from]
|
||||
|
||||
`size`::
|
||||
(Optional, integer) Specifies the maximum number of {dfanalytics-jobs} to
|
||||
obtain. The default value is `100`.
|
||||
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=size]
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-results]]
|
||||
==== {api-response-body-title}
|
||||
|
||||
`data_frame_analytics`::
|
||||
(array) An array of {dfanalytics-job} resources. For more information, see
|
||||
<<ml-dfanalytics-resources>>.
|
||||
(array)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=data-frame-analytics]
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-response-codes]]
|
||||
==== {api-response-codes-title}
|
||||
|
@ -83,6 +83,7 @@ when there are no matches or only partial matches.
|
|||
If `allow_no_match` is `false`, this code indicates that there are no
|
||||
resources that match the request or only partial matches for the request.
|
||||
|
||||
|
||||
[[ml-get-dfanalytics-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
|
|
|
@ -14,6 +14,8 @@ You can use the following APIs to perform {ml} {dfanalytics} activities.
|
|||
* <<evaluate-dfanalytics,Evaluate {dfanalytics}>>
|
||||
* <<explain-dfanalytics,Explain {dfanalytics}>>
|
||||
|
||||
For the `analysis` object resources, check <<ml-dfa-analysis-objects>>.
|
||||
|
||||
See also <<ml-apis>>.
|
||||
|
||||
//CREATE
|
||||
|
|
|
@ -86,101 +86,62 @@ single number. For example, in case of age ranges, you can model the values as
|
|||
==== {api-path-parms-title}
|
||||
|
||||
`<data_frame_analytics_id>`::
|
||||
(Required, string) A numerical character string that uniquely identifies the
|
||||
{dfanalytics-job}. This identifier can contain lowercase alphanumeric
|
||||
characters (a-z and 0-9), hyphens, and underscores. It must start and end with
|
||||
alphanumeric characters.
|
||||
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
|
||||
|
||||
[[ml-put-dfanalytics-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
||||
`analysis`::
|
||||
(Required, object) Defines the type of {dfanalytics} you want to perform on
|
||||
your source index. For example: `outlier_detection`. See
|
||||
<<dfanalytics-types>>.
|
||||
(Required, object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=analysis]
|
||||
|
||||
`analyzed_fields`::
|
||||
(Optional, object) Specify `includes` and/or `excludes` patterns to select
|
||||
which fields will be included in the analysis. If `analyzed_fields` is not
|
||||
set, only the relevant fields will be included. For example, all the numeric
|
||||
fields for {oldetection}. For the supported field types, see
|
||||
<<ml-put-dfanalytics-supported-fields>>. Also see the <<explain-dfanalytics>>
|
||||
which helps understand field selection.
|
||||
(Optional, object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=analyzed-fields]
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT _ml/data_frame/analytics/loganalytics
|
||||
{
|
||||
"source": {
|
||||
"index": "logdata"
|
||||
},
|
||||
"dest": {
|
||||
"index": "logdata_out"
|
||||
},
|
||||
"analysis": {
|
||||
"outlier_detection": {
|
||||
}
|
||||
},
|
||||
"analyzed_fields": {
|
||||
"includes": [ "request.bytes", "response.counts.error" ],
|
||||
"excludes": [ "source.geo" ]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:setup_logdata]
|
||||
|
||||
`includes`:::
|
||||
(Optional, array) An array of strings that defines the fields that will be
|
||||
included in the analysis.
|
||||
|
||||
`excludes`:::
|
||||
(Optional, array) An array of strings that defines the fields that will be
|
||||
excluded from the analysis. You do not need to add fields with unsupported
|
||||
data types to `excludes`, these fields are excluded from the analysis
|
||||
automatically.
|
||||
|
||||
`description`::
|
||||
(Optional, string) A description of the job.
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=description-dfa]
|
||||
|
||||
`dest`::
|
||||
(Required, object) The destination configuration, consisting of `index` and
|
||||
optionally `results_field` (`ml` by default).
|
||||
|
||||
`index`:::
|
||||
(Required, string) Defines the _destination index_ to store the results of
|
||||
the {dfanalytics-job}.
|
||||
|
||||
`results_field`:::
|
||||
(Optional, string) Defines the name of the field in which to store the
|
||||
results of the analysis. Default to `ml`.
|
||||
(Required, object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dest]
|
||||
|
||||
`model_memory_limit`::
|
||||
(Optional, string) The approximate maximum amount of memory resources that are
|
||||
permitted for analytical processing. The default value for {dfanalytics-jobs}
|
||||
is `1gb`. If your `elasticsearch.yml` file contains an
|
||||
`xpack.ml.max_model_memory_limit` setting, an error occurs when you try to
|
||||
create {dfanalytics-jobs} that have `model_memory_limit` values greater than
|
||||
that setting. For more information, see <<ml-settings>>.
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=model-memory-limit-dfa]
|
||||
|
||||
`source`::
|
||||
(object) The configuration of how to source the analysis data. It requires an
|
||||
`index`. Optionally, `query` and `_source` may be specified.
|
||||
|
||||
`index`:::
|
||||
(Required, string or array) Index or indices on which to perform the
|
||||
analysis. It can be a single index or index pattern as well as an array of
|
||||
indices or patterns.
|
||||
|
||||
`query`:::
|
||||
(Optional, object) The {es} query domain-specific language
|
||||
(<<query-dsl,DSL>>). This value corresponds to the query object in an {es}
|
||||
search POST body. All the options that are supported by {es} can be used,
|
||||
as this object is passed verbatim to {es}. By default, this property has
|
||||
the following value: `{"match_all": {}}`.
|
||||
|
||||
`_source`:::
|
||||
(Optional, object) Specify `includes` and/or `excludes` patterns to select
|
||||
which fields will be present in the destination. Fields that are excluded
|
||||
cannot be included in the analysis.
|
||||
|
||||
`includes`::::
|
||||
(array) An array of strings that defines the fields that will be
|
||||
included in the destination.
|
||||
|
||||
`excludes`::::
|
||||
(array) An array of strings that defines the fields that will be
|
||||
excluded from the destination.
|
||||
(object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=source-put-dfa]
|
||||
|
||||
`allow_lazy_start`::
|
||||
(Optional, boolean) Whether this job should be allowed to start when there
|
||||
is insufficient {ml} node capacity for it to be immediately assigned to a node.
|
||||
The default is `false`, which means that the <<start-dfanalytics>>
|
||||
will return an error if a {ml} node with capacity to run the
|
||||
job cannot immediately be found. (However, this is also subject to
|
||||
the cluster-wide `xpack.ml.max_lazy_ml_nodes` setting - see
|
||||
<<advanced-ml-settings>>.) If this option is set to `true` then
|
||||
the <<start-dfanalytics>> will not return an error, and the job will
|
||||
wait in the `starting` state until sufficient {ml} node capacity
|
||||
is available.
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-lazy-start]
|
||||
|
||||
|
||||
[[ml-put-dfanalytics-example]]
|
||||
|
@ -294,35 +255,33 @@ The API returns the following result:
|
|||
[source,console-result]
|
||||
----
|
||||
{
|
||||
"id" : "loganalytics",
|
||||
"description": "Outlier detection on log data",
|
||||
"source" : {
|
||||
"index" : [
|
||||
"logdata"
|
||||
],
|
||||
"query" : {
|
||||
"match_all" : { }
|
||||
}
|
||||
},
|
||||
"dest" : {
|
||||
"index" : "logdata_out",
|
||||
"results_field" : "ml"
|
||||
},
|
||||
"analysis": {
|
||||
"outlier_detection": {
|
||||
"compute_feature_influence": true,
|
||||
"outlier_fraction": 0.05,
|
||||
"standardization_enabled": true
|
||||
}
|
||||
},
|
||||
"model_memory_limit" : "1gb",
|
||||
"create_time" : 1562351429434,
|
||||
"version" : "7.3.0",
|
||||
"allow_lazy_start" : false
|
||||
"id": "loganalytics",
|
||||
"description": "Outlier detection on log data",
|
||||
"source": {
|
||||
"index": ["logdata"],
|
||||
"query": {
|
||||
"match_all": {}
|
||||
}
|
||||
},
|
||||
"dest": {
|
||||
"index": "logdata_out",
|
||||
"results_field": "ml"
|
||||
},
|
||||
"analysis": {
|
||||
"outlier_detection": {
|
||||
"compute_feature_influence": true,
|
||||
"outlier_fraction": 0.05,
|
||||
"standardization_enabled": true
|
||||
}
|
||||
},
|
||||
"model_memory_limit": "1gb",
|
||||
"create_time" : 1562265491319,
|
||||
"version" : "7.6.0",
|
||||
"allow_lazy_start" : false
|
||||
}
|
||||
----
|
||||
// TESTRESPONSE[s/1562351429434/$body.$_path/]
|
||||
// TESTRESPONSE[s/"version" : "7.3.0"/"version" : $body.version/]
|
||||
// TESTRESPONSE[s/1562265491319/$body.$_path/]
|
||||
// TESTRESPONSE[s/"version": "7.6.0"/"version": $body.version/]
|
||||
|
||||
|
||||
[[ml-put-dfanalytics-example-r]]
|
||||
|
@ -410,9 +369,10 @@ PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
|
|||
--------------------------------------------------
|
||||
// TEST[skip:TBD]
|
||||
|
||||
<1> The `training_percent` defines the percentage of the data set that will be used
|
||||
for training the model.
|
||||
<2> The `randomize_seed` is the seed used to randomly pick which data is used for training.
|
||||
<1> The `training_percent` defines the percentage of the data set that will be
|
||||
used for training the model.
|
||||
<2> The `randomize_seed` is the seed used to randomly pick which data is used
|
||||
for training.
|
||||
|
||||
|
||||
[[ml-put-dfanalytics-example-c]]
|
||||
|
|
|
@ -29,16 +29,15 @@ more information, see <<security-privileges>> and <<built-in-roles>>.
|
|||
==== {api-path-parms-title}
|
||||
|
||||
`<data_frame_analytics_id>`::
|
||||
(Required, string) Identifier for the {dfanalytics-job}. This identifier can
|
||||
contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
|
||||
underscores. It must start and end with alphanumeric characters.
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
|
||||
|
||||
[[ml-start-dfanalytics-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
`timeout`::
|
||||
(Optional, time) Controls the amount of time to wait until the
|
||||
{dfanalytics-job} starts. The default value is 20 seconds.
|
||||
(Optional, <<time-units,time units>>)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=timeout-start]
|
||||
|
||||
[[ml-start-dfanalytics-example]]
|
||||
==== {api-examples-title}
|
||||
|
|
|
@ -42,24 +42,23 @@ stop all {dfanalytics-job} by using _all or by specifying * as the
|
|||
==== {api-path-parms-title}
|
||||
|
||||
`<data_frame_analytics_id>`::
|
||||
(Required, string) Identifier for the {dfanalytics-job}. This identifier can
|
||||
contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
|
||||
underscores. It must start and end with alphanumeric characters.
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
|
||||
|
||||
[[ml-stop-dfanalytics-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
`allow_no_match`::
|
||||
(Optional, boolean) If `false` and the `data_frame_analytics_id` does not
|
||||
match any {dfanalytics-job} an error will be returned. The default value is
|
||||
`true`.
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-match]
|
||||
|
||||
|
||||
`force`::
|
||||
(Optional, boolean) If true, the {dfanalytics-job} is stopped forcefully.
|
||||
|
||||
`timeout`::
|
||||
(Optional, time) Controls the amount of time to wait until the
|
||||
{dfanalytics-job} stops. The default value is 20 seconds.
|
||||
(Optional, <<time-units,time units>>)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=timeout-stop]
|
||||
|
||||
|
||||
[[ml-stop-dfanalytics-example]]
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -7,9 +7,7 @@ These resource definitions are used in APIs related to {ml-features} and
|
|||
|
||||
* <<ml-datafeed-resource,{dfeeds-cap}>>
|
||||
* <<ml-datafeed-counts,{dfeed-cap} counts>>
|
||||
* <<ml-dfanalytics-resources,{dfanalytics-cap}>>
|
||||
* <<ml-evaluate-dfanalytics-resources,Evaluate {dfanalytics}>>
|
||||
* <<ml-job-resource,{anomaly-jobs-cap}>>
|
||||
* <<ml-dfa-analysis-objects>>
|
||||
* <<ml-jobstats,{anomaly-jobs-cap} statistics>>
|
||||
* <<ml-snapshot-resource,{anomaly-detect-cap} model snapshots>>
|
||||
* <<ml-results-resource,{anomaly-detect-cap} results>>
|
||||
|
@ -17,10 +15,9 @@ These resource definitions are used in APIs related to {ml-features} and
|
|||
* <<transform-resource,{transforms-cap}>>
|
||||
|
||||
include::{es-repo-dir}/ml/anomaly-detection/apis/datafeedresource.asciidoc[]
|
||||
include::{es-repo-dir}/ml/df-analytics/apis/dfanalyticsresources.asciidoc[]
|
||||
include::{es-repo-dir}/ml/df-analytics/apis/evaluateresources.asciidoc[]
|
||||
include::{es-repo-dir}/ml/anomaly-detection/apis/jobresource.asciidoc[]
|
||||
include::{es-repo-dir}/ml/df-analytics/apis/analysisobjects.asciidoc[]
|
||||
include::{es-repo-dir}/ml/anomaly-detection/apis/jobcounts.asciidoc[]
|
||||
include::{es-repo-dir}/ml/anomaly-detection/apis/jobresource.asciidoc[]
|
||||
include::{es-repo-dir}/ml/anomaly-detection/apis/snapshotresource.asciidoc[]
|
||||
include::{xes-repo-dir}/rest-api/security/role-mapping-resources.asciidoc[]
|
||||
include::{es-repo-dir}/ml/anomaly-detection/apis/resultsresource.asciidoc[]
|
||||
|
|
Loading…
Reference in New Issue