[DOCS] Moves analysis resources to PUT DFA API docs (#50704)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
This commit is contained in:
parent
acd73dda1c
commit
4e1107d5d7
|
@ -1,217 +0,0 @@
|
||||||
[role="xpack"]
|
|
||||||
[testenv="platinum"]
|
|
||||||
[[ml-dfa-analysis-objects]]
|
|
||||||
=== Analysis configuration objects
|
|
||||||
|
|
||||||
{dfanalytics-cap} resources contain `analysis` objects. For example, when you
|
|
||||||
create a {dfanalytics-job}, you must define the type of analysis it performs.
|
|
||||||
This page lists all the available parameters that you can use in the `analysis`
|
|
||||||
object grouped by {dfanalytics} types.
|
|
||||||
|
|
||||||
|
|
||||||
[discrete]
|
|
||||||
[[oldetection-resources]]
|
|
||||||
==== {oldetection-cap} configuration objects
|
|
||||||
|
|
||||||
An `outlier_detection` configuration object has the following properties:
|
|
||||||
|
|
||||||
`compute_feature_influence`::
|
|
||||||
(Optional, boolean)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
|
|
||||||
|
|
||||||
`feature_influence_threshold`::
|
|
||||||
(Optional, double)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
|
|
||||||
|
|
||||||
`method`::
|
|
||||||
(Optional, string)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=method]
|
|
||||||
|
|
||||||
`n_neighbors`::
|
|
||||||
(Optional, integer)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=n-neighbors]
|
|
||||||
|
|
||||||
`outlier_fraction`::
|
|
||||||
(Optional, double)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
|
|
||||||
|
|
||||||
`standardization_enabled`::
|
|
||||||
(Optional, boolean)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
|
|
||||||
|
|
||||||
|
|
||||||
[discrete]
|
|
||||||
[[regression-resources]]
|
|
||||||
==== {regression-cap} configuration objects
|
|
||||||
|
|
||||||
[source,console]
|
|
||||||
--------------------------------------------------
|
|
||||||
PUT _ml/data_frame/analytics/house_price_regression_analysis
|
|
||||||
{
|
|
||||||
"source": {
|
|
||||||
"index": "houses_sold_last_10_yrs" <1>
|
|
||||||
},
|
|
||||||
"dest": {
|
|
||||||
"index": "house_price_predictions" <2>
|
|
||||||
},
|
|
||||||
"analysis":
|
|
||||||
{
|
|
||||||
"regression": { <3>
|
|
||||||
"dependent_variable": "price" <4>
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
// TEST[skip:TBD]
|
|
||||||
|
|
||||||
<1> Training data is taken from source index `houses_sold_last_10_yrs`.
|
|
||||||
<2> Analysis results will be output to destination index
|
|
||||||
`house_price_predictions`.
|
|
||||||
<3> The regression analysis configuration object.
|
|
||||||
<4> Regression analysis will use field `price` to train on. As no other
|
|
||||||
parameters have been specified it will train on 100% of eligible data, store its
|
|
||||||
prediction in destination index field `price_prediction` and use in-built
|
|
||||||
hyperparameter optimization to give minimum validation errors.
|
|
||||||
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[regression-resources-standard]]
|
|
||||||
===== Standard parameters
|
|
||||||
|
|
||||||
`dependent_variable`::
|
|
||||||
(Required, string)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
|
|
||||||
+
|
|
||||||
--
|
|
||||||
The data type of the field must be numeric.
|
|
||||||
--
|
|
||||||
|
|
||||||
`prediction_field_name`::
|
|
||||||
(Optional, string)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
|
|
||||||
|
|
||||||
`training_percent`::
|
|
||||||
(Optional, integer)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
|
|
||||||
|
|
||||||
`randomize_seed`::
|
|
||||||
(Optional, long)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
|
|
||||||
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[regression-resources-advanced]]
|
|
||||||
===== Advanced parameters
|
|
||||||
|
|
||||||
Advanced parameters are for fine-tuning {reganalysis}. They are set
|
|
||||||
automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
|
|
||||||
to give minimum validation error. It is highly recommended to use the default
|
|
||||||
values unless you fully understand the function of these parameters. If these
|
|
||||||
parameters are not supplied, their values are automatically tuned to give
|
|
||||||
minimum validation error.
|
|
||||||
|
|
||||||
`eta`::
|
|
||||||
(Optional, double)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
|
||||||
|
|
||||||
`feature_bag_fraction`::
|
|
||||||
(Optional, double)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
|
|
||||||
|
|
||||||
`maximum_number_trees`::
|
|
||||||
(Optional, integer)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
|
|
||||||
|
|
||||||
`gamma`::
|
|
||||||
(Optional, double)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
|
||||||
|
|
||||||
`lambda`::
|
|
||||||
(Optional, double)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
|
||||||
|
|
||||||
|
|
||||||
[discrete]
|
|
||||||
[[classification-resources]]
|
|
||||||
==== {classification-cap} configuration objects
|
|
||||||
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[classification-resources-standard]]
|
|
||||||
===== Standard parameters
|
|
||||||
|
|
||||||
`dependent_variable`::
|
|
||||||
(Required, string)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
|
|
||||||
+
|
|
||||||
--
|
|
||||||
The data type of the field must be numeric (`integer`, `short`, `long`, `byte`),
|
|
||||||
categorical (`ip`, `keyword`, `text`), or boolean.
|
|
||||||
--
|
|
||||||
|
|
||||||
`num_top_classes`::
|
|
||||||
(Optional, integer)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=num-top-classes]
|
|
||||||
|
|
||||||
`prediction_field_name`::
|
|
||||||
(Optional, string)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
|
|
||||||
|
|
||||||
`training_percent`::
|
|
||||||
(Optional, integer)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
|
|
||||||
|
|
||||||
`randomize_seed`::
|
|
||||||
(Optional, long)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
|
|
||||||
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[classification-resources-advanced]]
|
|
||||||
===== Advanced parameters
|
|
||||||
|
|
||||||
Advanced parameters are for fine-tuning {classanalysis}. They are set
|
|
||||||
automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
|
|
||||||
to give minimum validation error. It is highly recommended to use the default
|
|
||||||
values unless you fully understand the function of these parameters. If these
|
|
||||||
parameters are not supplied, their values are automatically tuned to give
|
|
||||||
minimum validation error.
|
|
||||||
|
|
||||||
`eta`::
|
|
||||||
(Optional, double)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
|
||||||
|
|
||||||
`feature_bag_fraction`::
|
|
||||||
(Optional, double)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
|
|
||||||
|
|
||||||
`maximum_number_trees`::
|
|
||||||
(Optional, integer)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
|
|
||||||
|
|
||||||
`gamma`::
|
|
||||||
(Optional, double)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
|
||||||
|
|
||||||
`lambda`::
|
|
||||||
(Optional, double)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
|
||||||
|
|
||||||
[discrete]
|
|
||||||
[[ml-hyperparam-optimization]]
|
|
||||||
==== Hyperparameter optimization
|
|
||||||
|
|
||||||
If you don't supply {regression} or {classification} parameters, hyperparameter
|
|
||||||
optimization will be performed by default to set a value for the undefined
|
|
||||||
parameters. The starting point is calculated for data dependent parameters by
|
|
||||||
examining the loss on the training data. Subject to the size constraint, this
|
|
||||||
operation provides an upper bound on the improvement in validation loss.
|
|
||||||
|
|
||||||
A fixed number of rounds is used for optimization which depends on the number of
|
|
||||||
parameters being optimized. The optimization starts with random search, then
|
|
||||||
Bayesian optimization is performed that is targeting maximum expected
|
|
||||||
improvement. If you override any parameters, then the optimization will
|
|
||||||
calculate the value of the remaining parameters accordingly and use the value
|
|
||||||
you provided for the overridden parameter. The number of rounds are reduced
|
|
||||||
respectively. The validation error is estimated in each round by using 4-fold
|
|
||||||
cross validation.
|
|
|
@ -14,8 +14,6 @@ You can use the following APIs to perform {ml} {dfanalytics} activities.
|
||||||
* <<evaluate-dfanalytics,Evaluate {dfanalytics}>>
|
* <<evaluate-dfanalytics,Evaluate {dfanalytics}>>
|
||||||
* <<explain-dfanalytics,Explain {dfanalytics}>>
|
* <<explain-dfanalytics,Explain {dfanalytics}>>
|
||||||
|
|
||||||
For the `analysis` object resources, check <<ml-dfa-analysis-objects>>.
|
|
||||||
|
|
||||||
|
|
||||||
You can use the following APIs to perform {infer} operations.
|
You can use the following APIs to perform {infer} operations.
|
||||||
|
|
||||||
|
|
|
@ -53,41 +53,25 @@ If the destination index already exists, then it will be use as is. This makes
|
||||||
it possible to set up the destination index in advance with custom settings
|
it possible to set up the destination index in advance with custom settings
|
||||||
and mappings.
|
and mappings.
|
||||||
|
|
||||||
[[ml-put-dfanalytics-supported-fields]]
|
[discrete]
|
||||||
===== Supported fields
|
[[ml-hyperparam-optimization]]
|
||||||
|
===== Hyperparameter optimization
|
||||||
|
|
||||||
====== {oldetection-cap}
|
If you don't supply {regression} or {classification} parameters, _hyperparameter
|
||||||
|
optimization_ occurs, which sets a value for the undefined parameters. The
|
||||||
{oldetection-cap} requires numeric or boolean data to analyze. The algorithms
|
starting point is calculated for data dependent parameters by examining the loss
|
||||||
don't support missing values therefore fields that have data types other than
|
on the training data. Subject to the size constraint, this operation provides an
|
||||||
numeric or boolean are ignored. Documents where included fields contain missing
|
upper bound on the improvement in validation loss.
|
||||||
values, null values, or an array are also ignored. Therefore the `dest` index
|
|
||||||
may contain documents that don't have an {olscore}.
|
|
||||||
|
|
||||||
|
|
||||||
====== {regression-cap}
|
|
||||||
|
|
||||||
{regression-cap} supports fields that are numeric, `boolean`, `text`, `keyword`,
|
|
||||||
and `ip`. It is also tolerant of missing values. Fields that are supported are
|
|
||||||
included in the analysis, other fields are ignored. Documents where included
|
|
||||||
fields contain an array with two or more values are also ignored. Documents in
|
|
||||||
the `dest` index that don’t contain a results field are not included in the
|
|
||||||
{reganalysis}.
|
|
||||||
|
|
||||||
|
|
||||||
====== {classification-cap}
|
|
||||||
|
|
||||||
{classification-cap} supports fields that are numeric, `boolean`, `text`,
|
|
||||||
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
|
|
||||||
supported are included in the analysis, other fields are ignored. Documents
|
|
||||||
where included fields contain an array with two or more values are also ignored.
|
|
||||||
Documents in the `dest` index that don’t contain a results field are not
|
|
||||||
included in the {classanalysis}.
|
|
||||||
|
|
||||||
{classanalysis-cap} can be improved by mapping ordinal variable values to a
|
|
||||||
single number. For example, in case of age ranges, you can model the values as
|
|
||||||
"0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.
|
|
||||||
|
|
||||||
|
A fixed number of rounds is used for optimization which depends on the number of
|
||||||
|
parameters being optimized. The optimization starts with random search, then
|
||||||
|
Bayesian optimization is performed that is targeting maximum expected
|
||||||
|
improvement. If you override any parameters,
|
||||||
|
//TBD: What is meant by overriding them? Explicitly setting the parameter instead of letting it take the default?
|
||||||
|
the optimization calculates the value of the remaining parameters accordingly
|
||||||
|
and uses the value you provided for the overridden parameter. The number of
|
||||||
|
rounds are reduced respectively. The validation error is estimated in each round
|
||||||
|
by using 4-fold cross validation.
|
||||||
|
|
||||||
[[ml-put-dfanalytics-path-params]]
|
[[ml-put-dfanalytics-path-params]]
|
||||||
==== {api-path-parms-title}
|
==== {api-path-parms-title}
|
||||||
|
@ -99,36 +83,170 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
|
||||||
[[ml-put-dfanalytics-request-body]]
|
[[ml-put-dfanalytics-request-body]]
|
||||||
==== {api-request-body-title}
|
==== {api-request-body-title}
|
||||||
|
|
||||||
|
`allow_lazy_start`::
|
||||||
|
(Optional, boolean)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-lazy-start]
|
||||||
|
|
||||||
`analysis`::
|
`analysis`::
|
||||||
(Required, object)
|
(Required, object)
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=analysis]
|
The analysis configuration, which contains the information necessary to perform
|
||||||
|
one of the following types of analysis: {classification}, {oldetection}, or
|
||||||
|
{regression}.
|
||||||
|
//include::{docdir}/ml/ml-shared.asciidoc[tag=analysis]
|
||||||
|
|
||||||
|
`analysis`.`classification`:::
|
||||||
|
(Required^*^, object)
|
||||||
|
The configuration information necessary to perform
|
||||||
|
{ml-docs}/dfa-classification.html[{classification}].
|
||||||
|
+
|
||||||
|
--
|
||||||
|
TIP: Advanced parameters are for fine-tuning {classanalysis}. They are set
|
||||||
|
automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
|
||||||
|
to give minimum validation error. It is highly recommended to use the default
|
||||||
|
values unless you fully understand the function of these parameters.
|
||||||
|
|
||||||
|
--
|
||||||
|
|
||||||
|
`analysis`.`classification`.`dependent_variable`::::
|
||||||
|
(Required, string)
|
||||||
|
+
|
||||||
|
--
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
|
||||||
|
|
||||||
|
The data type of the field must be numeric (`integer`, `short`, `long`, `byte`),
|
||||||
|
categorical (`ip`, `keyword`, `text`), or boolean.
|
||||||
|
--
|
||||||
|
|
||||||
|
`analysis`.`classification`.`eta`::::
|
||||||
|
(Optional, double)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
||||||
|
|
||||||
|
`analysis`.`classification`.`feature_bag_fraction`::::
|
||||||
|
(Optional, double)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
|
||||||
|
|
||||||
|
`analysis`.`classification`.`maximum_number_trees`::::
|
||||||
|
(Optional, integer)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
|
||||||
|
|
||||||
|
`analysis`.`classification`.`gamma`::::
|
||||||
|
(Optional, double)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
||||||
|
|
||||||
|
`analysis`.`classification`.`lambda`::::
|
||||||
|
(Optional, double)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
||||||
|
|
||||||
|
`analysis`.`classification`.`num_top_classes`::::
|
||||||
|
(Optional, integer)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=num-top-classes]
|
||||||
|
|
||||||
|
`analysis`.`classification`.`prediction_field_name`::::
|
||||||
|
(Optional, string)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
|
||||||
|
|
||||||
|
`analysis`.`classification`.`randomize_seed`::::
|
||||||
|
(Optional, long)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
|
||||||
|
|
||||||
|
`analysis`.`classification`.`training_percent`::::
|
||||||
|
(Optional, integer)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
|
||||||
|
|
||||||
|
`analysis`.`outlier_detection`:::
|
||||||
|
(Required^*^, object)
|
||||||
|
The configuration information necessary to perform
|
||||||
|
{ml-docs}/dfa-outlier-detection.html[{oldetection}]:
|
||||||
|
|
||||||
|
`analysis`.`outlier_detection`.`compute_feature_influence`::::
|
||||||
|
(Optional, boolean)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
|
||||||
|
|
||||||
|
`analysis`.`outlier_detection`.`feature_influence_threshold`::::
|
||||||
|
(Optional, double)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
|
||||||
|
|
||||||
|
`analysis`.`outlier_detection`.`method`::::
|
||||||
|
(Optional, string)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=method]
|
||||||
|
|
||||||
|
`analysis`.`outlier_detection`.`n_neighbors`::::
|
||||||
|
(Optional, integer)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=n-neighbors]
|
||||||
|
|
||||||
|
`analysis`.`outlier_detection`.`outlier_fraction`::::
|
||||||
|
(Optional, double)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
|
||||||
|
|
||||||
|
`analysis`.`outlier_detection`.`standardization_enabled`::::
|
||||||
|
(Optional, boolean)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
|
||||||
|
|
||||||
|
`analysis`.`regression`:::
|
||||||
|
(Required^*^, object)
|
||||||
|
The configuration information necessary to perform
|
||||||
|
{ml-docs}/dfa-regression.html[{regression}].
|
||||||
|
+
|
||||||
|
--
|
||||||
|
TIP: Advanced parameters are for fine-tuning {reganalysis}. They are set
|
||||||
|
automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
|
||||||
|
to give minimum validation error. It is highly recommended to use the default
|
||||||
|
values unless you fully understand the function of these parameters.
|
||||||
|
|
||||||
|
--
|
||||||
|
|
||||||
|
`analysis`.`regression`.`dependent_variable`::::
|
||||||
|
(Required, string)
|
||||||
|
+
|
||||||
|
--
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
|
||||||
|
|
||||||
|
The data type of the field must be numeric.
|
||||||
|
--
|
||||||
|
|
||||||
|
`analysis`.`regression`.`eta`::::
|
||||||
|
(Optional, double)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
||||||
|
|
||||||
|
`analysis`.`regression`.`feature_bag_fraction`::::
|
||||||
|
(Optional, double)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
|
||||||
|
|
||||||
|
`analysis`.`regression`.`maximum_number_trees`::::
|
||||||
|
(Optional, integer)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
|
||||||
|
|
||||||
|
`analysis`.`regression`.`gamma`::::
|
||||||
|
(Optional, double)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
||||||
|
|
||||||
|
`analysis`.`regression`.`lambda`::::
|
||||||
|
(Optional, double)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
||||||
|
|
||||||
|
`analysis`.`regression`.`prediction_field_name`::::
|
||||||
|
(Optional, string)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
|
||||||
|
|
||||||
|
`analysis`.`regression`.`training_percent`::::
|
||||||
|
(Optional, integer)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
|
||||||
|
|
||||||
|
`analysis`.`regression`.`randomize_seed`::::
|
||||||
|
(Optional, long)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
|
||||||
|
|
||||||
`analyzed_fields`::
|
`analyzed_fields`::
|
||||||
(Optional, object)
|
(Optional, object)
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=analyzed-fields]
|
include::{docdir}/ml/ml-shared.asciidoc[tag=analyzed-fields]
|
||||||
|
|
||||||
[source,console]
|
`analyzed_fields`.`excludes`:::
|
||||||
--------------------------------------------------
|
(Optional, array)
|
||||||
PUT _ml/data_frame/analytics/loganalytics
|
include::{docdir}/ml/ml-shared.asciidoc[tag=analyzed-fields-excludes]
|
||||||
{
|
|
||||||
"source": {
|
|
||||||
"index": "logdata"
|
|
||||||
},
|
|
||||||
"dest": {
|
|
||||||
"index": "logdata_out"
|
|
||||||
},
|
|
||||||
"analysis": {
|
|
||||||
"outlier_detection": {
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"analyzed_fields": {
|
|
||||||
"includes": [ "request.bytes", "response.counts.error" ],
|
|
||||||
"excludes": [ "source.geo" ]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
// TEST[setup:setup_logdata]
|
|
||||||
|
|
||||||
|
`analyzed_fields`.`includes`:::
|
||||||
|
(Optional, array)
|
||||||
|
include::{docdir}/ml/ml-shared.asciidoc[tag=analyzed-fields-includes]
|
||||||
|
|
||||||
`description`::
|
`description`::
|
||||||
(Optional, string)
|
(Optional, string)
|
||||||
|
@ -146,15 +264,9 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=model-memory-limit-dfa]
|
||||||
(object)
|
(object)
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=source-put-dfa]
|
include::{docdir}/ml/ml-shared.asciidoc[tag=source-put-dfa]
|
||||||
|
|
||||||
`allow_lazy_start`::
|
|
||||||
(Optional, boolean)
|
|
||||||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-lazy-start]
|
|
||||||
|
|
||||||
|
|
||||||
[[ml-put-dfanalytics-example]]
|
[[ml-put-dfanalytics-example]]
|
||||||
==== {api-examples-title}
|
==== {api-examples-title}
|
||||||
|
|
||||||
|
|
||||||
[[ml-put-dfanalytics-example-preprocess]]
|
[[ml-put-dfanalytics-example-preprocess]]
|
||||||
===== Preprocessing actions example
|
===== Preprocessing actions example
|
||||||
|
|
||||||
|
|
|
@ -93,22 +93,47 @@ end::analysis-limits[]
|
||||||
|
|
||||||
tag::analyzed-fields[]
|
tag::analyzed-fields[]
|
||||||
Specify `includes` and/or `excludes` patterns to select which fields will be
|
Specify `includes` and/or `excludes` patterns to select which fields will be
|
||||||
included in the analysis. If `analyzed_fields` is not set, only the relevant
|
included in the analysis.
|
||||||
fields will be included. For example, all the numeric fields for {oldetection}.
|
+
|
||||||
For the supported field types, see <<ml-put-dfanalytics-supported-fields>>. Also
|
--
|
||||||
see the <<explain-dfanalytics>> which helps understand field selection.
|
The supported fields for each type of analysis are as follows:
|
||||||
|
|
||||||
`includes`:::
|
* {oldetection-cap} requires numeric or boolean data to analyze. The algorithms
|
||||||
(Optional, array) An array of strings that defines the fields that will be
|
don't support missing values therefore fields that have data types other than
|
||||||
included in the analysis.
|
numeric or boolean are ignored. Documents where included fields contain missing
|
||||||
|
values, null values, or an array are also ignored. Therefore the `dest` index
|
||||||
`excludes`:::
|
may contain documents that don't have an {olscore}.
|
||||||
(Optional, array) An array of strings that defines the fields that will be
|
* {regression-cap} supports fields that are numeric, `boolean`, `text`, `keyword`,
|
||||||
excluded from the analysis. You do not need to add fields with unsupported
|
and `ip`. It is also tolerant of missing values. Fields that are supported are
|
||||||
data types to `excludes`, these fields are excluded from the analysis
|
included in the analysis, other fields are ignored. Documents where included
|
||||||
automatically.
|
fields contain an array with two or more values are also ignored. Documents in
|
||||||
|
the `dest` index that don’t contain a results field are not included in the
|
||||||
|
{reganalysis}.
|
||||||
|
* {classification-cap} supports fields that are numeric, `boolean`, `text`,
|
||||||
|
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
|
||||||
|
supported are included in the analysis, other fields are ignored. Documents
|
||||||
|
where included fields contain an array with two or more values are also ignored.
|
||||||
|
Documents in the `dest` index that don’t contain a results field are not
|
||||||
|
included in the {classanalysis}. {classanalysis-cap} can be improved by mapping
|
||||||
|
ordinal variable values to a single number. For example, in case of age ranges,
|
||||||
|
you can model the values as "0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.
|
||||||
|
|
||||||
|
If `analyzed_fields` is not set, only the relevant fields will be included. For
|
||||||
|
example, all the numeric fields for {oldetection}. For more information about
|
||||||
|
field selection, see <<explain-dfanalytics>>.
|
||||||
|
--
|
||||||
end::analyzed-fields[]
|
end::analyzed-fields[]
|
||||||
|
|
||||||
|
tag::analyzed-fields-excludes[]
|
||||||
|
An array of strings that defines the fields that will be excluded from the
|
||||||
|
analysis. You do not need to add fields with unsupported data types to
|
||||||
|
`excludes`, these fields are excluded from the analysis automatically.
|
||||||
|
end::analyzed-fields-excludes[]
|
||||||
|
|
||||||
|
tag::analyzed-fields-includes[]
|
||||||
|
An array of strings that defines the fields that will be included in the analysis.
|
||||||
|
end::analyzed-fields-includes[]
|
||||||
|
|
||||||
tag::background-persist-interval[]
|
tag::background-persist-interval[]
|
||||||
Advanced configuration option. The time between each periodic persistence of the
|
Advanced configuration option. The time between each periodic persistence of the
|
||||||
model. The default value is a randomized value between 3 to 4 hours, which
|
model. The default value is a randomized value between 3 to 4 hours, which
|
||||||
|
@ -511,11 +536,11 @@ identifier when you want to update a specific detector.
|
||||||
end::detector-index[]
|
end::detector-index[]
|
||||||
|
|
||||||
tag::eta[]
|
tag::eta[]
|
||||||
The shrinkage applied to the weights. Smaller values result
|
Advanced configuration option. The shrinkage applied to the weights. Smaller
|
||||||
in larger forests which have better generalization error. However, the smaller
|
values result in larger forests which have better generalization error. However,
|
||||||
the value the longer the training will take. For more information, see
|
the smaller the value the longer the training will take. For more information
|
||||||
https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article]
|
about shrinkage, see
|
||||||
about shrinkage.
|
https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article].
|
||||||
end::eta[]
|
end::eta[]
|
||||||
|
|
||||||
tag::exclude-frequent[]
|
tag::exclude-frequent[]
|
||||||
|
@ -532,8 +557,8 @@ included.
|
||||||
end::exclude-interim-results[]
|
end::exclude-interim-results[]
|
||||||
|
|
||||||
tag::feature-bag-fraction[]
|
tag::feature-bag-fraction[]
|
||||||
Defines the fraction of features that will be used when
|
Advanced configuration option. Defines the fraction of features that will be
|
||||||
selecting a random bag for each candidate split.
|
used when selecting a random bag for each candidate split.
|
||||||
end::feature-bag-fraction[]
|
end::feature-bag-fraction[]
|
||||||
|
|
||||||
tag::feature-influence-threshold[]
|
tag::feature-influence-threshold[]
|
||||||
|
@ -594,10 +619,10 @@ The analysis function that is used. For example, `count`, `rare`, `mean`, `min`,
|
||||||
end::function[]
|
end::function[]
|
||||||
|
|
||||||
tag::gamma[]
|
tag::gamma[]
|
||||||
Regularization parameter to prevent overfitting on the
|
Advanced configuration option. Regularization parameter to prevent overfitting
|
||||||
training dataset. Multiplies a linear penalty associated with the size of
|
on the training dataset. Multiplies a linear penalty associated with the size of
|
||||||
individual trees in the forest. The higher the value the more training will
|
individual trees in the forest. The higher the value the more training will
|
||||||
prefer smaller trees. The smaller this parameter the larger individual trees
|
prefer smaller trees. The smaller this parameter the larger individual trees
|
||||||
will be and the longer train will take.
|
will be and the longer train will take.
|
||||||
end::gamma[]
|
end::gamma[]
|
||||||
|
|
||||||
|
@ -691,10 +716,10 @@ For more information, see <<ml-jobstats>>.
|
||||||
end::jobs-stats-anomaly-detection[]
|
end::jobs-stats-anomaly-detection[]
|
||||||
|
|
||||||
tag::lambda[]
|
tag::lambda[]
|
||||||
Regularization parameter to prevent overfitting on the
|
Advanced configuration option. Regularization parameter to prevent overfitting
|
||||||
training dataset. Multiplies an L2 regularisation term which applies to leaf
|
on the training dataset. Multiplies an L2 regularisation term which applies to
|
||||||
weights of the individual trees in the forest. The higher the value the more
|
leaf weights of the individual trees in the forest. The higher the value the
|
||||||
training will attempt to keep leaf weights small. This makes the prediction
|
more training will attempt to keep leaf weights small. This makes the prediction
|
||||||
function smoother at the expense of potentially not being able to capture
|
function smoother at the expense of potentially not being able to capture
|
||||||
relevant relationships between the features and the {depvar}. The smaller this
|
relevant relationships between the features and the {depvar}. The smaller this
|
||||||
parameter the larger individual trees will be and the longer train will take.
|
parameter the larger individual trees will be and the longer train will take.
|
||||||
|
@ -723,8 +748,8 @@ until it is explicitly stopped. By default this setting is not set.
|
||||||
end::max-empty-searches[]
|
end::max-empty-searches[]
|
||||||
|
|
||||||
tag::maximum-number-trees[]
|
tag::maximum-number-trees[]
|
||||||
Defines the maximum number of trees the forest is allowed
|
Advanced configuration option. Defines the maximum number of trees the forest is
|
||||||
to contain. The maximum value is 2000.
|
allowed to contain. The maximum value is 2000.
|
||||||
end::maximum-number-trees[]
|
end::maximum-number-trees[]
|
||||||
|
|
||||||
tag::memory-estimation[]
|
tag::memory-estimation[]
|
||||||
|
|
|
@ -298,3 +298,9 @@ See <<ml-get-bucket>>,
|
||||||
<<ml-get-category>>, and
|
<<ml-get-category>>, and
|
||||||
[[ml-results-overall-buckets]]
|
[[ml-results-overall-buckets]]
|
||||||
<<ml-get-overall-buckets>>.
|
<<ml-get-overall-buckets>>.
|
||||||
|
|
||||||
|
[role="exclude",id="ml-dfa-analysis-objects"]
|
||||||
|
=== Analysis configuration objects
|
||||||
|
|
||||||
|
This page was deleted.
|
||||||
|
See <<put-dfanalytics>>.
|
|
@ -2,11 +2,10 @@
|
||||||
[[api-definitions]]
|
[[api-definitions]]
|
||||||
== Definitions
|
== Definitions
|
||||||
|
|
||||||
These resource definitions are used in APIs related to {ml-features} and
|
The role mappings resource definition you can find below is used in APIs related
|
||||||
{security-features} and in {kib} advanced {ml} job configuration options.
|
to security features.
|
||||||
|
|
||||||
|
* <<role-mapping-resources,Role mappings>>
|
||||||
|
|
||||||
* <<ml-dfa-analysis-objects>>
|
|
||||||
* <<role-mapping-resources,Role mappings>>
|
|
||||||
|
|
||||||
include::{es-repo-dir}/ml/df-analytics/apis/analysisobjects.asciidoc[]
|
|
||||||
include::{xes-repo-dir}/rest-api/security/role-mapping-resources.asciidoc[]
|
include::{xes-repo-dir}/rest-api/security/role-mapping-resources.asciidoc[]
|
||||||
|
|
Loading…
Reference in New Issue