[DOCS] Moves analysis resources to PUT DFA API docs (#50704)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
This commit is contained in:
parent
acd73dda1c
commit
4e1107d5d7
|
@ -1,217 +0,0 @@
|
|||
[role="xpack"]
|
||||
[testenv="platinum"]
|
||||
[[ml-dfa-analysis-objects]]
|
||||
=== Analysis configuration objects
|
||||
|
||||
{dfanalytics-cap} resources contain `analysis` objects. For example, when you
|
||||
create a {dfanalytics-job}, you must define the type of analysis it performs.
|
||||
This page lists all the available parameters that you can use in the `analysis`
|
||||
object grouped by {dfanalytics} types.
|
||||
|
||||
|
||||
[discrete]
|
||||
[[oldetection-resources]]
|
||||
==== {oldetection-cap} configuration objects
|
||||
|
||||
An `outlier_detection` configuration object has the following properties:
|
||||
|
||||
`compute_feature_influence`::
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
|
||||
|
||||
`feature_influence_threshold`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
|
||||
|
||||
`method`::
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=method]
|
||||
|
||||
`n_neighbors`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=n-neighbors]
|
||||
|
||||
`outlier_fraction`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
|
||||
|
||||
`standardization_enabled`::
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
|
||||
|
||||
|
||||
[discrete]
|
||||
[[regression-resources]]
|
||||
==== {regression-cap} configuration objects
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT _ml/data_frame/analytics/house_price_regression_analysis
|
||||
{
|
||||
"source": {
|
||||
"index": "houses_sold_last_10_yrs" <1>
|
||||
},
|
||||
"dest": {
|
||||
"index": "house_price_predictions" <2>
|
||||
},
|
||||
"analysis":
|
||||
{
|
||||
"regression": { <3>
|
||||
"dependent_variable": "price" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[skip:TBD]
|
||||
|
||||
<1> Training data is taken from source index `houses_sold_last_10_yrs`.
|
||||
<2> Analysis results will be output to destination index
|
||||
`house_price_predictions`.
|
||||
<3> The regression analysis configuration object.
|
||||
<4> Regression analysis will use field `price` to train on. As no other
|
||||
parameters have been specified it will train on 100% of eligible data, store its
|
||||
prediction in destination index field `price_prediction` and use in-built
|
||||
hyperparameter optimization to give minimum validation errors.
|
||||
|
||||
|
||||
[float]
|
||||
[[regression-resources-standard]]
|
||||
===== Standard parameters
|
||||
|
||||
`dependent_variable`::
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
|
||||
+
|
||||
--
|
||||
The data type of the field must be numeric.
|
||||
--
|
||||
|
||||
`prediction_field_name`::
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
|
||||
|
||||
`training_percent`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
|
||||
|
||||
`randomize_seed`::
|
||||
(Optional, long)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
|
||||
|
||||
|
||||
[float]
|
||||
[[regression-resources-advanced]]
|
||||
===== Advanced parameters
|
||||
|
||||
Advanced parameters are for fine-tuning {reganalysis}. They are set
|
||||
automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
|
||||
to give minimum validation error. It is highly recommended to use the default
|
||||
values unless you fully understand the function of these parameters. If these
|
||||
parameters are not supplied, their values are automatically tuned to give
|
||||
minimum validation error.
|
||||
|
||||
`eta`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
||||
|
||||
`feature_bag_fraction`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
|
||||
|
||||
`maximum_number_trees`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
|
||||
|
||||
`gamma`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
||||
|
||||
`lambda`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
||||
|
||||
|
||||
[discrete]
|
||||
[[classification-resources]]
|
||||
==== {classification-cap} configuration objects
|
||||
|
||||
|
||||
[float]
|
||||
[[classification-resources-standard]]
|
||||
===== Standard parameters
|
||||
|
||||
`dependent_variable`::
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
|
||||
+
|
||||
--
|
||||
The data type of the field must be numeric (`integer`, `short`, `long`, `byte`),
|
||||
categorical (`ip`, `keyword`, `text`), or boolean.
|
||||
--
|
||||
|
||||
`num_top_classes`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=num-top-classes]
|
||||
|
||||
`prediction_field_name`::
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
|
||||
|
||||
`training_percent`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
|
||||
|
||||
`randomize_seed`::
|
||||
(Optional, long)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
|
||||
|
||||
|
||||
[float]
|
||||
[[classification-resources-advanced]]
|
||||
===== Advanced parameters
|
||||
|
||||
Advanced parameters are for fine-tuning {classanalysis}. They are set
|
||||
automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
|
||||
to give minimum validation error. It is highly recommended to use the default
|
||||
values unless you fully understand the function of these parameters. If these
|
||||
parameters are not supplied, their values are automatically tuned to give
|
||||
minimum validation error.
|
||||
|
||||
`eta`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
||||
|
||||
`feature_bag_fraction`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
|
||||
|
||||
`maximum_number_trees`::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
|
||||
|
||||
`gamma`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
||||
|
||||
`lambda`::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
||||
|
||||
[discrete]
|
||||
[[ml-hyperparam-optimization]]
|
||||
==== Hyperparameter optimization
|
||||
|
||||
If you don't supply {regression} or {classification} parameters, hyperparameter
|
||||
optimization will be performed by default to set a value for the undefined
|
||||
parameters. The starting point is calculated for data dependent parameters by
|
||||
examining the loss on the training data. Subject to the size constraint, this
|
||||
operation provides an upper bound on the improvement in validation loss.
|
||||
|
||||
A fixed number of rounds is used for optimization which depends on the number of
|
||||
parameters being optimized. The optimization starts with random search, then
|
||||
Bayesian optimization is performed that is targeting maximum expected
|
||||
improvement. If you override any parameters, then the optimization will
|
||||
calculate the value of the remaining parameters accordingly and use the value
|
||||
you provided for the overridden parameter. The number of rounds are reduced
|
||||
respectively. The validation error is estimated in each round by using 4-fold
|
||||
cross validation.
|
|
@ -14,8 +14,6 @@ You can use the following APIs to perform {ml} {dfanalytics} activities.
|
|||
* <<evaluate-dfanalytics,Evaluate {dfanalytics}>>
|
||||
* <<explain-dfanalytics,Explain {dfanalytics}>>
|
||||
|
||||
For the `analysis` object resources, check <<ml-dfa-analysis-objects>>.
|
||||
|
||||
|
||||
You can use the following APIs to perform {infer} operations.
|
||||
|
||||
|
|
|
@ -53,41 +53,25 @@ If the destination index already exists, then it will be use as is. This makes
|
|||
it possible to set up the destination index in advance with custom settings
|
||||
and mappings.
|
||||
|
||||
[[ml-put-dfanalytics-supported-fields]]
|
||||
===== Supported fields
|
||||
[discrete]
|
||||
[[ml-hyperparam-optimization]]
|
||||
===== Hyperparameter optimization
|
||||
|
||||
====== {oldetection-cap}
|
||||
|
||||
{oldetection-cap} requires numeric or boolean data to analyze. The algorithms
|
||||
don't support missing values therefore fields that have data types other than
|
||||
numeric or boolean are ignored. Documents where included fields contain missing
|
||||
values, null values, or an array are also ignored. Therefore the `dest` index
|
||||
may contain documents that don't have an {olscore}.
|
||||
|
||||
|
||||
====== {regression-cap}
|
||||
|
||||
{regression-cap} supports fields that are numeric, `boolean`, `text`, `keyword`,
|
||||
and `ip`. It is also tolerant of missing values. Fields that are supported are
|
||||
included in the analysis, other fields are ignored. Documents where included
|
||||
fields contain an array with two or more values are also ignored. Documents in
|
||||
the `dest` index that don’t contain a results field are not included in the
|
||||
{reganalysis}.
|
||||
|
||||
|
||||
====== {classification-cap}
|
||||
|
||||
{classification-cap} supports fields that are numeric, `boolean`, `text`,
|
||||
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
|
||||
supported are included in the analysis, other fields are ignored. Documents
|
||||
where included fields contain an array with two or more values are also ignored.
|
||||
Documents in the `dest` index that don’t contain a results field are not
|
||||
included in the {classanalysis}.
|
||||
|
||||
{classanalysis-cap} can be improved by mapping ordinal variable values to a
|
||||
single number. For example, in case of age ranges, you can model the values as
|
||||
"0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.
|
||||
If you don't supply {regression} or {classification} parameters, _hyperparameter
|
||||
optimization_ occurs, which sets a value for the undefined parameters. The
|
||||
starting point is calculated for data dependent parameters by examining the loss
|
||||
on the training data. Subject to the size constraint, this operation provides an
|
||||
upper bound on the improvement in validation loss.
|
||||
|
||||
A fixed number of rounds is used for optimization which depends on the number of
|
||||
parameters being optimized. The optimization starts with random search, then
|
||||
Bayesian optimization is performed that is targeting maximum expected
|
||||
improvement. If you override any parameters,
|
||||
//TBD: What is meant by overriding them? Explicitly setting the parameter instead of letting it take the default?
|
||||
the optimization calculates the value of the remaining parameters accordingly
|
||||
and uses the value you provided for the overridden parameter. The number of
|
||||
rounds are reduced respectively. The validation error is estimated in each round
|
||||
by using 4-fold cross validation.
|
||||
|
||||
[[ml-put-dfanalytics-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
@ -99,36 +83,170 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
|
|||
[[ml-put-dfanalytics-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
||||
`allow_lazy_start`::
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-lazy-start]
|
||||
|
||||
`analysis`::
|
||||
(Required, object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=analysis]
|
||||
The analysis configuration, which contains the information necessary to perform
|
||||
one of the following types of analysis: {classification}, {oldetection}, or
|
||||
{regression}.
|
||||
//include::{docdir}/ml/ml-shared.asciidoc[tag=analysis]
|
||||
|
||||
`analysis`.`classification`:::
|
||||
(Required^*^, object)
|
||||
The configuration information necessary to perform
|
||||
{ml-docs}/dfa-classification.html[{classification}].
|
||||
+
|
||||
--
|
||||
TIP: Advanced parameters are for fine-tuning {classanalysis}. They are set
|
||||
automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
|
||||
to give minimum validation error. It is highly recommended to use the default
|
||||
values unless you fully understand the function of these parameters.
|
||||
|
||||
--
|
||||
|
||||
`analysis`.`classification`.`dependent_variable`::::
|
||||
(Required, string)
|
||||
+
|
||||
--
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
|
||||
|
||||
The data type of the field must be numeric (`integer`, `short`, `long`, `byte`),
|
||||
categorical (`ip`, `keyword`, `text`), or boolean.
|
||||
--
|
||||
|
||||
`analysis`.`classification`.`eta`::::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
||||
|
||||
`analysis`.`classification`.`feature_bag_fraction`::::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
|
||||
|
||||
`analysis`.`classification`.`maximum_number_trees`::::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
|
||||
|
||||
`analysis`.`classification`.`gamma`::::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
||||
|
||||
`analysis`.`classification`.`lambda`::::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
||||
|
||||
`analysis`.`classification`.`num_top_classes`::::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=num-top-classes]
|
||||
|
||||
`analysis`.`classification`.`prediction_field_name`::::
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
|
||||
|
||||
`analysis`.`classification`.`randomize_seed`::::
|
||||
(Optional, long)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
|
||||
|
||||
`analysis`.`classification`.`training_percent`::::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
|
||||
|
||||
`analysis`.`outlier_detection`:::
|
||||
(Required^*^, object)
|
||||
The configuration information necessary to perform
|
||||
{ml-docs}/dfa-outlier-detection.html[{oldetection}]:
|
||||
|
||||
`analysis`.`outlier_detection`.`compute_feature_influence`::::
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
|
||||
|
||||
`analysis`.`outlier_detection`.`feature_influence_threshold`::::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
|
||||
|
||||
`analysis`.`outlier_detection`.`method`::::
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=method]
|
||||
|
||||
`analysis`.`outlier_detection`.`n_neighbors`::::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=n-neighbors]
|
||||
|
||||
`analysis`.`outlier_detection`.`outlier_fraction`::::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
|
||||
|
||||
`analysis`.`outlier_detection`.`standardization_enabled`::::
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
|
||||
|
||||
`analysis`.`regression`:::
|
||||
(Required^*^, object)
|
||||
The configuration information necessary to perform
|
||||
{ml-docs}/dfa-regression.html[{regression}].
|
||||
+
|
||||
--
|
||||
TIP: Advanced parameters are for fine-tuning {reganalysis}. They are set
|
||||
automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
|
||||
to give minimum validation error. It is highly recommended to use the default
|
||||
values unless you fully understand the function of these parameters.
|
||||
|
||||
--
|
||||
|
||||
`analysis`.`regression`.`dependent_variable`::::
|
||||
(Required, string)
|
||||
+
|
||||
--
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
|
||||
|
||||
The data type of the field must be numeric.
|
||||
--
|
||||
|
||||
`analysis`.`regression`.`eta`::::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
|
||||
|
||||
`analysis`.`regression`.`feature_bag_fraction`::::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
|
||||
|
||||
`analysis`.`regression`.`maximum_number_trees`::::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
|
||||
|
||||
`analysis`.`regression`.`gamma`::::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
|
||||
|
||||
`analysis`.`regression`.`lambda`::::
|
||||
(Optional, double)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
|
||||
|
||||
`analysis`.`regression`.`prediction_field_name`::::
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
|
||||
|
||||
`analysis`.`regression`.`training_percent`::::
|
||||
(Optional, integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
|
||||
|
||||
`analysis`.`regression`.`randomize_seed`::::
|
||||
(Optional, long)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
|
||||
|
||||
`analyzed_fields`::
|
||||
(Optional, object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=analyzed-fields]
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT _ml/data_frame/analytics/loganalytics
|
||||
{
|
||||
"source": {
|
||||
"index": "logdata"
|
||||
},
|
||||
"dest": {
|
||||
"index": "logdata_out"
|
||||
},
|
||||
"analysis": {
|
||||
"outlier_detection": {
|
||||
}
|
||||
},
|
||||
"analyzed_fields": {
|
||||
"includes": [ "request.bytes", "response.counts.error" ],
|
||||
"excludes": [ "source.geo" ]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:setup_logdata]
|
||||
`analyzed_fields`.`excludes`:::
|
||||
(Optional, array)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=analyzed-fields-excludes]
|
||||
|
||||
`analyzed_fields`.`includes`:::
|
||||
(Optional, array)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=analyzed-fields-includes]
|
||||
|
||||
`description`::
|
||||
(Optional, string)
|
||||
|
@ -146,15 +264,9 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=model-memory-limit-dfa]
|
|||
(object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=source-put-dfa]
|
||||
|
||||
`allow_lazy_start`::
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-lazy-start]
|
||||
|
||||
|
||||
[[ml-put-dfanalytics-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
|
||||
[[ml-put-dfanalytics-example-preprocess]]
|
||||
===== Preprocessing actions example
|
||||
|
||||
|
|
|
@ -93,22 +93,47 @@ end::analysis-limits[]
|
|||
|
||||
tag::analyzed-fields[]
|
||||
Specify `includes` and/or `excludes` patterns to select which fields will be
|
||||
included in the analysis. If `analyzed_fields` is not set, only the relevant
|
||||
fields will be included. For example, all the numeric fields for {oldetection}.
|
||||
For the supported field types, see <<ml-put-dfanalytics-supported-fields>>. Also
|
||||
see the <<explain-dfanalytics>> which helps understand field selection.
|
||||
included in the analysis.
|
||||
+
|
||||
--
|
||||
The supported fields for each type of analysis are as follows:
|
||||
|
||||
`includes`:::
|
||||
(Optional, array) An array of strings that defines the fields that will be
|
||||
included in the analysis.
|
||||
|
||||
`excludes`:::
|
||||
(Optional, array) An array of strings that defines the fields that will be
|
||||
excluded from the analysis. You do not need to add fields with unsupported
|
||||
data types to `excludes`, these fields are excluded from the analysis
|
||||
automatically.
|
||||
* {oldetection-cap} requires numeric or boolean data to analyze. The algorithms
|
||||
don't support missing values therefore fields that have data types other than
|
||||
numeric or boolean are ignored. Documents where included fields contain missing
|
||||
values, null values, or an array are also ignored. Therefore the `dest` index
|
||||
may contain documents that don't have an {olscore}.
|
||||
* {regression-cap} supports fields that are numeric, `boolean`, `text`, `keyword`,
|
||||
and `ip`. It is also tolerant of missing values. Fields that are supported are
|
||||
included in the analysis, other fields are ignored. Documents where included
|
||||
fields contain an array with two or more values are also ignored. Documents in
|
||||
the `dest` index that don’t contain a results field are not included in the
|
||||
{reganalysis}.
|
||||
* {classification-cap} supports fields that are numeric, `boolean`, `text`,
|
||||
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
|
||||
supported are included in the analysis, other fields are ignored. Documents
|
||||
where included fields contain an array with two or more values are also ignored.
|
||||
Documents in the `dest` index that don’t contain a results field are not
|
||||
included in the {classanalysis}. {classanalysis-cap} can be improved by mapping
|
||||
ordinal variable values to a single number. For example, in case of age ranges,
|
||||
you can model the values as "0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.
|
||||
|
||||
If `analyzed_fields` is not set, only the relevant fields will be included. For
|
||||
example, all the numeric fields for {oldetection}. For more information about
|
||||
field selection, see <<explain-dfanalytics>>.
|
||||
--
|
||||
end::analyzed-fields[]
|
||||
|
||||
tag::analyzed-fields-excludes[]
|
||||
An array of strings that defines the fields that will be excluded from the
|
||||
analysis. You do not need to add fields with unsupported data types to
|
||||
`excludes`, these fields are excluded from the analysis automatically.
|
||||
end::analyzed-fields-excludes[]
|
||||
|
||||
tag::analyzed-fields-includes[]
|
||||
An array of strings that defines the fields that will be included in the analysis.
|
||||
end::analyzed-fields-includes[]
|
||||
|
||||
tag::background-persist-interval[]
|
||||
Advanced configuration option. The time between each periodic persistence of the
|
||||
model. The default value is a randomized value between 3 to 4 hours, which
|
||||
|
@ -511,11 +536,11 @@ identifier when you want to update a specific detector.
|
|||
end::detector-index[]
|
||||
|
||||
tag::eta[]
|
||||
The shrinkage applied to the weights. Smaller values result
|
||||
in larger forests which have better generalization error. However, the smaller
|
||||
the value the longer the training will take. For more information, see
|
||||
https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article]
|
||||
about shrinkage.
|
||||
Advanced configuration option. The shrinkage applied to the weights. Smaller
|
||||
values result in larger forests which have better generalization error. However,
|
||||
the smaller the value the longer the training will take. For more information
|
||||
about shrinkage, see
|
||||
https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article].
|
||||
end::eta[]
|
||||
|
||||
tag::exclude-frequent[]
|
||||
|
@ -532,8 +557,8 @@ included.
|
|||
end::exclude-interim-results[]
|
||||
|
||||
tag::feature-bag-fraction[]
|
||||
Defines the fraction of features that will be used when
|
||||
selecting a random bag for each candidate split.
|
||||
Advanced configuration option. Defines the fraction of features that will be
|
||||
used when selecting a random bag for each candidate split.
|
||||
end::feature-bag-fraction[]
|
||||
|
||||
tag::feature-influence-threshold[]
|
||||
|
@ -594,10 +619,10 @@ The analysis function that is used. For example, `count`, `rare`, `mean`, `min`,
|
|||
end::function[]
|
||||
|
||||
tag::gamma[]
|
||||
Regularization parameter to prevent overfitting on the
|
||||
training dataset. Multiplies a linear penalty associated with the size of
|
||||
Advanced configuration option. Regularization parameter to prevent overfitting
|
||||
on the training dataset. Multiplies a linear penalty associated with the size of
|
||||
individual trees in the forest. The higher the value the more training will
|
||||
prefer smaller trees. The smaller this parameter the larger individual trees
|
||||
prefer smaller trees. The smaller this parameter the larger individual trees
|
||||
will be and the longer train will take.
|
||||
end::gamma[]
|
||||
|
||||
|
@ -691,10 +716,10 @@ For more information, see <<ml-jobstats>>.
|
|||
end::jobs-stats-anomaly-detection[]
|
||||
|
||||
tag::lambda[]
|
||||
Regularization parameter to prevent overfitting on the
|
||||
training dataset. Multiplies an L2 regularisation term which applies to leaf
|
||||
weights of the individual trees in the forest. The higher the value the more
|
||||
training will attempt to keep leaf weights small. This makes the prediction
|
||||
Advanced configuration option. Regularization parameter to prevent overfitting
|
||||
on the training dataset. Multiplies an L2 regularisation term which applies to
|
||||
leaf weights of the individual trees in the forest. The higher the value the
|
||||
more training will attempt to keep leaf weights small. This makes the prediction
|
||||
function smoother at the expense of potentially not being able to capture
|
||||
relevant relationships between the features and the {depvar}. The smaller this
|
||||
parameter the larger individual trees will be and the longer train will take.
|
||||
|
@ -723,8 +748,8 @@ until it is explicitly stopped. By default this setting is not set.
|
|||
end::max-empty-searches[]
|
||||
|
||||
tag::maximum-number-trees[]
|
||||
Defines the maximum number of trees the forest is allowed
|
||||
to contain. The maximum value is 2000.
|
||||
Advanced configuration option. Defines the maximum number of trees the forest is
|
||||
allowed to contain. The maximum value is 2000.
|
||||
end::maximum-number-trees[]
|
||||
|
||||
tag::memory-estimation[]
|
||||
|
|
|
@ -298,3 +298,9 @@ See <<ml-get-bucket>>,
|
|||
<<ml-get-category>>, and
|
||||
[[ml-results-overall-buckets]]
|
||||
<<ml-get-overall-buckets>>.
|
||||
|
||||
[role="exclude",id="ml-dfa-analysis-objects"]
|
||||
=== Analysis configuration objects
|
||||
|
||||
This page was deleted.
|
||||
See <<put-dfanalytics>>.
|
|
@ -2,11 +2,10 @@
|
|||
[[api-definitions]]
|
||||
== Definitions
|
||||
|
||||
These resource definitions are used in APIs related to {ml-features} and
|
||||
{security-features} and in {kib} advanced {ml} job configuration options.
|
||||
The role mappings resource definition you can find below is used in APIs related
|
||||
to security features.
|
||||
|
||||
* <<role-mapping-resources,Role mappings>>
|
||||
|
||||
* <<ml-dfa-analysis-objects>>
|
||||
* <<role-mapping-resources,Role mappings>>
|
||||
|
||||
include::{es-repo-dir}/ml/df-analytics/apis/analysisobjects.asciidoc[]
|
||||
include::{xes-repo-dir}/rest-api/security/role-mapping-resources.asciidoc[]
|
||||
|
|
Loading…
Reference in New Issue