OpenSearch/docs/reference/ml/ml-shared.asciidoc

85 lines
3.2 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

tag::dependent_variable[]
`dependent_variable`::
(Required, string) Defines which field of the document is to be predicted.
This parameter is supplied by field name and must match one of the fields in
the index being used to train. If this field is missing from a document, then
that document will not be used for training, but a prediction with the trained
model will be generated for it. It is also known as continuous target variable.
end::dependent_variable[]
tag::eta[]
`eta`::
(Optional, double) The shrinkage applied to the weights. Smaller values result
in larger forests which have better generalization error. However, the smaller
the value the longer the training will take. For more information, see
https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article]
about shrinkage.
end::eta[]
tag::feature_bag_fraction[]
`feature_bag_fraction`::
(Optional, double) Defines the fraction of features that will be used when
selecting a random bag for each candidate split.
end::feature_bag_fraction[]
tag::gamma[]
`gamma`::
(Optional, double) Regularization parameter to prevent overfitting on the
training dataset. Multiplies a linear penalty associated with the size of
individual trees in the forest. The higher the value the more training will
prefer smaller trees. The smaller this parameter the larger individual trees
will be and the longer train will take.
end::gamma[]
tag::lambda[]
`lambda`::
(Optional, double) Regularization parameter to prevent overfitting on the
training dataset. Multiplies an L2 regularisation term which applies to leaf
weights of the individual trees in the forest. The higher the value the more
training will attempt to keep leaf weights small. This makes the prediction
function smoother at the expense of potentially not being able to capture
relevant relationships between the features and the {depvar}. The smaller this
parameter the larger individual trees will be and the longer train will take.
end::lambda[]
tag::maximum_number_trees[]
`maximum_number_trees`::
(Optional, integer) Defines the maximum number of trees the forest is allowed
to contain. The maximum value is 2000.
end::maximum_number_trees[]
tag::prediction_field_name[]
`prediction_field_name`::
(Optional, string) Defines the name of the prediction field in the results.
Defaults to `<dependent_variable>_prediction`.
end::prediction_field_name[]
tag::training_percent[]
`training_percent`::
(Optional, integer) Defines what percentage of the eligible documents that will
be used for training. Documents that are ignored by the analysis (for example
those that contain arrays) wont be included in the calculation for used
percentage. Defaults to `100`.
end::training_percent[]
tag::randomize_seed[]
`randomize_seed`::
(Optional, long) Defines the seed to the random generator that is used to pick
which documents will be used for training. By default it is randomly generated.
Set it to a specific value to ensure the same documents are used for training
assuming other related parameters (e.g. `source`, `analyzed_fields`, etc.) are the same.
end::randomize_seed[]
tag::use-null[]
Defines whether a new series is used as the null series when there is no value
for the by or partition fields. The default value is `false`.
end::use-null[]