2019-11-06 07:40:27 -05:00
|
|
|
|
tag::dependent_variable[]
|
|
|
|
|
`dependent_variable`::
|
|
|
|
|
(Required, string) Defines which field of the document is to be predicted.
|
|
|
|
|
This parameter is supplied by field name and must match one of the fields in
|
|
|
|
|
the index being used to train. If this field is missing from a document, then
|
|
|
|
|
that document will not be used for training, but a prediction with the trained
|
|
|
|
|
model will be generated for it. It is also known as continuous target variable.
|
|
|
|
|
end::dependent_variable[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tag::eta[]
|
|
|
|
|
`eta`::
|
|
|
|
|
(Optional, double) The shrinkage applied to the weights. Smaller values result
|
|
|
|
|
in larger forests which have better generalization error. However, the smaller
|
|
|
|
|
the value the longer the training will take. For more information, see
|
|
|
|
|
https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article]
|
|
|
|
|
about shrinkage.
|
|
|
|
|
end::eta[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tag::feature_bag_fraction[]
|
|
|
|
|
`feature_bag_fraction`::
|
|
|
|
|
(Optional, double) Defines the fraction of features that will be used when
|
|
|
|
|
selecting a random bag for each candidate split.
|
|
|
|
|
end::feature_bag_fraction[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tag::gamma[]
|
|
|
|
|
`gamma`::
|
|
|
|
|
(Optional, double) Regularization parameter to prevent overfitting on the
|
|
|
|
|
training dataset. Multiplies a linear penalty associated with the size of
|
|
|
|
|
individual trees in the forest. The higher the value the more training will
|
|
|
|
|
prefer smaller trees. The smaller this parameter the larger individual trees
|
|
|
|
|
will be and the longer train will take.
|
|
|
|
|
end::gamma[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tag::lambda[]
|
|
|
|
|
`lambda`::
|
|
|
|
|
(Optional, double) Regularization parameter to prevent overfitting on the
|
|
|
|
|
training dataset. Multiplies an L2 regularisation term which applies to leaf
|
|
|
|
|
weights of the individual trees in the forest. The higher the value the more
|
|
|
|
|
training will attempt to keep leaf weights small. This makes the prediction
|
|
|
|
|
function smoother at the expense of potentially not being able to capture
|
|
|
|
|
relevant relationships between the features and the {depvar}. The smaller this
|
|
|
|
|
parameter the larger individual trees will be and the longer train will take.
|
|
|
|
|
end::lambda[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tag::maximum_number_trees[]
|
|
|
|
|
`maximum_number_trees`::
|
|
|
|
|
(Optional, integer) Defines the maximum number of trees the forest is allowed
|
|
|
|
|
to contain. The maximum value is 2000.
|
|
|
|
|
end::maximum_number_trees[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tag::prediction_field_name[]
|
|
|
|
|
`prediction_field_name`::
|
|
|
|
|
(Optional, string) Defines the name of the prediction field in the results.
|
|
|
|
|
Defaults to `<dependent_variable>_prediction`.
|
|
|
|
|
end::prediction_field_name[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tag::training_percent[]
|
|
|
|
|
`training_percent`::
|
|
|
|
|
(Optional, integer) Defines what percentage of the eligible documents that will
|
|
|
|
|
be used for training. Documents that are ignored by the analysis (for example
|
|
|
|
|
those that contain arrays) won’t be included in the calculation for used
|
|
|
|
|
percentage. Defaults to `100`.
|
2019-12-10 08:29:19 -05:00
|
|
|
|
end::training_percent[]
|
|
|
|
|
|
|
|
|
|
tag::randomize_seed[]
|
|
|
|
|
`randomize_seed`::
|
|
|
|
|
(Optional, long) Defines the seed to the random generator that is used to pick
|
|
|
|
|
which documents will be used for training. By default it is randomly generated.
|
|
|
|
|
Set it to a specific value to ensure the same documents are used for training
|
|
|
|
|
assuming other related parameters (e.g. `source`, `analyzed_fields`, etc.) are the same.
|
|
|
|
|
end::randomize_seed[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tag::use-null[]
|
|
|
|
|
Defines whether a new series is used as the null series when there is no value
|
|
|
|
|
for the by or partition fields. The default value is `false`.
|
|
|
|
|
end::use-null[]
|