70 lines
2.7 KiB
Plaintext
70 lines
2.7 KiB
Plaintext
|
tag::dependent_variable[]
|
|||
|
`dependent_variable`::
|
|||
|
(Required, string) Defines which field of the document is to be predicted.
|
|||
|
This parameter is supplied by field name and must match one of the fields in
|
|||
|
the index being used to train. If this field is missing from a document, then
|
|||
|
that document will not be used for training, but a prediction with the trained
|
|||
|
model will be generated for it. It is also known as continuous target variable.
|
|||
|
end::dependent_variable[]
|
|||
|
|
|||
|
|
|||
|
tag::eta[]
|
|||
|
`eta`::
|
|||
|
(Optional, double) The shrinkage applied to the weights. Smaller values result
|
|||
|
in larger forests which have better generalization error. However, the smaller
|
|||
|
the value the longer the training will take. For more information, see
|
|||
|
https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article]
|
|||
|
about shrinkage.
|
|||
|
end::eta[]
|
|||
|
|
|||
|
|
|||
|
tag::feature_bag_fraction[]
|
|||
|
`feature_bag_fraction`::
|
|||
|
(Optional, double) Defines the fraction of features that will be used when
|
|||
|
selecting a random bag for each candidate split.
|
|||
|
end::feature_bag_fraction[]
|
|||
|
|
|||
|
|
|||
|
tag::gamma[]
|
|||
|
`gamma`::
|
|||
|
(Optional, double) Regularization parameter to prevent overfitting on the
|
|||
|
training dataset. Multiplies a linear penalty associated with the size of
|
|||
|
individual trees in the forest. The higher the value the more training will
|
|||
|
prefer smaller trees. The smaller this parameter the larger individual trees
|
|||
|
will be and the longer train will take.
|
|||
|
end::gamma[]
|
|||
|
|
|||
|
|
|||
|
tag::lambda[]
|
|||
|
`lambda`::
|
|||
|
(Optional, double) Regularization parameter to prevent overfitting on the
|
|||
|
training dataset. Multiplies an L2 regularisation term which applies to leaf
|
|||
|
weights of the individual trees in the forest. The higher the value the more
|
|||
|
training will attempt to keep leaf weights small. This makes the prediction
|
|||
|
function smoother at the expense of potentially not being able to capture
|
|||
|
relevant relationships between the features and the {depvar}. The smaller this
|
|||
|
parameter the larger individual trees will be and the longer train will take.
|
|||
|
end::lambda[]
|
|||
|
|
|||
|
|
|||
|
tag::maximum_number_trees[]
|
|||
|
`maximum_number_trees`::
|
|||
|
(Optional, integer) Defines the maximum number of trees the forest is allowed
|
|||
|
to contain. The maximum value is 2000.
|
|||
|
end::maximum_number_trees[]
|
|||
|
|
|||
|
|
|||
|
tag::prediction_field_name[]
|
|||
|
`prediction_field_name`::
|
|||
|
(Optional, string) Defines the name of the prediction field in the results.
|
|||
|
Defaults to `<dependent_variable>_prediction`.
|
|||
|
end::prediction_field_name[]
|
|||
|
|
|||
|
|
|||
|
tag::training_percent[]
|
|||
|
`training_percent`::
|
|||
|
(Optional, integer) Defines what percentage of the eligible documents that will
|
|||
|
be used for training. Documents that are ignored by the analysis (for example
|
|||
|
those that contain arrays) won’t be included in the calculation for used
|
|||
|
percentage. Defaults to `100`.
|
|||
|
end::training_percent[]
|