[DOCS] Adds outlier detection params to the data frame analytics resources (#46323)

* [DOCS] Adds outlier detection params to the data frame analytics resources.
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
This commit is contained in:
István Zoltán Szabó 2019-09-16 14:21:50 +02:00
parent c8f52ec4ff
commit fe8f33a8e1
1 changed files with 25 additions and 9 deletions

View File

@ -108,10 +108,13 @@ other types will be added, for example `regression`.
An {oldetection} configuration object has the following properties:
`n_neighbors`::
(integer) Defines the value for how many nearest neighbors each method of
{oldetection} will use to calculate its {olscore}. When the value is
not set, the system will dynamically detect an appropriate value.
`compute_feature_influence`::
(boolean) If `true`, the feature influence calculation is enabled. Defaults to
`true`.
`feature_influence_threshold`::
(double) The minimum {olscore} that a document needs to have in order to
calculate its {fiscore}. Value range: 0-1 (`0.1` by default).
`method`::
(string) Sets the method that {oldetection} uses. If the method is not set
@ -119,8 +122,21 @@ An {oldetection} configuration object has the following properties:
combines their individual {olscores} to obtain the overall {olscore}. We
recommend to use the ensemble method. Available methods are `lof`, `ldof`,
`distance_kth_nn`, `distance_knn`.
`feature_influence_threshold`::
(double) The minimum {olscore} that a document needs to have in order to
calculate its {fiscore}.
Value range: 0-1 (`0.1` by default).
`n_neighbors`::
(integer) Defines the value for how many nearest neighbors each method of
{oldetection} will use to calculate its {olscore}. When the value is not set,
different values will be used for different ensemble members. This helps
improve diversity in the ensemble. Therefore, only override this if you are
confident that the value you choose is appropriate for the data set.
`outlier_fraction`::
(double) Sets the proportion of the data set that is assumed to be outlying prior to
{oldetection}. For example, 0.05 means it is assumed that 5% of values are real outliers
and 95% are inliers.
`standardize_columns`::
(boolean) If `true`, then the following operation is performed on the columns
before computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to
`true`. For more information, see
https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].