[DOCS] Adds outlier detection params to the data frame analytics resources (#46323)

* [DOCS] Adds outlier detection params to the data frame analytics resources. Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com> Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2025-03-24 17:09:48 +00:00 · 2019-09-16 14:21:50 +02:00 · 2019-09-16 14:21:50 +02:00 · fe8f33a8e1
commit fe8f33a8e1
parent c8f52ec4ff
1 changed files with 25 additions and 9 deletions
--- a/docs/reference/ml/df-analytics/apis/dfanalyticsresources.asciidoc
+++ b/docs/reference/ml/df-analytics/apis/dfanalyticsresources.asciidoc
@ -108,10 +108,13 @@ other types will be added, for example `regression`.

 An {oldetection} configuration object has the following properties:

-`n_neighbors`::
-  (integer) Defines the value for how many nearest neighbors each method of 
-  {oldetection} will use to calculate its {olscore}. When the value is 
-  not set, the system will dynamically detect an appropriate value.
+`compute_feature_influence`::
+  (boolean) If `true`, the feature influence calculation is enabled. Defaults to 
+  `true`.
+  
+`feature_influence_threshold`:: 
+  (double) The minimum {olscore} that a document needs to have in order to 
+  calculate its {fiscore}. Value range: 0-1 (`0.1` by default).

 `method`::
  (string) Sets the method that {oldetection} uses. If the method is not set 
@ -119,8 +122,21 @@ An {oldetection} configuration object has the following properties:
  combines their individual {olscores} to obtain the overall {olscore}. We 
  recommend to use the ensemble method. Available methods are `lof`, `ldof`, 
  `distance_kth_nn`, `distance_knn`.
-
-`feature_influence_threshold`:: 
-  (double) The minimum {olscore} that a document needs to have in order to 
-  calculate its {fiscore}. 
-  Value range: 0-1 (`0.1` by default).
+  
+`n_neighbors`::
+  (integer) Defines the value for how many nearest neighbors each method of 
+  {oldetection} will use to calculate its {olscore}. When the value is not set, 
+  different values will be used for different ensemble members. This helps 
+  improve diversity in the ensemble. Therefore, only override this if you are 
+  confident that the value you choose is appropriate for the data set.
+  
+`outlier_fraction`::
+  (double) Sets the proportion of the data set that is assumed to be outlying prior to 
+  {oldetection}. For example, 0.05 means it is assumed that 5% of values are real outliers 
+  and 95% are inliers.
+  
+`standardize_columns`::
+  (boolean) If `true`, then the following operation is performed on the columns 
+  before computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to 
+  `true`. For more information, see 
+  https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].