Adds the following parameters to `outlier_detection`:
- `compute_feature_influence` (boolean): whether to compute or not
feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
to the feature values
Backport of #47600
* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.
* [DOCS] Removes extra lines from examples.
* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
* [DOCS] Explains examples.
* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>
* [DOCS] Adds outlier detection params to the data frame analytics resources.
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
Though we allow CCS within datafeeds, users could prevent nodes from accessing remote clusters. This can cause mysterious errors and difficult to troubleshoot.
This commit adds a check to verify that `cluster.remote.connect` is enabled on the current node when a datafeed is configured with a remote index pattern.
Previously, the stats API reports a progress percentage
for DF analytics tasks that are running and are in the
`reindexing` or `analyzing` state.
This means that when the task is `stopped` there is no progress
reported. Thus, one cannot distinguish between a task that never
run to one that completed.
In addition, there are blind spots in the progress reporting.
In particular, we do not account for when data is loaded into the
process. We also do not account for when results are written.
This commit addresses the above issues. It changes progress
to being a list of objects, each one describing the phase
and its progress as a percentage. We currently have 4 phases:
reindexing, loading_data, analyzing, writing_results.
When the task stops, progress is persisted as a document in the
state index. The stats API now reports progress from in-memory
if the task is running, or returns the persisted document
(if there is one).
This PR addresses the feedback in https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.
* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections