This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.
A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.
The APIs are:
- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}
When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:
1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index
In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:
- POST _ml/data_frame/_evaluate
The existing language was misleading about the model snapshots and where they are located. Saying "to disk" sounds like files external to Elasticsearch IMO. It raises the obvious question, where on disk? which node? Is it in the Elasticsearch snapshot repo? The model snapshots are held in an internal index.
* Example of how to set slow logs dynamically per-index
* Make _settings API example more explicit
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
* Add TEST directive to fix CI
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
the geo-bounding-box and phrase-suggest docs were susceptible to
failing due to other indices in the cluster. This change restricts
the queries to the index that is set up for the test.
relates to #43271.
This commit tweaks the docs for secure settings to ensure the user is
aware adding non secure settings to the keystore will result in
elasticsearch not starting.
fixes#43328
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
Together with types removal, any mention of "fields with the same name in the same index" doesn't make sense anymore.
(cherry picked from commit c5190106cbd4c007945156249cce462956933326)
* [ML][Data Frame] adds new pipeline field to dest config (#43124)
* [ML][Data Frame] adds new pipeline field to dest config
* Adding pipeline support to _preview
* removing unused import
* moving towards extracting _source from pipeline simulation
* fixing permission requirement, adding _index entry to doc
* adjusting for java 8 compatibility
* adjusting bwc serialization version to 7.3.0
These docs were misleading for package installations of
Elasticsearch. Instead, we should refer to $ES_CONFIG/ingest-geoip as
the path to place the custom database files. For non-package
installations, this is the same as $ES_HOME/config, but for package
installations this is not the case as the config directory for package
installations is /etc/elasticsearch, and is not relative to
$ES_HOME. This commit corrects the docs.
* [DOCS] Add introduction to Elasticsearch.
* [DOCS] Incorporated review comments.
* [DOCS] Minor edits to add an abbreviated title and cross refs.
* [DOCS] Added sizing tips & link to quantatative sizing video.
To be consistent with the `search.max_buckets` default setting,
set the hard limit of the PriorityQueue used for in memory sorting,
when sorting on an aggregate function, to 10000.
Fixes: #43168
(cherry picked from commit 079e012fdea68ea0a7daae078359495047e9c407)
The machine learning feature of xpack has native binaries with a
different commit id than the rest of code. It is currently exposed in
the xpack info api. This commit adds that commit information to the ML
info api, so that it may be removed from the info api.