35 Commits

Author SHA1 Message Date
David Roberts
1421471556
[ML] Introduce a "starting" datafeed state for lazy jobs (#54065)
It is possible for ML jobs to open lazily if the "allow_lazy_open"
option in the job config is set to true.  Such jobs wait in the
"opening" state until a node has sufficient capacity to run them.

This commit fixes the bug that prevented datafeeds for jobs lazily
waiting assignment from being started.  The state of such datafeeds
is "starting", and they can be stopped by the stop datafeed API
while in this state with or without force.

Backport of #53918
2020-03-24 13:00:04 +00:00
Tom Veasey
690099553c
[7.x][ML] Adds the class_assignment_objective parameter to classification (#53552)
Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.

Fixes #52427.
2020-03-13 17:35:51 +00:00
István Zoltán Szabó
969164cc47 [DOCS] Adds a warning about reindexing docs with the same ID to the PUT DFA docs. (#53490) 2020-03-12 18:01:43 +01:00
Benjamin Trent
89668c5ea0
[ML][Inference] adds new default_field_map field to trained models (#53294) (#53419)
Adds a new `default_field_map` field to trained model config objects.

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 13:49:39 -04:00
Dimitris Athanasiou
0fd0516d0d
[7.x][ML] Rename data frame analytics maximum_number_trees to max_trees (#53300) (#53390)
Deprecates `maximum_number_trees` parameter of classification and
regression and replaces it with `max_trees`.

Backport of #53300
2020-03-11 12:45:27 +02:00
István Zoltán Szabó
58ce56f6c8 [DOCS] Makes the naming convention of the DFA response objects coherent (#53172) 2020-03-05 16:26:57 +01:00
István Zoltán Szabó
48707ec55a [DOCS] Expands GET DFA stat API docs with response objects. (#53107) 2020-03-05 15:31:55 +01:00
Lisa Cawley
892f0d5848 [DOCS] Adds link in datafeed indices_options (#53067) 2020-03-03 10:31:32 -08:00
István Zoltán Szabó
6cece3a709 [DOCS] Adds response body documentation to GET inference API (#53050) 2020-03-03 16:26:40 +01:00
Lisa Cawley
4fbe1b0550
[DOCS] Adds cat anomaly detectors API (#52866) (#52970) 2020-03-02 07:28:55 -08:00
Dimitris Athanasiou
85b4e45093
[7.x]ML] Parse and report memory usage for DF Analytics (#52778) (#52980)
Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern `.ml-stats-*` whose
first concrete index will be `.ml-stats-000001`. This index serves
to store instrumentation information for those jobs.

Backport of #52778 and #52958
2020-02-29 13:03:40 +02:00
Benjamin Trent
eac38e9847
[ML] Add indices_options to datafeed config and update (#52793) (#52905)
This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index.

This is necessary for the following use cases:
 - Reading from frozen indices
 - Allowing certain indices in multiple index patterns to not exist yet

These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object.

closes https://github.com/elastic/elasticsearch/issues/48056
2020-02-27 13:43:25 -05:00
Lisa Cawley
b788ec7157 [DOCS] Adds cat datafeeds API (#52738) 2020-02-26 09:28:57 -08:00
István Zoltán Szabó
f57422bbfd [DOCS] Adds cat data frame analytics API (#52764)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-02-26 11:10:42 +01:00
Lisa Cawley
123b3c6f55 [DOCS] Clarifies description of num_top_feature_importance_values (#52246)
Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
2020-02-18 08:50:21 -08:00
Lisa Cawley
40b58e612d [DOCS] Fixes, sorts ML tagged regions (#52283) 2020-02-12 13:52:34 -08:00
Benjamin Trent
76660a5a4f
[7.x] [ML][Inference] add tags url param to GET (#51330) (#51404)
* [ML][Inference] add tags url param to GET (#51330)

Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint.

This parameter allows the list of models to be further reduced to those who contain all the provided tags.
2020-01-24 08:26:58 -05:00
István Zoltán Szabó
8bdf654cc7 [DOCS] Refines description. (#51400) 2020-01-24 13:34:25 +01:00
Lisa Cawley
4590d4156a [DOCS] Clarify interval, frequency, and bucket span in ML APIs and example (#51280) 2020-01-22 08:15:46 -08:00
David Kyle
ca4b90a001
[ML] Calculate results and snapshot retention using latest bucket timestamps (#51061) (#51301)
The retention period is calculated relative to the last bucket result or snapshot
time rather than wall clock
2020-01-22 14:52:33 +00:00
Dimitris Athanasiou
1d8cb3c741
[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)
Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
2020-01-14 16:46:09 +02:00
lcawl
8a5de4f56f [DOCS] Clarify detector_index property in ML APIs (#50723) 2020-01-09 08:34:34 -08:00
István Zoltán Szabó
4f150e4961
[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793) 2020-01-09 16:21:35 +01:00
István Zoltán Szabó
71afeec7d0 Revert "[DOCS] Moves analysis resources to PUT DFA API docs (#50704)"
This reverts commit 4e1107d5d717599ddf1632c3253de6f1df4a51af.
2020-01-09 14:31:35 +01:00
István Zoltán Szabó
4e1107d5d7 [DOCS] Moves analysis resources to PUT DFA API docs (#50704)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-01-09 14:13:37 +01:00
István Zoltán Szabó
0bcbddecf8 [DOCS] Fine-tunes training_percent definition. (#50601) 2020-01-03 14:51:03 +01:00
Lisa Cawley
81a9cff16f
[7.x][DOCS] Remove redundant results from ML APIs (#50565) 2020-01-02 11:23:26 -08:00
Lisa Cawley
f8eef43fc6
[7.x][DOCS] Move model snapshot resource definitions into APIs (#50540) 2019-12-31 10:53:05 -08:00
Lisa Cawley
72840c0cb2
[7.x][DOCS] Move anomaly detection job resource definitions into APIs (#50490) 2019-12-27 13:30:26 -08:00
Lisa Cawley
d479e0563a
[7.x][DOCS] Augments ML shared definitions (#50487) 2019-12-24 10:22:05 -08:00
Lisa Cawley
2106a7b02a
[7.x][DOCS] Updates ML links (#50387) (#50409) 2019-12-20 10:01:19 -08:00
István Zoltán Szabó
5759a263cb [DOCS] Adds GET, GET stats and DELETE inference APIs (#50224)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-12-18 09:18:56 +01:00
István Zoltán Szabó
7611b3c9be
[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)
* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
2019-12-13 11:48:21 +01:00
Dimitris Athanasiou
8891f4db88
[7.x][ML] Introduce randomize_seed setting for regression and classification (#49990) (#50023)
This adds a new `randomize_seed` for regression and classification.
When not explicitly set, the seed is randomly generated. One can
reuse the seed in a similar job in order to ensure the same docs
are picked for training.

Backport of #49990
2019-12-10 15:29:19 +02:00
István Zoltán Szabó
3c9bd13dca [DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241) 2019-11-06 07:41:38 -05:00