Commit Graph

91 Commits

Author SHA1 Message Date
Benjamin Trent a28547c4b4
[7.x] [ML] add new `custom` field to trained model processors (#59542) (#59700)
* [ML] add new `custom` field to trained model processors (#59542)

This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 10:57:38 -04:00
Przemysław Witek df4fea79cb
Add a "verbose" option to the data frame analytics stats endpoint (#59589) (#59621) 2020-07-16 09:51:31 +02:00
Dimitris Athanasiou b2243337d8
[7.x][ML] Data frame analytics max_num_threads setting (#59254) (#59308)
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.

Backport of #59254 and #57274
2020-07-09 19:15:46 +03:00
Benjamin Trent b9d9964d10
[ML] add exponent output aggregator to inference (#58933) (#59016)
* [ML] add exponent output aggregator to inference

* fixing docs

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-03 14:51:00 -04:00
Przemysław Witek 751e84e4c8
Rename regression evaluation metrics to make the names consistent with loss functions (#58887) (#58927) 2020-07-02 17:35:55 +02:00
Przemysław Witek 909649dd15
[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) (#58825) 2020-07-01 14:52:06 +02:00
Przemysław Witek 9ea9b7bd3b
[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) (#58731) 2020-06-30 14:09:11 +02:00
Przemysław Witek 3f7c45472e
[7.x] Introduce DataFrameAnalyticsConfig update API (#58302) (#58648) 2020-06-29 10:56:11 +02:00
Dimitris Athanasiou 1817b896c9
[7.x][ML] Add status and increased estimate to memory usage (#58588) (#58606)
Adds parsing of `status` and `memory_reestimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`memory_reestimate_bytes` can be used to update the job's
limit in order to restart the job.

Backport of #58588
2020-06-28 16:27:26 +03:00
Benjamin Trent bf8641aa15
[7.x] [ML] calculate cache misses for inference and return in stats (#58252) (#58363)
When a local model is constructed, the cache hit miss count is incremented.

When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
2020-06-19 09:46:51 -04:00
Dimitris Athanasiou f49a14ce6f
[7.x][ML] Fix race condition when force stopping DF analytics job (#57680) (#57717)
When we force delete a DF analytics job, we currently first force
stop it and then we proceed with deleting the job config.
This may result in logging errors if the job config is deleted
before it is retrieved while the job is starting.

Instead of force stopping the job, it would make more sense to
try to stop the job gracefully first. So we now try that out first.
If normal stop fails, then we resort to force stopping the job to
ensure we can go through with the delete.

In addition, this commit introduces `timeout` for the delete action
and makes use of it in the child requests.

Backport of #57680
2020-06-05 17:50:01 +03:00
Lisa Cawley a1514c9ffe
[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467) 2020-06-01 13:46:15 -07:00
Benjamin Trent 35d5126cea
[7.x] [ML] adds new for_export flag to GET _ml/inference API (#57351) (#57368)
* [ML] adds new for_export flag to GET _ml/inference API (#57351)

Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.

This flag is useful for moving models between clusters.
2020-05-29 14:01:08 -04:00
István Zoltán Szabó e1cab4feb4 [DOCS] Puts a link into the loss_function variable description (#56678) 2020-05-28 09:46:11 +02:00
István Zoltán Szabó 27f258711a [DOCS] Fixes formatting of admonition paragraph in PUT inference API docs. (#57196) 2020-05-27 13:43:55 +02:00
István Zoltán Szabó ebe1e4c4c4 [DOCS] Expands GET DFA stats API docs with new phases (#56407)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-11 09:26:15 +02:00
Dimitris Athanasiou 75dadb7a6d
[7.x][ML] Add loss_function to regression (#56118) (#56187)
Adds parameters `loss_function` and `loss_function_parameter`
to regression.

Backport of #56118
2020-05-05 14:59:51 +03:00
István Zoltán Szabó 9bcc975bd1 [DOCS] Simplifies footnote text in DFA APIs (#56105)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-05 09:05:08 +02:00
Lisa Cawley b816ab0c18
[DOCS] Synchs and links hyperparameter descriptions (#56131) 2020-05-04 10:37:26 -07:00
Lisa Cawley 006e00ed0a
[DOCS] Adds documentation for secondary authorization headers (#55365) (#55986) 2020-04-29 16:29:38 -07:00
István Zoltán Szabó a5cf4712e5 [DOCS] Changes feature importance links to point to the new page (#55531)
* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
2020-04-28 09:03:43 +02:00
Lisa Cawley 314ca78e31
[7.x][DOCS] Update example and nesting in get data frame analytics job stats API (#55612) 2020-04-22 10:58:26 -07:00
David Roberts da5aeb8be7
[ML] Return assigned node in start/open job/datafeed response (#55570)
Adds a "node" field to the response from the following endpoints:

1. Open anomaly detection job
2. Start datafeed
3. Start data frame analytics job

If the job or datafeed is assigned to a node immediately then
this field will return the ID of that node.

In the case where a job or datafeed is opened or started lazily
the node field will contain an empty string.  Clients that want
to test whether a job or datafeed was opened or started lazily
can therefore check for this.

Backport of #55473
2020-04-22 12:06:53 +01:00
Benjamin Trent 24d41eb695
[ML] partitions model definitions into chunks (#55260) (#55484)
This paves the data layer way so that exceptionally large models are partitioned across multiple documents.

This change means that nodes before 7.8.0 will not be able to use trained inference models created on nodes on or after 7.8.0.

I chose the definition document limit to be 100. This *SHOULD* be plenty for any large model. One of the largest models that I have created so far had the following stats:
~314MB of inflated JSON, ~66MB when compressed, ~177MB of heap.
With the chunking sizes of `16 * 1024 * 1024` its compressed string could be partitioned to 5 documents.
Supporting models 20 times this size (compressed) seems adequate for now.
2020-04-20 16:08:54 -04:00
Lisa Cawley c7cf6e621d [DOCS] Remove text fields from classification dependent variables (#54849) 2020-04-16 13:40:28 -07:00
Benjamin Trent 8ff2cbf1a3
[7.x] [ML] adding prediction_field_type to inference config (#55128) (#55230)
* [ML] adding prediction_field_type to inference config (#55128)

Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. 

Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. 

Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). 

Analytics provides the default `prediction_field_type` when the model is created from the process.
2020-04-15 09:45:22 -04:00
Lisa Cawley 2910d01179
[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192) 2020-04-14 18:47:09 -07:00
lcawl fcd96db006 [DOCS] Edits create data frame analytics job API (#54751) 2020-04-13 10:43:52 -07:00
István Zoltán Szabó 374f633b6e [DOCS] Adds link points to the data frame analytics supported fields (#55004)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-09 11:27:57 -07:00
István Zoltán Szabó 4cba1e6368 [DOCS] Changes kibana_user to kibana_admin in DFA API prerequisites. (#54806) 2020-04-06 15:46:18 +02:00
István Zoltán Szabó d025b90cd1 [DOCS] Makes PUT inference API docs collapsible (#54653)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-03 09:48:53 +02:00
Benjamin Trent 4a1610265f
[7.x] [ML] add new inference_config field to trained model config (#54421) (#54647)
* [ML] add new inference_config field to trained model config (#54421)

A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder.

The inference processor can still override whatever is set as the default in the trained model config.

* fixing for backport
2020-04-02 12:25:10 -04:00
Benjamin Trent 65233383f6
[7.x] [ML] prefer secondary authorization header for data[feed|frame] authz (#54121) (#54645)
* [ML] prefer secondary authorization header for data[feed|frame] authz (#54121)

Secondary authorization headers are to be used to facilitate Kibana spaces support + ML jobs/datafeeds.

Now on PUT/Update/Preview datafeed, and PUT data frame analytics the secondary authorization is preferred over the primary (if provided).

closes https://github.com/elastic/elasticsearch/issues/53801

* fixing for backport
2020-04-02 11:20:25 -04:00
Lisa Cawley 922ec8e961
[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526) 2020-03-31 12:51:04 -07:00
Dimitris Athanasiou 0b25e3b66c
[7.x][ML] Fix DF analytics explain API request in docs (#54510) (#54514)
The explain API expects a data frame analytics config
as its request.

Backport of #54410
2020-03-31 18:56:52 +03:00
István Zoltán Szabó eeb23e9e73 [DOCS] Adds description of analysis_stats object and its properties to GET DFA stats API docs (#53881)
Co-authored-by: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-03-31 13:30:06 +02:00
Tom Veasey 690099553c
[7.x][ML] Adds the class_assignment_objective parameter to classification (#53552)
Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.

Fixes #52427.
2020-03-13 17:35:51 +00:00
Dimitris Athanasiou 0fd0516d0d
[7.x][ML] Rename data frame analytics maximum_number_trees to max_trees (#53300) (#53390)
Deprecates `maximum_number_trees` parameter of classification and
regression and replaces it with `max_trees`.

Backport of #53300
2020-03-11 12:45:27 +02:00
István Zoltán Szabó 6cece3a709 [DOCS] Adds response body documentation to GET inference API (#53050) 2020-03-03 16:26:40 +01:00
Lisa Cawley 123b3c6f55 [DOCS] Clarifies description of num_top_feature_importance_values (#52246)
Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
2020-02-18 08:50:21 -08:00
David Kyle 8f10a7c6ca [ML] Make Ensemble feature names optional (#51996)
The featureNames field is requisite in individual models but is not required by the Ensemble.
2020-02-07 10:08:37 +00:00
István Zoltán Szabó dfc9f2330c [DOCS] Adds PUT inference API docs (#51231)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-01-31 13:13:34 +01:00
Benjamin Trent 76660a5a4f
[7.x] [ML][Inference] add tags url param to GET (#51330) (#51404)
* [ML][Inference] add tags url param to GET (#51330)

Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint.

This parameter allows the list of models to be further reduced to those who contain all the provided tags.
2020-01-24 08:26:58 -05:00
Dimitris Athanasiou b70ebdeb96
[7.x][ML] DF Analytics _explain API should skip object fields (#51115) (#51147)
Object fields cannot be used as features. At the moment _explain
API includes them and even worse it allows it does not error when
an object field is excluded. This creates the expectation to the
user that all children fields will also be excluded while it's not
the case.

This commit omits object fields from the _explain API and also
adds an error if an object field is included or excluded.

Backport of #51115
2020-01-17 14:02:59 +02:00
Christoph Büscher d291f189a8
Fix hardcoded version replacement in put-dfanalytics.asciidoc #51053
The version replacement for the code snippet should replace 7.6 with the current version,
but doesn't match because of a missing whitespace.

Closes #51052
2020-01-15 18:09:37 +01:00
Przemysław Witek b4a631277a
Add missing docs for new evaluation metrics (#50967) (#51041) 2020-01-15 15:53:42 +01:00
Dimitris Athanasiou 1d8cb3c741
[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)
Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
2020-01-14 16:46:09 +02:00
István Zoltán Szabó 4f150e4961
[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793) 2020-01-09 16:21:35 +01:00
István Zoltán Szabó 71afeec7d0 Revert "[DOCS] Moves analysis resources to PUT DFA API docs (#50704)"
This reverts commit 4e1107d5d7.
2020-01-09 14:31:35 +01:00
István Zoltán Szabó 4e1107d5d7 [DOCS] Moves analysis resources to PUT DFA API docs (#50704)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-01-09 14:13:37 +01:00