OpenSearch/docs/reference
Benjamin Trent afd90647c9
[ML] Adds feature importance to option to inference processor (#52218) (#52666)
This adds machine learning model feature importance calculations to the inference processor.

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : {
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888).

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7.

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 18:42:31 -05:00
..
aggregations [DOCS] Links transforms in aggregation docs (#52563) 2020-02-21 08:23:34 +01:00
analysis [DOCS] Fixed typo in jump link. (#52302) 2020-02-12 17:53:00 -08:00
autoscaling Add autoscaling API skelton (#51564) 2020-02-06 21:55:01 -05:00
cat [DOCS] Fix `disk.used_percent` typo in `_cat/nodes` docs (#51854) 2020-02-04 09:15:56 -05:00
ccr Remove outdated requirement of CCR (#50859) 2020-01-13 20:00:23 -05:00
cluster [DOCS] Add missing `indices` parms returned by `_nodes/stats` (#52055) 2020-02-21 08:15:59 -05:00
commands [Docs] Fix typo in node-tool.asciidoc (#51667) 2020-01-31 10:36:21 +01:00
docs [DOCS] Fixes "enables you to" typos (#50225) 2019-12-23 14:39:14 -05:00
eql [DOCS] Add EQL limitations page (#52001) 2020-02-12 08:45:43 -05:00
graph [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502) 2019-09-09 13:38:14 -04:00
high-availability [DOCS] Move snapshot-restore out of modules. (#49618) (#50829) 2020-01-09 16:55:46 -08:00
how-to [DOCS] Fix index_prefixes link in 'faster prefix queries' docs (#51833) 2020-02-04 08:40:18 -05:00
ilm [7.x] Allow forcemerge in the hot phase for ILM policies (#520… (#52083) 2020-02-10 08:54:49 -07:00
images SQL: update ODBC docs, cover Cloud ID, latest params (#52291) 2020-02-19 17:42:28 +01:00
index-modules Deprecate translog retention settings (#51588) (#51638) 2020-01-30 09:03:10 -05:00
indices [DOC] Remove definition typo in update alias API docs (#52184) 2020-02-14 08:31:21 -05:00
ingest [ML] Adds feature importance to option to inference processor (#52218) (#52666) 2020-02-21 18:42:31 -05:00
licensing [DOCS] Augments update license API (#51903) 2020-02-05 11:08:11 -08:00
mapping Add support for multipoint geoshape queries (#52133) (#52553) 2020-02-21 07:45:53 +01:00
migration Make `date_range` query rounding consistent with `date` (#50237) (#51741) 2020-01-31 15:35:05 +01:00
ml [DOCS] Clarifies description of num_top_feature_importance_values (#52246) 2020-02-18 08:50:21 -08:00
modules Deprecate fixed_auto_queue_size thread pool type (#52399) 2020-02-20 11:11:06 +01:00
monitoring Stricter checks of setup and teardown in docs tests (#51430) 2020-01-28 16:52:23 +01:00
query-dsl Add a cluster setting to disallow expensive queries (#51385) (#52279) 2020-02-12 22:56:14 +01:00
release-notes Correct release notes for 7.5 (#52660) 2020-02-21 14:59:46 -05:00
rest-api [7.x][DOCS] Adds X-Pack usage API (#52592) 2020-02-21 06:57:11 -08:00
rollup [DOCS] Merge rollup config details into API (#49412) 2019-11-22 08:39:49 -08:00
scripting Scripting: Add char position of script errors (#51069) (#51266) 2020-01-21 13:45:59 -07:00
search [DOCS] Fixed typo. (#52071) 2020-02-07 11:04:56 -08:00
settings [DOCS] Correct important note for xpack.transform.enabled (#52194) 2020-02-11 13:02:10 +00:00
setup [DOCS] Switch to standard ESS trial links (#52552) 2020-02-21 12:07:10 -05:00
slm Correct SLM retention timezone documentation (#52533) 2020-02-19 13:46:43 -07:00
snapshot-restore Backporting updates to ILM org, overview, & GS (#51898) 2020-02-04 16:45:18 -08:00
sql SQL: specify command to run the CLI on a remote machine without Elasticsearch (#52626) 2020-02-21 13:29:58 +02:00
testing Uppercasing some docs section title (#37781) 2019-01-24 22:54:55 +01:00
transform [DOCS] Correct important note for xpack.transform.enabled (#52194) 2020-02-11 13:02:10 +00:00
upgrade Deprecate synced flush (#50835) 2020-01-13 19:54:38 -05:00
vectors Remove the 'experimental' marking from vector fields. (#49120) 2019-11-18 12:42:46 -08:00
aggregations.asciidoc [Docs] Update aggregations.asciidoc (#29265) 2018-03-28 15:01:45 +02:00
analysis.asciidoc [DOCS] Add attribute for Lucene analysis links (#51687) 2020-01-30 11:24:01 -05:00
api-conventions.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502) 2019-09-09 13:38:14 -04:00
cat.asciidoc [DOCS] Use `s` parameter in cat API overview example (#50616) 2020-01-14 08:22:07 -05:00
cluster.asciidoc Password-protected Keystore Feature Branch PR (#51123) (#51510) 2020-01-28 05:32:32 -05:00
data-rollup-transform.asciidoc [DOCS] Adds transforms to Elasticsearch book (#46846) (#47055) 2019-09-25 08:11:37 -07:00
docs.asciidoc [DOCS] Remove heading offsets for REST APIs (#44568) 2019-07-19 14:36:06 -04:00
frozen-indices.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502) 2019-09-09 13:38:14 -04:00
getting-started.asciidoc [DOCS] Switch to standard ESS trial links (#52552) 2020-02-21 12:07:10 -05:00
glossary.asciidoc Backporting updates to ILM org, overview, & GS (#51898) 2020-02-04 16:45:18 -08:00
gs-index.asciidoc [DOCS] Adding index file for GS "mini book". 2017-07-18 13:44:08 -07:00
high-availability.asciidoc [DOCS] Remove leveloffset for CCR docs (#46818) 2019-09-18 09:44:43 -04:00
how-to.asciidoc Correct grammar in list in how-to docs 2017-01-17 20:57:22 -05:00
index-modules.asciidoc Deprecate creation of dot-prefixed index names except for hidden and system indices (#49959) 2020-01-28 10:01:16 -07:00
index.asciidoc [DOCS] Include docs on permanently unreleased branches only (#51743) 2020-02-11 11:24:13 -05:00
index.x.asciidoc [DOCS] Removes redundant index.asciidoc files (#30707) 2018-05-18 11:05:40 -07:00
indices.asciidoc [DOCS] Reorder index APIs alphabetically (#46981) (#47402) 2019-10-01 17:07:28 -04:00
ingest.asciidoc Replace required pipeline with final pipeline (#49470) 2019-11-22 14:37:36 -05:00
intro.asciidoc [7.x][DOCS] Updates ML links (#50387) (#50409) 2019-12-20 10:01:19 -08:00
mapping.asciidoc [DOCS] Note clause limit in `index.mapping.total_fields.limit` docs (#48153) 2019-10-18 10:20:49 -04:00
modules.asciidoc Backporting updates to ILM org, overview, & GS (#51898) 2020-02-04 16:45:18 -08:00
query-dsl.asciidoc Add a cluster setting to disallow expensive queries (#51385) (#52279) 2020-02-12 22:56:14 +01:00
redirects.asciidoc [7.x] [DOCS] Re-add redirects for API relocation (#52628) 2020-02-21 05:32:10 -05:00
release-notes.asciidoc [DOCS] Adds placeholder for 7.5.2 release notes (#51124) 2020-01-16 14:42:24 -05:00
scripting.asciidoc [DOCS] Move 'Scripting' section to top-level navigation. (#42939) 2019-06-06 10:46:02 -04:00
search.asciidoc Use snake casing for document field (#45432) 2019-09-19 14:27:00 +02:00
setup.asciidoc [DOCS] Creates a cluster restart documentation page (#48583) 2019-11-12 14:50:53 +01:00
testing.asciidoc [Docs] Unify spelling of Elasticsearch (#27567) 2017-11-29 09:44:25 +01:00
upgrade.asciidoc [DOCS] Change prev version to 7.5 in upgrade docs (#48415) 2019-10-23 12:09:26 -05:00