OpenSearch/docs/reference/ingest/processors
Benjamin Trent afd90647c9
[ML] Adds feature importance to option to inference processor (#52218) (#52666)
This adds machine learning model feature importance calculations to the inference processor.

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : {
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888).

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7.

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 18:42:31 -05:00
..
append.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
bytes.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
circle.asciidoc Geo: Switch generated GeoJson type names to camel case (#50400) 2019-12-20 15:37:22 -05:00
common-options.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
convert.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
csv.asciidoc Add empty_value parameter to CSV processor (#51567) (#51966) 2020-02-05 23:35:52 +01:00
date-index-name.asciidoc [DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459) 2019-09-06 16:09:09 -04:00
date.asciidoc Add documentation about breaking java time changes (#38886) 2019-02-14 10:18:12 +01:00
dissect.asciidoc [DOCS] Add anchors for Asciidoctor migration (#41648) 2019-04-30 10:20:17 -04:00
dot-expand.asciidoc [DOCS] Various spelling corrections (#37046) 2019-01-07 14:44:12 +01:00
drop.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
enrich.asciidoc [DOCS] Explicitly document enrich `target_field` includes `match_field` (#49407) 2019-12-02 09:13:24 -05:00
fail.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
foreach.asciidoc bad formatted JSON object (#38515) (#38526) 2019-02-06 13:01:45 -07:00
geoip.asciidoc Allow list of IPs in geoip ingest processor (#49573) (#49947) 2019-12-07 00:19:09 +01:00
grok.asciidoc Docs: Fix & test more grok processor documentation (#49447) 2019-12-03 11:55:49 +01:00
gsub.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
html_strip.asciidoc Add HTML strip processor (#41888) 2019-05-09 13:01:07 +02:00
inference.asciidoc [ML] Adds feature importance to option to inference processor (#52218) (#52666) 2020-02-21 18:42:31 -05:00
join.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
json.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
kv.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
lowercase.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
pipeline.asciidoc Backport: Add pipeline name to ingest metadata (#51050) 2020-01-16 10:50:47 +01:00
remove.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
rename.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
script.asciidoc [DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459) 2019-09-06 16:09:09 -04:00
set-security-user.asciidoc Expose more authentication info to ingest pipeline (#51305) (#52119) 2020-02-11 23:05:01 +11:00
set.asciidoc [DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459) 2019-09-06 16:09:09 -04:00
sort.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
split.asciidoc Add option to split processor for preserving trailing empty fields (#48685) 2019-10-30 08:25:03 -05:00
trim.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
uppercase.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
url-decode.asciidoc Split the ingest processor docs into multiple files (#36887) 2018-12-20 08:04:54 -05:00
user-agent.asciidoc [DOCS] Correct required file ext for user agent ingest processor (#48688) 2019-10-30 11:11:29 -04:00