Commit Graph

16 Commits

Author SHA1 Message Date
James Rodewig 14e1618fd9
[DOCS] Fix case of ingest processor titles (#61024) (#61039)
Converts page headings to sentence case.
Adds a title abbreviation.
2020-08-12 11:49:54 -04:00
David Kyle c651135562
[ML] Make Inference processor field_map and inference_config optional (#59010)
Relaxes the requirement that the inference ingest processor must has a
field_map and inference_config defined even if they are empty.
2020-07-06 11:35:30 +01:00
István Zoltán Szabó 13aa8b8d9a [DOCS] Updates results_field description in the inference processor docs (#58554) 2020-06-29 13:15:15 +02:00
Lisa Cawley db5bf92acf
[7.x][DOCS] Replace docdir attribute with es-repo-dir (#57489) (#57494) 2020-06-01 16:42:53 -07:00
István Zoltán Szabó a5cf4712e5 [DOCS] Changes feature importance links to point to the new page (#55531)
* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
2020-04-28 09:03:43 +02:00
Benjamin Trent 8ff2cbf1a3
[7.x] [ML] adding prediction_field_type to inference config (#55128) (#55230)
* [ML] adding prediction_field_type to inference config (#55128)

Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. 

Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. 

Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). 

Analytics provides the default `prediction_field_type` when the model is created from the process.
2020-04-15 09:45:22 -04:00
István Zoltán Szabó d025b90cd1 [DOCS] Makes PUT inference API docs collapsible (#54653)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-03 09:48:53 +02:00
Benjamin Trent 4a1610265f
[7.x] [ML] add new inference_config field to trained model config (#54421) (#54647)
* [ML] add new inference_config field to trained model config (#54421)

A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder.

The inference processor can still override whatever is set as the default in the trained model config.

* fixing for backport
2020-04-02 12:25:10 -04:00
lcawl 949636944c [DOCS] Fixes shared attribute for feature importance 2020-04-01 14:52:08 -07:00
István Zoltán Szabó 487b273286 [DOCS] Adds feature importance mapping subsection to inference processor docs (#54190) 2020-03-26 09:26:50 +01:00
Benjamin Trent 4e43ede735
[ML] renaming inference processor field field_mappings to new name field_map (#53433) (#53502)
This renames the `inference` processor configuration field `field_mappings` to `field_map`.

`field_mappings` is now deprecated.
2020-03-13 15:40:57 -04:00
Benjamin Trent 89668c5ea0
[ML][Inference] adds new default_field_map field to trained models (#53294) (#53419)
Adds a new `default_field_map` field to trained model config objects.

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 13:49:39 -04:00
Benjamin Trent afd90647c9
[ML] Adds feature importance to option to inference processor (#52218) (#52666)
This adds machine learning model feature importance calculations to the inference processor.

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : {
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888).

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7.

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 18:42:31 -05:00
David Kyle 289d4f4f4d [ML] Remove stray field from inference docs (#51870)
model_info_field is not a valid option
2020-02-05 10:50:51 +00:00
István Zoltán Szabó 30d1587ad5 [DOCS] Fixes indentation in inference processor code snippet (#51252) 2020-01-21 16:22:16 +01:00
István Zoltán Szabó 501ab83471 [DOCS] Adds inference processor documentation (#50204)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-12-19 12:21:04 +01:00