Benjamin Trent 8ff2cbf1a3
[7.x] [ML] adding prediction_field_type to inference config (#55128) (#55230)
* [ML] adding prediction_field_type to inference config (#55128)

Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. 

Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. 

Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). 

Analytics provides the default `prediction_field_type` when the model is created from the process.
2020-04-15 09:45:22 -04:00

180 lines
5.3 KiB
Plaintext

[role="xpack"]
[testenv="basic"]
[[inference-processor]]
=== {infer-cap} Processor
Uses a pre-trained {dfanalytics} model to infer against the data that is being
ingested in the pipeline.
[[inference-options]]
.{infer-cap} Options
[options="header"]
|======
| Name | Required | Default | Description
| `model_id` | yes | - | (String) The ID of the model to load and infer against.
| `target_field` | no | `ml.inference.<processor_tag>` | (String) Field added to incoming documents to contain results objects.
| `field_map` | yes | - | (Object) Maps the document field names to the known field names of the model. This mapping takes precedence over any default mappings provided in the model configuration.
| `inference_config` | yes | - | (Object) Contains the inference type and its options. There are two types: <<inference-processor-regression-opt,`regression`>> and <<inference-processor-classification-opt,`classification`>>.
include::common-options.asciidoc[]
|======
[source,js]
--------------------------------------------------
{
"inference": {
"model_id": "flight_delay_regression-1571767128603",
"target_field": "FlightDelayMin_prediction_infer",
"field_map": {},
"inference_config": { "regression": {} }
}
}
--------------------------------------------------
// NOTCONSOLE
[discrete]
[[inference-processor-regression-opt]]
==== {regression-cap} configuration options
Regression configuration for inference.
`results_field`::
(Optional, string)
include::{docdir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`num_top_feature_importance_values`::
(Optional, integer)
include::{docdir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
[discrete]
[[inference-processor-classification-opt]]
==== {classification-cap} configuration options
Classification configuration for inference.
`num_top_classes`::
(Optional, integer)
include::{docdir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
`num_top_feature_importance_values`::
(Optional, integer)
include::{docdir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
`results_field`::
(Optional, string)
include::{docdir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`top_classes_results_field`::
(Optional, string)
include::{docdir}/ml/ml-shared.asciidoc[tag=inference-config-classification-top-classes-results-field]
`prediction_field_type`::
(Optional, string)
include::{docdir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
[discrete]
[[inference-processor-config-example]]
==== `inference_config` examples
[source,js]
--------------------------------------------------
{
"inference_config": {
"regression": {
"results_field": "my_regression"
}
}
}
--------------------------------------------------
// NOTCONSOLE
This configuration specifies a `regression` inference and the results are
written to the `my_regression` field contained in the `target_field` results
object.
[source,js]
--------------------------------------------------
{
"inference_config": {
"classification": {
"num_top_classes": 2,
"results_field": "prediction",
"top_classes_results_field": "probabilities"
}
}
}
--------------------------------------------------
// NOTCONSOLE
This configuration specifies a `classification` inference. The number of
categories for which the predicted probabilities are reported is 2
(`num_top_classes`). The result is written to the `prediction` field and the top
classes to the `probabilities` field. Both fields are contained in the
`target_field` results object.
[discrete]
[[inference-processor-feature-importance]]
==== {feat-imp-cap} object mapping
Update your index mapping of the {feat-imp} result field as you can see below to
get the full benefit of aggregating and searching for
{ml-docs}/dfa-classification.html#dfa-classification-feature-importance[{feat-imp}].
[source,js]
--------------------------------------------------
"ml.inference.feature_importance": {
"type": "nested",
"dynamic": true,
"properties": {
"feature_name": {
"type": "keyword"
},
"importance": {
"type": "double"
}
}
}
--------------------------------------------------
// NOTCONSOLE
The mapping field name for {feat-imp} is compounded as follows:
`<ml.inference.target_field>`.`<inference.tag>`.`feature_importance`
If `inference.tag` is not provided in the processor definition, it is not part
of the field path. The `<ml.inference.target_field>` defaults to `ml.inference`.
For example, you provide a tag `foo` in the definition as you can see below:
[source,js]
--------------------------------------------------
{
"tag": "foo",
...
}
--------------------------------------------------
// NOTCONSOLE
The {feat-imp} value is written to the `ml.inference.foo.feature_importance`
field.
You can also specify a target field as follows:
[source,js]
--------------------------------------------------
{
"tag": "foo",
"target_field": "my_field"
}
--------------------------------------------------
// NOTCONSOLE
In this case, {feat-imp} is exposed in the
`my_field.foo.feature_importance` field.