OpenSearch/docs/reference/aggregations/pipeline/inference-bucket-aggregatio...

[role="xpack"]
[testenv="basic"]
[[search-aggregations-pipeline-inference-bucket-aggregation]]
=== {infer-cap} Bucket Aggregation

A parent pipeline aggregation which loads a pre-trained model and performs 
{infer} on the collated result fields from the parent bucket aggregation.

To use the {infer} bucket aggregation, you need to have the same security 
privileges that are required for using the <<get-inference>>.

[[inference-bucket-agg-syntax]]
==== Syntax

A `inference` aggregation looks like this in isolation:

[source,js]
--------------------------------------------------
{
  "inference": {
    "model_id": "a_model_for_inference", <1>
    "inference_config": { <2>
      "regression_config": {
        "num_top_feature_importance_values": 2
      }
    },
    "buckets_path": {
      "avg_cost": "avg_agg", <3>
          "max_cost": "max_agg"
    }
  }
}
--------------------------------------------------
// NOTCONSOLE
<1> The ID of model to use.
<2> The optional inference config which overrides the model's default settings
<3> Map the value of `avg_agg` to the model's input field `avg_cost`


[[inference-bucket-params]]
.`inference` Parameters
[options="header"]
|===
|Parameter Name |Description |Required |Default Value
| `model_id`         | The ID of the model to load and infer against       | Required  | -
| `inference_config` | Contains the inference type and its options. There are two types: <<inference-agg-regression-opt,`regression`>> and <<inference-agg-classification-opt,`classification`>>  | Optional | -
| `buckets_path`     | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.
See <<buckets-path-syntax>> for more details | Required       | -
|===


==== Configuration options for {infer} models

The `inference_config` setting is optional and usually isn't required as the 
pre-trained models come equipped with sensible defaults. In the context of 
aggregations some options can overridden for each of the 2 types of model.

[discrete]
[[inference-agg-regression-opt]]
===== Configuration options for {regression} models

`num_top_feature_importance_values`::
(Optional, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]

[discrete]
[[inference-agg-classification-opt]]
===== Configuration options for {classification} models

`num_top_classes`::
(Optional, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]

`num_top_feature_importance_values`::
(Optional, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]

`prediction_field_type`::
(Optional, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]


[[inference-bucket-agg-example]]
==== Example 

The following snippet aggregates a web log by `client_ip` and extracts a number 
of features via metric and bucket sub-aggregations as input to the {infer} 
aggregation configured with a model trained to identify suspicious client IPs:

[source,console]
-------------------------------------------------
GET kibana_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "client_ip": { <1>
      "composite": {
        "sources": [
          {
            "client_ip": {
              "terms": {
                "field": "clientip"
              }
            }
          }
        ]
      },
      "aggs": { <2>
        "url_dc": {
          "cardinality": {
            "field": "url.keyword"
          }
        },
        "bytes_sum": {
          "sum": {
            "field": "bytes"
          }
        },
        "geo_src_dc": {
          "cardinality": {
            "field": "geo.src"
          }
        },
        "geo_dest_dc": {
          "cardinality": {
            "field": "geo.dest"
          }
        },
        "responses_total": {
          "value_count": {
            "field": "timestamp"
          }
        },
        "success": {
          "filter": {
            "term": {
              "response": "200"
            }
          }
        },
        "error404": {
          "filter": {
            "term": {
              "response": "404"
            }
          }
        },
        "error503": {
          "filter": {
            "term": {
              "response": "503"
            }
          }
        },
        "malicious_client_ip": { <3>
          "inference": {
            "model_id": "malicious_clients_model",
            "buckets_path": {
              "response_count": "responses_total",
              "url_dc": "url_dc",
              "bytes_sum": "bytes_sum",
              "geo_src_dc": "geo_src_dc",
              "geo_dest_dc": "geo_dest_dc",
              "success": "success._count",
              "error404": "error404._count",
              "error503": "error503._count"
            }
          }
        }
      }
    }
  }
}
-------------------------------------------------
// TEST[skip:setup kibana sample data]

<1> A composite bucket aggregation that aggregates the data by `client_ip`.
<2> A series of metrics and bucket sub-aggregations.
<3> {infer-cap} bucket aggregation that contains the model ID and maps the 
aggregation names to the model's input fields.
[7.x] Pipeline Inference Aggregation (#58965) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results. 2020-07-03 04:29:04 -04:00			`[role="xpack"]`
			`[testenv="basic"]`
			`[[search-aggregations-pipeline-inference-bucket-aggregation]]`
[DOCS] Adds security privilege info to inference bucket aggregation (#59604) 2020-07-16 12:02:17 -04:00			`=== {infer-cap} Bucket Aggregation`
[7.x] Pipeline Inference Aggregation (#58965) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results. 2020-07-03 04:29:04 -04:00
[DOCS] Adds security privilege info to inference bucket aggregation (#59604) 2020-07-16 12:02:17 -04:00			`A parent pipeline aggregation which loads a pre-trained model and performs`
			`{infer} on the collated result fields from the parent bucket aggregation.`

			`To use the {infer} bucket aggregation, you need to have the same security`
			`privileges that are required for using the <<get-inference>>.`
[7.x] Pipeline Inference Aggregation (#58965) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results. 2020-07-03 04:29:04 -04:00
			`[[inference-bucket-agg-syntax]]`
			`==== Syntax`

			A `inference` aggregation looks like this in isolation:

			`[source,js]`
			`--------------------------------------------------`
			`{`
[DOCS] Reformat agg snippets to use two-space indents (#59912) (#59922) 2020-07-20 15:59:00 -04:00			`"inference": {`
			`"model_id": "a_model_for_inference", <1>`
			`"inference_config": { <2>`
			`"regression_config": {`
			`"num_top_feature_importance_values": 2`
			`}`
			`},`
			`"buckets_path": {`
			`"avg_cost": "avg_agg", <3>`
			`"max_cost": "max_agg"`
[7.x] Pipeline Inference Aggregation (#58965) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results. 2020-07-03 04:29:04 -04:00			`}`
[DOCS] Reformat agg snippets to use two-space indents (#59912) (#59922) 2020-07-20 15:59:00 -04:00			`}`
[7.x] Pipeline Inference Aggregation (#58965) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results. 2020-07-03 04:29:04 -04:00			`}`
			`--------------------------------------------------`
			`// NOTCONSOLE`
			`<1> The ID of model to use.`
			`<2> The optional inference config which overrides the model's default settings`
			<3> Map the value of `avg_agg` to the model's input field `avg_cost`

[DOCS] Adds security privilege info to inference bucket aggregation (#59604) 2020-07-16 12:02:17 -04:00
[7.x] Pipeline Inference Aggregation (#58965) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results. 2020-07-03 04:29:04 -04:00			`[[inference-bucket-params]]`
			.`inference` Parameters
			`[options="header"]`
			`\|===`
			`\|Parameter Name \|Description \|Required \|Default Value`
			\| `model_id` \| The ID of the model to load and infer against \| Required \| -
			\| `inference_config` \| Contains the inference type and its options. There are two types: <<inference-agg-regression-opt,`regression`>> and <<inference-agg-classification-opt,`classification`>> \| Optional \| -
			\| `buckets_path` \| Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.
			`See <<buckets-path-syntax>> for more details \| Required \| -`
			`\|===`


			`==== Configuration options for {infer} models`
[DOCS] Adds security privilege info to inference bucket aggregation (#59604) 2020-07-16 12:02:17 -04:00
			The `inference_config` setting is optional and usually isn't required as the
			`pre-trained models come equipped with sensible defaults. In the context of`
			`aggregations some options can overridden for each of the 2 types of model.`
[7.x] Pipeline Inference Aggregation (#58965) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results. 2020-07-03 04:29:04 -04:00
			`[discrete]`
			`[[inference-agg-regression-opt]]`
			`===== Configuration options for {regression} models`

			`num_top_feature_importance_values`::
			`(Optional, integer)`
			`include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]`

			`[discrete]`
			`[[inference-agg-classification-opt]]`
			`===== Configuration options for {classification} models`

			`num_top_classes`::
			`(Optional, integer)`
			`include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]`

			`num_top_feature_importance_values`::
			`(Optional, integer)`
			`include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]`

			`prediction_field_type`::
			`(Optional, string)`
			`include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]`
[DOCS] Adds example to the inference aggregation description (#61290) (#61318) 2020-08-19 06:07:30 -04:00

			`[[inference-bucket-agg-example]]`
			`==== Example`

			The following snippet aggregates a web log by `client_ip` and extracts a number
			`of features via metric and bucket sub-aggregations as input to the {infer}`
			`aggregation configured with a model trained to identify suspicious client IPs:`

			`[source,console]`
			`-------------------------------------------------`
			`GET kibana_sample_data_logs/_search`
			`{`
			`"size": 0,`
			`"aggs": {`
			`"client_ip": { <1>`
			`"composite": {`
			`"sources": [`
			`{`
			`"client_ip": {`
			`"terms": {`
			`"field": "clientip"`
			`}`
			`}`
			`}`
			`]`
			`},`
			`"aggs": { <2>`
			`"url_dc": {`
			`"cardinality": {`
			`"field": "url.keyword"`
			`}`
			`},`
			`"bytes_sum": {`
			`"sum": {`
			`"field": "bytes"`
			`}`
			`},`
			`"geo_src_dc": {`
			`"cardinality": {`
			`"field": "geo.src"`
			`}`
			`},`
			`"geo_dest_dc": {`
			`"cardinality": {`
			`"field": "geo.dest"`
			`}`
			`},`
			`"responses_total": {`
			`"value_count": {`
			`"field": "timestamp"`
			`}`
			`},`
			`"success": {`
			`"filter": {`
			`"term": {`
			`"response": "200"`
			`}`
			`}`
			`},`
			`"error404": {`
			`"filter": {`
			`"term": {`
			`"response": "404"`
			`}`
			`}`
			`},`
			`"error503": {`
			`"filter": {`
			`"term": {`
			`"response": "503"`
			`}`
			`}`
			`},`
			`"malicious_client_ip": { <3>`
			`"inference": {`
			`"model_id": "malicious_clients_model",`
			`"buckets_path": {`
			`"response_count": "responses_total",`
			`"url_dc": "url_dc",`
			`"bytes_sum": "bytes_sum",`
			`"geo_src_dc": "geo_src_dc",`
			`"geo_dest_dc": "geo_dest_dc",`
			`"success": "success._count",`
			`"error404": "error404._count",`
			`"error503": "error503._count"`
			`}`
			`}`
			`}`
			`}`
			`}`
			`}`
			`}`
			`-------------------------------------------------`
			`// TEST[skip:setup kibana sample data]`

			<1> A composite bucket aggregation that aggregates the data by `client_ip`.
			`<2> A series of metrics and bucket sub-aggregations.`
			`<3> {infer-cap} bucket aggregation that contains the model ID and maps the`
			`aggregation names to the model's input fields.`