OpenSearch/docs/reference/aggregations.asciidoc

[[search-aggregations]]
= Aggregations

[partintro]
--
The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks
called aggregations, that can be composed in order to build complex summaries of the data.

An aggregation can be seen as a _unit-of-work_ that builds analytic information over a set of documents. The context of
the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed
query/filters of the search request).

There are many different types of aggregations, each with its own purpose and output. To better understand these types,
it is often easier to break them into two main families:

<<search-aggregations-bucket, _Bucketing_>>::
				A family of aggregations that build buckets, where each bucket is associated with a _key_ and a document
				criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in
				the context and when a criterion matches, the document is considered to "fall in" the relevant bucket.
				By the end of the aggregation process, we'll end up with a list of buckets - each one with a set of
				documents that "belong" to it.

<<search-aggregations-metrics, _Metric_>>::
				Aggregations that keep track and compute metrics over a set of documents.

<<search-aggregations-reducer, _Reducer_>>::
				Aggregations that aggregate the output of other aggregations and their associated metrics

The interesting part comes next. Since each bucket effectively defines a document set (all documents belonging to
the bucket), one can potentially associate aggregations on the bucket level, and those will execute within the context
of that bucket. This is where the real power of aggregations kicks in: *aggregations can be nested!*

NOTE:	Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub-aggregations will be computed for
		the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested
		aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of
		another higher-level aggregation).

[float]
== Structuring Aggregations

The following snippet captures the basic structure of aggregations:

[source,js]
--------------------------------------------------
"aggregations" : {
    "<aggregation_name>" : {
        "<aggregation_type>" : {
            <aggregation_body>
        }
        [,"meta" : {  [<meta_data_body>] } ]?
        [,"aggregations" : { [<sub_aggregation>]+ } ]?
    }
    [,"<aggregation_name_2>" : { ... } ]*
}
--------------------------------------------------

The `aggregations` object (the key `aggs` can also be used) in the JSON holds the aggregations to be computed. Each aggregation
is associated with a logical name that the user defines (e.g. if the aggregation computes the average price, then it would
make sense to name it `avg_price`). These logical names will also be used to uniquely identify the aggregations in the
response. Each aggregation has a specific type (`<aggregation_type>` in the above snippet) and is typically the first
key within the named aggregation body. Each type of aggregation defines its own body, depending on the nature of the
aggregation (e.g. an `avg` aggregation on a specific field will define the field on which the average will be calculated).
At the same level of the aggregation type definition, one can optionally define a set of additional aggregations,
though this only makes sense if the aggregation you defined is of a bucketing nature. In this scenario, the
sub-aggregations you define on the bucketing aggregation level will be computed for all the buckets built by the
bucketing aggregation. For example, if you define a set of aggregations under the `range` aggregation, the
sub-aggregations will be computed for the range buckets that are defined.

[float]
=== Values Source

Some aggregations work on values extracted from the aggregated documents. Typically, the values will be extracted from
a specific document field which is set using the `field` key for the aggregations. It is also possible to define a
<<modules-scripting,`script`>> which will generate the values (per document).

TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory.

When both `field` and `script` settings are configured for the aggregation, the script will be treated as a
`value script`.  While normal scripts are evaluated on a document level (i.e. the script has access to all the data
associated with the document), value scripts are evaluated on the *value* level. In this mode, the values are extracted
from the configured `field` and the `script` is used to apply a "transformation" over these value/s.

["NOTE",id="aggs-script-note"]
===============================
When working with scripts, the `lang` and `params` settings can also be defined. The former defines the scripting
language which is used (assuming the proper language is available in Elasticsearch, either by default or as a plugin). The latter
enables defining all the "dynamic" expressions in the script as parameters, which enables the script to keep itself static
between calls (this will ensure the use of the cached compiled scripts in Elasticsearch).
===============================

Scripts can generate a single value or multiple values per document. When generating multiple values, one can use the
`script_values_sorted` settings to indicate whether these values are sorted or not. Internally, Elasticsearch can
perform optimizations when dealing with sorted values (for example, with the `min` aggregations, knowing the values are
sorted, Elasticsearch will skip the iterations over all the values and rely on the first value in the list to be the
minimum value among all other values associated with the same document).

--

include::aggregations/metrics.asciidoc[]

include::aggregations/bucket.asciidoc[]

include::aggregations/reducer.asciidoc[]

include::aggregations/misc.asciidoc[]
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00			`[[search-aggregations]]`
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`= Aggregations`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`[partintro]`
			`--`
Facets: Removal from master. Close #7337 2014-08-13 15:43:47 +02:00			`The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks`
			`called aggregations, that can be composed in order to build complex summaries of the data.`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			`An aggregation can be seen as a _unit-of-work_ that builds analytic information over a set of documents. The context of`
Improve Aggregations documentation * Mostly minor things like typos and grammar stuff * Some clarifications * The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated 2014-01-23 13:53:49 +01:00			`the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed`
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			`query/filters of the search request).`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			`There are many different types of aggregations, each with its own purpose and output. To better understand these types,`
			`it is often easier to break them into two main families:`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`<<search-aggregations-bucket, _Bucketing_>>::`
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			`A family of aggregations that build buckets, where each bucket is associated with a _key_ and a document`
Improve Aggregations documentation * Mostly minor things like typos and grammar stuff * Some clarifications * The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated 2014-01-23 13:53:49 +01:00			`criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in`
			`the context and when a criterion matches, the document is considered to "fall in" the relevant bucket.`
			`By the end of the aggregation process, we'll end up with a list of buckets - each one with a set of`
			`documents that "belong" to it.`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`<<search-aggregations-metrics, _Metric_>>::`
[DOCS] Various aggregation doc fixes 2014-03-12 18:28:40 -07:00			`Aggregations that keep track and compute metrics over a set of documents.`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`<<search-aggregations-reducer, _Reducer_>>::`
			`Aggregations that aggregate the output of other aggregations and their associated metrics`

[DOCS] Various aggregation doc fixes 2014-03-12 18:28:40 -07:00			`The interesting part comes next. Since each bucket effectively defines a document set (all documents belonging to`
			`the bucket), one can potentially associate aggregations on the bucket level, and those will execute within the context`
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			`of that bucket. This is where the real power of aggregations kicks in: aggregations can be nested!`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
[DOCS] Various aggregation doc fixes 2014-03-12 18:28:40 -07:00			`NOTE: Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub-aggregations will be computed for`
			`the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested`
			`aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of`
			`another higher-level aggregation).`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
- Added docs for the value_count aggregation - Fixed typos in the terms facets docs - Fixed aggregation docs layout - Added docs for shard_size in term aggregation 2013-11-29 12:35:25 +01:00			`[float]`
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`== Structuring Aggregations`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
			`The following snippet captures the basic structure of aggregations:`

			`[source,js]`
			`--------------------------------------------------`
			`"aggregations" : {`
			`"<aggregation_name>" : {`
			`"<aggregation_type>" : {`
			`<aggregation_body>`
			`}`
[Aggregations] Meta data support This commit adds the ability to associate a bit of state with each individual aggregation. The aggregation response can be hard to stitch back together without having a reference to the aggregation request. In many cases this is not available, many json serializer frameworks cache types globally or have a static deserialisation override mechanism. In these cases making the original request available, if at all possible, would be a hack. The old facets returned `_type` which was just enough metadata to know what the originating facet type in the request was. This PR takes `_type` one step further by introducing ANY arbitrary meta data. This could be further <strike>ab</strike>used for instance by generic/automated aggregations that include UI state (color information, thresholds, user input states, etc) per aggregation. 2014-10-29 13:55:33 +01:00			`[,"meta" : { [<meta_data_body>] } ]?`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00			`[,"aggregations" : { [<sub_aggregation>]+ } ]?`
			`}`
			`[,"<aggregation_name_2>" : { ... } ]*`
			`}`
			`--------------------------------------------------`

Improve Aggregations documentation * Mostly minor things like typos and grammar stuff * Some clarifications * The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated 2014-01-23 13:53:49 +01:00			The `aggregations` object (the key `aggs` can also be used) in the JSON holds the aggregations to be computed. Each aggregation
[DOCS] Various aggregation doc fixes 2014-03-12 18:28:40 -07:00			`is associated with a logical name that the user defines (e.g. if the aggregation computes the average price, then it would`
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			make sense to name it `avg_price`). These logical names will also be used to uniquely identify the aggregations in the
			response. Each aggregation has a specific type (`<aggregation_type>` in the above snippet) and is typically the first
[DOCS] Various aggregation doc fixes 2014-03-12 18:28:40 -07:00			`key within the named aggregation body. Each type of aggregation defines its own body, depending on the nature of the`
Improve Aggregations documentation * Mostly minor things like typos and grammar stuff * Some clarifications * The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated 2014-01-23 13:53:49 +01:00			aggregation (e.g. an `avg` aggregation on a specific field will define the field on which the average will be calculated).
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			`At the same level of the aggregation type definition, one can optionally define a set of additional aggregations,`
			`though this only makes sense if the aggregation you defined is of a bucketing nature. In this scenario, the`
			`sub-aggregations you define on the bucketing aggregation level will be computed for all the buckets built by the`
[DOCS] Various aggregation doc fixes 2014-03-12 18:28:40 -07:00			bucketing aggregation. For example, if you define a set of aggregations under the `range` aggregation, the
Improve Aggregations documentation * Mostly minor things like typos and grammar stuff * Some clarifications * The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated 2014-01-23 13:53:49 +01:00			`sub-aggregations will be computed for the range buckets that are defined.`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
- Added docs for the value_count aggregation - Fixed typos in the terms facets docs - Fixed aggregation docs layout - Added docs for shard_size in term aggregation 2013-11-29 12:35:25 +01:00			`[float]`
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`=== Values Source`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			`Some aggregations work on values extracted from the aggregated documents. Typically, the values will be extracted from`
Improve Aggregations documentation * Mostly minor things like typos and grammar stuff * Some clarifications * The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated 2014-01-23 13:53:49 +01:00			a specific document field which is set using the `field` key for the aggregations. It is also possible to define a
[DOCS] Various aggregation doc fixes 2014-03-12 18:28:40 -07:00			<<modules-scripting,`script`>> which will generate the values (per document).
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
Docs: Mentioned script_id and script_file parameters across all aggs Closes #10760 2015-04-26 17:30:38 +02:00			TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory.

Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			When both `field` and `script` settings are configured for the aggregation, the script will be treated as a
			`value script`. While normal scripts are evaluated on a document level (i.e. the script has access to all the data
			`associated with the document), value scripts are evaluated on the value level. In this mode, the values are extracted`
[DOCS] Various aggregation doc fixes 2014-03-12 18:28:40 -07:00			from the configured `field` and the `script` is used to apply a "transformation" over these value/s.
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			`["NOTE",id="aggs-script-note"]`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00			`===============================`
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			When working with scripts, the `lang` and `params` settings can also be defined. The former defines the scripting
[DOCS] Various aggregation doc fixes 2014-03-12 18:28:40 -07:00			`language which is used (assuming the proper language is available in Elasticsearch, either by default or as a plugin). The latter`
			`enables defining all the "dynamic" expressions in the script as parameters, which enables the script to keep itself static`
Improve Aggregations documentation * Mostly minor things like typos and grammar stuff * Some clarifications * The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated 2014-01-23 13:53:49 +01:00			`between calls (this will ensure the use of the cached compiled scripts in Elasticsearch).`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00			`===============================`

[DOCS] Various aggregation doc fixes 2014-03-12 18:28:40 -07:00			`Scripts can generate a single value or multiple values per document. When generating multiple values, one can use the`
Improve Aggregations documentation * Mostly minor things like typos and grammar stuff * Some clarifications * The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated 2014-01-23 13:53:49 +01:00			`script_values_sorted` settings to indicate whether these values are sorted or not. Internally, Elasticsearch can
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			perform optimizations when dealing with sorted values (for example, with the `min` aggregations, knowing the values are
Improve Aggregations documentation * Mostly minor things like typos and grammar stuff * Some clarifications * The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated 2014-01-23 13:53:49 +01:00			`sorted, Elasticsearch will skip the iterations over all the values and rely on the first value in the list to be the`
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 01:54:42 +01:00			`minimum value among all other values associated with the same document).`
initial commit of the aggregations module Closes #3300 2013-11-24 03:13:08 -08:00
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`--`
[Aggregations] Meta data support This commit adds the ability to associate a bit of state with each individual aggregation. The aggregation response can be hard to stitch back together without having a reference to the aggregation request. In many cases this is not available, many json serializer frameworks cache types globally or have a static deserialisation override mechanism. In these cases making the original request available, if at all possible, would be a hack. The old facets returned `_type` which was just enough metadata to know what the originating facet type in the request was. This PR takes `_type` one step further by introducing ANY arbitrary meta data. This could be further <strike>ab</strike>used for instance by generic/automated aggregations that include UI state (color information, thresholds, user input states, etc) per aggregation. 2014-10-29 13:55:33 +01:00
- Added docs for the value_count aggregation - Fixed typos in the terms facets docs - Fixed aggregation docs layout - Added docs for shard_size in term aggregation 2013-11-29 12:35:25 +01:00			`include::aggregations/metrics.asciidoc[]`

			`include::aggregations/bucket.asciidoc[]`
Documentation for the derivative reducer 2015-04-16 14:07:40 +01:00
			`include::aggregations/reducer.asciidoc[]`

[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`include::aggregations/misc.asciidoc[]`