2013-11-24 06:13:08 -05:00
[[search-aggregations]]
2015-05-01 16:04:55 -04:00
= Aggregations
2013-11-24 06:13:08 -05:00
2015-05-01 16:04:55 -04:00
[partintro]
--
2014-08-13 09:43:47 -04:00
The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks
called aggregations, that can be composed in order to build complex summaries of the data.
2013-11-24 06:13:08 -05:00
2013-12-01 19:54:42 -05:00
An aggregation can be seen as a _unit-of-work_ that builds analytic information over a set of documents. The context of
2014-01-23 07:53:49 -05:00
the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed
2013-12-01 19:54:42 -05:00
query/filters of the search request).
2013-11-24 06:13:08 -05:00
2013-12-01 19:54:42 -05:00
There are many different types of aggregations, each with its own purpose and output. To better understand these types,
it is often easier to break them into two main families:
2013-11-24 06:13:08 -05:00
2015-05-01 16:04:55 -04:00
<<search-aggregations-bucket, _Bucketing_>>::
2013-12-01 19:54:42 -05:00
A family of aggregations that build buckets, where each bucket is associated with a _key_ and a document
2014-01-23 07:53:49 -05:00
criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in
the context and when a criterion matches, the document is considered to "fall in" the relevant bucket.
By the end of the aggregation process, we'll end up with a list of buckets - each one with a set of
documents that "belong" to it.
2013-11-24 06:13:08 -05:00
2015-05-01 16:04:55 -04:00
<<search-aggregations-metrics, _Metric_>>::
2014-03-12 21:28:40 -04:00
Aggregations that keep track and compute metrics over a set of documents.
2013-11-24 06:13:08 -05:00
2015-05-21 05:39:38 -04:00
<<search-aggregations-pipeline, _Pipeline_>>::
2015-05-01 16:04:55 -04:00
Aggregations that aggregate the output of other aggregations and their associated metrics
2014-03-12 21:28:40 -04:00
The interesting part comes next. Since each bucket effectively defines a document set (all documents belonging to
the bucket), one can potentially associate aggregations on the bucket level, and those will execute within the context
2013-12-01 19:54:42 -05:00
of that bucket. This is where the real power of aggregations kicks in: *aggregations can be nested!*
2013-11-24 06:13:08 -05:00
2014-03-12 21:28:40 -04:00
NOTE: Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub-aggregations will be computed for
the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested
aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of
another higher-level aggregation).
2013-11-24 06:13:08 -05:00
2013-11-29 06:35:25 -05:00
[float]
2015-05-01 16:04:55 -04:00
== Structuring Aggregations
2013-11-24 06:13:08 -05:00
The following snippet captures the basic structure of aggregations:
[source,js]
--------------------------------------------------
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
}
2014-10-29 08:55:33 -04:00
[,"meta" : { [<meta_data_body>] } ]?
2013-11-24 06:13:08 -05:00
[,"aggregations" : { [<sub_aggregation>]+ } ]?
}
[,"<aggregation_name_2>" : { ... } ]*
}
--------------------------------------------------
2014-01-23 07:53:49 -05:00
The `aggregations` object (the key `aggs` can also be used) in the JSON holds the aggregations to be computed. Each aggregation
2014-03-12 21:28:40 -04:00
is associated with a logical name that the user defines (e.g. if the aggregation computes the average price, then it would
2013-12-01 19:54:42 -05:00
make sense to name it `avg_price`). These logical names will also be used to uniquely identify the aggregations in the
response. Each aggregation has a specific type (`<aggregation_type>` in the above snippet) and is typically the first
2014-03-12 21:28:40 -04:00
key within the named aggregation body. Each type of aggregation defines its own body, depending on the nature of the
2014-01-23 07:53:49 -05:00
aggregation (e.g. an `avg` aggregation on a specific field will define the field on which the average will be calculated).
2013-12-01 19:54:42 -05:00
At the same level of the aggregation type definition, one can optionally define a set of additional aggregations,
though this only makes sense if the aggregation you defined is of a bucketing nature. In this scenario, the
sub-aggregations you define on the bucketing aggregation level will be computed for all the buckets built by the
2014-03-12 21:28:40 -04:00
bucketing aggregation. For example, if you define a set of aggregations under the `range` aggregation, the
2014-01-23 07:53:49 -05:00
sub-aggregations will be computed for the range buckets that are defined.
2013-11-24 06:13:08 -05:00
2013-11-29 06:35:25 -05:00
[float]
2015-05-01 16:04:55 -04:00
=== Values Source
2013-11-24 06:13:08 -05:00
2013-12-01 19:54:42 -05:00
Some aggregations work on values extracted from the aggregated documents. Typically, the values will be extracted from
2014-01-23 07:53:49 -05:00
a specific document field which is set using the `field` key for the aggregations. It is also possible to define a
2014-03-12 21:28:40 -04:00
<<modules-scripting,`script`>> which will generate the values (per document).
2013-11-24 06:13:08 -05:00
2015-04-26 11:30:38 -04:00
TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory.
2013-12-01 19:54:42 -05:00
When both `field` and `script` settings are configured for the aggregation, the script will be treated as a
`value script`. While normal scripts are evaluated on a document level (i.e. the script has access to all the data
associated with the document), value scripts are evaluated on the *value* level. In this mode, the values are extracted
2014-03-12 21:28:40 -04:00
from the configured `field` and the `script` is used to apply a "transformation" over these value/s.
2013-11-24 06:13:08 -05:00
2013-12-01 19:54:42 -05:00
["NOTE",id="aggs-script-note"]
2013-11-24 06:13:08 -05:00
===============================
2013-12-01 19:54:42 -05:00
When working with scripts, the `lang` and `params` settings can also be defined. The former defines the scripting
2014-03-12 21:28:40 -04:00
language which is used (assuming the proper language is available in Elasticsearch, either by default or as a plugin). The latter
enables defining all the "dynamic" expressions in the script as parameters, which enables the script to keep itself static
2014-01-23 07:53:49 -05:00
between calls (this will ensure the use of the cached compiled scripts in Elasticsearch).
2013-11-24 06:13:08 -05:00
===============================
2014-03-12 21:28:40 -04:00
Scripts can generate a single value or multiple values per document. When generating multiple values, one can use the
2014-01-23 07:53:49 -05:00
`script_values_sorted` settings to indicate whether these values are sorted or not. Internally, Elasticsearch can
2013-12-01 19:54:42 -05:00
perform optimizations when dealing with sorted values (for example, with the `min` aggregations, knowing the values are
2014-01-23 07:53:49 -05:00
sorted, Elasticsearch will skip the iterations over all the values and rely on the first value in the list to be the
2013-12-01 19:54:42 -05:00
minimum value among all other values associated with the same document).
2013-11-24 06:13:08 -05:00
2015-05-01 16:04:55 -04:00
--
2014-10-29 08:55:33 -04:00
2013-11-29 06:35:25 -05:00
include::aggregations/metrics.asciidoc[]
include::aggregations/bucket.asciidoc[]
2015-04-16 09:07:40 -04:00
2015-05-21 05:39:38 -04:00
include::aggregations/pipeline.asciidoc[]
2015-04-16 09:07:40 -04:00
2015-05-01 16:04:55 -04:00
include::aggregations/misc.asciidoc[]