OpenSearch/docs/reference/aggregations/bucket.asciidoc

[[search-aggregations-bucket]]
== Bucket Aggregations

Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create
buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines
whether or not a document in the current context "falls" into it. In other words, the buckets effectively define document
sets. In addition to the buckets themselves, the `bucket` aggregations also compute and return the number of documents
that "fell into" each bucket.

Bucket aggregations, as opposed to `metrics` aggregations, can hold sub-aggregations. These sub-aggregations will be
aggregated for the buckets created by their "parent" bucket aggregation.

There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.

NOTE: The maximum number of buckets allowed in a single response is limited by a dynamic cluster
setting named `search.max_buckets`. It defaults to 10,000, requests that try to return more than
the limit will fail with an exception.

include::bucket/adjacency-matrix-aggregation.asciidoc[]

include::bucket/children-aggregation.asciidoc[]

include::bucket/datehistogram-aggregation.asciidoc[]

include::bucket/daterange-aggregation.asciidoc[]

include::bucket/diversified-sampler-aggregation.asciidoc[]

include::bucket/filter-aggregation.asciidoc[]

include::bucket/filters-aggregation.asciidoc[]

include::bucket/geodistance-aggregation.asciidoc[]

include::bucket/geohashgrid-aggregation.asciidoc[]

include::bucket/global-aggregation.asciidoc[]

include::bucket/histogram-aggregation.asciidoc[]

include::bucket/iprange-aggregation.asciidoc[]

include::bucket/missing-aggregation.asciidoc[]

include::bucket/nested-aggregation.asciidoc[]

include::bucket/range-aggregation.asciidoc[]

include::bucket/reverse-nested-aggregation.asciidoc[]

include::bucket/sampler-aggregation.asciidoc[]

include::bucket/significantterms-aggregation.asciidoc[]

include::bucket/significanttext-aggregation.asciidoc[]

include::bucket/terms-aggregation.asciidoc[]

include::bucket/composite-aggregation.asciidoc[]
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`[[search-aggregations-bucket]]`
			`== Bucket Aggregations`

			`Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create`
			`buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines`
			`whether or not a document in the current context "falls" into it. In other words, the buckets effectively define document`
			sets. In addition to the buckets themselves, the `bucket` aggregations also compute and return the number of documents
Update bucket.asciidoc 2016-04-21 10:16:45 -04:00			`that "fell into" each bucket.`
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00
			Bucket aggregations, as opposed to `metrics` aggregations, can hold sub-aggregations. These sub-aggregations will be
			`aggregated for the buckets created by their "parent" bucket aggregation.`

			`There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some`
			`define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.`

Add a new cluster setting to limit the total number of buckets returned by a request (#27581) This commit adds a new dynamic cluster setting named `search.max_buckets` that can be used to limit the number of buckets created per shard or by the reduce phase. Each multi bucket aggregator can consume buckets during the final build of the aggregation at the shard level or during the reduce phase (final or not) in the coordinating node. When an aggregator consumes a bucket, a global count for the request is incremented and if this number is greater than the limit an exception is thrown (TooManyBuckets exception). This change adds the ability for multi bucket aggregator to "consume" buckets in the global limit, the default is 10,000. It's an opt-in consumer so each multi-bucket aggregator must explicitly call the consumer when a bucket is added in the response. Closes #27452 #26012 2017-12-06 03:15:28 -05:00			`NOTE: The maximum number of buckets allowed in a single response is limited by a dynamic cluster`
			setting named `search.max_buckets`. It defaults to 10,000, requests that try to return more than
			`the limit will fail with an exception.`

Docs fix - Added missing link to new Adjacency-matrix agg 2017-01-23 05:18:30 -05:00			`include::bucket/adjacency-matrix-aggregation.asciidoc[]`

[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`include::bucket/children-aggregation.asciidoc[]`

			`include::bucket/datehistogram-aggregation.asciidoc[]`

			`include::bucket/daterange-aggregation.asciidoc[]`

Aggregations Refactor: Refactor Sampler Aggregation 2015-12-14 06:54:41 -05:00			`include::bucket/diversified-sampler-aggregation.asciidoc[]`

[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`include::bucket/filter-aggregation.asciidoc[]`

			`include::bucket/filters-aggregation.asciidoc[]`

			`include::bucket/geodistance-aggregation.asciidoc[]`

			`include::bucket/geohashgrid-aggregation.asciidoc[]`

			`include::bucket/global-aggregation.asciidoc[]`

			`include::bucket/histogram-aggregation.asciidoc[]`

			`include::bucket/iprange-aggregation.asciidoc[]`

			`include::bucket/missing-aggregation.asciidoc[]`

			`include::bucket/nested-aggregation.asciidoc[]`

			`include::bucket/range-aggregation.asciidoc[]`

			`include::bucket/reverse-nested-aggregation.asciidoc[]`

			`include::bucket/sampler-aggregation.asciidoc[]`

			`include::bucket/significantterms-aggregation.asciidoc[]`

SignificantText aggregation - like significant_terms, but for text (#24432) * SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results. Closes #23674 2017-05-24 08:46:43 -04:00			`include::bucket/significanttext-aggregation.asciidoc[]`

[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`include::bucket/terms-aggregation.asciidoc[]`

Add composite aggregator (#26800) * This change adds a module called `aggs-composite` that defines a new aggregation named `composite`. The `composite` aggregation is a multi-buckets aggregation that creates composite buckets made of multiple sources. The sources for each bucket can be defined as: * A `terms` source, values are extracted from a field or a script. * A `date_histogram` source, values are extracted from a date field and rounded to the provided interval. This aggregation can be used to retrieve all buckets of a deeply nested aggregation by flattening the nested aggregation in composite buckets. A composite buckets is composed of one value per source and is built for each document as the combinations of values in the provided sources. For instance the following aggregation: ```` "test_agg": { "terms": { "field": "field1" }, "aggs": { "nested_test_agg": "terms": { "field": "field2" } } } ```` ... which retrieves the top N terms for `field1` and for each top term in `field1` the top N terms for `field2`, can be replaced by a `composite` aggregation in order to retrieve all the combinations of `field1`, `field2` in the matching documents: ```` "composite_agg": { "composite": { "sources": [ { "field1": { "terms": { "field": "field1" } } }, { "field2": { "terms": { "field": "field2" } } }, } } ```` The response of the aggregation looks like this: ```` "aggregations": { "composite_agg": { "buckets": [ { "key": { "field1": "alabama", "field2": "almanach" }, "doc_count": 100 }, { "key": { "field1": "alabama", "field2": "calendar" }, "doc_count": 1 }, { "key": { "field1": "arizona", "field2": "calendar" }, "doc_count": 1 } ] } } ```` By default this aggregation returns 10 buckets sorted in ascending order of the composite key. Pagination can be achieved by providing `after` values, the values of the composite key to aggregate after. For instance the following aggregation will aggregate all composite keys that sorts after `arizona, calendar`: ```` "composite_agg": { "composite": { "after": {"field1": "alabama", "field2": "calendar"}, "size": 100, "sources": [ { "field1": { "terms": { "field": "field1" } } }, { "field2": { "terms": { "field": "field2" } } } } } ```` This aggregation is optimized for indices that set an index sorting that match the composite source definition. For instance the aggregation above could run faster on indices that defines an index sorting like this: ```` "settings": { "index.sort.field": ["field1", "field2"] } ```` In this case the `composite` aggregation can early terminate on each segment. This aggregation also accepts multi-valued field but disables early termination for these fields even if index sorting matches the sources definition. This is mandatory because index sorting picks only one value per document to perform the sort. 2017-11-16 09:13:36 -05:00			`include::bucket/composite-aggregation.asciidoc[]`