OpenSearch/docs/reference/aggregations/bucket.asciidoc

[[search-aggregations-bucket]]
== Bucket Aggregations

Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create
buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines
whether or not a document in the current context "falls" into it. In other words, the buckets effectively define document
sets. In addition to the buckets themselves, the `bucket` aggregations also compute and return the number of documents
that "fell into" each bucket.

Bucket aggregations, as opposed to `metrics` aggregations, can hold sub-aggregations. These sub-aggregations will be
aggregated for the buckets created by their "parent" bucket aggregation.

There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.

NOTE: The maximum number of buckets allowed in a single response is limited by a dynamic cluster
setting named `search.max_buckets`. It defaults to 10,000, requests that try to return more than
the limit will fail with an exception.

include::bucket/adjacency-matrix-aggregation.asciidoc[]

include::bucket/autodatehistogram-aggregation.asciidoc[]

include::bucket/children-aggregation.asciidoc[]

include::bucket/composite-aggregation.asciidoc[]

include::bucket/datehistogram-aggregation.asciidoc[]

include::bucket/daterange-aggregation.asciidoc[]

include::bucket/diversified-sampler-aggregation.asciidoc[]

include::bucket/filter-aggregation.asciidoc[]

include::bucket/filters-aggregation.asciidoc[]

include::bucket/geodistance-aggregation.asciidoc[]

include::bucket/geohashgrid-aggregation.asciidoc[]

include::bucket/global-aggregation.asciidoc[]

include::bucket/histogram-aggregation.asciidoc[]

include::bucket/iprange-aggregation.asciidoc[]

include::bucket/missing-aggregation.asciidoc[]

include::bucket/nested-aggregation.asciidoc[]

include::bucket/parent-aggregation.asciidoc[]

include::bucket/range-aggregation.asciidoc[]

include::bucket/reverse-nested-aggregation.asciidoc[]

include::bucket/sampler-aggregation.asciidoc[]

include::bucket/significantterms-aggregation.asciidoc[]

include::bucket/significanttext-aggregation.asciidoc[]

include::bucket/terms-aggregation.asciidoc[]
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`[[search-aggregations-bucket]]`
			`== Bucket Aggregations`

			`Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create`
			`buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines`
			`whether or not a document in the current context "falls" into it. In other words, the buckets effectively define document`
			sets. In addition to the buckets themselves, the `bucket` aggregations also compute and return the number of documents
Update bucket.asciidoc 2016-04-21 10:16:45 -04:00			`that "fell into" each bucket.`
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00
			Bucket aggregations, as opposed to `metrics` aggregations, can hold sub-aggregations. These sub-aggregations will be
			`aggregated for the buckets created by their "parent" bucket aggregation.`

			`There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some`
			`define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.`

Add a new cluster setting to limit the total number of buckets returned by a request (#27581) This commit adds a new dynamic cluster setting named `search.max_buckets` that can be used to limit the number of buckets created per shard or by the reduce phase. Each multi bucket aggregator can consume buckets during the final build of the aggregation at the shard level or during the reduce phase (final or not) in the coordinating node. When an aggregator consumes a bucket, a global count for the request is incremented and if this number is greater than the limit an exception is thrown (TooManyBuckets exception). This change adds the ability for multi bucket aggregator to "consume" buckets in the global limit, the default is 10,000. It's an opt-in consumer so each multi-bucket aggregator must explicitly call the consumer when a bucket is added in the response. Closes #27452 #26012 2017-12-06 03:15:28 -05:00			`NOTE: The maximum number of buckets allowed in a single response is limited by a dynamic cluster`
			setting named `search.max_buckets`. It defaults to 10,000, requests that try to return more than
			`the limit will fail with an exception.`

Docs fix - Added missing link to new Adjacency-matrix agg 2017-01-23 05:18:30 -05:00			`include::bucket/adjacency-matrix-aggregation.asciidoc[]`
Adds a new auto-interval date histogram (#28993) * Adds a new auto-interval date histogram This change adds a new type of histogram aggregation called `auto_date_histogram` where you can specify the target number of buckets you require and it will find an appropriate interval for the returned buckets. The aggregation works by first collecting documents in buckets at second interval, when it has created more than the target number of buckets it merges these buckets into minute interval bucket and continues collecting until it reaches the target number of buckets again. It will keep merging buckets when it exceeds the target until either collection is finished or the highest interval (currently years) is reached. A similar process happens at reduce time. This aggregation intentionally does not support min_doc_count, offest and extended_bounds to keep the already complex logic from becoming more complex. The aggregation accepts sub-aggregations but will always operate in `breadth_first` mode deferring the computation of sub-aggregations until the final buckets from the shard are known. min_doc_count is effectively hard-coded to zero meaning that we will insert empty buckets where necessary. Closes #9572 * Adds documentation * Added sub aggregator test * Fixes failing docs test * Brings branch up to date with master changes * trying to get tests to pass again * Fixes multiBucketConsumer accounting * Collects more buckets than needed on shards This gives us more options at reduce time in terms of how we do the final merge of the buckeets to produce the final result * Revert "Collects more buckets than needed on shards" This reverts commit 993c782d117892af9a3c86a51921cdee630a3ac5. * Adds ability to merge within a rounding * Fixes nonn-timezone doc test failure * Fix time zone tests * iterates on tests * Adds test case and documentation changes Added some notes in the documentation about the intervals that can bbe returned. Also added a test case that utilises the merging of conseecutive buckets * Fixes performance bug The bug meant that getAppropriate rounding look a huge amount of time if the range of the data was large but also sparsely populated. In these situations the rounding would be very low so iterating through the rounding values from the min key to the max keey look a long time (~120 seconds in one test). The solution is to add a rough estimate first which chooses the rounding based just on the long values of the min and max keeys alone but selects the rounding one lower than the one it thinks is appropriate so the accurate method can choose the final rounding taking into account the fact that intervals are not always fixed length. Thee commit also adds more tests * Changes to only do complex reduction on final reduce * merge latest with master * correct tests and add a new test case for 10k buckets * refactor to perform bucket number check in innerBuild * correctly derive bucket setting, update tests to increase bucket threshold * fix checkstyle * address code review comments * add documentation for default buckets * fix typo 2018-07-13 13:08:35 -04:00
			`include::bucket/autodatehistogram-aggregation.asciidoc[]`
Docs fix - Added missing link to new Adjacency-matrix agg 2017-01-23 05:18:30 -05:00
[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`include::bucket/children-aggregation.asciidoc[]`

Mark the composite aggregation as a beta feature (#28431) The `composite` aggregation should be marked as beta (rather than experimental) in the documentation. 2018-02-02 03:24:10 -05:00			`include::bucket/composite-aggregation.asciidoc[]`

[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`include::bucket/datehistogram-aggregation.asciidoc[]`

			`include::bucket/daterange-aggregation.asciidoc[]`

Aggregations Refactor: Refactor Sampler Aggregation 2015-12-14 06:54:41 -05:00			`include::bucket/diversified-sampler-aggregation.asciidoc[]`

[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`include::bucket/filter-aggregation.asciidoc[]`

			`include::bucket/filters-aggregation.asciidoc[]`

			`include::bucket/geodistance-aggregation.asciidoc[]`

			`include::bucket/geohashgrid-aggregation.asciidoc[]`

			`include::bucket/global-aggregation.asciidoc[]`

			`include::bucket/histogram-aggregation.asciidoc[]`

			`include::bucket/iprange-aggregation.asciidoc[]`

			`include::bucket/missing-aggregation.asciidoc[]`

			`include::bucket/nested-aggregation.asciidoc[]`

Add parent-aggregation to parent-join module (#34210) Add `parent` aggregation, a special single bucket aggregation that joins children documents to their parent. 2018-11-08 08:13:00 -05:00			`include::bucket/parent-aggregation.asciidoc[]`

[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`include::bucket/range-aggregation.asciidoc[]`

			`include::bucket/reverse-nested-aggregation.asciidoc[]`

			`include::bucket/sampler-aggregation.asciidoc[]`

			`include::bucket/significantterms-aggregation.asciidoc[]`

SignificantText aggregation - like significant_terms, but for text (#24432) * SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results. Closes #23674 2017-05-24 08:46:43 -04:00			`include::bucket/significanttext-aggregation.asciidoc[]`

[DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00			`include::bucket/terms-aggregation.asciidoc[]`