OpenSearch/docs/reference/aggregations
Jim Ferenczi 5288235ca3
Optimize the composite aggregation for match_all and range queries (#28745)
This change refactors the composite aggregation to add an execution mode that visits documents in the order of the values
present in the leading source of the composite definition. This mode does not need to visit all documents since it can early terminate
the collection when the leading source value is greater than the lowest value in the queue.
Instead of collecting the documents in the order of their doc_id, this mode uses the inverted lists (or the bkd tree for numerics) to collect documents
in the order of the values present in the leading source.
For instance the following aggregation:

```
"composite" : {
  "sources" : [
    { "value1": { "terms" : { "field": "timestamp", "order": "asc" } } }
  ],
  "size": 10
}
```
... can use the field `timestamp` to collect the documents with the 10 lowest values for the field instead of visiting all documents.
For composite aggregation with more than one source the execution can early terminate as soon as one of the 10 lowest values produces enough
composite buckets. For instance if visiting the first two lowest timestamp created 10 composite buckets we can early terminate the collection since it
is guaranteed that the third lowest timestamp cannot create a composite key that compares lower than the one already visited.

This mode can execute iff:
 * The leading source in the composite definition uses an indexed field of type `date` (works also with `date_histogram` source), `integer`, `long` or `keyword`.
 * The query is a match_all query or a range query over the field that is used as the leading source in the composite definition.
 * The sort order of the leading source is the natural order (ascending since postings and numerics are sorted in ascending order only).

If these conditions are not met this aggregation visits each document like any other agg.
2018-03-26 09:51:37 +02:00
..
bucket Optimize the composite aggregation for match_all and range queries (#28745) 2018-03-26 09:51:37 +02:00
matrix Allow `_doc` as a type. (#27816) 2017-12-14 17:47:53 +01:00
metrics Upgrade t-digest to 3.2 (#28295) (#28305) 2018-02-15 08:23:20 +00:00
pipeline Update bucket-sort-aggregation.asciidoc (#28937) 2018-03-08 15:05:34 +01:00
bucket.asciidoc Mark the composite aggregation as a beta feature (#28431) 2018-02-02 09:24:10 +01:00
matrix.asciidoc refactor matrix agg documentation from modules to main agg section 2016-06-06 07:39:00 -05:00
metrics.asciidoc Adds geo_centroid metric aggregator 2015-10-14 16:19:09 -05:00
misc.asciidoc Allow `_doc` as a type. (#27816) 2017-12-14 17:47:53 +01:00
pipeline.asciidoc Aggregations: bucket_sort pipeline aggregation (#27152) 2017-11-09 17:59:57 +00:00