OpenSearch/x-pack
Nik Everett 2f38aeb5e2
Save memory when numeric terms agg is not top (#55873) (#56454)
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:

1. We didn't have an appropriate data structure to track the
   sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
   per `Aggregator` which was the only way to delay collection of
   sub-aggregations.

This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.

It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.

It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.

I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
2020-05-08 20:38:53 -04:00
..
dev-tools
docs [DOCS] Create API key API requires `name` request body param (#56262) 2020-05-06 08:52:45 -04:00
license-tools Support "enterprise" license types (#49474) 2019-12-12 14:37:44 +11:00
plugin Save memory when numeric terms agg is not top (#55873) (#56454) 2020-05-08 20:38:53 -04:00
qa [Transform] unmute transform upgrade tests (#56296) 2020-05-08 10:48:58 +02:00
snapshot-tool Fix missing SHAs for Jackson 2.10.4 2020-05-06 17:28:24 -04:00
test Upgrade feature aware check usage of ASM to 7.3.1 (#54577) 2020-04-18 10:49:57 -04:00
transport-client Always use archive base name as the pom artifact id (#56447) (#56467) 2020-05-08 16:11:19 -07:00
NOTICE.txt
README.md
build.gradle [7.x] Update opensaml dependency (#44972) (#49512) 2019-11-29 00:17:16 +02:00

README.md

Elastic License Functionality

This directory tree contains files subject to the Elastic License. The files subject to the Elastic License are grouped in this directory to clearly separate them from files licensed under the Apache License 2.0.