OpenSearch/modules
Nik Everett 2f38aeb5e2
Save memory when numeric terms agg is not top (#55873) (#56454)
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:

1. We didn't have an appropriate data structure to track the
   sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
   per `Aggregator` which was the only way to delay collection of
   sub-aggregations.

This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.

It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.

It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.

I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
2020-05-08 20:38:53 -04:00
..
aggs-matrix-stats Expose agg usage in Feature Usage API (#55732) (#56048) 2020-04-30 12:53:36 -04:00
analysis-common Analysis enhancement - add preserve_original setting in ngram-token-filter (#55432) (#56100) 2020-05-04 11:31:28 +01:00
geo Add geo_shape mapper supporting doc-values in Spatial Plugin (#55037) (#55500) 2020-04-22 08:12:54 -07:00
ingest-common Ensure auto close of HTMLStripCharFilter in HtmlStripProcessor 2020-05-01 17:31:53 -05:00
ingest-geoip Upgrade to Jackson 2.10.4 (#56188) 2020-05-06 17:20:23 -04:00
ingest-user-agent Always use deprecateAndMaybeLog for deprecation warnings (#55319) 2020-04-23 09:20:54 +01:00
kibana Reintroduce system index APIs for Kibana (#54935) 2020-04-08 09:08:49 -06:00
lang-expression Upgrade to lucene 8.5.1 release (#55229) (#55235) 2020-04-15 17:35:42 +02:00
lang-mustache Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151) 2020-05-04 22:38:33 +02:00
lang-painless Always use archive base name as the pom artifact id (#56447) (#56467) 2020-05-08 16:11:19 -07:00
mapper-extras Simplify signature of FieldMapper#parseCreateField. (#56144) 2020-05-06 11:12:09 -07:00
parent-join Save memory when numeric terms agg is not top (#55873) (#56454) 2020-05-08 20:38:53 -04:00
percolator Simplify signature of FieldMapper#parseCreateField. (#56144) 2020-05-06 11:12:09 -07:00
rank-eval [7.x] Lazy test cluster module and plugins (#54852) (#55087) 2020-04-13 10:53:35 -05:00
reindex Fix CancelTests#testDeleteByQueryCancelWithWorkers (#56242) 2020-05-06 09:55:40 -04:00
repository-url Rename MetaData to Metadata in all of the places (#54519) 2020-03-31 17:24:38 -04:00
systemd Encapsulate systemd extender 2020-04-20 21:17:42 -04:00
tasks Reintroduce system index APIs for Kibana (#54935) 2020-04-08 09:08:49 -06:00
transport-netty4 Upgrade netty to 4.1.49.Final (#56059) 2020-05-05 10:40:23 -06:00
build.gradle Apply 2-space indent to all gradle scripts (#49071) 2019-11-14 11:01:23 +00:00