Nik Everett ab2c6d9696
Save memory when auto_date_histogram is not on top (backport of #57304) (#58190)
This builds an `auto_date_histogram` aggregator that natively aggregates
from many buckets and uses it when the `auto_date_histogram` used to use
`asMultiBucketAggregator` which should save a significant amount of
memory in those cases. In particular, this happens when
`auto_date_histogram` is a sub-aggregator of a multi-bucketing aggregator
like `terms` or `histogram` or `filters`. For the most part we preserve
the original implementation when `auto_date_histogram` only collects from
a single bucket.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

The multi-bucket version of the aggregator has a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be *terrible* when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

All of that does buy us some speed improvements when the aggregator is
a child of multi-bucket aggregator:
Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster

Relates to #56487
2020-06-17 08:48:41 -04:00
..
2020-06-17 07:35:08 -04:00