druid/extensions-core/datasketches
Gian Merlino 93aeaf4801
Improve on-heap aggregator footprint estimates. (#11950)
Add a "guessAggregatorHeapFootprint" method to AggregatorFactory that
mitigates #6743 by enabling heap footprint estimates based on a specific
number of rows. The idea is that at ingestion time, the number of rows
that go into an aggregator will be 1 (if rollup is off) or will likely
be a small number (if rollup is on).

It's a heuristic, because of course nothing guarantees that the rollup
ratio is a small number. But it's a common case, and I expect this logic
to go wrong much less often than the current logic. Also, when it does
go wrong, users can fix it by lowering maxRowsInMemory or
maxBytesInMemory. The current situation is unintuitive: when the
estimation goes wrong, users get an OOME, but actually they need to
*raise* these limits to fix it.
2021-11-28 13:21:24 +05:30
..
src Improve on-heap aggregator footprint estimates. (#11950) 2021-11-28 13:21:24 +05:30
README.md update links datasketches.github.io to datasketches.apache.org (#10107) 2020-07-01 14:56:17 -07:00
pom.xml bump version to 0.23.0-SNAPSHOT (#11670) 2021-09-08 15:56:04 -07:00

README.md

This module provides Druid aggregators based on https://datasketches.apache.org/.

Credits: This module is a result of feedback and work done by following people.

https://github.com/cheddar https://github.com/himanshug https://github.com/leerho https://github.com/will-lauer