druid/indexing-service
Chi Cao Minh 33a37d85d7
Fix native batch range partition segment sizing (#10089)
* Fix native batch range partition segment sizing

Fixes #10057.

Native batch range partitioning was only considering the partition
dimension value when grouping rows instead of using all of the row's
partition values. Thus, for schemas with multiple dimensions, the rollup
was overestimated, which would cause too many dimension values to be
packed into the same range partition. The resulting segments would then
be overly large (and not honor the target or max partition sizes).

Main changes:

- PartialDimensionDistributionTask: Consider all dimension values when
  grouping row

- RangePartitionMultiPhaseParallelIndexingTest: Regression test by
  having input with rows that should roll up and rows that should not
  roll up

* Use hadoop & native hash ingestion row group key
2020-06-29 17:49:52 -07:00
..
src Fix native batch range partition segment sizing (#10089) 2020-06-29 17:49:52 -07:00
pom.xml Fix missing temp dir for native single_dim (#10046) 2020-06-25 14:41:22 -07:00