This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.
Both of these are described in more detail in #2987.
There are two goals of this patch:
1. Make it possible for historical/realtime nodes to return larger groupBy
result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
columns, avoiding materialization.
This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
* add get dimension rangeset to filters
* add get domain to ShardSpec and added chunk filter in caching clustered client
* add null check and modified not filter, started with unit test
* add filter test with caching
* refactor and some comments
* extract filtershard to helper function
* fixup
* minor changes
* update javadoc
* fix caching for search results
properly read count when reading from cache.
* fix NPE during merging search count and add test
* Update cache key to invalidate prev results
* Add lookup optimization for InDimFilter
* tests for in filter with lookup extraction fn
* refactor
* refactor2 and modified filter test
* make optimizeLookup private
* Allows RegisteredLookupExtractionFn to find its lookups lazily
* Use raw variables instead of AtomicReference
* Make sure to use volatile
* Remove extra local variable.
* Move from BAOS to ByteBuffer
* Optimize filter for timeseries, search, and select queries
* exception at failed toolchest type check
* took out query type check
* java7 error fix and test improvement
* Fix parsing fail of segment id with underscored datasource (Fix for #2786)
* addressed comment
* renamed and moved code into api. added log4 dependency for tests
* addressed comments
* fixed test fails
* make isSingleThreaded groupBy query processing overridable at query time
* refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent
* Document how to use roaring bitmaps
This fixes#2408.
While not all indexSpec properties are explained, it does explain how roaring bitmaps can be turned on.
* fix
* fix
* fix
* fix
The behavior is now that filters on "null" will match rows with no
values. The behavior in the past was inconsistent; sometimes these
filters would match and sometimes they wouldn't.
Adds tests for this behavior to SelectorFilterTest and
BoundFilterTest, for query-level filters and filtered aggregates.
Fixes#2750.
BoundFilter:
- For lexicographic bounds, use bitmapIndex.getIndex to find the start and end points,
then union all bitmaps between those points.
- For alphanumeric bounds, iterate through dimValues, and union all bitmaps for values
matching the predicate.
- Change behavior for nulls: it used to be that the BoundFilter would never match nulls,
now it matches nulls if "" is allowed by the lower limit and not excluded by the
upper limit.
Interface changes:
- BitmapIndex: add `int getIndex(value)` to make it possible to get the index for a
value without retrieving the bitmap.
- BitmapIndex: remove `ImmutableBitmap getBitmap(value)`, change callers to `getBitmap(getIndex(value))`.
- BitmapIndexSelector: allow retrieving the underlying BitmapIndex through getBitmapIndex.
- Clarified contract of indexOf in Indexed, GenericIndexed.
Also added tests for SelectorFilter, NotFilter, and BoundFilter.
I believe that the instanceof chain in Filters exists because in the past, Filter
and DimFilter were in different packages (DimFilter was in druid-client and Filter
was in druid-processing). And since druid-client didn't depend on druid-processing,
DimFilter couldn't have a toFilter method. But now it can.
This removes Filter.makeMatcher(ColumnSelectorFactory) and adds a
ValueMatcherFactory implementation to FilteredAggregatorFactory so it can
take advantage of existing makeMatcher(ValueMatcherFactory) implementations.
This patch also removes the Bound-based method from ValueMatcherFactory. Its
only user was the SpatialFilter, which could use the Predicate-based method.
Fixes#2604.
- Add central doc for multi-value dimensions, with some content from other docs.
- Link to multi-value dimension doc from topN and groupBy docs.
- Fixes a broken link from dimensionspecs.md, which was presciently already
linking to this nonexistent doc.
- Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes
"multi-value") in favor of "multi-value".