* add first and last aggregator
* add test and fix
* moving around
* separate aggregator valueType
* address PR comment
* add finalize inner query and adjust v1 inner indexing
* better test and fixes
* java-util import fixes
* PR comments
* Add first/last aggs to ITWikipediaQueryTest
* GroupByBenchmark: Add serde, spilling, all-gran benchmarks.
Also use more iterations.
* groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.
Specifically:
- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
* sortByDimsFirst flag for groupBy query
* Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType>
* fix review comments
* fix review comments regarding removing code duplication of dim/time comparison
* move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde
* remove unnecessary system.out.println
* make access static var NATURAL_NULLS_FIRST directly
* further review comments addressing
* More robust Filter tests.
All Filter tests now exercise the CNF and post-filtering features.
* Fixes to RowBasedValueMatcherFactory and to bound filters.
- Change Comparables to Strings in ValueMatcher related code.
- Break out RowBasedValueMatcherFactory, fix a variety of issues around nulls, and add tests.
- Fix bound filters on long columns with non-numeric bounds, and add tests.
v1 subqueries try to use aggregators to "transfer" values from the inner
results to an incremental index, but aggregators can't transfer all kinds of
values (strings are a common one). This is a workaround that selectively
ignores what the outer aggregators ask for and instead assumes that we know
best.
These are in the same commit because the name validation changed the kinds of
errors that were thrown by v1 subqueries.
This also involved some other test changes:
- Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2
engine does merging there.
- Changed test byteBuffer pools from on-heap to off-heap to work around
https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.
* Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics
* Remove unused imports
* Use empty string instead of "testing-version" as a version placeholder
Excludes tests from AvoidStaticImport, since those are used often there and
I didn't want to make this changeset too large. Production code use was minimal
and I switched those to non-static imports.
* Add "like" filter.
* Addressed some PR comments.
* Slight simplifications to LikeFilter.
* Additional simplifications.
* Fix comment in LikeFilter.
* Clarify comment in LikeFilter.
* Simplify LikeMatcher a bit.
* No use going through the optimized path if prefix is empty.
* Add more tests.
* Support string type in math expression
addressed comments
addressed comments
Addressed comments
* Updated math function document
* Addressed comments
* Remove unused ComplexColumnImpl class
* Remove throws IOException from close() in GenericColumn, ComplexColumn, IndexedFloats and IndexedLongs
* Use concise try-with-resources syntax in several places
* Fix resource leaks (ComplexColumn and GenericColumn) in SegmentAnalyzer, SearchQueryRunner, QueryableIndexIndexableAdapter and QueryableIndexStorageAdapter
* Use Closer in Iterable, returned from QueryableIndexIndexableAdapter.getRows(), in order to try to close everything even if closing some parts thew exceptions
* Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was #2820)
* Addressed comments
* addressed comments
* Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue()
* Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue()
* Remove unused import
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.
This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.
This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before #3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
* Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9
* Don't mask index in MergeIntIterator.makeQueueElement()
* DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator
* Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable
* Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }