* No more singleton. Reduce iterations
* Granularities
* Fix the delay in the test
* Add license header
* Remove unused imports
* Lot more unused imports from all the rearranging
* CR feedback
* Move javadoc to constructor
* Refactor Segment Granularity
* Beginning of one granularity
* Copy the fix for custom periods in segment-grunalrity over here.
* Remove the custom serialization for now.
* Compilation cleanup
* Reformat code
* Fixing unit tests
* Unify to use a single iterable
* Backward compatibility for rolling upgrade
* Minor check style. Cosmetic changes.
* Rename length and millis to duration
* CR feedback
* Minor changes.
* initial commits for finalizeFieldAccess #2433
* fix some bugs to run a query
* change name of method Queries.verifyAggregations to Queries.prepareAggregations
* add Uts
* fix Ut failures
* rebased to master
* address comments and add a Ut for arithmetic post aggregators
* rebased to the master
* address the comment of injection within arithmetic post aggregator
* address comments and introduce decorate() in the PostAggregator interface.
* Address comments. 1. Implements getComparator in FinalizingFieldAccessPostAggregator and add Uts for it 2. Some minor changes like renaming a method name.
* Fix a code style mismatch.
* Rebased to the master
* Removing Integer.MAX column size limit.
* On demand creation of headerLong, use v2 instead of v3
* Avoid reusing the same object from a previous test.
* Avoid reusing the same object from a previous test part#2
* code formatting.
* GenericIndexed/Writer code review changes.
* GenericIndexed/writer code review requested changes.
* checkIndex() to static
* native endianess for genericIndexedV2, code review requested changes.
* Formatting
* Hll fix.
* use native endianess during bag size calculation.
* Code review requested changes.
* IOPeon close() changes.
* use different tmp directory path for testing.
* Code review requested changes.
* Less use of File.deleteOnExit()
* removed deleteOnExit from most of the tests/benchmarks/iopeon
* Made IOpeon closable
* Formatting.
* Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon
* Close all aggregators when closing onHeapIncrementalIndex
* Aggregators are now handled as Closeables, remove unnecessary mock in test
* Fix variable shadowing
* Add PostAggregators to generator cache keys for top-n queries
* Add tests for strings
* Remove debug comments
* Add type keys and list sizes to cache key
* Make post aggregators used for sort are considered for cache key generation
* Use assertArrayEquals()
* Improve findPostAggregatorsForSort()
* Address comments
* fix test failure
* address comments
* Simplify LikeFilter implementation of getBitmapIndex, estimateSelectivity.
LikeFilter:
- Reduce code duplication, and simplify methods, at the cost of incurring an extra box
of ImmutableBitmap into a SingletonImmutableList. I think this is fine, since this
should be cheap and the code path is not hot (just once per filter).
Filters:
- Make estimateSelectivity public since it seems intended that they be used by Filter
implementations, and Filters from extensions may want to use them too. Removed
@VisibleForTesting for the same reason.
- Rename one of the estimatePredicateSelectivity overloads to estimateSelectivity, since
predicates aren't involved.
* Address PR comments.
* Remove unused import
* Change List to Collection
* Add filter selectivity estimation for auto search strategy
* Addressed comments
* Lazy bitmap materialization for bitmap sampling and java docs
* Addressed comments.
- Fix wrong non-overlap ratio computation and added unit tests.
- Change Iterable<Integer> to IntIterable
- Remove unnecessary Iterable<Integer>
* Addressed comments
- Split a long ternary operation into if-else blocks
- Add IntListUtils.fromTo()
* Fix test failure and add a test for RangeIntList
* fix code style
* Diabled selectivity estimation for multi-valued dimensions
* Address comment
* Introduce SegmentizerFactory
- that knows how to deserialize specific type of segment
- Default implementation is MMappedQueryableSegmentizerFactory which creates QueryableIndexSegment
- Unit test for the default behavior
* review comments
* Add virtual column types, holder serde, and safety features.
Virtual columns:
- add long, float, dimension selectors
- put cache IDs in VirtualColumnCacheHelper
- adjust serde so VirtualColumns can be the holder object for Jackson
- add fail-fast validation for cycle detection and duplicates
- add expression virtual column in core
Storage adapters:
- move virtual column hooks before checking base columns, to prevent surprises
when a new base column is added that happens to have the same name as a
virtual column.
* Fix ExtractionDimensionSpecs with virtual dimensions.
* Fix unused imports.
* CR comments
* Merge one more time, with feeling.
* * Add DimensionSelector.idLookup() and nameLookupPossibleInAdvance() to allow better inspection of features DimensionSelectors supports, and safer code working with DimensionSelectors in BaseTopNAlgorithm, BaseFilteredDimensionSpec, DimensionSelectorUtils;
* Add PredicateFilteringDimensionSelector, to make BaseFilteredDimensionSpec to be able to decorate DimensionSelectors with unknown cardinality;
* Add DimensionSelector.makeValueMatcher() (two kinds) for DimensionSelector-side specifics-aware optimization of ValueMatchers;
* Optimize getRow() in BaseFilteredDimensionSpec's DimensionSelector, StringDimensionIndexer's DimensionSelector and SingleScanTimeDimSelector;
* Use two static singletons, TrueValueMatcher and FalseValueMatcher, instead of BooleanValueMatcher;
* Add NullStringObjectColumnSelector singleton and use it in MapVirtualColumn
* Rename DimensionSelectorUtils.makeNonDictionaryEncodedIndexedIntsBasedValueMatcher to makeNonDictionaryEncodedRowBasedValueMatcher
* Make ArrayBasedIndexedInts constructor private, replace it's usages with of() static factory method
* Cache baseIdLookup in ForwardingFilteredDimensionSelector
* Fix a bug in DimensionSelectorUtils.makeRowBasedValueMatcher(selector, predicate, matchNull)
* Employ precomputed BitSet optimization in DimensionSelector.makeValueMatcher(value, matchNull) when lookupId() is not available, but cardinality is known and lookupName() is available
* Doc fixes
* Addressed comments
* Fix
* Fix
* Adjust javadoc of DimensionSelector.nameLookupPossibleInAdvance() for SingleScanTimeDimSelector
* throw UnsupportedOperationException instead of IAE in BaseTopNAlgorithm
* Removing unused code from io.druid.java.util.common.guava package; fix#3563 (more consistent and paranoiac resource handing in Sequences subsystem); Add Sequences.wrap() for DRY in MetricsEmittingQueryRunner, CPUTimeMetricQueryRunner and SpecificSegmentQueryRunner; Catch MissingSegmentsException in SpecificSegmentQueryRunner's yielder.next() method (follow up on #3617)
* Make Sequences.withEffect() execute the effect if the wrapped sequence throws exception from close()
* Fix strange code in MetricsEmittingQueryRunner
* Add comment on why YieldingSequenceBase is used in Sequences.withEffect()
* Use Closer in OrderedMergeSequence and MergeSequence to close multiple yielders
* streaming version of select query
* use columns instead of dimensions and metrics;prepare for valueVector;remove granularity
* respect query limit within historical
* use constant
* fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner
* add some test for scan query
* add scan query document
* fix merge conflicts
* add compactedList resultFormat, this format is better for json ser/der
* respect query timeout
* respect query limit on broker
* use static consts and remove unused code
* SQL support for nested groupBys.
Allows, for example, doing exact count distinct by writing:
SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)
Contrast with approximate count distinct, which is:
SELECT COUNT(DISTINCT col) FROM druid.foo
* Add deeply-nested groupBy docs, tests, and maxQueryCount config.
* Extract magic constants into statics.
* Rework rules to put preconditions in the "matches" method.
* Add an option to SearchQuery to choose a search query execution strategy.
Supported strategies are
1) Index-only query execution
2) Cursor-based scan
3) Auto: choose an efficient strategy for a given query
* Add SearchStrategy and SearchQueryExecutor
* Address comments
* Rename strategies and set UseIndexesStrategy as the default strategy
* Add a cost-based planner for auto strategy
* Add document
* Fix code style
* apply code style
* apply comments
* Filters: Use ColumnSelectorFactory directly for building row-based matchers.
* Adjustments based on code review.
- BoundDimFilter: fewer volatiles, rename matchesAnything to !matchesNothing.
- HavingSpecs: Clarify that they are not thread-safe, and make DimFilterHavingSpec
not thread safe.
- Renamed rowType to rowSignature.
- Added specializations for time-based vs non-time-based DimensionSelector in RBCSF.
- Added convenience method DimensionHanderUtils.createColumnSelectorPlus.
- Added singleton ZeroIndexedInts.
- Added test cases for DimFilterHavingSpec.
* Make ValueMatcherColumnSelectorStrategy actually use the associated selector.
* Add RangeIndexedInts.
* DimFilterHavingSpec: Fix concurrent usage guard on jdk7.
* Add assertion to ZeroIndexedInts.
* Rename no-longer-volatile members.
* add first and last aggregator
* add test and fix
* moving around
* separate aggregator valueType
* address PR comment
* add finalize inner query and adjust v1 inner indexing
* better test and fixes
* java-util import fixes
* PR comments
* Add first/last aggs to ITWikipediaQueryTest
* GroupByBenchmark: Add serde, spilling, all-gran benchmarks.
Also use more iterations.
* groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.
Specifically:
- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
* sortByDimsFirst flag for groupBy query
* Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType>
* fix review comments
* fix review comments regarding removing code duplication of dim/time comparison
* move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde
* remove unnecessary system.out.println
* make access static var NATURAL_NULLS_FIRST directly
* further review comments addressing
* More robust Filter tests.
All Filter tests now exercise the CNF and post-filtering features.
* Fixes to RowBasedValueMatcherFactory and to bound filters.
- Change Comparables to Strings in ValueMatcher related code.
- Break out RowBasedValueMatcherFactory, fix a variety of issues around nulls, and add tests.
- Fix bound filters on long columns with non-numeric bounds, and add tests.
v1 subqueries try to use aggregators to "transfer" values from the inner
results to an incremental index, but aggregators can't transfer all kinds of
values (strings are a common one). This is a workaround that selectively
ignores what the outer aggregators ask for and instead assumes that we know
best.
These are in the same commit because the name validation changed the kinds of
errors that were thrown by v1 subqueries.
This also involved some other test changes:
- Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2
engine does merging there.
- Changed test byteBuffer pools from on-heap to off-heap to work around
https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.
* Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics
* Remove unused imports
* Use empty string instead of "testing-version" as a version placeholder
Excludes tests from AvoidStaticImport, since those are used often there and
I didn't want to make this changeset too large. Production code use was minimal
and I switched those to non-static imports.
* Add "like" filter.
* Addressed some PR comments.
* Slight simplifications to LikeFilter.
* Additional simplifications.
* Fix comment in LikeFilter.
* Clarify comment in LikeFilter.
* Simplify LikeMatcher a bit.
* No use going through the optimized path if prefix is empty.
* Add more tests.
* Support string type in math expression
addressed comments
addressed comments
Addressed comments
* Updated math function document
* Addressed comments
* Remove unused ComplexColumnImpl class
* Remove throws IOException from close() in GenericColumn, ComplexColumn, IndexedFloats and IndexedLongs
* Use concise try-with-resources syntax in several places
* Fix resource leaks (ComplexColumn and GenericColumn) in SegmentAnalyzer, SearchQueryRunner, QueryableIndexIndexableAdapter and QueryableIndexStorageAdapter
* Use Closer in Iterable, returned from QueryableIndexIndexableAdapter.getRows(), in order to try to close everything even if closing some parts thew exceptions
* Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was #2820)
* Addressed comments
* addressed comments
* Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue()
* Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue()
* Remove unused import
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.
This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.
This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before #3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
* Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9
* Don't mask index in MergeIntIterator.makeQueueElement()
* DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator
* Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable
* Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }
- Fix GroupByRowProcessor config overrides
- Fix GroupByRowProcessor resource limit checking
- Invert subquery context overrides such that for the subquery, its own
keys override keys from the outer query, not the other way around.
The last bit is necessary for the test to work, and seems like a better
way to do it anyway.