* add first and last aggregator
* add test and fix
* moving around
* separate aggregator valueType
* address PR comment
* add finalize inner query and adjust v1 inner indexing
* better test and fixes
* java-util import fixes
* PR comments
* Add first/last aggs to ITWikipediaQueryTest
* sortByDimsFirst flag for groupBy query
* Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType>
* fix review comments
* fix review comments regarding removing code duplication of dim/time comparison
* move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde
* remove unnecessary system.out.println
* make access static var NATURAL_NULLS_FIRST directly
* further review comments addressing
* Add "like" filter.
* Addressed some PR comments.
* Slight simplifications to LikeFilter.
* Additional simplifications.
* Fix comment in LikeFilter.
* Clarify comment in LikeFilter.
* Simplify LikeMatcher a bit.
* No use going through the optimized path if prefix is empty.
* Add more tests.
Also change defaults:
- bufferGrouperMaxLoadFactor from 0.75 to 0.7.
- maxMergingDictionarySize to 100MB from 25MB, should be more appropriate
for most heaps.
Follow-up to #1773, which meant to add more useful query errors but
did not actually do so. Since that patch, any error other than
interrupt/cancel/timeout was reported as `{"error":"Unknown exception"}`.
With this patch, the error fields are:
- error, one of the specific strings "Query interrupted", "Query timeout",
"Query cancelled", or "Unknown exception" (same behavior as before).
- errorMessage, the message of the topmost non-QueryInterruptedException
in the causality chain.
- errorClass, the class of the topmost non-QueryInterruptedException
in the causality chain.
- host, the host that failed the query.
* Add time interval dim filter and retention analysis example
* Use closed-open matching for intervals, update cache key generation
* Fix time filtering tests for interval boundary change
* ability to not rollup at index time, make pre aggregation an option
* rename getRowIndexForRollup to getPriorIndex
* fix doc misspelling
* test query using no-rollup indexes
* fix benchmark fail due to jmh bug
* Add numeric StringComparator
* Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query
* Address PR comments, add multithreaded BoundDimFilter test
* Add comment on strlen tie handling
* Add timeseries interval filter benchmark
* Adjust docs
* Use jackson for StringComparator, address PR comments
* Add new TopNMetricSpec and SearchSortSpec with tests (WIP)
* More TopNMetricSpec and SearchSortSpec tests
* Fix NewSearchSortSpec serde
* Update docs for new DimensionTopNMetricSpec
* Delete NumericDimensionTopNMetricSpec
* Delete old SearchSortSpec
* Rename NewSearchSortSpec to SearchSortSpec
* Add TopN numeric comparator benchmark, address PR comments
* Refactor OrderByColumnSpec
* Add null checks to NumericComparator and String->BigDecimal conversion function
* Add more OrderByColumnSpec serde tests
This fixes a potential issue where groupBy resources could be allocated to
create a Sequence, but then the Sequence is never used, and thus the resources
are never freed.
Also simplifies how groupBy handles config overrides (this made the new
unit test easier to write).
* Support filtering on __time column
* Rename DruidPredicate
* Add docs for ValueMatcherFactory, add comment on getColumnCapabilities
* Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate
* Address PR comments (support filter on all long columns)
* Use predicate factory instead of composite predicate
* Address PR comments
* Lazily initialize long handling in selector/in filter
* Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe
* Add multithreaded selector/in filter test
* Fix non-final lock object in SelectorDimFilter
This is actually reasonable for a groupBy or lexicographic topNs that is
being used to do a "COUNT DISTINCT" kind of query. No aggregators are
needed for that query, and including a dummy aggregator wastes 8 bytes
per row.
It's kind of silly for timeseries, but why not.
* support alphanumeric sort in search query
* address a comment about handling equals() and hashCode()
* address comments
* add Ut for string comparators
* address a comment about space indentations.
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.
Both of these are described in more detail in #2987.
There are two goals of this patch:
1. Make it possible for historical/realtime nodes to return larger groupBy
result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
columns, avoiding materialization.
This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
* docs: replace OR by AND inside topnquery docs about multi value dimensions
* docs: replace OR by AND inside groupby docs about multi value dimensions
* Cleanup the base lookup cluster wide config docs
* Add better examples in lookups-cached-global.md
* Use actual valid stock lookups
* Fixed maps with :
* Add mix of lookups
* Better examples in extension
* Remove unneeded namespace requirement
* Add extra line space
* Add link to lookup tiers
* Renamed header
* Async lookups-cached-global by default
* Also better lookup docs
* Fix test timeouts
* Fix timing of deserialized test
* Fix problem with 0 wait failing immediately
There is no such thing as a "Java aggregator" in Druid from a user's point of view, there are just native aggregator that happen to be implemented in Java.
* Datasource as lookup tier
* Adds an option to let indexing service tasks pull their lookup tier from the datasource they are working for.
* Fix bad docs for lookups lookupTier
* Add Datasource name holder
* Move task and datasource to be pulled from Task file
* Make LookupModule pull from bound dataSource
* Fix test
* Fix code style on imports
* Fix formatting
* Make naming better
* Address code comments about naming
* make isSingleThreaded groupBy query processing overridable at query time
* refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent