Because timestamps at the end instant are not actually part of the interval. This
affected benchmark numbers, since it meant some data points would not be queried
(the interval for the query was based on getDataInterval) and also the
TimestampCheckingOffsets could not use the allWithinThreshold optimization.
* Migrate IndexerSQLMetadataStorageCoordinator.getUnusedSegmentsForInterval to streaming
* Missed query from #2859
* Make inReadOnlyTransaction part of SQLMetadataConnector
* Initial commit of caffeine cache
* Address code comments
* Move and fixup README.md a bit
* Improve caffeine readme information
* Cleanup caffeine pom
* Address review comments
* Bump caffeine to 2.3.1
* Bump druid version to 0.9.2-SNAPSHOT
* Make test not fail randomly.
See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation
* Fix distribution and documentation
* Add caffeine to extensions.md
* Fix links in extensions.md
* Lexicographic
This is actually reasonable for a groupBy or lexicographic topNs that is
being used to do a "COUNT DISTINCT" kind of query. No aggregators are
needed for that query, and including a dummy aggregator wastes 8 bytes
per row.
It's kind of silly for timeseries, but why not.
* support alphanumeric sort in search query
* address a comment about handling equals() and hashCode()
* address comments
* add Ut for string comparators
* address a comment about space indentations.
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.
Both of these are described in more detail in #2987.
There are two goals of this patch:
1. Make it possible for historical/realtime nodes to return larger groupBy
result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
columns, avoiding materialization.
This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
* Add status command to launch scripts
* make druid init script to pick up config directories from environment variables
make druid init script to pick up config directories from environment
variables
* add get dimension rangeset to filters
* add get domain to ShardSpec and added chunk filter in caching clustered client
* add null check and modified not filter, started with unit test
* add filter test with caching
* refactor and some comments
* extract filtershard to helper function
* fixup
* minor changes
* update javadoc
* Better handling for parseExceptions
* make parseException handling consistent with Realtime
* change combiner default val to true
* review comments
* review comments
* docs: replace OR by AND inside topnquery docs about multi value dimensions
* docs: replace OR by AND inside groupby docs about multi value dimensions
* Allow S3 version finder to search entire s3 object key
* Previously only was able to search immediate "directory"
* Update method javadoc
* Expand docs a bit better