* report message gap, source gap and sink count in RealtimePlumber
* report message gap, sink count in Appenderator
* add ingest/events/sourceGap in metrics.md
* remove source gap
* Deduplicate looking for bitset.nextSetBit() in BitSetIterator.next() and hasNext()
* Add BitmapIterationTest
* More elaborate comment on why Roaring is not tested in BitmapIterationTest
* Fix PathChildrenCache's executorService leak in Announcer, CuratorInventoryManager and RemoteTaskRunner
* Use a single ExecutorService for all workerStatusPathChildrenCaches in RemoteTaskRunner
Noticed this in our internal testing, sometimes realtime index tasks in
kafkaIndexing service can get stuck waiting for handoff if datasource
is disabled before there task completion.
This is a workaround to ensure integration tests do not hit this case
until https://github.com/druid-io/druid/issues/1729 is fixed.
* GroupByBenchmark: Add serde, spilling, all-gran benchmarks.
Also use more iterations.
* groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.
Specifically:
- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
* make sure CliCoordinator initializes and starts DerbyMetadataStorage first if configured
* Revert "make sure CliCoordinator initializes and starts DerbyMetadataStorage first if configured"
This reverts commit 54f5644054626d4a9e2448bb4bd5e6ce9a9fca1d.
* make sure CliCoordinator initializes and starts DerbyMetadataStorage first if configured
* sortByDimsFirst flag for groupBy query
* Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType>
* fix review comments
* fix review comments regarding removing code duplication of dim/time comparison
* move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde
* remove unnecessary system.out.println
* make access static var NATURAL_NULLS_FIRST directly
* further review comments addressing
* Normalized Cost Balancer
* Adding documentation and renaming to use diskNormalizedCostBalancer
* Remove balancer from the strings
* Update docs and include random cost balancer
* Fix checkstyle issues
This retries when the start condition is met but SELECT -> INSERT/UPDATE
fails, which indicates a race. If the start condition isn't met, there
won't be any retrying.
* More robust Filter tests.
All Filter tests now exercise the CNF and post-filtering features.
* Fixes to RowBasedValueMatcherFactory and to bound filters.
- Change Comparables to Strings in ValueMatcher related code.
- Break out RowBasedValueMatcherFactory, fix a variety of issues around nulls, and add tests.
- Fix bound filters on long columns with non-numeric bounds, and add tests.
* Blacklist workers if they fail for too many times
* Adding documentation
* Changing to timeout to period and updating docs
* 1. Add configurable maxPercentageBlacklistWorkers
2. Rename variable
* Change maxPercentageBlacklistWorkers to double
* Remove thread.sleep
v1 subqueries try to use aggregators to "transfer" values from the inner
results to an incremental index, but aggregators can't transfer all kinds of
values (strings are a common one). This is a workaround that selectively
ignores what the outer aggregators ask for and instead assumes that we know
best.
These are in the same commit because the name validation changed the kinds of
errors that were thrown by v1 subqueries.
* option to reset offset automatically in case of OffsetOutOfRangeException
if the next offset is less than the earliest available offset for that partition
* review comments
* refactoring
* refactor
* review comments
This also involved some other test changes:
- Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2
engine does merging there.
- Changed test byteBuffer pools from on-heap to off-heap to work around
https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.