This also involved some other test changes:
- Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2
engine does merging there.
- Changed test byteBuffer pools from on-heap to off-heap to work around
https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.
* Use Long timestamp as key instead of DateTime.
DateTime representation is screwed up when you store with an obj
and read with a different DateTime obj.
For example: The code below fails when you use DateTime as key
```
DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles")));
HashMap<DateTime, String> map = new HashMap<>();
map.put(odt, "abc");
DateTime dt = new DateTime(odt.getMillis());
System.out.println(map.get(dt));
```
* Respect timezone when creating the file.
* Update docs with timezone caveat in granularity spec
* Remove unused imports
* InputRowParser to decode OrcStruct from OrcNewInputFormat
* add unit test for orc hadoop indexing
* update docs and fix test code bug
* doc updated
* resove maven dependency conflict
* remove unused imports
* fix returning array type from Object[] to correct primitive array type
* fix to support getDimension() of MapBasedRow : changing return type of orc list from array to list
* rebase and updated based on comments
* updated based on comments
* on reflecting review comments
* fix bug in typeStringFromParseSpec() and add unit test
* add license header
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.
Both of these are described in more detail in #2987.
There are two goals of this patch:
1. Make it possible for historical/realtime nodes to return larger groupBy
result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
columns, avoiding materialization.
This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
* Eliminate exclusion groups from pull-deps
* Only consider dependency nodes in pull-deps if they are not in the following scopes
* provided
* test
* system
* Fix a bunch of `<scope>provided</scope>` missing tags
* Better exclusions for a couple of problematic libs