HadoopDruidIndexerConfig:
- Add partitionsSpec (backwards compatible with targetPartitionSize and partitionDimension)
- Add assumeGrouped flag to partitionsSpec
DeterminePartitionsJob:
- Skip group-by job if assumeGrouped is set
- Clean up code a bit
2) Fix bug with IndexMerger and null columns
3) Add QueryableIndexIndexableAdapter so that QueryableIndexes can be merged
4) Adjust twitter example to have multiple values for each hash tag
5) Adjusted GroupByQueryEngine to just drop dimensions that don't exist instead of throwing an NPE
- Can handle non-rolled-up input (by grouping input rows using an additional MR stage)
- Can select its own partitioning dimension, if none is supplied
- Can detect and avoid oversized shards due to bad dimension value distribution
- Shares input parsing code with IndexGeneratorJob
2) Have the DbUpdaterJob read descriptors from the temporary working directory instead of looking in the final segment output location (often the eventually consistent S3)
3) 1 and 2 Fixes#30
2) Add some docs to InputRow/Row to indicate that column names passed into the methods are *always* lowercase and that the rows need to act accordingly. (fixes#29, or at least clarifies the behavior...)