Task:
- Add TaskCallback to run method (for early returns)
TaskStatus:
- Remove CONTINUED status
- Add segmentsNuked (placeholder for future deletion support)
- Add more builder methods
- Add validations to constructor
TaskStorage:
- Add TaskStorageQueryAdapter, a concrete class that wraps TaskStorages and
provides various read-only convenience methods
- Add getTask method for benefit of TaskStorageQueryAdapter
TaskQueue:
- Rename "done" to "notify"
- notify is responsible for deciding if we should commit
- Add optional commitRunnable to "notify", which gets called when it's time to commit
- Allow nextTasks and commits to run early (statusCode RUNNING)
- Move getStatus, collapseStatus functionality to TaskStorageQueryAdapter
generates index segments.
The HadoopIndexTask run method wraps a HadoopDruidIndexerJob run method. The
key modifications to the HadoopDruidIndexerJob are as follows:
- The UpDaterJobSpec field of the config that is used to set up the indexer job
is set to null. This ensures that the job does not push a list of published
segments to the database, in order to allow the indexing service to handle this
later.
- Set the version field of the config file based on the TaskContext. Also
changed config.setVersion method to take a string (as opposed to a Date) as
input, and propogated this change where necessary.
- Set the SegmentOutputDir field of the config file based on the TaskToolbox,
to allow the indexing service to handle where to write the segments too.
- Added a method to IndexGeneratorJob called getPublishedSegments, that simply
returns a list of published segments without publishing this list to the
database.
HadoopDruidIndexerConfig:
- Add partitionsSpec (backwards compatible with targetPartitionSize and partitionDimension)
- Add assumeGrouped flag to partitionsSpec
DeterminePartitionsJob:
- Skip group-by job if assumeGrouped is set
- Clean up code a bit
2) Fix bug with IndexMerger and null columns
3) Add QueryableIndexIndexableAdapter so that QueryableIndexes can be merged
4) Adjust twitter example to have multiple values for each hash tag
5) Adjusted GroupByQueryEngine to just drop dimensions that don't exist instead of throwing an NPE
- Can handle non-rolled-up input (by grouping input rows using an additional MR stage)
- Can select its own partitioning dimension, if none is supplied
- Can detect and avoid oversized shards due to bad dimension value distribution
- Shares input parsing code with IndexGeneratorJob