* add s3 input source for native batch ingestion
* add docs
* fixes
* checkstyle
* lazy splits
* fixes and hella tests
* fix it
* re-use better iterator
* use key
* javadoc and checkstyle
* exception
* oops
* refactor to use S3Coords instead of URI
* remove unused code, add retrying stream to handle s3 stream
* remove unused parameter
* update to latest master
* use list of objects instead of object
* serde test
* refactor and such
* now with the ability to compile
* fix signature and javadocs
* fix conflicts yet again, fix S3 uri stuffs
* more tests, enforce uri for bucket
* javadoc
* oops
* abstract class instead of interface
* null or empty
* better error
* Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor
* Fix docs and javadoc
* Add unit tests for large or small estimated num splits
* add override
* Add FileUtils.createTempDir() and enforce its usage.
The purpose of this is to improve error messages. Previously, the error
message on a nonexistent or unwritable temp directory would be
"Failed to create directory within 10,000 attempts".
* Further updates.
* Another update.
* Remove commons-io from benchmark.
* Fix tests.
* add TsvInputFormat
* refactor code
* fix grammar
* use enum replace string literal
* code refactor
* code refactor
* mark abstract for base class meant not to be instantiated
* remove constructor for test
* add parquet support to native batch
* cleanup
* implement toJson for sampler support
* better binaryAsString test
* docs
* i hate spellcheck
* refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest
* add comment, fix some stuff
* adjustments
* fix accident
* tweaks
* Refactor parallel indexing perfect rollup partitioning
Refactoring to make it easier to later add range partitioning for
perfect rollup parallel indexing. This is accomplished by adding several
new base classes (e.g., PerfectRollupWorkerTask) and new classes for
encapsulating logic that needs to be changed for different partitioning
strategies (e.g., IndexTaskInputRowIteratorBuilder).
The code is functionally equivalent to before except for the following
small behavior changes:
1) PartialSegmentMergeTask: Previously, this task had a priority of
DEFAULT_TASK_PRIORITY. It now has a priority of
DEFAULT_BATCH_INDEX_TASK_PRIORITY (via the new PerfectRollupWorkerTask
base class), since it is a batch index task.
2) ParallelIndexPhaseRunner: A decorator was added to
subTaskSpecIterator to ensure the subtasks are generated with unique
ids. Previously, only tests (i.e., MultiPhaseParallelIndexingTest)
would have this decorator, but this behavior is desired for non-test
code as well.
* Fix forbidden apis and pmd warnings
* Fix analyze dependencies warnings
* Fix IndexTask json and add IT diags
* Fix parallel index supervisor<->worker serde
* Fix TeamCity inspection errors/warnings
* Fix TeamCity inspection errors/warnings again
* Integrate changes with those from #8823
* Address review comments
* Address more review comments
* Fix forbidden apis
* Address more review comments
* HDFS input source
Add support for using HDFS as an input source. In this version, commas
or globs are not supported in HDFS paths.
* Fix forbidden api
* Address review comments
* Tidy up lifecycle, query, and ingestion logging.
The goal of this patch is to improve the clarity and usefulness of
Druid's logging for cluster operators. For more information, see
https://twitter.com/cowtowncoder/status/1195469299814555648.
Concretely, this patch does the following:
- Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the
goal of reducing redundancy and improving clarity by avoiding
showing rarely-useful log messages. This includes most "starting"
and "stopping" messages, and most messages related to individual
columns.
- Adds new log4j2 templates that show operators how to enabled DEBUG
logging for certain important packages.
- Eliminate stack traces for query errors, unless log level is DEBUG
or more. This is useful because query errors often indicate user
error rather than system error, but dumping stack trace often gave
operators the impression that there was a system failure.
- Adds task id to Appenderator, AppenderatorDriver thread names. In
the default log4j2 configuration, this will put them in log lines
as well. It's very useful if a user is using the Indexer, where
multiple tasks run in the same JVM.
- More consistent terminology when it comes to "sequences" (sets of
segments that are handed-off together by Kafka ingestion) and
"offsets" (cursors in partitions). These terms had been confused in
some log messages due to the fact that Kinesis calls offsets
"sequence numbers".
- Replaces some ugly toString calls with either the JSONification or
something more operator-accessible (like a URL or segment identifier,
instead of JSON object representing the same).
* Adjustments.
* Adjust integration test.
If the JDBC drivers are missing from the lookup extensions, throw an
exception that directs the user how to resolve the issue. This change is
a follow up to #8825.
* Use earliest offset on kafka newly discovered partitions
* resolve conflicts
* remove redundant check cases
* simplified unit tests
* change test case
* rewrite comments
* add regression test
* add junit ignore annotation
* minor modifications
* indent
* override testableKafkaSupervisor and KafkaRecordSupplier to make the test runable
* modified test constructor of kafkaRecordSupplier
* simplify
* delegated constructor
* first steps
* clean licenses
* fix capabilities
* fix specs
* more tests
* new web console on coordinator and overlord, remove setup for old consoles, old configs
* better message
* update licenses
* sync license files
* more button
* fix tslint issue
* jetty-rewrite dependency to add redirects for old console paths
* put dependency in the right place
* fix overlord detection
* fix notices, dedupe licenses
* make segment timeline work in no SQL mode
* update license
* revert hard coded coordinator mode from testing
* update restricted mode copy
* transformSpec + array expressions
changes:
* added array expression support to transformSpec
* removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning
* hijacked index task test to test changes
* remove docs about being unsupported
* re-arrange test assert
* unused imports
* imports
* fix tests
* preserve types
* suppress warning, fixes, add test
* formatting
* cleanup
* better list to array type conversion and tests
* fix oops
* use peekable iterator for numeric column selector null checking instead of bitmap.get for those sweet sweet nanoseconds
* remove unused method
* slight optimization i think
* remove clone from wrappers since we do not use and is confusing
* fixes and tests
* int instead of Integer
* fix it
* fixes, more tests
* fix
* Add reference to `druid.storage.type`
This should be in here. Without setting storage type to S3 globally it will obviously not be used, even if all other parameters are correct.
* Update s3.md
Add global storage parameter to knob table.
* Update s3.md
* Add missing button for include future, and handle logic for default true case
* Remove duplicate go to tasks button
* Fix lgtm issue
* Revert changes on old console
* Made changes based on PR comments