* add prefixes support to google input source, making it symmetrical-ish with s3
* docs
* more better, and tests
* unused
* formatting
* javadoc
* dependencies
* oops
* review comments
* better javadoc
* add s3 input source for native batch ingestion
* add docs
* fixes
* checkstyle
* lazy splits
* fixes and hella tests
* fix it
* re-use better iterator
* use key
* javadoc and checkstyle
* exception
* oops
* refactor to use S3Coords instead of URI
* remove unused code, add retrying stream to handle s3 stream
* remove unused parameter
* update to latest master
* use list of objects instead of object
* serde test
* refactor and such
* now with the ability to compile
* fix signature and javadocs
* fix conflicts yet again, fix S3 uri stuffs
* more tests, enforce uri for bucket
* javadoc
* oops
* abstract class instead of interface
* null or empty
* better error
* Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor
* Fix docs and javadoc
* Add unit tests for large or small estimated num splits
* add override
* add parquet support to native batch
* cleanup
* implement toJson for sampler support
* better binaryAsString test
* docs
* i hate spellcheck
* refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest
* add comment, fix some stuff
* adjustments
* fix accident
* tweaks
If the JDBC drivers are missing from the lookup extensions, throw an
exception that directs the user how to resolve the issue. This change is
a follow up to #8825.
* transformSpec + array expressions
changes:
* added array expression support to transformSpec
* removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning
* hijacked index task test to test changes
* remove docs about being unsupported
* re-arrange test assert
* unused imports
* imports
* fix tests
* preserve types
* suppress warning, fixes, add test
* formatting
* cleanup
* better list to array type conversion and tests
* fix oops
* Add reference to `druid.storage.type`
This should be in here. Without setting storage type to S3 globally it will obviously not be used, even if all other parameters are correct.
* Update s3.md
Add global storage parameter to knob table.
* Update s3.md
* SQL: EARLIEST, LATEST aggregators.
I chose these names instead of FIRST, LAST because those are already
reserved functions in Calcite that mean something different. I think
these are also better names anyway.
* Finalify.
* SQL updates.
* Adjust aggregator calls.
* Validations, test updates.
* Review docs.
* sketch of broker parallel merges done in small batches on fork join pool
* fix non-terminating sequences, auto compute parallelism
* adjust benches
* adjust benchmarks
* now hella more faster, fixed dumb
* fix
* remove comments
* log.info for debug
* javadoc
* safer block for sequence to yielder conversion
* refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool
* smooth yield rate adjustment, more logs to help tune
* cleanup, less logs
* error handling, bug fixes, on by default, more parallel, more tests
* remove unused var
* comments
* timeboundary mergeFn
* simplify, more javadoc
* formatting
* pushdown config
* use nanos consistently, move logs back to debug level, bit more javadoc
* static terminal result batch
* javadoc for nullability of createMergeFn
* cleanup
* oops
* fix race, add docs
* spelling, remove todo, add unhandled exception log
* cleanup, revert unintended change
* another unintended change
* review stuff
* add ParallelMergeCombiningSequenceBenchmark, fixes
* hyper-threading is the enemy
* fix initial start delay, lol
* parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2
* fix those important style issues with the benchmarks code
* lazy sequence creation for benchmarks
* more benchmark comments
* stable sequence generation time
* update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs
* add jmh thread based benchmarks, cleanup some stuff
* oops
* style
* add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose
* retool benchmark to allow modeling more typical heterogenous heavy workloads
* spelling
* fix
* refactor benchmarks
* formatting
* docs
* add maxThreadStartDelay parameter to threaded benchmark
* why does catch need to be on its own line but else doesnt
* Add option lateMessageRejectionStartDate
* Use option lateMessageRejectionStartDate
* Fix tests
* Add lateMessageRejectionStartDate to kafka indexing service
* Update tests kafka indexing service
* Fix tests for KafkaSupervisorTest
* Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig
* Fix var name
* Update documentation
* Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.
* remove select query
* thanks teamcity
* oops
* oops
* add back a SelectQuery class that throws RuntimeExceptions linking to docs
* adjust text
* update docs per review
* deprecated
Since it hasn't received updates or community interest in a while, it makes sense
to de-emphasize it in the distribution and most documentation (outside of simple
mentions of its existence).
* Support assign tasks to run on different tiers of MiddleManagers
* address comments
* address comments
* rename tier to category and docs
* doc
* fix doc
* fix spelling errors
* docs