* SQL: EARLIEST, LATEST aggregators.
I chose these names instead of FIRST, LAST because those are already
reserved functions in Calcite that mean something different. I think
these are also better names anyway.
* Finalify.
* SQL updates.
* Adjust aggregator calls.
* Validations, test updates.
* Review docs.
There is a class of bugs due to the fact that BaseObjectColumnValueSelector
has both "getObject" and "isNull" methods, but in most selector implementations
and most call sites, it is clear that the intent of "isNull" is only to apply
to the primitive getters, not the object getter. This makes sense, because the
purpose of isNull is to enable detection of nulls in otherwise-primitive columns.
Imagine a string column with a numeric selector built on top of it. You would
want it to return isNull = true, so numeric aggregators don't treat it as
all zeroes.
Sometimes this design leads people to accidentally guard non-primitive get
methods with "selector.isNull" checks, which is improper.
This patch has three goals:
1) Fix null-handling bugs that already exist in this class.
2) Make interface and doc changes that reduce the probability of future bugs.
3) Fix other, unrelated bugs I noticed in the stringFirst and stringLast
aggregators while fixing null-handling bugs. I thought about splitting this
into its own patch, but it ended up being tough to split from the
null-handling fixes.
For (1) the fixes are,
- Fix StringFirst and StringLastAggregatorFactory to stop guarding getObject
calls on isNull, by no longer extending NullableAggregatorFactory. Now uses
-1 as a sigil value for null, to differentiate nulls and empty strings.
- Fix ExpressionFilter to stop guarding getObject calls on isNull. Also, use
eval.asBoolean() to avoid calling getLong on the selector after already
calling getObject.
- Fix ObjectBloomFilterAggregator to stop guarding DimensionSelector calls
on isNull. Also, refactored slightly to avoid the overhead of calling
getObject followed by another getter (see BloomFilterAggregatorFactory for
part of this).
For (2) the main changes are,
- Remove the "isNull" method from BaseObjectColumnValueSelector.
- Clarify "isNull" doc on BaseNullableColumnValueSelector.
- Rename NullableAggregatorFactory -> NullbleNumericAggregatorFactory to emphasize
that it only works on aggregators that take numbers as input.
- Similar naming changes to the Aggregator, BufferAggregator, and AggregateCombiner.
- Similar naming changes to helper methods for groupBy, ValueMatchers, etc.
For (3) the other fixes for StringFirst and StringLastAggregatorFactory are,
- Fixed buffer overrun in the buffer aggregators when some characters in the string
code into more than one byte (the old code used "substring" to apply a byte limit,
which is bad). I did this by introducing a new StringUtils.toUtf8WithLimit method.
- Fixed weird IncrementalIndex logic that led to reading nulls for the timestamp.
- Adjusted weird StringFirst/Last logic that worked around the weird IncrementalIndex
behavior.
- Refactored to share code between the four aggregators.
- Improved test coverage.
- Made the base stringFirst, stringLast aggregators adaptive, and streamlined the
xFold versions into aliases. The adaptiveness is similar to how other aggregators
like hyperUnique work.
* Created temporary interval input component
* Make reusable interval component
* Fixed errors with typing invalid dates
* Fix interval input styling and place into autoform
* Fix styling of popover calendar that opens off the page
* Add snapshot test and change interval to required props
* Add functionality to enter hours minutes second
* Fix min date limit
* Remove console log
* Fix difference in timezone
* Update snapshot test
* Fixed snapshot test without changing min max date
* Made changes based on discussion before converting to hooks
* Rewrote using hooks and deleted duplicate states
* Remove unused states
* Change sql query view numbers to monospace
* Made changes based on discussion
* Removed duplicate state
* sketch of broker parallel merges done in small batches on fork join pool
* fix non-terminating sequences, auto compute parallelism
* adjust benches
* adjust benchmarks
* now hella more faster, fixed dumb
* fix
* remove comments
* log.info for debug
* javadoc
* safer block for sequence to yielder conversion
* refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool
* smooth yield rate adjustment, more logs to help tune
* cleanup, less logs
* error handling, bug fixes, on by default, more parallel, more tests
* remove unused var
* comments
* timeboundary mergeFn
* simplify, more javadoc
* formatting
* pushdown config
* use nanos consistently, move logs back to debug level, bit more javadoc
* static terminal result batch
* javadoc for nullability of createMergeFn
* cleanup
* oops
* fix race, add docs
* spelling, remove todo, add unhandled exception log
* cleanup, revert unintended change
* another unintended change
* review stuff
* add ParallelMergeCombiningSequenceBenchmark, fixes
* hyper-threading is the enemy
* fix initial start delay, lol
* parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2
* fix those important style issues with the benchmarks code
* lazy sequence creation for benchmarks
* more benchmark comments
* stable sequence generation time
* update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs
* add jmh thread based benchmarks, cleanup some stuff
* oops
* style
* add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose
* retool benchmark to allow modeling more typical heterogenous heavy workloads
* spelling
* fix
* refactor benchmarks
* formatting
* docs
* add maxThreadStartDelay parameter to threaded benchmark
* why does catch need to be on its own line but else doesnt
* In DirectDruidClient, don't run Future cancellation listener in HTTP library executor
* extract cancelQuery as a method of DirectDruidClient
* Fix testCancel
* Add exception as the first argument to log.error
* IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals()
* Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever
* Fix tests
* More fixes
* Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code
* Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase
* More test fixes
* More test fixes
* Add a comment to VersionedIntervalTimelineTestBase
* Fix tests
* Set DataSegment.size(0) in more tests
* Specify DataSegment.size(0) in more places in tests
* Fix more tests
* Fix DruidSchemaTest
* Set DataSegment's size in more tests and benchmarks
* Fix HdfsDataSegmentPusherTest
* Doc changes addressing comments
* Extended doc for visibility
* Typo
* Typo 2
* Address comment
* fine grained capabilities
* fix tests
* configure all cards
* better detection
* update tests
* rename server to service
* node -> service
* remove console log
* better loader in data loader
* Add option lateMessageRejectionStartDate
* Use option lateMessageRejectionStartDate
* Fix tests
* Add lateMessageRejectionStartDate to kafka indexing service
* Update tests kafka indexing service
* Fix tests for KafkaSupervisorTest
* Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig
* Fix var name
* Update documentation
* Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.
* Startup scripts: verify Java 8 (exactly), improve port/java verification messages.
Java 11 compatibility isn't fully baked yet (users have reported various
issues on Java 11), so block startup with an error message unless Java 8
is found. Allow overriding this decision with an environment variable.
* Message adjustments.
* remove select query
* thanks teamcity
* oops
* oops
* add back a SelectQuery class that throws RuntimeExceptions linking to docs
* adjust text
* update docs per review
* deprecated
Since it hasn't received updates or community interest in a while, it makes sense
to de-emphasize it in the distribution and most documentation (outside of simple
mentions of its existence).
When following the instructions from Hadoop batch load tutorial in the
docs, building the Docker images fails with:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Updating nss allows the curl command for downloading Hadoop to succeed
while building the Docker image.