* Avoid many unnecessary materializations of collections of 'all segments in cluster' cardinality
* Fix DruidCoordinatorTest; Renamed DruidCoordinator.getReplicationStatus() to computeUnderReplicationCountsPerDataSourcePerTier()
* More Javadocs, typos, refactor DruidCoordinatorRuntimeParams.createAvailableSegmentsSet()
* Style
* typo
* Disable StaticPseudoFunctionalStyleMethod inspection because of too much false positives
* Fixes
* Add compaction dialog in druid console which allows users to add/edit data source compaction configuration
* Addressed naming issues; changed json input validating process
* Fixed a bug when there are so many retention rules that it becomes unclickable; fixed a bug when retention dialog is at bottom and becomes unreachable
* Using class for CSS instead of inline style
* Logic adjustments to SeekableStreamIndexTaskRunner.
A mix of simplifications and bug fixes. They are intermingled because
some of the bugs were made difficult to fix, and also more likely to
happen in the first place, by how the code was structured. I tried to
keep restructuring to a minimum. The changes are:
- Remove "initialOffsetsSnapshot", which was used to determine when to
skip start offsets. Replace it with "lastReadOffsets", which I hope
is more intuitive. (There is a connection: start offsets must be
skipped if and only if they have already been read, either by a
previous task or by a previous sequence in the same task, post-restoring.)
- Remove "isStartingSequenceOffsetsExclusive", because it should always
be the opposite of isEndOffsetExclusive. The reason is that starts are
exclusive exactly when the prior ends are inclusive: they must match
up in that way for adjacent reads to link up properly.
- Don't call "seekToStartingSequence" after the initial seek. There is
no reason to, since we expect to read continuous message streams
throughout the task. And calling it makes offset-tracking logic
trickier, so better to avoid the need for trickiness. I believe the
call being here was causing a bug in Kinesis ingestion where a
message might get double-read.
- Remove the "continue" calls in the main read loop. They are bad
because they prevent keeping currOffsets and lastReadOffsets up to
date, and prevent us from detecting that we have finished reading.
- Rework "verifyInitialRecordAndSkipExclusivePartition" into
"verifyRecordInRange". It no longer has side effects. It does a sanity
check on the message offset and also makes sure that it is not past
the endOffsets.
- Rework "assignPartitions" to replace inline comparisons with
"isRecordAlreadyRead" and "isMoreToReadBeforeReadingRecord" calls. I
believe this fixes an off-by-one error with Kinesis where the last
record would not get read. It also makes the logic easier to read.
- When doing the final publish, only adjust end offsets of the final
sequence, rather than potentially adjusting any unpublished sequence.
Adjusting sequences other than the last one is a mistake since it
will extend their endOffsets beyond what they actually read. (I'm not
sure if this was an issue in practice, since I'm not sure if real
world situations would have more than one unpublished sequence.)
- Rename "isEndSequenceOffsetsExclusive" to "isEndOffsetExclusive". It's
shorter and more clear, I think.
- Add equals/hashCode/toString methods to OrderedSequenceNumber.
Kafka test changes:
- Added a Kafka "testRestoreAtEndOffset" test to verify that restores at
the very end of the task lifecycle still work properly.
Kinesis test changes:
- Renamed "testRunOnNothing" to "testRunOnSingletonRange". I think that
given Kinesis semantics, the right behavior when start offset equals
end offset (and there aren't exclusive partitions set) is to read that
single offset. This is because they are both meant to be treated as
inclusive.
- Adjusted "testRestoreAfterPersistingSequences" to expect one more
message read. I believe the old test was wrong; it expected the task
not to read message number 5.
- Adjusted "testRunContextSequenceAheadOfStartingOffsets" to use a
checkpoint starting from 1 rather than 2. I believe the old test was
wrong here too; it was expecting the task to start reading from the
checkpointed offset, but it actually should have started reading from
one past the checkpointed offset.
- Adjusted "testIncrementalHandOffReadsThroughEndOffsets" to expect
11 messages read instead of 12. It's starting at message 0 and reading
up to 10, which should be 11 messages.
* Changes from code review.
* Throw caught exception.
* Throw caught exceptions.
* Related checkstyle rule is added to prevent further bugs.
* RuntimeException() is used instead of Throwables.propagate().
* Missing import is added.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* * Checkstyle definition is improved.
* Throwables.propagate() usages are removed.
* Checkstyle pattern is changed for only scanning "Throwables.propagate(" instead of checking lookbehind.
* Throwable is kept before firing a Runtime Exception.
* Fix unused assignments.
* Locale problem is fixed which fails tests.
* Forbidden apis definition is improved to prevent using com.ibm.icu.text.SimpleDateFormat and com.ibm.icu.text.DateFormatSymbols without using any Locale defined.
* Error message is improved.
* wip
* fix tests, stop reading if we are at end offset
* fix build
* remove restore at end offsets fix in favor of a separate PR
* use typereference from method for serialization too
Similar to other bugs fixed in #6220, but this one was missed. This bug would
cause "extraction" dimensionSpecs on the "__time" column with non-STRING
outputTypes to potentially be output as STRING sometimes instead of LONG,
causing incompletely merged results.
* write null byte in hadoop indexing for numeric dimensions
* Add test case to check output serializing null numeric dimensions
* Remove extra line
* Add @Nullable annotations
Follow-up to #7223 that fixes a doc bug (a result-level cache property
was misspelled), changes the recommended "small cluster" threshold from
20 to 5 servers, and clarifies behavior of the various caching options.
* refactor sql planning to re-use expression virtual columns when possible when constructing a DruidQuery, allowing virtual columns to be defined in filter expressions, and making resulting native druid queries more concise. also minor refactor of built-in sql aggregators to maximize code re-use
* fix it
* fix it in the right place
* fixup for base64 stuff
* fixup tests
* fix merge conflict on import order
* fixup
* fix imports
* fix tests
* review comments
* refactor
* re-arrange
* better javadoc
* fixup merge
* fixup tests
* fix accidental changes
For selectors with internal caches (like SingleScanTimeDimensionSelector,
SingleLongInputCachingExpressionColumnValueSelector, etc) we can get a perf
boost and memory usage decrease by sharing selectors.
* integration-tests: make ITParallelIndexTest still work in parallel
Follow-up to #7181, which made the default behavior for index_parallel tasks
non-parallel.
* Validate that parallel index subtasks were run
* Reduce # of max subTasks to 2
* fix typo and add more doc
* add more doc and link
* change default and add warning
* fix doc
* add test
* fix it test