Currently, the CNF conversion of a filter is unbounded, which means that it can create as many filters as possible thereby also leading to OOMs in historical heap. We should throw an error or disable CNF conversion if the filter count starts getting out of hand. There are ways to do CNF conversion with linear increase in filters as well but that has been left out of the scope of this change since those algorithms add new variables in the predicate - which can be contentious.
* Tombstone support for replace functionality
* A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec
* Update compaction test to match replace behavior
* Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker.
* Style plus simple queriableindex test
* Add segment cache loader tombstone test
* Add more tests
* Add a method to the LogicalSegment to test whether it has any data
* Test filter with some empty logical segments
* Refactor more compaction/dropexisting tests
* Code coverage
* Support for all empty segments
* Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them.
* Fix null ptr when segment does not have a queriable index
* Add support for empty replace interval (all input data has been filtered out)
* Fixed coverage & style
* Find tombstone versions from lock versions
* Test failures & style
* Interner was making this fail since the two segments were consider equal due to their id's being equal
* Cleanup tombstone version code
* Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used
* Reject replace spec when input intervals are empty
* Documentation
* Style and unit test
* Restore test code deleted by mistake
* Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added.
* Unused imports. Dead code. Test coverage.
* Coverage.
* Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments.
* Fix OmniKiller + more test coverage.
* Tombstones are now marked using a shard spec
* Drop a segment factory.json in the segment cache for tombstones
* Style
* Style + coverage
* style
* Add TombstoneLoadSpec.class to mapper in test
* Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java
Typo
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* Update docs/configuration/index.md
Missing
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* Typo
* Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold.
* Range does not work with multi-dim
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* add topn heap optimization when string is dictionary encoded, but not uniquely
* use array instead
* is same
* fix javadoc
* fix
* Update StringTopNColumnAggregatesProcessor.java
* GroupBy: Cap dictionary-building selector memory usage.
New context parameter "maxSelectorDictionarySize" controls when the
per-segment processing code should return early and trigger a trip
to the merge buffer.
Includes:
- Vectorized and nonvectorized implementations.
- Adjustments to GroupByQueryRunnerTest to exercise this code in
the v2SmallDictionary suite. (Both the selector dictionary and
the merging dictionary will be small in that suite.)
- Tests for the new config parameter.
* Fix issues from tests.
* Add "pre-existing" to dictionary.
* Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods.
* Adjustments from review comments.
* Always reopen stream in FileUtils.copyLarge, RetryingInputStream.
When an InputStream throws an exception from one of its read methods,
we should assume it's bad and reopen it.
The main changes here are:
- In FileUtils.copyLarge, replace InputStream with InputStreamSupplier.
- In RetryingInputStream, collapse retryCondition and resetCondition
into a single condition. Also, make it required, since every usage
is passing in a specific condition anyway.
* Test fixes.
* Fix read impl.
There aren't any changes in this patch that improve Java 11
compatibility; these changes have already been done separately. This
patch merely updates documentation and explicit Java version checks.
The log message adjustments in DruidProcessingConfig are there to make
things a little nicer when running in Java 11, where we can't measure
direct memory _directly_, and so we may auto-size processing buffers
incorrectly.
* add a new query laning metrics to visualize lane assignment
* fixes :spotbugs check
* Update docs/operations/metrics.md
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* perf: improve ZkWorker task lookup performance
This improves the performance of the ZkWorker task lookup loop by
eliminating repeat calls to getRunningTasks() in toImmutable(),
and reduces the work performed in isRunningTask() to stream-parse
the id field instead of entire JSON blob.
Row stats are reported for single phase tasks in the `/liveReports` and `/rowStats` APIs
and are also a part of the overall task report. This commit adds changes to report
row stats for multiphase tasks too.
Changes:
- Add `TaskReport` in `GeneratedPartitionsReport` generated during hash and range partitioning
- Collect the reports for `index_generate` phase in `ParallelIndexSupervisorTask`
These changes are to use the latest datasketches-java-3.1.0 and also to restore support for quantile and HLL4 sketches to be able to grow larger than a given buffer in a buffer aggregator and move to heap in rare cases. This was discussed in #11544.
Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader)
Following changes are addressed in this PR:
A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next().
IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number").
TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error.
This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).
Mockito now supports all our needs and plays much better with recent Java versions.
Migrating to Mockito also simplifies running the kind of tests that required PowerMock in the past.
* replace all uses of powermock with mockito-inline
* upgrade mockito to 4.3.1 and fix use of deprecated methods
* import mockito bom to align all our mockito dependencies
* add powermock to forbidden-apis to avoid accidentally reintroducing it in the future
* upgrade Airline to Airline 2
https://github.com/airlift/airline is no longer maintained, updating to
https://github.com/rvesse/airline (Airline 2) to use an actively
maintained version, while minimizing breaking changes.
Note, this is a backwards incompatible change, and extensions relying on
the CliCommandCreator extension point will also need to be updated.
* fix dependency checks where jakarta.inject is now resolved first instead
of javax.inject, due to Airline 2 using jakarta
* remove use of mocks for ServiceMetricEvent
* simplify KafkaEmitterTests by moving to Mockito
* speed up KafkaEmitterTest by adjusting reporting frequency in tests
* remove unnecessary easymock and JUnitParams dependencies
* perf: indexing: Introduce a bulk getValuesInto function to read values in bulk
If large number of values are required from DimensionDictionary
during indexing, fetch them all in a single lock/unlock instead of
lock/unlock each individual item.
* refactor: rename key to keys in function args
* fix: check explicitly that argument length on arrays match
* refactor: getValuesInto renamed to getValues, now creates and returns a new T[] rather than filling
Azure Blob storage has multiple modes of authentication. One of them is Shared access resource
. This is very useful in cases when we do not want to add the account key in the druid properties .
Problem:
- When a kinesis stream is resharded, the original shards are closed.
Any intermediate shard created in the process is eventually closed as well.
- If a shard is closed before any record is put into it, it can be safely ignored for ingestion.
- It is expensive to determine if a closed shard is empty, since it requires a call to the Kinesis cluster.
Changes:
- Maintain a cache of closed empty and closed non-empty shards in `KinesisSupervisor`
- Add config `skipIngorableShards` to `KinesisSupervisorTuningConfig`
- The caches are used and updated only when `skipIgnorableShards = true`
Problem:
When using a `CachingCostBalancerStrategy` with segments of granularity ALL,
no segment gets loaded.
- With granularity ALL, segments of eternity interval are created which have
`start = Long.MIN_VALUE / 2` and `end = Long.MAX_VALUE / 2`.
- For cost calculation in the balancer strategy, `toLocalInterval()` method is invoked where
`Long.MIN_VALUE / 2` or `Long.MAX_VALUE / 2` cause an overflow thus resulting in no overlap.
- The strategy is unable to find any eligible server for loading a given segment.
Fix:
- Reverse order of operations to divide by `MILLIS_FACTOR` (~10^8) first,
then do the subtraction to prevent Long overflow.
As part of #12078 one of the followup's was to have a specific config which does not allow accidental unnesting of multi value columns if such columns become part of the grouping key.
Added a config groupByEnableMultiValueUnnesting which can be set in the query context.
The default value of groupByEnableMultiValueUnnesting is true, therefore it does not change the current engine behavior.
If groupByEnableMultiValueUnnesting is set to false, the query will fail if it encounters a multi-value column in the grouping key.
* Moving in filter check to broker
* Adding more unit tests, making error message meaningful
* Spelling and doc changes
* Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100
* Removing upper limit of 100, updated docs
* Making documentation more meaningful
* Moving check outside to PlannerConfig, updating test cases and adding back max limit
* Updated with some additional code comments
* Missed removing one line during the checkin
* Addressing doc changes and one forbidden API correction
* Final doc change
* Adding a speling exception, correcting a testcase
* Reading entire filter tree to address combinations of ANDs and ORs
* Specifying in docs that, this case works only for ORs
* Revert "Reading entire filter tree to address combinations of ANDs and ORs"
This reverts commit 81ca8f8496.
* Covering a class cast exception and updating docs
* Counting changed
Co-authored-by: Jihoon Son <jihoonson@apache.org>
In extreme cases where many parallel indexing jobs are submitted together, it is possible
that the `ParallelIndexSupervisorTasks` take up all slots leaving no slot to schedule
their own sub-tasks thus stalling progress of all the indexing jobs.
Key changes:
- Add config `druid.indexer.runner.parallelIndexTaskSlotRatio` to limit the task slots
for `ParallelIndexSupervisorTasks` per worker
- `ratio = 1` implies supervisor tasks can use all slots on a worker if needed (default behavior)
- `ratio = 0` implies supervisor tasks can not use any slot on a worker
(actually, at least 1 slot is always available to ensure progress of parallel indexing jobs)
- `ImmutableWorkerInfo.canRunTask()`
- `WorkerHolder`, `ZkWorker`, `WorkerSelectUtils`
This fixes intermittent, spurious failures that we've observed in
the Kinesis sharding integration tests due to Kinesis taking
longer than the code expected to start a sharding operation. The
method that's changed is part of the integration test suite and
only used by the test cases that we've seen are flaky.
Prior to this change, the tests expected a sharding operation to
start in 9 seconds (30 retries * 300ms delay/retry). This change
bumps the number of retries to 100, giving Kinesis 30 seconds to
start the sharding.
This PR also makes a small, clarifying change to the condition
used to determine if sharding has started. Instead of checking if
the number of shards has increased (which was technically correct
even if the test is reducing the number of shards due to a Kinesis
implementation detail), we now just check if the shard count has
changed.
* refactor and link fixes
* add sql docs to left nav
* code format for needle
* updated web console script
* link fixes
* update earliest/latest functions
* edits for grammar and style
* more link fixes
* another link
* update with #12226
* update .spelling file
#12163 makes PARTITIONED BY a required clause in INSERT queries. While this is required, if a user accidentally omits the clause, it emits a JavaCC/Calcite error, since it's syntactically incorrect. The error message is cryptic. Since it's a custom clause, this PR aims to make the clause optional on the syntactic side, but move the validation to DruidSqlInsert where we can surface a friendlier error.
When `ParallelIndexSupervisorTask` converts `BucketNumberedShardSpecs`
to corresponding `BuildingShardSpecs`, the bucketId order gets lost.
Particularly, for range partitioning, this results in the partitionIds not being in the same order
as increasing partition boundaries.
Changes
- Refactor `ParallelIndexSupervisorTask.groupGenericPartitionLocationsPerPartition()`
* rework sql planner expression and virtual column handling
* simplify a bit
* add back and deprecate old methods, more tests, fix multi-value string coercion bug and associated tests
* spotbugs
* fix bugs with multi-value string array expression handling
* javadocs and adjust test
* better
* fix tests
* working
* Lazily load segmentKillers, segmentMovers, and segmentArchivers
* more tests
* test-jar plugin
* more coverage
* lazy client
* clean up changes
* checkstyle
* i did not change the branch condition
* adjust failure rate to run tests faster
* javadocs
* checkstyle