This PR adds checks for verification of DataSourceCompactionConfig and CompactionTask with msq engine to ensure:
each aggregator in metricsSpec is idempotent
metricsSpec is non-null when rollup is set to true
Unit tests and existing compaction ITs have been updated accordingly.
This PR fixes query correctness issues for MSQ window functions when using more than 1 worker (that is, maxNumTasks > 2).
Currently, we were keeping the shuffle spec of the previous stage when we didn't have any partition columns for window stage. This PR changes it to override the shuffle spec of the previous stage to MixShuffleSpec (if we have a window function with empty over clause) so that the window stage gets a single partition to work on.
A test has been added for a query which returned incorrect results prior to this change when using more than 1 workers.
* enables to launch a fake broker based on test resources (druidtest uri)
* could record queries into new testfiles during usage
* instead of re-purpose Calcite's Hook migrates to use DruidHook which we can add further keys
* added a quidem-ut module which could be the place for tests which could iteract with modules/etc
This patch introduces an optional cluster configuration, druid.indexing.formats.stringMultiValueHandlingMode, allowing operators to override the default mode SORTED_SET for string dimensions. The possible values for the config are SORTED_SET, SORTED_ARRAY, or ARRAY (SORTED_SET is the default). Case insensitive values are allowed.
While this cluster property allows users to manage the multi-value handling mode for string dimension types, it's recommended to migrate to using real array types instead of MVDs.
This fixes a long-standing issue where compaction will honor the configured cluster wide property instead of rewriting it as the default SORTED_ARRAY always, even if the data was originally ingested with ARRAY or SORTED_SET.
* SQL syntax error should target USER persona
* * revert change to queryHandler and related tests, based on review comments
* * add test
* Introduce KinesisRecordEntity to support Kinesis headers in InputFormats
* * add kinesisInputFormat and Reader, and tests
* * bind KinesisInputFormat class to module
* * improve test coverage
* * remove references to kafka
* * resolve review comments
* * remove comment
* * fix grammer of comment
* * fix comment again
* * fix comment again
* * more review comments
* * add partitionKey
* * add check for same timestamp and partitionKey column name
* * fix intellij inspection
If the optional query parameter detail is supplied, then the response also includes the following:
* A stages object that summarizes information about the different stages being used for query execution, such as stage number, phase, start time, duration, input and output information, processing methods, and partitioning.
* A counters object that provides details on the rows, bytes, and files processed at various stages for each worker across different channels, along with sort progress.
* A warnings object that provides details about any warnings.
* MSQ worker: Support in-memory shuffles.
This patch is a follow-up to #16168, adding worker-side support for
in-memory shuffles. Changes include:
1) Worker-side code now respects the same context parameter "maxConcurrentStages"
that was added to the controller in #16168. The parameter remains undocumented
for now, to give us a chance to more fully develop and test this functionality.
1) WorkerImpl is broken up into WorkerImpl, RunWorkOrder, and RunWorkOrderListener
to improve readability.
2) WorkerImpl has a new StageOutputHolder + StageOutputReader concept, which
abstract over memory-based or file-based stage results.
3) RunWorkOrder is updated to create in-memory stage output channels when
instructed to.
4) ControllerResource is updated to add /doneReadingInput/, so the controller
can tell when workers that sort, but do not gather statistics, are done reading
their inputs.
5) WorkerMemoryParameters is updated to consider maxConcurrentStages.
Additionally, WorkerChatHandler is split into WorkerResource, so as to match
ControllerChatHandler and ControllerResource.
* Updates for static checks, test coverage.
* Fixes.
* Remove exception.
* Changes from review.
* Address static check.
* Changes from review.
* Improvements to docs and method names.
* Update comments, add test.
* Additional javadocs.
* Fix throws.
* Fix worker stopping in tests.
* Fix stuck test.
Follow-up to #16291, this commit enables a subset of existing native compaction ITs on the MSQ engine.
In the process, the following changes have been introduced in the MSQ compaction flow:
- Populate `metricsSpec` in `CompactionState` from `querySpec` in `MSQControllerTask` instead of `dataSchema`
- Add check for pre-rolled-up segments having `AggregatorFactory` with different input and output column names
- Fix passing missing cluster-by clause in scan queries
- Add annotation of `CompactionState` to tombstone segments
Changes:
- Add API `/druid/coordinator/v1/config/compaction/global` to update cluster level compaction config
- Add class `CompactionConfigUpdateRequest`
- Fix bug in `CoordinatorCompactionConfig` which caused compaction engine to not be persisted.
Use json field name `engine` instead of `compactionEngine` because JSON field names must align
with the getter name.
- Update MSQ validation error messages
- Complete overhaul of `CoordinatorCompactionConfigResourceTest` to remove unnecessary mocking
and add more meaningful tests.
- Add `TuningConfigBuilder` to easily build tuning configs for tests.
- Add `DatasourceCompactionConfigBuilder`