Commit Graph

1623 Commits

Author SHA1 Message Date
Zoltan Haindrich 5bc32d5678 explore failure 2024-08-06 20:04:58 +00:00
Zoltan Haindrich 8a92ec6b2e Merge remote-tracking branch 'apache/master' into quidem-msq 2024-08-06 18:46:39 +00:00
Zoltan Haindrich dbe2674971 remove/cleanup/etc 2024-08-06 16:45:43 +00:00
Zoltan Haindrich 3e2b59f808 Revert "shared tries"
This reverts commit 34651e70e5.
2024-08-06 16:14:16 +00:00
Zoltan Haindrich 34651e70e5 shared tries 2024-08-06 16:14:12 +00:00
Zoltan Haindrich 449a7f3a73 move stuff 2024-08-06 16:13:56 +00:00
Zoltan Haindrich 69a39a42da Revert "connection supplies properties approach"
This reverts commit 2700557a55.
2024-08-06 15:28:58 +00:00
Zoltan Haindrich 2700557a55 connection supplies properties approach 2024-08-06 15:28:56 +00:00
Vishesh Garg 593c3b2150
Do not support non-idempotent aggregator in MSQ compaction (#16846)
This PR adds checks for verification of DataSourceCompactionConfig and CompactionTask with msq engine to ensure:

each aggregator in metricsSpec is idempotent
metricsSpec is non-null when rollup is set to true
Unit tests and existing compaction ITs have been updated accordingly.
2024-08-06 20:58:08 +05:30
Zoltan Haindrich d5b82af4a9 cleanup 2024-08-06 14:38:30 +00:00
Zoltan Haindrich 29b2b559d9 Merge remote-tracking branch 'apache/master' into quidem-msq 2024-08-06 11:42:04 +00:00
Zoltan Haindrich 6d38e8f075 clenaup 2024-08-06 11:41:54 +00:00
Akshat Jain c3aa033e14
MSQ window functions: Fix query correctness issues when using multiple workers (#16804)
This PR fixes query correctness issues for MSQ window functions when using more than 1 worker (that is, maxNumTasks > 2).

Currently, we were keeping the shuffle spec of the previous stage when we didn't have any partition columns for window stage. This PR changes it to override the shuffle spec of the previous stage to MixShuffleSpec (if we have a window function with empty over clause) so that the window stage gets a single partition to work on.

A test has been added for a query which returned incorrect results prior to this change when using more than 1 workers.
2024-08-06 16:11:18 +05:30
Zoltan Haindrich f867db774a fix pom 2024-08-06 10:31:32 +00:00
Zoltan Haindrich 61c3b16b17 checkstyle 2024-08-06 09:44:27 +00:00
Zoltan Haindrich 130252bb5e fix; move class 2024-08-06 09:20:13 +00:00
Zoltan Haindrich 34ab911399 fix intellij errors 2024-08-06 07:25:42 +00:00
Zoltan Haindrich 5e5c94d6d8 cleanup 2024-08-06 07:23:08 +00:00
Zoltan Haindrich 12cfde805e update 2024-08-06 05:50:11 +00:00
Zoltan Haindrich 181458c873 mask more 2024-08-05 14:19:52 +00:00
Zoltan Haindrich 4c722f271f Revert "this doesnt work"
This reverts commit 1063948749.
2024-08-05 14:15:06 +00:00
Zoltan Haindrich 1063948749 this doesnt work 2024-08-05 14:15:02 +00:00
Zoltan Haindrich c40474285c updates 2024-08-05 13:49:08 +00:00
Zoltan Haindrich f4af51ef7f extend/cleanup/etc 2024-08-05 13:41:53 +00:00
Zoltan Haindrich bc70443c7f update few more 2024-08-05 13:20:13 +00:00
Zoltan Haindrich 841ab462dd Merge branch 'quidem-record' into quidem-msq 2024-08-05 13:00:59 +00:00
Zoltan Haindrich 436ba18815 x 2024-08-05 12:59:19 +00:00
Zoltan Haindrich 26e3c44f4b
Quidem record (#16624)
* enables to launch a fake broker based on test resources (druidtest uri)
* could record queries into new testfiles during usage
* instead of re-purpose Calcite's Hook migrates to use DruidHook which we can add further keys
* added a quidem-ut module which could be the place for tests which could iteract with modules/etc
2024-08-05 14:58:32 +02:00
Akshat Jain 08f9ec1cae
Memoize the redundant calls to overlord in sql statements endpoint (#16839) 2024-08-05 16:52:56 +05:30
Zoltan Haindrich 70e46eadb9 update 2024-08-05 09:07:46 +00:00
Zoltan Haindrich 090f937d58 Merge branch 'quidem-record' into quidem-msq 2024-08-05 09:03:53 +00:00
Laksh Singla c84e689eb8
Don't use ComplexMetricExtractor to fetch the class of the object in field readers (#16825)
This patch fixes queries like `SELECT COUNT(DISTINCT json_col) FROM foo`
2024-08-05 14:13:56 +05:30
Zoltan Haindrich e6add9ea84 Merge remote-tracking branch 'apache/master' into quidem-record 2024-08-05 07:04:02 +00:00
Abhishek Radhakrishnan 31b43753fb
Add `druid.indexing.formats.stringMultiValueHandlingMode` system config (#16822)
This patch introduces an optional cluster configuration, druid.indexing.formats.stringMultiValueHandlingMode, allowing operators to override the default mode SORTED_SET for string dimensions. The possible values for the config are SORTED_SET, SORTED_ARRAY, or ARRAY (SORTED_SET is the default). Case insensitive values are allowed.
While this cluster property allows users to manage the multi-value handling mode for string dimension types, it's recommended to migrate to using real array types instead of MVDs.
 
This fixes a long-standing issue where compaction will honor the configured cluster wide property instead of rewriting it as the default SORTED_ARRAY always, even if the data was originally ingested with ARRAY or SORTED_SET.
2024-08-03 10:23:44 -07:00
Abhishek Radhakrishnan fe6772a101
Rename test builder `MSQTester.setExpectedSegment` (#16837)
* Rename setExpectedSegment to setExpectedSegments in MSQTestBase.

* Add expected segments for max num segments test cases.
2024-08-02 10:01:55 -07:00
zachjsh 9b731e8f0a
Kinesis Input Format for timestamp, and payload parsing (#16813)
* SQL syntax error should target USER persona

* * revert change to queryHandler and related tests, based on review comments

* * add test

* Introduce KinesisRecordEntity to support Kinesis headers in InputFormats

* * add kinesisInputFormat and Reader, and tests

* * bind KinesisInputFormat class to module

* * improve test coverage

* * remove references to kafka

* * resolve review comments

* * remove comment

* * fix grammer of comment

* * fix comment again

* * fix comment again

* * more review comments

* * add partitionKey

* * add check for same timestamp and partitionKey column name

* * fix intellij inspection
2024-08-02 08:48:44 -04:00
Akshat Jain 63ba5a4113
Fix issues with fetching task reports in SQL statements endpoint for middlemanager (#16832) 2024-08-01 23:37:15 -04:00
Akshat Jain bb4d6cc001
Add task report fields in response of SQL statements endpoint (#16808)
If the optional query parameter detail is supplied, then the response also includes the following:

 * A stages object that summarizes information about the different stages being used for query execution, such as stage number, phase, start time, duration, input and output information, processing methods, and partitioning.
* A counters object that provides details on the rows, bytes, and files processed at various stages for each worker across different channels, along with sort progress.
* A warnings object that provides details about any warnings.
2024-08-01 10:26:04 +05:30
Gian Merlino 01f6cfcbf5
MSQ worker: Support in-memory shuffles. (#16790)
* MSQ worker: Support in-memory shuffles.

This patch is a follow-up to #16168, adding worker-side support for
in-memory shuffles. Changes include:

1) Worker-side code now respects the same context parameter "maxConcurrentStages"
   that was added to the controller in #16168. The parameter remains undocumented
   for now, to give us a chance to more fully develop and test this functionality.

1) WorkerImpl is broken up into WorkerImpl, RunWorkOrder, and RunWorkOrderListener
   to improve readability.

2) WorkerImpl has a new StageOutputHolder + StageOutputReader concept, which
   abstract over memory-based or file-based stage results.

3) RunWorkOrder is updated to create in-memory stage output channels when
   instructed to.

4) ControllerResource is updated to add /doneReadingInput/, so the controller
   can tell when workers that sort, but do not gather statistics, are done reading
   their inputs.

5) WorkerMemoryParameters is updated to consider maxConcurrentStages.

Additionally, WorkerChatHandler is split into WorkerResource, so as to match
ControllerChatHandler and ControllerResource.

* Updates for static checks, test coverage.

* Fixes.

* Remove exception.

* Changes from review.

* Address static check.

* Changes from review.

* Improvements to docs and method names.

* Update comments, add test.

* Additional javadocs.

* Fix throws.

* Fix worker stopping in tests.

* Fix stuck test.
2024-07-30 18:41:24 -07:00
Zoltan Haindrich 5f6290eb54 use updated hook class 2024-07-30 16:11:57 +00:00
Zoltan Haindrich b1ab252b31 Merge branch 'quidem-record' into quidem-msq 2024-07-30 16:03:33 +00:00
Zoltan Haindrich eb2a047e4b Merge remote-tracking branch 'apache/master' into quidem-record 2024-07-30 14:24:37 +00:00
Zoltan Haindrich 78b75d3e8e move more to non-static 2024-07-30 10:42:41 +00:00
Zoltan Haindrich f6cc540368 use druidhookdispatcherr#1 2024-07-30 10:33:57 +00:00
Zoltan Haindrich 4157a8f105 add/.etc 2024-07-30 10:16:03 +00:00
Vishesh Garg e9ea243d97
Enable compaction ITs on MSQ engine (#16778)
Follow-up to #16291, this commit enables a subset of existing native compaction ITs on the MSQ engine.

In the process, the following changes have been introduced in the MSQ compaction flow:
- Populate `metricsSpec` in `CompactionState` from `querySpec` in `MSQControllerTask` instead of `dataSchema`
- Add check for pre-rolled-up segments having `AggregatorFactory` with different input and output column names
- Fix passing missing cluster-by clause in scan queries
- Add annotation of `CompactionState` to tombstone segments
2024-07-30 09:34:46 +05:30
Kashif Faraz caedeb66cd
Add API to update compaction engine (#16803)
Changes:
- Add API `/druid/coordinator/v1/config/compaction/global` to update cluster level compaction config
- Add class `CompactionConfigUpdateRequest`
- Fix bug in `CoordinatorCompactionConfig` which caused compaction engine to not be persisted.
Use json field name `engine` instead of `compactionEngine` because JSON field names must align
with the getter name.
- Update MSQ validation error messages
- Complete overhaul of `CoordinatorCompactionConfigResourceTest` to remove unnecessary mocking
and add more meaningful tests.
- Add `TuningConfigBuilder` to easily build tuning configs for tests.
- Add `DatasourceCompactionConfigBuilder`
2024-07-27 09:14:51 +05:30
Clint Wylie 14954c7eb9
serialize legacy as false for scan query for rolling downgrade/upgrade (#16793)
Fixes rolling downgrades/upgrades after #16659 by hard coding scan query "legacy":false since it is a required property during deserialization.
2024-07-25 14:51:58 +05:30
Clint Wylie 5da69a01cb
change arrayIngestMode default to array (#16789)
* change arrayIngestMode default to array

* remove arrayIngestMode flag option none

* fix space

* fix test
2024-07-25 15:09:40 +08:00
Zoltan Haindrich 8bb38a04a5 fix FIMXE 2024-07-25 03:33:33 +00:00