Commit Graph

14460 Commits

Author SHA1 Message Date
Zoltan Haindrich 449a7f3a73 move stuff 2024-08-06 16:13:56 +00:00
Zoltan Haindrich 69a39a42da Revert "connection supplies properties approach"
This reverts commit 2700557a55.
2024-08-06 15:28:58 +00:00
Zoltan Haindrich 2700557a55 connection supplies properties approach 2024-08-06 15:28:56 +00:00
Zoltan Haindrich 42cc5d62a8 apidoc 2024-08-06 15:28:36 +00:00
Zoltan Haindrich d5b82af4a9 cleanup 2024-08-06 14:38:30 +00:00
Zoltan Haindrich 21016a28a2 remove ifs 2024-08-06 14:21:28 +00:00
Zoltan Haindrich 3f507d8648 ignore duplicate queries 2024-08-06 14:20:42 +00:00
Zoltan Haindrich 19fe5867b1 fix one fixme 2024-08-06 14:13:50 +00:00
Zoltan Haindrich 941c39aae5 it does work 2024-08-06 13:59:13 +00:00
Zoltan Haindrich 00ee182d75 add fixme/etc 2024-08-06 13:55:23 +00:00
Zoltan Haindrich 91d5d14bf1 remove builtintypes 2024-08-06 11:44:02 +00:00
Zoltan Haindrich 29b2b559d9 Merge remote-tracking branch 'apache/master' into quidem-msq 2024-08-06 11:42:04 +00:00
Zoltan Haindrich 6d38e8f075 clenaup 2024-08-06 11:41:54 +00:00
Akshat Jain c3aa033e14
MSQ window functions: Fix query correctness issues when using multiple workers (#16804)
This PR fixes query correctness issues for MSQ window functions when using more than 1 worker (that is, maxNumTasks > 2).

Currently, we were keeping the shuffle spec of the previous stage when we didn't have any partition columns for window stage. This PR changes it to override the shuffle spec of the previous stage to MixShuffleSpec (if we have a window function with empty over clause) so that the window stage gets a single partition to work on.

A test has been added for a query which returned incorrect results prior to this change when using more than 1 workers.
2024-08-06 16:11:18 +05:30
Zoltan Haindrich f867db774a fix pom 2024-08-06 10:31:32 +00:00
Zoltan Haindrich 61c3b16b17 checkstyle 2024-08-06 09:44:27 +00:00
Zoltan Haindrich 130252bb5e fix; move class 2024-08-06 09:20:13 +00:00
Zoltan Haindrich 3b77784e6e add test 2024-08-06 08:13:25 +00:00
Sree Charan Manamala ed6b547481
Handle default bounds correctly in WINDOW clause (#16833)
When a window is defined as WINDOW W AS <DEF> and using a syntax of (PARTITION BY col1 ORDER BY col2 ROWS x PRECEDING), we would need to default the other bound to CURRENT ROW

We already have implemented this earlier, but when defined as WINDOW W AS <DEF>, Calcite takes a different route to validate the window.
2024-08-06 09:58:44 +02:00
Zoltan Haindrich 22d8a4b872 add compositemodule 2024-08-06 07:37:24 +00:00
Zoltan Haindrich 34ab911399 fix intellij errors 2024-08-06 07:25:42 +00:00
Zoltan Haindrich 5e5c94d6d8 cleanup 2024-08-06 07:23:08 +00:00
Zoltan Haindrich 93457c6b3e cleanup 2024-08-06 05:50:39 +00:00
Zoltan Haindrich 12cfde805e update 2024-08-06 05:50:11 +00:00
Vadim Ogievetsky aeace28ccb
Web console: Add columnMapping information to the Explain dialog (#16598)
* Add columnMapping information in the Explain dialog

* use arrow char
2024-08-05 13:21:51 -07:00
Alberic Liu 461727de12
Fix Druid Console cannot open submit supervisor dialog (#16736) 2024-08-05 10:44:11 -07:00
Zoltan Haindrich c8f9147810 minor fixes 2024-08-05 14:23:54 +00:00
Zoltan Haindrich 6d339d1706 rename class 2024-08-05 14:21:25 +00:00
Zoltan Haindrich 181458c873 mask more 2024-08-05 14:19:52 +00:00
Zoltan Haindrich 4c722f271f Revert "this doesnt work"
This reverts commit 1063948749.
2024-08-05 14:15:06 +00:00
Zoltan Haindrich 1063948749 this doesnt work 2024-08-05 14:15:02 +00:00
Zoltan Haindrich c40474285c updates 2024-08-05 13:49:08 +00:00
Zoltan Haindrich f4af51ef7f extend/cleanup/etc 2024-08-05 13:41:53 +00:00
Zoltan Haindrich bc70443c7f update few more 2024-08-05 13:20:13 +00:00
Zoltan Haindrich 841ab462dd Merge branch 'quidem-record' into quidem-msq 2024-08-05 13:00:59 +00:00
Zoltan Haindrich fda0d63e44 Merge remote-tracking branch 'apache/master' into quidem-record 2024-08-05 13:00:50 +00:00
Zoltan Haindrich 929e68c11a undo unrelated 2024-08-05 12:59:50 +00:00
Zoltan Haindrich 436ba18815 x 2024-08-05 12:59:19 +00:00
Zoltan Haindrich 26e3c44f4b
Quidem record (#16624)
* enables to launch a fake broker based on test resources (druidtest uri)
* could record queries into new testfiles during usage
* instead of re-purpose Calcite's Hook migrates to use DruidHook which we can add further keys
* added a quidem-ut module which could be the place for tests which could iteract with modules/etc
2024-08-05 14:58:32 +02:00
Akshat Jain 08f9ec1cae
Memoize the redundant calls to overlord in sql statements endpoint (#16839) 2024-08-05 16:52:56 +05:30
Rushikesh Bankar c8323d1a7c
Add indexer task success and failure metrics (#16829)
This PR adds indexer-level task metrics-

"indexer/task/failed/count"
"indexer/task/success/count"

the current "worker/task/completed/count" metric shows all the tasks completed irrespective of success or failure status so these metrics would help us get more visibility into the status of the completed tasks
2024-08-05 16:21:27 +05:30
Zoltan Haindrich 70e46eadb9 update 2024-08-05 09:07:46 +00:00
Zoltan Haindrich 090f937d58 Merge branch 'quidem-record' into quidem-msq 2024-08-05 09:03:53 +00:00
Zoltan Haindrich bb23ace518 builtintypes instead nesteddata 2024-08-05 08:59:48 +00:00
Laksh Singla c84e689eb8
Don't use ComplexMetricExtractor to fetch the class of the object in field readers (#16825)
This patch fixes queries like `SELECT COUNT(DISTINCT json_col) FROM foo`
2024-08-05 14:13:56 +05:30
Laksh Singla 0411c4e67e
Add metrics for number of rows/bytes materialized while running subqueries (#16835)
subquery/rows and subquery/bytes metrics have been added, which indicate the size of the results materialized on the heap.
2024-08-05 14:13:20 +05:30
Zoltan Haindrich e6add9ea84 Merge remote-tracking branch 'apache/master' into quidem-record 2024-08-05 07:04:02 +00:00
Sree Charan Manamala c7eacd079e
fallback SQL IN filter to expression filter when VirtualColumnRegistry is null (#16836) 2024-08-05 11:27:51 +05:30
Abhishek Radhakrishnan 31b43753fb
Add `druid.indexing.formats.stringMultiValueHandlingMode` system config (#16822)
This patch introduces an optional cluster configuration, druid.indexing.formats.stringMultiValueHandlingMode, allowing operators to override the default mode SORTED_SET for string dimensions. The possible values for the config are SORTED_SET, SORTED_ARRAY, or ARRAY (SORTED_SET is the default). Case insensitive values are allowed.
While this cluster property allows users to manage the multi-value handling mode for string dimension types, it's recommended to migrate to using real array types instead of MVDs.
 
This fixes a long-standing issue where compaction will honor the configured cluster wide property instead of rewriting it as the default SORTED_ARRAY always, even if the data was originally ingested with ARRAY or SORTED_SET.
2024-08-03 10:23:44 -07:00
Kashif Faraz 9dc2569f22
Track and emit segment loading rate for HttpLoadQueuePeon on Coordinator (#16691)
Design:
The loading rate is computed as a moving average of at least the last 10 GiB of successful segment loads.
To account for multiple loading threads on a server, we use the concept of a batch to track load times.
A batch is a set of segments added by the coordinator to the load queue of a server in one go.

Computation:
batchDurationMillis = t(load queue becomes empty) - t(first load request in batch is sent to server)
batchBytes = total bytes successfully loaded in batch
avg loading rate in batch (kbps) = (8 * batchBytes) / batchDurationMillis
overall avg loading rate (kbps) = (8 * sumOverWindow(batchBytes)) / sumOverWindow(batchDurationMillis)

Changes:
- Add `LoadingRateTracker` which computes a moving average load rate based on
the last few GBs of successful segment loads.
- Emit metric `segment/loading/rateKbps` from the Coordinator. In the future, we may
also consider emitting this metric from the historicals themselves.
- Add `expectedLoadTimeMillis` to response of API `/druid/coordinator/v1/loadQueue?simple`
2024-08-03 13:14:21 +05:30