Commit Graph

1685 Commits

Author SHA1 Message Date
Vishesh Garg 1b9a6dde9f
Fix compilation error for MSQCompactionRunnerTest (#17516) 2024-11-27 12:46:30 +01:00
Vishesh Garg 5333c53d71
Support non time order in MSQ compaction (#17318)
This patch supports sorting segments by non-time columns (added in #16849) to MSQ compaction.
Specifically, if `forceSegmentSortByTime` is set in the data schema, either via the user-supplied
compaction config or in the inferred schema, the following steps are taken:
- Skip adding `__time` explicitly as the first column to the dimension schema since it already comes
as part of the schema
- Ensure column mappings propagate `__time` in the order specified by the schema
- Set `forceSegmentSortByTime` in the MSQ context.
2024-11-27 13:26:10 +05:30
Akshat Jain dd46c7722d
Remove pre-java-11 profile (#17511)
We have removed support for Java 8 in #17466. This PR removes an unused profile pre-java-11 which activated for JDK < 11.
2024-11-26 08:43:20 +01:00
Zoltan Haindrich 20aea29a51
Rename d1/d2 columns in tests (#17471) 2024-11-22 14:58:56 +01:00
Rishabh Singh 74422b58f5
Emit disk spill and merge buffer utilisation metrics for GroupBy queries (#17360)
This change is to emit following metrics as part of GroupByStatsMonitor monitor,
mergeBuffer/used -> Number of merge buffers used.
mergeBuffer/acquisitionTimeNs -> Total time required to acquire merge buffer.
mergeBuffer/acquisition -> Number of queries that acquired a batch of merge buffers.
groupBy/spilledQueries -> Number of queries that spilled onto the disk.
groupBy/spilledBytes-> Spilled bytes on the disk.
groupBy/mergeDictionarySize -> Size of the merging dictionary.
2024-11-22 14:22:03 +05:30
Adarsh Sanjeev 2726c6f388
Minor refactors to processing
Some refactors across druid to clean up the code and add utility functions where required.
2024-11-21 15:37:55 +05:30
Akshat Jain 17215cd677
Remove support for Java 8 (#17466)
All JDK 8 based CI checks have been removed.
    Images used in Dockerfile(s) have been updated to Java 17 based images.
    Documentation has been updated accordingly.
2024-11-21 15:33:08 +05:30
Adithya Chakilam 6f436301be
supervisor: make rejection periods work with stopTasksCount (#17442)
* kafka-indexing: Report consumer io time

* commit

* backward

* tests

* remove unwanted changes

* comments

* comments

* coverage

* change name

* fixes

* fixes

* comments
2024-11-18 13:12:24 -08:00
Zoltan Haindrich f296102f05
ScanQuery should not ignore columnTypes in equals/hashCode (#17463)
* ScanQuery: equals/hashCode/toString
* DruidQuery: changes of Align ScanQuery column order with its desired signature #17457
* ScanQueryTest: add equalsverifer test
2024-11-12 14:26:59 +05:30
Akshat Jain c571e6905d
Refactor WindowOperatorQueryKit to use WindowStage class for representing different window stages (#17158) 2024-11-12 14:18:16 +05:30
Akshat Jain 3f56b57c7e
MSQ WF: Pass a flag from broker to determine operator chain transformation (#17443) 2024-11-12 09:28:28 +05:30
Akshat Jain 73cbce9109
WindowOperatorQueryFrameProcessorFactory: Pass QueryContext instead of WindowOperatorQuery to WindowOperatorQueryFrameProcessor (#17405)
* WindowOperatorQueryKit: Pass QueryContext instead of WindowOperatorQuery to subsequent layers

* Add serializer for QueryContext class

* Revert changes of WindowOperatorQueryFrameProcessorFactory json param

* Fix checkstyle

* Address review comment: Remove older method in favor of calling new method inline
2024-11-07 11:29:49 +05:30
Zoltan Haindrich 2eac8318f8
Support Union in Decoupled planning (#17354)
* introduces `UnionQuery`
* some changes to enable a `UnionQuery` to have multiple input datasources
* `UnionQuery` execution is driven by the `QueryLogic` - which could later enable to reduce some complexity in `ClientQuerySegmentWalker`
* to run the subqueries of `UnionQuery` there was a need to access the `conglomerate` from the `Runner`; to enable that some refactors were done
* renamed `UnionQueryRunner` to `UnionDataSourceQueryRunner`
* `QueryRunnerFactoryConglomerate` have taken the place of `QueryToolChestWarehouse` which shaves of some unnecessary things here and there
* small cleanup/refactors
2024-11-05 16:58:57 +01:00
Tom e4cdbca23c
make planner errors be user persona (#17437)
Change the persona for errors within the planner from Admin to User. The ADMIN persona is meant to be "a persona who is interacting with admin APIs and understands Druid query concepts". This isn't an admin API, it's a query API. Low quality error messages being returned to the correct audience is better than hiding all error messages.

The errors that can be returned back can be user solvable, and other times requires a druid expert. But the errors do not leak information that should only be seen by more expert/privileged personas.

The original ADMIN persona showed some reticence to tag low-quality error messages with a USER persona. but it really does seem user-directed to me so USER to me would make sense.
2024-11-04 10:48:35 -08:00
Akshat Jain 21e7e5cddd
Add benchmark suite for MSQ window functions (#17377)
* Add benchmark suite for MSQ window functions

* Fix inspection checks

* Address review comment: Rename method
2024-10-30 11:32:28 +05:30
Akshat Jain 63c91ad813
Fix backward compatibility issues in WindowOperatorQueryFrameProcessorFactory and WindowOperatorQueryFrameProcessor (#17433) 2024-10-30 11:32:02 +05:30
Adarsh Sanjeev b7c661b801
Make tempStorageDirectory configuration optional and rely on task dir instead (#17015)
Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir.

Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set.

Please note that preference would be given to the user configured, druid.*.storage.temp*Dir, on the tasks. If that is not configured, we then use the configured temporary directory.

Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.
2024-10-29 13:36:59 +05:30
Gian Merlino 446a8f466f
Update errorprone, mockito, jacoco, checkerframework. (#17414)
* Update errorprone, mockito, jacoco, checkerframework.

This patch updates various build and test dependencies, to see if they
cause unit tests on JDK 21 to behave more reliably.

* Update licenses, tests.

* Remove assertEquals.

* Repair two tests.

* Update some more tests.
2024-10-28 11:34:03 -07:00
Suraj Goel 7306d280cc
Migrate jaxb bind dependency to jakarta (#17370)
- Migrated from javax.xml.bind 2.3.1  to jakarta.xml.bind 2.3.3.
- Minor version is modified to avoid any breaking changes.
2024-10-26 21:24:17 -07:00
Akshat Jain fe0f4150c9
MSQ ingestion: Improve error message on encountering non-long timestamp column (#17411)
This PR improves the error message during MSQ ingestion if we encounter a non-long timestamp column.
2024-10-25 15:02:32 +05:30
Akshat Jain 1e96c85b38
WindowOperatorQueryFrameProcessor: Avoid writing multiple frames to output channel in runIncrementally() (#17373)
WindowOperatorQueryFrameProcessor: Avoid writing multiple frames to output channel in runIncrementally()
2024-10-23 10:34:37 +05:30
Gian Merlino 60daddedf8
SeekableStreamSupervisor: Use workerExec as the client connectExec. (#17394)
* SeekableStreamSupervisor: Use workerExec as the client connectExec.

This patch uses the already-existing per-supervisor workerExec as the
connectExec for task clients, rather than using the process-wide default
ServiceClientFactory pool.

This helps prevent callbacks from backlogging on the process-wide pool.
It's especially useful for retries, where callbacks may need to establish
new TCP connections or perform TLS handshakes.

* Fix compilation, tests.

* Fix style.
2024-10-22 20:21:21 -07:00
Vishesh Garg 5da9949992
Fail MSQ compaction if multi-valued partition dimensions are found (#17344)
MSQ currently supports only single-valued string dimensions as partition keys.
This patch adds a check to ensure that partition keys are single-valued in case
this info is available by virtue of segment download for schema inference.

During compaction, if MSQ finds multi-valued dimensions (MVDs) declared as part
of `range` partitionsSpec, it switches partitioning type to dynamic, ending up in
repeated compactions of the same interval. To avoid this scenario, the segment
download logic is also updated to always download segments if info on multi-valued
dimensions is required.
2024-10-19 13:33:33 +05:30
Abhishek Radhakrishnan 9a16d4e219
Move SqlTaskStatus and SqlTaskStausTest from msq module to sql module. (#17380)
- This is a non-functional change that moves SqlTaskStatus and its unit test SqlTaskStatusTest from the msq module to the sql module to help class reuse in other places.
- This refactor is extracted from this PR to facilitate easier review.
- Fix a minor spacing issue in the TaskStartTimeoutFault error message.
2024-10-18 14:39:01 -07:00
Laksh Singla 5b09329479
Fixes an issue with AppendableMemory that can cause MSQ jobs to fail (#17369) 2024-10-18 09:05:53 +05:30
Adithya Chakilam e834e49290
supervisor/autoscaler: Fix clearing of collected lags on skipped scale actions (#17356)
* superviosr/autoscaler: Fix clearing of collected lags on skipped scale actions

* comments

* supervisor/autoscaler: Skip scaling when partitions are less than minTaskCount (#17335)

* Fix pip installation after ubuntu upgrade (#17358)

* fix tests

---------

Co-authored-by: Pranav <pranavbhole@gmail.com>
2024-10-17 11:05:16 -07:00
Akshat Jain 8c52be81d3
Fix postgres metadata storage warning logs because of tablename causing issues (#17351) 2024-10-17 15:48:22 +05:30
Akshat Jain 450fb0147b
Add GlueingPartitioningOperator + Corresponding changes in window function layer to consume it for MSQ (#17038)
*    GlueingPartitioningOperator: It continuously receives data, and outputs batches of partitioned RACs. It maintains a last-partitioning-boundary of the last-pushed-RAC, and attempts to glue it with the next RAC it receives, ensuring that partitions are handled correctly, even across multiple RACs. You can check GlueingPartitioningOperatorTest for some good examples of the "glueing" work.
*    PartitionSortOperator: It sorts rows inside partitioned RACs, on the sort columns. The input RACs it receives are expected to be "complete / separate" partitions of data.
2024-10-17 10:54:52 +05:30
TessaIO a9f582711e
Fix loading lookup extension (#17212)
We introduce the option to iterate over fetched data from the dataFetcher for loadingLookups in the lookups-cached-single extension. Also, added the handling of a use case where the data exists in Druid but not in the actual data fetcher, which is in our use-case JDBC Data fetcher, where the value returned is null.

Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
2024-10-16 07:28:32 -07:00
Gian Merlino b287b219a8
MSQ: Include stageId, workerNumber in processing thread names. (#17324)
* MSQ: Include stageId, workerNumber in processing thread names.

Helps identify which query was running in a thread dump.

* s/dart/msq/
2024-10-11 08:37:15 -07:00
Gian Merlino a0c29f8bbb
MSQ WorkerResource: Fix timeout handler for httpGetChannelData. (#17328)
The timeout handler should fire if the response has not been handled yet
(i.e. if responseResolved was previously false). However, it erroneously
fires only if the response *was* handled. This causes HTTP 500 errors if
the timeout actually does fire. The timeout is 30 seconds, which can be
hit during pipelined queries, if an earlier stage of the query hasn't
produced its first frame within 30 seconds.

This fixes a regression introduced in #17140.
2024-10-11 16:29:04 +05:30
Karan Kumar 034bb9dbea
Removing enable windowing from MSQ tests. (#17276) 2024-10-11 05:33:27 +02:00
Clint Wylie a6236c3d15
add substituteCombiningFactory implementations for datasketches aggs (#17314)
Follow up to #17214, adds implementations for substituteCombiningFactory so that more
datasketches aggs can match projections, along with some projections tests for datasketches.
2024-10-10 16:14:06 +05:30
Gian Merlino 074944e02c
Dart: Only use historicals as workers. (#17319)
Only historicals load the Dart worker modules. Other types of servers in
the server view (such as realtime tasks) should not be included.
2024-10-10 13:47:58 +05:30
Gian Merlino 4092f3fe47
MSQ: Call "onQueryComplete" after the query is closed. (#17313)
This fixes a concurrency issue where, for failed queries, "onQueryComplete"
could be called concurrently with "onResultsStart" or "onResultRow". Fully
closing the controller ensures that the result reader is no longer active,
which eliminates the race.
2024-10-10 10:44:44 +05:30
Gian Merlino b27712933e
MSQ: Use leaf worker count for stages that have any leaf inputs. (#17312)
Previously, the leaf worker count was used for stages that have *no*
stage inputs. It should actually be used for stages that have *any*
non-broadcast, non-stage inputs.

This fixes a bug with broadcast joins. In a broadcast join, the stage has
both a table and a broadcast stage as input. Previously, it would be planned
using the non-leaf worker count. It should actually be planned using the
leaf worker count.
2024-10-10 10:44:31 +05:30
Gian Merlino baa16f30f6
DartWorkerContext: Return the correct workerId(). (#17280)
Prior to this patch, the workerId() method did not actually return
the worker ID. It returned some other string that had similar information,
but was different.

This caused the /druid/dart-worker/workers API, to return an internal
server error. The API is useful for debugging, although it is not used
during actual queries.
2024-10-08 09:52:55 -07:00
Gian Merlino 152330c5a8
WorkerManager: Correct javadoc for "stop". (#17279)
The javadoc had a factual error: Dart's implementation does not in
fact always return immediately.
2024-10-08 15:49:43 +05:30
Gian Merlino 0a279e634a
DartSqlResource: Return HTTP 202 on cancellation even if no such query. (#17278)
Return HTTP 202 (Accepted) on cancellation, even if the requested query
ID was not found.

The main reason for this is that when the Router broadcasts DELETE requests
to all Brokers, it returns the response from one of them randomly. If we
return 404 when a query ID isn't found, then the Router randomly returns 404s
even when the query really was found and canceled.

This is also arguably still correct behavior. The cancellation request
*was* accepted, it just won't do anything because the query was not in
fact running.
2024-10-08 15:49:34 +05:30
Gian Merlino 01baf99148
DartWorkerModule: Replace en dash with regular dash. (#17281)
Due to a typo, the thread name of the worker executor used an en dash (–)
rather than a regular hyphen (-). This was unintentional, and makes it
difficult to search for in thread dumps.
2024-10-08 15:48:10 +05:30
Gian Merlino 2309aa7bdf
DartSqlResource: Add controllerHost to GetQueriesResponse. (#17283)
This helps find the specific Broker that is executing a query.
2024-10-08 15:47:32 +05:30
Gian Merlino 9921ac1b19
DartSqlResource: Sort queries by start time. (#17282)
* DartSqlResource: Sort queries by start time.

This keeps the list of queries returned by the API in a consistent order.

* Fix test.
2024-10-08 15:47:21 +05:30
Gian Merlino 06bbdb38ce
MSQ: Allow for worker gaps. (#17277)
In a Dart query, all Historicals are given worker IDs, but not all of them
are going to actually be started or receive work orders. This can create gaps
in the set of workers. For example, workers 1 and 3 could have work assigned
while workers 0 and 2 do not.

This patch updates ControllerStageTracker and WorkerInputs to handle such
gaps, by using the set of actual worker numbers, rather than 0..workerCount,
in various places.
2024-10-08 15:07:57 +05:30
Vishesh Garg 7e35e50052
Fix issues with MSQ Compaction (#17250)
The patch makes the following changes:
1. Fixes a bug causing compaction to fail on array, complex, and other non-primitive-type columns
2. Updates compaction status check to be conscious of partition dimensions when comparing dimension ordering.
3. Ensures only string columns are specified as partition dimensions
4. Ensures `rollup` is true if and only if metricsSpec is non-empty
5. Ensures disjoint intervals aren't submitted for compaction
6. Adds `compactionReason` to compaction task context.
2024-10-06 21:48:26 +05:30
Clint Wylie 0bd13bcd51
Projections prototype (#17214) 2024-10-05 04:38:57 -07:00
Clint Wylie 04fe56835d
add druid.expressions.allowVectorizeFallback and default to false (#17248)
changes:

adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types)
add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future
2024-10-05 12:42:42 +05:30
Gian Merlino d1709a329f
Dart: Skip final getCounters, postFinish to idle historicals. (#17255)
In a Dart query, all Historicals are given worker IDs, but not all of them
are going to actually be started or receive work orders.

Attempting to send a getCounters or postFinish command to a worker that
never received a work order is not only wasteful, but it causes errors due
to the workers not knowing about that query ID.
2024-10-04 23:05:21 -07:00
Shivam Garg 93b5a8326b
Upgrade commons-io to 2.17.0 (#17227) 2024-10-04 09:56:56 +02:00
Gian Merlino fc00664760
KafkaInputFormat: Fix handling of CSV/TSV keyFormat. (#17226)
* KafkaInputFormat: Fix handling of CSV/TSV keyFormat.

Follow-up to #16630, which fixed a similar issue for the valueFormat.

* Simplify.
2024-10-03 13:05:09 -07:00
Gian Merlino db7cc4634c
Dart: Smoother handling of stage early-exit. (#17228)
Stages can be instructed to exit before they finish, especially when a
downstream stage includes a "LIMIT". This patch has improvements related
to early-exiting stages.

Bug fix:

- WorkerStageKernel: Don't allow fail() to set an exception if the stage is
  already in a terminal state (FINISHED or FAILED). If fail() is called while
  in a terminal state, log the exception, then throw it away. If it's a
  cancellation exception, don't even log it. This fixes a bug where a stage
  that exited early could transition to FINISHED and then to FAILED, causing
  the overall query to fail.

Performance:

- DartWorkerManager previously sent stopWorker commands to workers
  even when "interrupt" was false. Now it only sends those commands when
  "interrupt" is true. The method javadoc already claimed this is what the
  method did, but the implementation did not match the javadoc. This reduces
  the number of RPCs by 1 per worker per query.

Quieter logging:

- In ReadableByteChunksFrameChannel, skip logging exception from setError if
  the channel has been closed. Channels are closed when readers are done with
  them, so at that point, we wouldn't be interested in the errors.

- In RunWorkOrder, skip calling notifyListener on failure of the main work,
  in the case when stop() has already been called. The stop() method will
  set its own error using CanceledFault. This enables callers to detect
  when a stage was canceled vs. failed for some other reason.

- In WorkerStageKernel, skip logging cancellation errors in fail(). This is
  made possible by the previous change in RunWorkOrder.
2024-10-03 20:09:02 +05:30