This patch supports sorting segments by non-time columns (added in #16849) to MSQ compaction.
Specifically, if `forceSegmentSortByTime` is set in the data schema, either via the user-supplied
compaction config or in the inferred schema, the following steps are taken:
- Skip adding `__time` explicitly as the first column to the dimension schema since it already comes
as part of the schema
- Ensure column mappings propagate `__time` in the order specified by the schema
- Set `forceSegmentSortByTime` in the MSQ context.
This change is to emit following metrics as part of GroupByStatsMonitor monitor,
mergeBuffer/used -> Number of merge buffers used.
mergeBuffer/acquisitionTimeNs -> Total time required to acquire merge buffer.
mergeBuffer/acquisition -> Number of queries that acquired a batch of merge buffers.
groupBy/spilledQueries -> Number of queries that spilled onto the disk.
groupBy/spilledBytes-> Spilled bytes on the disk.
groupBy/mergeDictionarySize -> Size of the merging dictionary.
All JDK 8 based CI checks have been removed.
Images used in Dockerfile(s) have been updated to Java 17 based images.
Documentation has been updated accordingly.
* ScanQuery: equals/hashCode/toString
* DruidQuery: changes of Align ScanQuery column order with its desired signature #17457
* ScanQueryTest: add equalsverifer test
* WindowOperatorQueryKit: Pass QueryContext instead of WindowOperatorQuery to subsequent layers
* Add serializer for QueryContext class
* Revert changes of WindowOperatorQueryFrameProcessorFactory json param
* Fix checkstyle
* Address review comment: Remove older method in favor of calling new method inline
* introduces `UnionQuery`
* some changes to enable a `UnionQuery` to have multiple input datasources
* `UnionQuery` execution is driven by the `QueryLogic` - which could later enable to reduce some complexity in `ClientQuerySegmentWalker`
* to run the subqueries of `UnionQuery` there was a need to access the `conglomerate` from the `Runner`; to enable that some refactors were done
* renamed `UnionQueryRunner` to `UnionDataSourceQueryRunner`
* `QueryRunnerFactoryConglomerate` have taken the place of `QueryToolChestWarehouse` which shaves of some unnecessary things here and there
* small cleanup/refactors
Change the persona for errors within the planner from Admin to User. The ADMIN persona is meant to be "a persona who is interacting with admin APIs and understands Druid query concepts". This isn't an admin API, it's a query API. Low quality error messages being returned to the correct audience is better than hiding all error messages.
The errors that can be returned back can be user solvable, and other times requires a druid expert. But the errors do not leak information that should only be seen by more expert/privileged personas.
The original ADMIN persona showed some reticence to tag low-quality error messages with a USER persona. but it really does seem user-directed to me so USER to me would make sense.
Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir.
Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set.
Please note that preference would be given to the user configured, druid.*.storage.temp*Dir, on the tasks. If that is not configured, we then use the configured temporary directory.
Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.
* Update errorprone, mockito, jacoco, checkerframework.
This patch updates various build and test dependencies, to see if they
cause unit tests on JDK 21 to behave more reliably.
* Update licenses, tests.
* Remove assertEquals.
* Repair two tests.
* Update some more tests.
* SeekableStreamSupervisor: Use workerExec as the client connectExec.
This patch uses the already-existing per-supervisor workerExec as the
connectExec for task clients, rather than using the process-wide default
ServiceClientFactory pool.
This helps prevent callbacks from backlogging on the process-wide pool.
It's especially useful for retries, where callbacks may need to establish
new TCP connections or perform TLS handshakes.
* Fix compilation, tests.
* Fix style.
MSQ currently supports only single-valued string dimensions as partition keys.
This patch adds a check to ensure that partition keys are single-valued in case
this info is available by virtue of segment download for schema inference.
During compaction, if MSQ finds multi-valued dimensions (MVDs) declared as part
of `range` partitionsSpec, it switches partitioning type to dynamic, ending up in
repeated compactions of the same interval. To avoid this scenario, the segment
download logic is also updated to always download segments if info on multi-valued
dimensions is required.
- This is a non-functional change that moves SqlTaskStatus and its unit test SqlTaskStatusTest from the msq module to the sql module to help class reuse in other places.
- This refactor is extracted from this PR to facilitate easier review.
- Fix a minor spacing issue in the TaskStartTimeoutFault error message.
* GlueingPartitioningOperator: It continuously receives data, and outputs batches of partitioned RACs. It maintains a last-partitioning-boundary of the last-pushed-RAC, and attempts to glue it with the next RAC it receives, ensuring that partitions are handled correctly, even across multiple RACs. You can check GlueingPartitioningOperatorTest for some good examples of the "glueing" work.
* PartitionSortOperator: It sorts rows inside partitioned RACs, on the sort columns. The input RACs it receives are expected to be "complete / separate" partitions of data.
We introduce the option to iterate over fetched data from the dataFetcher for loadingLookups in the lookups-cached-single extension. Also, added the handling of a use case where the data exists in Druid but not in the actual data fetcher, which is in our use-case JDBC Data fetcher, where the value returned is null.
Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
The timeout handler should fire if the response has not been handled yet
(i.e. if responseResolved was previously false). However, it erroneously
fires only if the response *was* handled. This causes HTTP 500 errors if
the timeout actually does fire. The timeout is 30 seconds, which can be
hit during pipelined queries, if an earlier stage of the query hasn't
produced its first frame within 30 seconds.
This fixes a regression introduced in #17140.
Follow up to #17214, adds implementations for substituteCombiningFactory so that more
datasketches aggs can match projections, along with some projections tests for datasketches.
This fixes a concurrency issue where, for failed queries, "onQueryComplete"
could be called concurrently with "onResultsStart" or "onResultRow". Fully
closing the controller ensures that the result reader is no longer active,
which eliminates the race.
Previously, the leaf worker count was used for stages that have *no*
stage inputs. It should actually be used for stages that have *any*
non-broadcast, non-stage inputs.
This fixes a bug with broadcast joins. In a broadcast join, the stage has
both a table and a broadcast stage as input. Previously, it would be planned
using the non-leaf worker count. It should actually be planned using the
leaf worker count.
Prior to this patch, the workerId() method did not actually return
the worker ID. It returned some other string that had similar information,
but was different.
This caused the /druid/dart-worker/workers API, to return an internal
server error. The API is useful for debugging, although it is not used
during actual queries.
Return HTTP 202 (Accepted) on cancellation, even if the requested query
ID was not found.
The main reason for this is that when the Router broadcasts DELETE requests
to all Brokers, it returns the response from one of them randomly. If we
return 404 when a query ID isn't found, then the Router randomly returns 404s
even when the query really was found and canceled.
This is also arguably still correct behavior. The cancellation request
*was* accepted, it just won't do anything because the query was not in
fact running.
Due to a typo, the thread name of the worker executor used an en dash (–)
rather than a regular hyphen (-). This was unintentional, and makes it
difficult to search for in thread dumps.
In a Dart query, all Historicals are given worker IDs, but not all of them
are going to actually be started or receive work orders. This can create gaps
in the set of workers. For example, workers 1 and 3 could have work assigned
while workers 0 and 2 do not.
This patch updates ControllerStageTracker and WorkerInputs to handle such
gaps, by using the set of actual worker numbers, rather than 0..workerCount,
in various places.
The patch makes the following changes:
1. Fixes a bug causing compaction to fail on array, complex, and other non-primitive-type columns
2. Updates compaction status check to be conscious of partition dimensions when comparing dimension ordering.
3. Ensures only string columns are specified as partition dimensions
4. Ensures `rollup` is true if and only if metricsSpec is non-empty
5. Ensures disjoint intervals aren't submitted for compaction
6. Adds `compactionReason` to compaction task context.
changes:
adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types)
add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future
In a Dart query, all Historicals are given worker IDs, but not all of them
are going to actually be started or receive work orders.
Attempting to send a getCounters or postFinish command to a worker that
never received a work order is not only wasteful, but it causes errors due
to the workers not knowing about that query ID.
Stages can be instructed to exit before they finish, especially when a
downstream stage includes a "LIMIT". This patch has improvements related
to early-exiting stages.
Bug fix:
- WorkerStageKernel: Don't allow fail() to set an exception if the stage is
already in a terminal state (FINISHED or FAILED). If fail() is called while
in a terminal state, log the exception, then throw it away. If it's a
cancellation exception, don't even log it. This fixes a bug where a stage
that exited early could transition to FINISHED and then to FAILED, causing
the overall query to fail.
Performance:
- DartWorkerManager previously sent stopWorker commands to workers
even when "interrupt" was false. Now it only sends those commands when
"interrupt" is true. The method javadoc already claimed this is what the
method did, but the implementation did not match the javadoc. This reduces
the number of RPCs by 1 per worker per query.
Quieter logging:
- In ReadableByteChunksFrameChannel, skip logging exception from setError if
the channel has been closed. Channels are closed when readers are done with
them, so at that point, we wouldn't be interested in the errors.
- In RunWorkOrder, skip calling notifyListener on failure of the main work,
in the case when stop() has already been called. The stop() method will
set its own error using CanceledFault. This enables callers to detect
when a stage was canceled vs. failed for some other reason.
- In WorkerStageKernel, skip logging cancellation errors in fail(). This is
made possible by the previous change in RunWorkOrder.