Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir.
Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set.
Please note that preference would be given to the user configured, druid.*.storage.temp*Dir, on the tasks. If that is not configured, we then use the configured temporary directory.
Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.
* DruidOverlord: Move becomeLeader/stopBeingLeader earlier.
On becoming leader, it is helpful for the TaskRunner and TaskQueue to be
available when the SupervisorManager starts up, to aid the supervisors
in discovering their tasks.
On stopping leadership, it is helpful for the TaskRunner and TaskQueue
to be available until the SupervisorManager has finished shutting down.
They are only available when the TaskMaster is in "leader" mode, so to
achieve the above, this patch moves it earlier in the sequence.
* Adjust leadership into two phases.
* Update test.
* Adjustments for coverage.
* Stop mirrors start better.
* Update errorprone, mockito, jacoco, checkerframework.
This patch updates various build and test dependencies, to see if they
cause unit tests on JDK 21 to behave more reliably.
* Update licenses, tests.
* Remove assertEquals.
* Repair two tests.
* Update some more tests.
Following #17394, workerExec can get deadlocked with itself, because it
waits for task futures and is also used as the connectExec for the task
client. To fix this, we need to never await task futures in the workerExec.
There are two specific changes: in "verifyAndMergeCheckpoints" and
"checkpointTaskGroup", two "coalesceAndAwait" calls that formerly occurred
in workerExec are replaced with Futures.transform (using a callback in
workerExec).
Because this adjustment removes a source of blocking, it may also improve
supervisor responsiveness for high task counts. This is not the primary
goal, however. The primary goal is to fix the bug introduced by #17394.
Calling toString on newConfig is unnecessary, because it will be done
automatically by the logger. This saves some effort under log levels
higher than DEBUG.
* SeekableStreamSupervisor: Use workerExec as the client connectExec.
This patch uses the already-existing per-supervisor workerExec as the
connectExec for task clients, rather than using the process-wide default
ServiceClientFactory pool.
This helps prevent callbacks from backlogging on the process-wide pool.
It's especially useful for retries, where callbacks may need to establish
new TCP connections or perform TLS handshakes.
* Fix compilation, tests.
* Fix style.
changes:
* adds `SqlBenchmarkDatasets` which contains commonly used benchmark data generator schemas
* adds `SqlBaseBenchmark` which contains common benchmark segment generation methods for any benchmark using `SqlBenchmarkDatasets`
* adds `SqlBaseQueryBenchmark` and `SqlBasePlanBenchmark` for benchmarks measuring queries and planning respectively
* migrate all existing SQL jmh benchmarks to extend `SqlBaseQueryBenchmark`, quite dramatically reducing the boilerplate needed to create benchmarks, and allowing the use of multiple datasources within a benchmark file
* adjustments to data generator stuff to allow passing in an ObjectMapper so that the same mapper can be used for both benchmark queries and segment generation, avoiding the need to register stuff with both mappers for benchmarks
* adds `SqlProjectionsBenchmark` and `SqlComplexMetricsColumnsBenchmark` for measuring projections and measuring complex metric compression respectively
This patch is extracted from PR 17353.
Changes:
- Added BrokerClient and BrokerClientImpl to the sql package that leverages the ServiceClient functionality; similar to OverlordClient and CoordinatorClient implementations in the server module.
- For now, only two broker API stubs are added: submitSqlTask() and fetchExplainPlan().
- Added a new POJO class ExplainPlan that encapsulates explain plan info.
- Deprecated org.apache.druid.discovery.BrokerClient in favor of the new BrokerClient in this patch.
- Clean up ExplainAttributesTest a bit and added serde verification.
MSQ currently supports only single-valued string dimensions as partition keys.
This patch adds a check to ensure that partition keys are single-valued in case
this info is available by virtue of segment download for schema inference.
During compaction, if MSQ finds multi-valued dimensions (MVDs) declared as part
of `range` partitionsSpec, it switches partitioning type to dynamic, ending up in
repeated compactions of the same interval. To avoid this scenario, the segment
download logic is also updated to always download segments if info on multi-valued
dimensions is required.
- This is a non-functional change that moves SqlTaskStatus and its unit test SqlTaskStatusTest from the msq module to the sql module to help class reuse in other places.
- This refactor is extracted from this PR to facilitate easier review.
- Fix a minor spacing issue in the TaskStartTimeoutFault error message.
The Delta Decimal type wasn't handled correctly in the Druid Delta connector, resulting in the error: Unsupported fieldType[Decimal(4, 2)] for fieldName[price].
There were no tests or existing tables with the Decimal type, so I've updated the existing table, complex-types-table to include this data type.
Note that the Decimal type can only be handled as a double at most in Druid. For a big decimal that cannot fit inside a double, it should be ingested as a string.
* GlueingPartitioningOperator: It continuously receives data, and outputs batches of partitioned RACs. It maintains a last-partitioning-boundary of the last-pushed-RAC, and attempts to glue it with the next RAC it receives, ensuring that partitions are handled correctly, even across multiple RACs. You can check GlueingPartitioningOperatorTest for some good examples of the "glueing" work.
* PartitionSortOperator: It sorts rows inside partitioned RACs, on the sort columns. The input RACs it receives are expected to be "complete / separate" partitions of data.
We introduce the option to iterate over fetched data from the dataFetcher for loadingLookups in the lookups-cached-single extension. Also, added the handling of a use case where the data exists in Druid but not in the actual data fetcher, which is in our use-case JDBC Data fetcher, where the value returned is null.
Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
The timeout handler should fire if the response has not been handled yet
(i.e. if responseResolved was previously false). However, it erroneously
fires only if the response *was* handled. This causes HTTP 500 errors if
the timeout actually does fire. The timeout is 30 seconds, which can be
hit during pipelined queries, if an earlier stage of the query hasn't
produced its first frame within 30 seconds.
This fixes a regression introduced in #17140.
Follow up to #17214, adds implementations for substituteCombiningFactory so that more
datasketches aggs can match projections, along with some projections tests for datasketches.
The duty group is a low cardinality dimension and can be helpful in providing insight
into whether a particular duty group is not running fast enough on the coordinator.