* ScanQuery: equals/hashCode/toString
* DruidQuery: changes of Align ScanQuery column order with its desired signature #17457
* ScanQueryTest: add equalsverifer test
* introduces `UnionQuery`
* some changes to enable a `UnionQuery` to have multiple input datasources
* `UnionQuery` execution is driven by the `QueryLogic` - which could later enable to reduce some complexity in `ClientQuerySegmentWalker`
* to run the subqueries of `UnionQuery` there was a need to access the `conglomerate` from the `Runner`; to enable that some refactors were done
* renamed `UnionQueryRunner` to `UnionDataSourceQueryRunner`
* `QueryRunnerFactoryConglomerate` have taken the place of `QueryToolChestWarehouse` which shaves of some unnecessary things here and there
* small cleanup/refactors
Change the persona for errors within the planner from Admin to User. The ADMIN persona is meant to be "a persona who is interacting with admin APIs and understands Druid query concepts". This isn't an admin API, it's a query API. Low quality error messages being returned to the correct audience is better than hiding all error messages.
The errors that can be returned back can be user solvable, and other times requires a druid expert. But the errors do not leak information that should only be seen by more expert/privileged personas.
The original ADMIN persona showed some reticence to tag low-quality error messages with a USER persona. but it really does seem user-directed to me so USER to me would make sense.
* Update errorprone, mockito, jacoco, checkerframework.
This patch updates various build and test dependencies, to see if they
cause unit tests on JDK 21 to behave more reliably.
* Update licenses, tests.
* Remove assertEquals.
* Repair two tests.
* Update some more tests.
This patch is extracted from PR 17353.
Changes:
- Added BrokerClient and BrokerClientImpl to the sql package that leverages the ServiceClient functionality; similar to OverlordClient and CoordinatorClient implementations in the server module.
- For now, only two broker API stubs are added: submitSqlTask() and fetchExplainPlan().
- Added a new POJO class ExplainPlan that encapsulates explain plan info.
- Deprecated org.apache.druid.discovery.BrokerClient in favor of the new BrokerClient in this patch.
- Clean up ExplainAttributesTest a bit and added serde verification.
- This is a non-functional change that moves SqlTaskStatus and its unit test SqlTaskStatusTest from the msq module to the sql module to help class reuse in other places.
- This refactor is extracted from this PR to facilitate easier review.
- Fix a minor spacing issue in the TaskStartTimeoutFault error message.
changes:
adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types)
add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future
* adds support for `UNNEST` expressions
* introduces `LogicalUnnestRule` to transform a `Correlate` doing UNNEST into a `LogicalUnnest`
* `UnnestInputCleanupRule` could move the final unnested expr into the `LogicalUnnest` itself (usually its an `mv_to_array` expression)
* enhanced source unwrapping to utilize `FilteredDataSource` if it looks right
This patch adds a profile of MSQ named "Dart" that runs on Brokers and
Historicals, and which is compatible with the standard SQL query API.
For more high-level description, and notes on future work, refer to #17139.
This patch contains the following changes, grouped into packages.
Controller (org.apache.druid.msq.dart.controller):
The controller runs on Brokers. Main classes are,
- DartSqlResource, which serves /druid/v2/sql/dart/.
- DartSqlEngine and DartQueryMaker, the entry points from SQL that actually
run the MSQ controller code.
- DartControllerContext, which configures the MSQ controller.
- DartMessageRelays, which sets up relays (see "message relays" below) to read
messages from workers' DartControllerClients.
- DartTableInputSpecSlicer, which assigns work based on a TimelineServerView.
Worker (org.apache.druid.msq.dart.worker)
The worker runs on Historicals. Main classes are,
- DartWorkerResource, which supplies the regular MSQ WorkerResource, plus
Dart-specific APIs.
- DartWorkerRunner, which runs MSQ worker code.
- DartWorkerContext, which configures the MSQ worker.
- DartProcessingBuffersProvider, which provides processing buffers from
sliced-up merge buffers.
- DartDataSegmentProvider, which provides segments from the Historical's
local cache.
Message relays (org.apache.druid.messages):
To avoid the need for Historicals to contact Brokers during a query, which
would create opportunities for queries to get stuck, all connections are
opened from Broker to Historical. This is made possible by a message relay
system, where the relay server (worker) has an outbox of messages.
The relay client (controller) connects to the outbox and retrieves messages.
Code for this system lives in the "server" package to keep it separate from
the MSQ extension and make it easier to maintain. The worker-to-controller
ControllerClient is implemented using message relays.
Other changes:
- Controller: Added the method "hasWorker". Used by the ControllerMessageListener
to notify the appropriate controllers when a worker fails.
- WorkerResource: No longer tries to respond more than once in the
"httpGetChannelData" API. This comes up when a response due to resolved future
is ready at about the same time as a timeout occurs.
- MSQTaskQueryMaker: Refactor to separate out some useful functions for reuse
in DartQueryMaker.
- SqlEngine: Add "queryContext" to "resultTypeForSelect" and "resultTypeForInsert".
This allows the DartSqlEngine to modify result format based on whether a "fullReport"
context parameter is set.
- LimitedOutputStream: New utility class. Used when in "fullReport" mode.
- TimelineServerView: Add getDruidServerMetadata as a performance optimization.
- CliHistorical: Add SegmentWrangler, so it can query inline data, lookups, etc.
- ServiceLocation: Add "fromUri" method, relocating some code from ServiceClientImpl.
- FixedServiceLocator: New locator for a fixed set of service locations. Useful for
URI locations.
* SQL: Use regular filters for time filtering in subqueries.
Using the "intervals" feature on subqueries, or any non-table, should be
avoided because it isn't a meaningful optimization in those cases, and
it's simpler for runtime implementations if they can assume all filters
are located in the regular filter object.
Two changes:
1) Fix the logic in DruidQuery.canUseIntervalFiltering. It was intended
to return false for QueryDataSource, but actually returned true.
2) Add a validation to ScanQueryFrameProcessor to ensure that when running
on an input channel (which would include any subquery), the query has
"intervals" set to ONLY_ETERNITY.
Prior to this patch, the new test case in testTimeFilterOnSubquery would
throw a "Can only handle a single interval" error in the native engine,
and "QueryNotSupported" in the MSQ engine.
* Mark new case as having extra columns in decoupled mode.
* Adjust test.