Commit Graph

1104 Commits

Author SHA1 Message Date
Akshat Jain ca8f24edd3
Upgrade Guice to 5.1.0 (#17578)
* Move Guice to 5.1.0 and fix tests
* Fix checkstyle
* Revert overrideCurrentGuiceModules() and related changes
* Fix the tests
* Try using maven:3-openjdk-17-slim
* Try enabling debugging for mvn command
* Use maven:3.9 image
* Address review comment: Fix formatting
* Address review comment: Add brief javadoc for ExceptionMatcher
---------
Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>
2024-12-19 09:08:20 +05:30
Kashif Faraz d9a58a7bbd
Move segment update APIs from Coordinator to Overlord (#17545)
Summary of changes
---------------------
- Add `OverlordDataSourcesResource` with APIs to mark segments used/unused
- Add corresponding methods to `OverlordClient`
- Deprecate Coordinator APIs to update segments
- Use `OverlordClient` in `DataSourcesResource` so that Coordinator APIs internally
call the corresponding Overlord APIs
- If the API call fails, fall back to updating the metadata store directly
- Audit these actions only on the Overlord

Other minor changes
---------------------
- Do not perform null check on `OverlordClient` on the coordinator side `DataSourcesResource`.
`OverlordClient` is always non-null in production.
- Add new tests, fix existing ones
- Complete the implementation of `TestSegmentsMetadataManager`

New Overlord APIs
------------------
- Mark all segments of a datasource as unused:
`POST /druid/indexer/v1/datasources/{dataSourceName}`
- Mark all (non-overshadowed) segments of a datasource as used:
`DELETE /druid/indexer/v1/datasources/{dataSourceName}`
- Mark multiple segments as used
`POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed`
- Mark multiple (non-overshadowed) segments as unused
`POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused`
- Mark a single segment as used:
`POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`
- Mark a single segment as unused:
`DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`
2024-12-19 09:05:00 +05:30
Ashwin Tumma f7c2c0acdd
Remove duplicate context from Request Logging (#17582)
* Remove duplicate context from Request Logging
* Update Unit Tests to read context
---------
Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
2024-12-19 09:03:28 +05:30
Akshat Jain 6fad11fe57
Revert "Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570)" (#17584)
This reverts commit cd6083fb94.
2024-12-18 08:29:10 -08:00
Clint Wylie a44ab109d5
remove druid.expressions.useStrictBooleans in favor of always being true (#17568) 2024-12-17 18:49:16 -08:00
Adarsh Sanjeev bb4416a17b
Join context hints (#17541)
* join hints draft

* join algo

* propagate join hints

* review comments

* Use direct hints instead

* Add tests

* Pass preferred algo through pre join clause

* Refactors

* Fix tests

* Revert test changes

* Fix serialization

* Fix tests

* Fix test

* Fix test

* Fix test for sql compat mode

* Increase coverage

* Refactored hint class

---------

Co-authored-by: sreemanamala <sree.manamala@imply.io>
2024-12-17 22:25:22 +05:30
Akshat Jain 98b960c6ac
Refactor: Replace explicit type arguments with diamond operator (#17567)
Since we aren't supporting Java 8 anymore, we can switch to diamond operators
without specifying explicit type arguments.
2024-12-17 14:37:45 +05:30
Akshat Jain cd6083fb94
Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570)
* Add back UnnecessaryFullyQualifiedName rule in pmd ruleset

* Fix checkstyle
2024-12-17 12:43:12 +05:30
Zoltan Haindrich 1a38434d8d
Restore usage of filtered SUM (#17378) 2024-12-12 10:30:42 +01:00
Clint Wylie 3c1b488cb7
remove druid.sql.planner.serializeComplexValues config in favor of always serializing complex values (#17549) 2024-12-11 13:07:56 -08:00
Abhishek Radhakrishnan 3a2220c68d
Refactor: Move some classes from `sql` to `processing` & `server` for reusability (#17542)
This PR contains non-functional / refactoring changes of the following classes in the sql module:

1. Move ExplainPlan and ExplainAttributes fromsql/src/main/java/org/apache/druid/sql/http to processing/src/main/java/org/apache/druid/query/explain
2. Move sql/src/main/java/org/apache/druid/sql/SqlTaskStatus.java -> processing/src/main/java/org/apache/druid/query/http/SqlTaskStatus.java
3. Add a new class processing/src/main/java/org/apache/druid/query/http/ClientSqlQuery.java that is effectively a thin POJO version of SqlQuery in the sql module but without any of the Calcite functionality and business logic.
4. Move BrokerClient, BrokerClientImpl and Broker classes from sql/src/main/java/org/apache/druid/sql/client to server/src/main/java/org/apache/druid/client/broker.
5. Remove BrokerServiceModule that provided the BrokerClient. The functionality is now contained in ServiceClientModule in the server package itself which provides all the clients as well.

This is done so that we can reuse the said classes in #17353 without brining in Calcite and other dependencies to the Overlord.
2024-12-06 09:32:03 -08:00
Zoltan Haindrich c1ef38b052
Minor fixes and enhancements in UnionQuery handling (#17483)
* plan consistently with either UnionDataSource or UnionQuery for decoupled mode
* expose errors
* move decoupled related setting from PlannerConfig to QueryContexts
2024-11-28 10:05:12 +01:00
Zoltan Haindrich 20aea29a51
Rename d1/d2 columns in tests (#17471) 2024-11-22 14:58:56 +01:00
Akshat Jain 17215cd677
Remove support for Java 8 (#17466)
All JDK 8 based CI checks have been removed.
    Images used in Dockerfile(s) have been updated to Java 17 based images.
    Documentation has been updated accordingly.
2024-11-21 15:33:08 +05:30
Zoltan Haindrich f296102f05
ScanQuery should not ignore columnTypes in equals/hashCode (#17463)
* ScanQuery: equals/hashCode/toString
* DruidQuery: changes of Align ScanQuery column order with its desired signature #17457
* ScanQueryTest: add equalsverifer test
2024-11-12 14:26:59 +05:30
Zoltan Haindrich 2eac8318f8
Support Union in Decoupled planning (#17354)
* introduces `UnionQuery`
* some changes to enable a `UnionQuery` to have multiple input datasources
* `UnionQuery` execution is driven by the `QueryLogic` - which could later enable to reduce some complexity in `ClientQuerySegmentWalker`
* to run the subqueries of `UnionQuery` there was a need to access the `conglomerate` from the `Runner`; to enable that some refactors were done
* renamed `UnionQueryRunner` to `UnionDataSourceQueryRunner`
* `QueryRunnerFactoryConglomerate` have taken the place of `QueryToolChestWarehouse` which shaves of some unnecessary things here and there
* small cleanup/refactors
2024-11-05 16:58:57 +01:00
Tom e4cdbca23c
make planner errors be user persona (#17437)
Change the persona for errors within the planner from Admin to User. The ADMIN persona is meant to be "a persona who is interacting with admin APIs and understands Druid query concepts". This isn't an admin API, it's a query API. Low quality error messages being returned to the correct audience is better than hiding all error messages.

The errors that can be returned back can be user solvable, and other times requires a druid expert. But the errors do not leak information that should only be seen by more expert/privileged personas.

The original ADMIN persona showed some reticence to tag low-quality error messages with a USER persona. but it really does seem user-directed to me so USER to me would make sense.
2024-11-04 10:48:35 -08:00
Akshat Jain 21e7e5cddd
Add benchmark suite for MSQ window functions (#17377)
* Add benchmark suite for MSQ window functions

* Fix inspection checks

* Address review comment: Rename method
2024-10-30 11:32:28 +05:30
Gian Merlino 446a8f466f
Update errorprone, mockito, jacoco, checkerframework. (#17414)
* Update errorprone, mockito, jacoco, checkerframework.

This patch updates various build and test dependencies, to see if they
cause unit tests on JDK 21 to behave more reliably.

* Update licenses, tests.

* Remove assertEquals.

* Repair two tests.

* Update some more tests.
2024-10-28 11:34:03 -07:00
Abhishek Radhakrishnan 43b325b6aa
Add missing `@Nullable` annotations to SqlQuery (#17398) 2024-10-22 20:34:46 -07:00
Abhishek Radhakrishnan 187e21afae
Add `BrokerClient` implementation (#17382)
This patch is extracted from PR 17353.

Changes:

- Added BrokerClient and BrokerClientImpl to the sql package that leverages the ServiceClient functionality; similar to OverlordClient and CoordinatorClient implementations in the server module.
- For now, only two broker API stubs are added: submitSqlTask() and fetchExplainPlan().
- Added a new POJO class ExplainPlan that encapsulates explain plan info.
- Deprecated org.apache.druid.discovery.BrokerClient in favor of the new BrokerClient in this patch.
- Clean up ExplainAttributesTest a bit and added serde verification.
2024-10-21 11:05:53 -07:00
Abhishek Radhakrishnan 9a16d4e219
Move SqlTaskStatus and SqlTaskStausTest from msq module to sql module. (#17380)
- This is a non-functional change that moves SqlTaskStatus and its unit test SqlTaskStatusTest from the msq module to the sql module to help class reuse in other places.
- This refactor is extracted from this PR to facilitate easier review.
- Fix a minor spacing issue in the TaskStartTimeoutFault error message.
2024-10-18 14:39:01 -07:00
Shivam Garg 6898a5a359
Removed Microsecond from Extract function (#17247) 2024-10-11 05:32:26 +02:00
Clint Wylie ab0d6eb620
Fix string array grouping comparator (#17183) 2024-10-08 09:47:28 +05:30
Clint Wylie 04fe56835d
add druid.expressions.allowVectorizeFallback and default to false (#17248)
changes:

adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types)
add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future
2024-10-05 12:42:42 +05:30
Zoltan Haindrich 65277b17a9
Decoupled planning: add support for unnest (#17177)
* adds support for `UNNEST` expressions
* introduces `LogicalUnnestRule` to transform a `Correlate` doing UNNEST into a `LogicalUnnest`
* `UnnestInputCleanupRule` could move the final unnested expr into the `LogicalUnnest` itself (usually its an `mv_to_array` expression)
* enhanced source unwrapping to utilize `FilteredDataSource` if it looks right
2024-10-02 08:54:56 +02:00
Gian Merlino 878adff9aa
MSQ profile for Brokers and Historicals. (#17140)
This patch adds a profile of MSQ named "Dart" that runs on Brokers and
Historicals, and which is compatible with the standard SQL query API.
For more high-level description, and notes on future work, refer to #17139.

This patch contains the following changes, grouped into packages.

Controller (org.apache.druid.msq.dart.controller):

The controller runs on Brokers. Main classes are,

- DartSqlResource, which serves /druid/v2/sql/dart/.
- DartSqlEngine and DartQueryMaker, the entry points from SQL that actually
  run the MSQ controller code.
- DartControllerContext, which configures the MSQ controller.
- DartMessageRelays, which sets up relays (see "message relays" below) to read
  messages from workers' DartControllerClients.
- DartTableInputSpecSlicer, which assigns work based on a TimelineServerView.

Worker (org.apache.druid.msq.dart.worker)

The worker runs on Historicals. Main classes are,

- DartWorkerResource, which supplies the regular MSQ WorkerResource, plus
  Dart-specific APIs.
- DartWorkerRunner, which runs MSQ worker code.
- DartWorkerContext, which configures the MSQ worker.
- DartProcessingBuffersProvider, which provides processing buffers from
  sliced-up merge buffers.
- DartDataSegmentProvider, which provides segments from the Historical's
  local cache.

Message relays (org.apache.druid.messages):

To avoid the need for Historicals to contact Brokers during a query, which
would create opportunities for queries to get stuck, all connections are
opened from Broker to Historical. This is made possible by a message relay
system, where the relay server (worker) has an outbox of messages.

The relay client (controller) connects to the outbox and retrieves messages.
Code for this system lives in the "server" package to keep it separate from
the MSQ extension and make it easier to maintain. The worker-to-controller
ControllerClient is implemented using message relays.

Other changes:

- Controller: Added the method "hasWorker". Used by the ControllerMessageListener
  to notify the appropriate controllers when a worker fails.
- WorkerResource: No longer tries to respond more than once in the
  "httpGetChannelData" API. This comes up when a response due to resolved future
  is ready at about the same time as a timeout occurs.
- MSQTaskQueryMaker: Refactor to separate out some useful functions for reuse
  in DartQueryMaker.
- SqlEngine: Add "queryContext" to "resultTypeForSelect" and "resultTypeForInsert".
  This allows the DartSqlEngine to modify result format based on whether a "fullReport"
  context parameter is set.
- LimitedOutputStream: New utility class. Used when in "fullReport" mode.
- TimelineServerView: Add getDruidServerMetadata as a performance optimization.
- CliHistorical: Add SegmentWrangler, so it can query inline data, lookups, etc.
- ServiceLocation: Add "fromUri" method, relocating some code from ServiceClientImpl.
- FixedServiceLocator: New locator for a fixed set of service locations. Useful for
  URI locations.
2024-10-01 14:38:55 -07:00
Sree Charan Manamala 661614129e
Window Functions : Context Parameter to Enable Transfer of RACs over wire (#17150) 2024-09-28 08:04:22 +02:00
Gian Merlino dc223f22db
SQL: Use regular filters for time filtering in subqueries. (#17173)
* SQL: Use regular filters for time filtering in subqueries.

Using the "intervals" feature on subqueries, or any non-table, should be
avoided because it isn't a meaningful optimization in those cases, and
it's simpler for runtime implementations if they can assume all filters
are located in the regular filter object.

Two changes:

1) Fix the logic in DruidQuery.canUseIntervalFiltering. It was intended
   to return false for QueryDataSource, but actually returned true.

2) Add a validation to ScanQueryFrameProcessor to ensure that when running
   on an input channel (which would include any subquery), the query has
   "intervals" set to ONLY_ETERNITY.

Prior to this patch, the new test case in testTimeFilterOnSubquery would
throw a "Can only handle a single interval" error in the native engine,
and "QueryNotSupported" in the MSQ engine.

* Mark new case as having extra columns in decoupled mode.

* Adjust test.
2024-09-27 10:32:30 +05:30
Zoltan Haindrich 6f7e8ca74a
Decoupled planning: improve join predicate handling (#17105)
* enforces to only allow supported predicates in join conditions
* fixed a recursive query building issue by caching the `source` in `DruidQueryGenerator`
* moved `DruidAggregateRemoveRedundancyRule.instance` higher up; as if `CoreRules.AGGREGATE_EXPAND_DISTINCT_AGGREGATES` runs earlier the resulting `GROUPING` might become invalid
2024-09-25 16:00:25 +02:00
Clint Wylie 77a362c555
various fixes and improvements to vectorization fallback (#17098)
changes:
* add `ApplyFunction` support to vectorization fallback, allowing many of the remaining expressions to be vectorized
* add `CastToObjectVectorProcessor` so that vector engine can correctly cast any type
* add support for array and complex vector constants
* reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness)
* fix array constructor expression, apply map expression to make actual evaluated type match the output type inference
* fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric
* fix isNull/isNotNull to correctly handle any type of input argument
2024-09-24 04:29:08 -07:00
Abhishek Radhakrishnan 37a2a12d79
rerwrite node so dynamic parameter applies to ingest node as well. (#17126) 2024-09-23 12:49:46 -07:00
Sree Charan Manamala 67d361c9bf
Window Functions : Remove enable windowing flag (#17087) 2024-09-23 08:24:26 +02:00
Zoltan Haindrich 2eee470f6e
Support explain in decoupled planning and log native plan consistently with DruidHook (#17101)
* enables to use DruidHook for native plan logging
* qudiem tests doesn't necessarily need to run the query to get an explain - this helps during development as if there is a runtime issue it could still be explained in the test
2024-09-20 10:53:43 +02:00
Abhishek Radhakrishnan 635e418131
Support to parse numbers in text-based input formats (#17082)
Text-based input formats like csv and tsv currently parse inputs only as strings, following the RFC4180Parser spec).
To workaround this, the web-console and other tools need to further inspect the sample data returned to sample data returned by the Druid sampler API to parse them as numbers. 

This patch introduces a new optional config, tryParseNumbers, for the csv and tsv input formats. If enabled, any numbers present in the input will be parsed in the following manner -- long data type for integer types and double for floating-point numbers, and if parsing fails for whatever reason, the input is treated as a string. By default, this configuration is set to false, so numeric strings will be treated as strings.
2024-09-19 13:21:18 -07:00
Pranav d1bd6a8156
Update doc for allowedHeaders (#17045)
Update doc for allowedHeaders and make allowedHeaders more restrictive
2024-09-19 08:37:39 +05:30
Zoltan Haindrich d84d53c017
Decoupled planning: improve join support (#17039)
There were some problematic cases

join branches are run with finalize=false instead of finalize=true like normal subqueries
this inconsistency is not good - but fixing it is a bigger thing
ensure that right hand sides of joins are always subqueries - or accessible globally
To achieve the above:

operand indexes were needed for the upstream reltree nodes in the generator
source unwrapping now takes the join situation into account as well
2024-09-18 08:56:36 +05:30
Lasse Mammen 307b8e3357
feat: json_merge expression and sql function (#17081) 2024-09-17 18:27:34 +05:30
Sree Charan Manamala bb1c3c1749
Add serde for ColumnBasedRowsAndColumns to fix window queries without group by (#16658)
Register a Ser-De for RowsAndColumns so that the window operator query running on leaf operators would be transferred properly on the wire. Would fix the empty response given by window queries without group by on the native engine.
2024-09-17 06:44:40 +02:00
Laksh Singla bb487a4193
Support maxSubqueryBytes for window functions (#16800)
Window queries now acknowledge maxSubqueryBytes.
2024-09-17 10:06:24 +05:30
Gian Merlino 27443a0600
Fix formatting of error message from validateNoIllegalRightyJoins. (#17061)
The prior formatting was inconsistent in terms of punctuation and
capitalization.
2024-09-14 15:20:48 -07:00
Rishabh Singh a8c06e93aa
Skip tombstone segment refresh in metadata cache (#17025)
This PR #16890 introduced a change to skip adding tombstone segments to the cache.
It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache.

The fix is to keep the segment in the metadata cache but skip them from refresh.

This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
2024-09-13 11:47:11 +05:30
Akshat Jain fff3e81dcc
Add window function drill tests for array_concat_agg for empty over scenarios (#17026)
* Add window function drill tests for array_concat_agg for empty over scenarios

* Cleanup sqlNativeIncompatible() as it's not needed now

* Address review comment
2024-09-13 11:35:45 +05:30
Pranav a95397e712
Allow request headers in HttpInputSource in native and MSQ Ingestion (#16974)
Support for adding the request headers in http input source. we can now pass the additional headers as json in both native and MSQ.
2024-09-12 11:18:44 +05:30
Abhishek Agarwal 78775ad398
Prepare master for 32.0.0 release (#17022) 2024-09-10 11:01:20 +05:30
Clint Wylie f57cd6f7af
transition away from StorageAdapter (#16985)
* transition away from StorageAdapter
changes:
* CursorHolderFactory has been renamed to CursorFactory and moved off of StorageAdapter, instead fetched directly from the segment via 'asCursorFactory'. The previous deprecated CursorFactory interface has been merged into StorageAdapter
* StorageAdapter is no longer used by any engines or tests and has been marked as deprecated with default implementations of all methods that throw exceptions indicating the new methods to call instead
* StorageAdapter methods not covered by CursorFactory (CursorHolderFactory prior to this change) have been moved into interfaces which are retrieved by Segment.as, the primary classes are the previously existing Metadata, as well as new interfaces PhysicalSegmentInspector and TopNOptimizationInspector
* added UnnestSegment and FilteredSegment that extend WrappedSegmentReference since their StorageAdapter implementations were previously provided by WrappedSegmentReference
* added PhysicalSegmentInspector which covers some of the previous StorageAdapter functionality which was primarily used for segment metadata queries and other metadata uses, and is implemented for QueryableIndexSegment and IncrementalIndexSegment
* added TopNOptimizationInspector to cover the oddly specific StorageAdapter.hasBuiltInFilters implementation, which is implemented for HashJoinSegment, UnnestSegment, and FilteredSegment
* Updated all engines and tests to no longer use StorageAdapter
2024-09-09 14:55:29 -07:00
Sree Charan Manamala 51fe3c08ab
Window Functions : Reject MVDs during window processing (#17002)
This commit aims to reject MVDs in window processing as we do not support them.
Earlier to this commit, query running a window aggregate partitioned by an MVD column would fail with ClassCastException
2024-09-09 12:07:54 +05:30
Clint Wylie b0f36c1b89
fix bug with CastOperatorConversion with types which cannot be mapped to native druid types (#17011) 2024-09-06 17:07:32 -07:00
Gian Merlino 76b8c20f4d
Create fewer temporary maps when querying sys.segments. (#16981)
Eliminates two map creations (availableSegmentMetadata, partialSegmentDataMap).
The segmentsAlreadySeen set remains.
2024-09-03 20:04:44 -07:00
Sree Charan Manamala 619d8ef964
Window Functions : Numeric Arrays Frame Column Writers - fix class cast exception (#16983)
Fix ClassCastException in ArrayFrameCoulmnWriters
2024-09-03 11:44:52 +05:30