druid

Commit Graph

Author	SHA1	Message	Date
Akshat Jain	ca8f24edd3	Upgrade Guice to 5.1.0 (#17578 ) * Move Guice to 5.1.0 and fix tests * Fix checkstyle * Revert overrideCurrentGuiceModules() and related changes * Fix the tests * Try using maven:3-openjdk-17-slim * Try enabling debugging for mvn command * Use maven:3.9 image * Address review comment: Fix formatting * Address review comment: Add brief javadoc for ExceptionMatcher --------- Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>	2024-12-19 09:08:20 +05:30
Kashif Faraz	d9a58a7bbd	Move segment update APIs from Coordinator to Overlord (#17545 ) Summary of changes --------------------- - Add `OverlordDataSourcesResource` with APIs to mark segments used/unused - Add corresponding methods to `OverlordClient` - Deprecate Coordinator APIs to update segments - Use `OverlordClient` in `DataSourcesResource` so that Coordinator APIs internally call the corresponding Overlord APIs - If the API call fails, fall back to updating the metadata store directly - Audit these actions only on the Overlord Other minor changes --------------------- - Do not perform null check on `OverlordClient` on the coordinator side `DataSourcesResource`. `OverlordClient` is always non-null in production. - Add new tests, fix existing ones - Complete the implementation of `TestSegmentsMetadataManager` New Overlord APIs ------------------ - Mark all segments of a datasource as unused: `POST /druid/indexer/v1/datasources/{dataSourceName}` - Mark all (non-overshadowed) segments of a datasource as used: `DELETE /druid/indexer/v1/datasources/{dataSourceName}` - Mark multiple segments as used `POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed` - Mark multiple (non-overshadowed) segments as unused `POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused` - Mark a single segment as used: `POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` - Mark a single segment as unused: `DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`	2024-12-19 09:05:00 +05:30
Ashwin Tumma	f7c2c0acdd	Remove duplicate context from Request Logging (#17582 ) * Remove duplicate context from Request Logging * Update Unit Tests to read context --------- Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>	2024-12-19 09:03:28 +05:30
Akshat Jain	6fad11fe57	Revert "Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570 )" (#17584 ) This reverts commit `cd6083fb94`.	2024-12-18 08:29:10 -08:00
Clint Wylie	a44ab109d5	remove druid.expressions.useStrictBooleans in favor of always being true (#17568 )	2024-12-17 18:49:16 -08:00
Adarsh Sanjeev	bb4416a17b	Join context hints (#17541 ) * join hints draft * join algo * propagate join hints * review comments * Use direct hints instead * Add tests * Pass preferred algo through pre join clause * Refactors * Fix tests * Revert test changes * Fix serialization * Fix tests * Fix test * Fix test * Fix test for sql compat mode * Increase coverage * Refactored hint class --------- Co-authored-by: sreemanamala <sree.manamala@imply.io>	2024-12-17 22:25:22 +05:30
Akshat Jain	98b960c6ac	Refactor: Replace explicit type arguments with diamond operator (#17567 ) Since we aren't supporting Java 8 anymore, we can switch to diamond operators without specifying explicit type arguments.	2024-12-17 14:37:45 +05:30
Akshat Jain	cd6083fb94	Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570 ) * Add back UnnecessaryFullyQualifiedName rule in pmd ruleset * Fix checkstyle	2024-12-17 12:43:12 +05:30
Zoltan Haindrich	1a38434d8d	Restore usage of filtered SUM (#17378 )	2024-12-12 10:30:42 +01:00
Clint Wylie	3c1b488cb7	remove druid.sql.planner.serializeComplexValues config in favor of always serializing complex values (#17549 )	2024-12-11 13:07:56 -08:00
Abhishek Radhakrishnan	3a2220c68d	Refactor: Move some classes from `sql` to `processing` & `server` for reusability (#17542 ) This PR contains non-functional / refactoring changes of the following classes in the sql module: 1. Move ExplainPlan and ExplainAttributes fromsql/src/main/java/org/apache/druid/sql/http to processing/src/main/java/org/apache/druid/query/explain 2. Move sql/src/main/java/org/apache/druid/sql/SqlTaskStatus.java -> processing/src/main/java/org/apache/druid/query/http/SqlTaskStatus.java 3. Add a new class processing/src/main/java/org/apache/druid/query/http/ClientSqlQuery.java that is effectively a thin POJO version of SqlQuery in the sql module but without any of the Calcite functionality and business logic. 4. Move BrokerClient, BrokerClientImpl and Broker classes from sql/src/main/java/org/apache/druid/sql/client to server/src/main/java/org/apache/druid/client/broker. 5. Remove BrokerServiceModule that provided the BrokerClient. The functionality is now contained in ServiceClientModule in the server package itself which provides all the clients as well. This is done so that we can reuse the said classes in #17353 without brining in Calcite and other dependencies to the Overlord.	2024-12-06 09:32:03 -08:00
Zoltan Haindrich	c1ef38b052	Minor fixes and enhancements in UnionQuery handling (#17483 ) * plan consistently with either UnionDataSource or UnionQuery for decoupled mode * expose errors * move decoupled related setting from PlannerConfig to QueryContexts	2024-11-28 10:05:12 +01:00
Zoltan Haindrich	20aea29a51	Rename d1/d2 columns in tests (#17471 )	2024-11-22 14:58:56 +01:00
Akshat Jain	17215cd677	Remove support for Java 8 (#17466 ) All JDK 8 based CI checks have been removed. Images used in Dockerfile(s) have been updated to Java 17 based images. Documentation has been updated accordingly.	2024-11-21 15:33:08 +05:30
Zoltan Haindrich	f296102f05	ScanQuery should not ignore columnTypes in equals/hashCode (#17463 ) * ScanQuery: equals/hashCode/toString * DruidQuery: changes of Align ScanQuery column order with its desired signature #17457 * ScanQueryTest: add equalsverifer test	2024-11-12 14:26:59 +05:30
Zoltan Haindrich	2eac8318f8	Support Union in Decoupled planning (#17354 ) * introduces `UnionQuery` * some changes to enable a `UnionQuery` to have multiple input datasources * `UnionQuery` execution is driven by the `QueryLogic` - which could later enable to reduce some complexity in `ClientQuerySegmentWalker` * to run the subqueries of `UnionQuery` there was a need to access the `conglomerate` from the `Runner`; to enable that some refactors were done * renamed `UnionQueryRunner` to `UnionDataSourceQueryRunner` * `QueryRunnerFactoryConglomerate` have taken the place of `QueryToolChestWarehouse` which shaves of some unnecessary things here and there * small cleanup/refactors	2024-11-05 16:58:57 +01:00
Tom	e4cdbca23c	make planner errors be user persona (#17437 ) Change the persona for errors within the planner from Admin to User. The ADMIN persona is meant to be "a persona who is interacting with admin APIs and understands Druid query concepts". This isn't an admin API, it's a query API. Low quality error messages being returned to the correct audience is better than hiding all error messages. The errors that can be returned back can be user solvable, and other times requires a druid expert. But the errors do not leak information that should only be seen by more expert/privileged personas. The original ADMIN persona showed some reticence to tag low-quality error messages with a USER persona. but it really does seem user-directed to me so USER to me would make sense.	2024-11-04 10:48:35 -08:00
Akshat Jain	21e7e5cddd	Add benchmark suite for MSQ window functions (#17377 ) * Add benchmark suite for MSQ window functions * Fix inspection checks * Address review comment: Rename method	2024-10-30 11:32:28 +05:30
Gian Merlino	446a8f466f	Update errorprone, mockito, jacoco, checkerframework. (#17414 ) * Update errorprone, mockito, jacoco, checkerframework. This patch updates various build and test dependencies, to see if they cause unit tests on JDK 21 to behave more reliably. * Update licenses, tests. * Remove assertEquals. * Repair two tests. * Update some more tests.	2024-10-28 11:34:03 -07:00
Abhishek Radhakrishnan	43b325b6aa	Add missing `@Nullable` annotations to SqlQuery (#17398 )	2024-10-22 20:34:46 -07:00
Abhishek Radhakrishnan	187e21afae	Add `BrokerClient` implementation (#17382 ) This patch is extracted from PR 17353. Changes: - Added BrokerClient and BrokerClientImpl to the sql package that leverages the ServiceClient functionality; similar to OverlordClient and CoordinatorClient implementations in the server module. - For now, only two broker API stubs are added: submitSqlTask() and fetchExplainPlan(). - Added a new POJO class ExplainPlan that encapsulates explain plan info. - Deprecated org.apache.druid.discovery.BrokerClient in favor of the new BrokerClient in this patch. - Clean up ExplainAttributesTest a bit and added serde verification.	2024-10-21 11:05:53 -07:00
Abhishek Radhakrishnan	9a16d4e219	Move SqlTaskStatus and SqlTaskStausTest from msq module to sql module. (#17380 ) - This is a non-functional change that moves SqlTaskStatus and its unit test SqlTaskStatusTest from the msq module to the sql module to help class reuse in other places. - This refactor is extracted from this PR to facilitate easier review. - Fix a minor spacing issue in the TaskStartTimeoutFault error message.	2024-10-18 14:39:01 -07:00
Shivam Garg	6898a5a359	Removed Microsecond from Extract function (#17247 )	2024-10-11 05:32:26 +02:00
Clint Wylie	ab0d6eb620	Fix string array grouping comparator (#17183 )	2024-10-08 09:47:28 +05:30
Clint Wylie	04fe56835d	add druid.expressions.allowVectorizeFallback and default to false (#17248 ) changes: adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types) add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future	2024-10-05 12:42:42 +05:30
Zoltan Haindrich	65277b17a9	Decoupled planning: add support for unnest (#17177 ) * adds support for `UNNEST` expressions * introduces `LogicalUnnestRule` to transform a `Correlate` doing UNNEST into a `LogicalUnnest` * `UnnestInputCleanupRule` could move the final unnested expr into the `LogicalUnnest` itself (usually its an `mv_to_array` expression) * enhanced source unwrapping to utilize `FilteredDataSource` if it looks right	2024-10-02 08:54:56 +02:00
Gian Merlino	878adff9aa	MSQ profile for Brokers and Historicals. (#17140 ) This patch adds a profile of MSQ named "Dart" that runs on Brokers and Historicals, and which is compatible with the standard SQL query API. For more high-level description, and notes on future work, refer to #17139. This patch contains the following changes, grouped into packages. Controller (org.apache.druid.msq.dart.controller): The controller runs on Brokers. Main classes are, - DartSqlResource, which serves /druid/v2/sql/dart/. - DartSqlEngine and DartQueryMaker, the entry points from SQL that actually run the MSQ controller code. - DartControllerContext, which configures the MSQ controller. - DartMessageRelays, which sets up relays (see "message relays" below) to read messages from workers' DartControllerClients. - DartTableInputSpecSlicer, which assigns work based on a TimelineServerView. Worker (org.apache.druid.msq.dart.worker) The worker runs on Historicals. Main classes are, - DartWorkerResource, which supplies the regular MSQ WorkerResource, plus Dart-specific APIs. - DartWorkerRunner, which runs MSQ worker code. - DartWorkerContext, which configures the MSQ worker. - DartProcessingBuffersProvider, which provides processing buffers from sliced-up merge buffers. - DartDataSegmentProvider, which provides segments from the Historical's local cache. Message relays (org.apache.druid.messages): To avoid the need for Historicals to contact Brokers during a query, which would create opportunities for queries to get stuck, all connections are opened from Broker to Historical. This is made possible by a message relay system, where the relay server (worker) has an outbox of messages. The relay client (controller) connects to the outbox and retrieves messages. Code for this system lives in the "server" package to keep it separate from the MSQ extension and make it easier to maintain. The worker-to-controller ControllerClient is implemented using message relays. Other changes: - Controller: Added the method "hasWorker". Used by the ControllerMessageListener to notify the appropriate controllers when a worker fails. - WorkerResource: No longer tries to respond more than once in the "httpGetChannelData" API. This comes up when a response due to resolved future is ready at about the same time as a timeout occurs. - MSQTaskQueryMaker: Refactor to separate out some useful functions for reuse in DartQueryMaker. - SqlEngine: Add "queryContext" to "resultTypeForSelect" and "resultTypeForInsert". This allows the DartSqlEngine to modify result format based on whether a "fullReport" context parameter is set. - LimitedOutputStream: New utility class. Used when in "fullReport" mode. - TimelineServerView: Add getDruidServerMetadata as a performance optimization. - CliHistorical: Add SegmentWrangler, so it can query inline data, lookups, etc. - ServiceLocation: Add "fromUri" method, relocating some code from ServiceClientImpl. - FixedServiceLocator: New locator for a fixed set of service locations. Useful for URI locations.	2024-10-01 14:38:55 -07:00
Sree Charan Manamala	661614129e	Window Functions : Context Parameter to Enable Transfer of RACs over wire (#17150 )	2024-09-28 08:04:22 +02:00
Gian Merlino	dc223f22db	SQL: Use regular filters for time filtering in subqueries. (#17173 ) * SQL: Use regular filters for time filtering in subqueries. Using the "intervals" feature on subqueries, or any non-table, should be avoided because it isn't a meaningful optimization in those cases, and it's simpler for runtime implementations if they can assume all filters are located in the regular filter object. Two changes: 1) Fix the logic in DruidQuery.canUseIntervalFiltering. It was intended to return false for QueryDataSource, but actually returned true. 2) Add a validation to ScanQueryFrameProcessor to ensure that when running on an input channel (which would include any subquery), the query has "intervals" set to ONLY_ETERNITY. Prior to this patch, the new test case in testTimeFilterOnSubquery would throw a "Can only handle a single interval" error in the native engine, and "QueryNotSupported" in the MSQ engine. * Mark new case as having extra columns in decoupled mode. * Adjust test.	2024-09-27 10:32:30 +05:30
Zoltan Haindrich	6f7e8ca74a	Decoupled planning: improve join predicate handling (#17105 ) * enforces to only allow supported predicates in join conditions * fixed a recursive query building issue by caching the `source` in `DruidQueryGenerator` * moved `DruidAggregateRemoveRedundancyRule.instance` higher up; as if `CoreRules.AGGREGATE_EXPAND_DISTINCT_AGGREGATES` runs earlier the resulting `GROUPING` might become invalid	2024-09-25 16:00:25 +02:00
Clint Wylie	77a362c555	various fixes and improvements to vectorization fallback (#17098 ) changes: * add `ApplyFunction` support to vectorization fallback, allowing many of the remaining expressions to be vectorized * add `CastToObjectVectorProcessor` so that vector engine can correctly cast any type * add support for array and complex vector constants * reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness) * fix array constructor expression, apply map expression to make actual evaluated type match the output type inference * fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric * fix isNull/isNotNull to correctly handle any type of input argument	2024-09-24 04:29:08 -07:00
Abhishek Radhakrishnan	37a2a12d79	rerwrite node so dynamic parameter applies to ingest node as well. (#17126 )	2024-09-23 12:49:46 -07:00
Sree Charan Manamala	67d361c9bf	Window Functions : Remove enable windowing flag (#17087 )	2024-09-23 08:24:26 +02:00
Zoltan Haindrich	2eee470f6e	Support explain in decoupled planning and log native plan consistently with DruidHook (#17101 ) * enables to use DruidHook for native plan logging * qudiem tests doesn't necessarily need to run the query to get an explain - this helps during development as if there is a runtime issue it could still be explained in the test	2024-09-20 10:53:43 +02:00
Abhishek Radhakrishnan	635e418131	Support to parse numbers in text-based input formats (#17082 ) Text-based input formats like csv and tsv currently parse inputs only as strings, following the RFC4180Parser spec). To workaround this, the web-console and other tools need to further inspect the sample data returned to sample data returned by the Druid sampler API to parse them as numbers. This patch introduces a new optional config, tryParseNumbers, for the csv and tsv input formats. If enabled, any numbers present in the input will be parsed in the following manner -- long data type for integer types and double for floating-point numbers, and if parsing fails for whatever reason, the input is treated as a string. By default, this configuration is set to false, so numeric strings will be treated as strings.	2024-09-19 13:21:18 -07:00
Pranav	d1bd6a8156	Update doc for allowedHeaders (#17045 ) Update doc for allowedHeaders and make allowedHeaders more restrictive	2024-09-19 08:37:39 +05:30
Zoltan Haindrich	d84d53c017	Decoupled planning: improve join support (#17039 ) There were some problematic cases join branches are run with finalize=false instead of finalize=true like normal subqueries this inconsistency is not good - but fixing it is a bigger thing ensure that right hand sides of joins are always subqueries - or accessible globally To achieve the above: operand indexes were needed for the upstream reltree nodes in the generator source unwrapping now takes the join situation into account as well	2024-09-18 08:56:36 +05:30
Lasse Mammen	307b8e3357	feat: json_merge expression and sql function (#17081 )	2024-09-17 18:27:34 +05:30
Sree Charan Manamala	bb1c3c1749	Add serde for ColumnBasedRowsAndColumns to fix window queries without group by (#16658 ) Register a Ser-De for RowsAndColumns so that the window operator query running on leaf operators would be transferred properly on the wire. Would fix the empty response given by window queries without group by on the native engine.	2024-09-17 06:44:40 +02:00
Laksh Singla	bb487a4193	Support maxSubqueryBytes for window functions (#16800 ) Window queries now acknowledge maxSubqueryBytes.	2024-09-17 10:06:24 +05:30
Gian Merlino	27443a0600	Fix formatting of error message from validateNoIllegalRightyJoins. (#17061 ) The prior formatting was inconsistent in terms of punctuation and capitalization.	2024-09-14 15:20:48 -07:00
Rishabh Singh	a8c06e93aa	Skip tombstone segment refresh in metadata cache (#17025 ) This PR #16890 introduced a change to skip adding tombstone segments to the cache. It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache. The fix is to keep the segment in the metadata cache but skip them from refresh. This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.	2024-09-13 11:47:11 +05:30
Akshat Jain	fff3e81dcc	Add window function drill tests for array_concat_agg for empty over scenarios (#17026 ) * Add window function drill tests for array_concat_agg for empty over scenarios * Cleanup sqlNativeIncompatible() as it's not needed now * Address review comment	2024-09-13 11:35:45 +05:30
Pranav	a95397e712	Allow request headers in HttpInputSource in native and MSQ Ingestion (#16974 ) Support for adding the request headers in http input source. we can now pass the additional headers as json in both native and MSQ.	2024-09-12 11:18:44 +05:30
Abhishek Agarwal	78775ad398	Prepare master for 32.0.0 release (#17022 )	2024-09-10 11:01:20 +05:30
Clint Wylie	f57cd6f7af	transition away from StorageAdapter (#16985 ) * transition away from StorageAdapter changes: * CursorHolderFactory has been renamed to CursorFactory and moved off of StorageAdapter, instead fetched directly from the segment via 'asCursorFactory'. The previous deprecated CursorFactory interface has been merged into StorageAdapter * StorageAdapter is no longer used by any engines or tests and has been marked as deprecated with default implementations of all methods that throw exceptions indicating the new methods to call instead * StorageAdapter methods not covered by CursorFactory (CursorHolderFactory prior to this change) have been moved into interfaces which are retrieved by Segment.as, the primary classes are the previously existing Metadata, as well as new interfaces PhysicalSegmentInspector and TopNOptimizationInspector * added UnnestSegment and FilteredSegment that extend WrappedSegmentReference since their StorageAdapter implementations were previously provided by WrappedSegmentReference * added PhysicalSegmentInspector which covers some of the previous StorageAdapter functionality which was primarily used for segment metadata queries and other metadata uses, and is implemented for QueryableIndexSegment and IncrementalIndexSegment * added TopNOptimizationInspector to cover the oddly specific StorageAdapter.hasBuiltInFilters implementation, which is implemented for HashJoinSegment, UnnestSegment, and FilteredSegment * Updated all engines and tests to no longer use StorageAdapter	2024-09-09 14:55:29 -07:00
Sree Charan Manamala	51fe3c08ab	Window Functions : Reject MVDs during window processing (#17002 ) This commit aims to reject MVDs in window processing as we do not support them. Earlier to this commit, query running a window aggregate partitioned by an MVD column would fail with ClassCastException	2024-09-09 12:07:54 +05:30
Clint Wylie	b0f36c1b89	fix bug with CastOperatorConversion with types which cannot be mapped to native druid types (#17011 )	2024-09-06 17:07:32 -07:00
Gian Merlino	76b8c20f4d	Create fewer temporary maps when querying sys.segments. (#16981 ) Eliminates two map creations (availableSegmentMetadata, partialSegmentDataMap). The segmentsAlreadySeen set remains.	2024-09-03 20:04:44 -07:00
Sree Charan Manamala	619d8ef964	Window Functions : Numeric Arrays Frame Column Writers - fix class cast exception (#16983 ) Fix ClassCastException in ArrayFrameCoulmnWriters	2024-09-03 11:44:52 +05:30

1 2 3 4 5 ...

1104 Commits