druid

Commit Graph

Author	SHA1	Message	Date
Zoltan Haindrich	c1ef38b052	Minor fixes and enhancements in UnionQuery handling (#17483 ) * plan consistently with either UnionDataSource or UnionQuery for decoupled mode * expose errors * move decoupled related setting from PlannerConfig to QueryContexts	2024-11-28 10:05:12 +01:00
Vishesh Garg	1b9a6dde9f	Fix compilation error for MSQCompactionRunnerTest (#17516 )	2024-11-27 12:46:30 +01:00
Vishesh Garg	5333c53d71	Support non time order in MSQ compaction (#17318 ) This patch supports sorting segments by non-time columns (added in #16849) to MSQ compaction. Specifically, if `forceSegmentSortByTime` is set in the data schema, either via the user-supplied compaction config or in the inferred schema, the following steps are taken: - Skip adding `__time` explicitly as the first column to the dimension schema since it already comes as part of the schema - Ensure column mappings propagate `__time` in the order specified by the schema - Set `forceSegmentSortByTime` in the MSQ context.	2024-11-27 13:26:10 +05:30
Akshat Jain	dd46c7722d	Remove pre-java-11 profile (#17511 ) We have removed support for Java 8 in #17466. This PR removes an unused profile pre-java-11 which activated for JDK < 11.	2024-11-26 08:43:20 +01:00
Zoltan Haindrich	20aea29a51	Rename d1/d2 columns in tests (#17471 )	2024-11-22 14:58:56 +01:00
Rishabh Singh	74422b58f5	Emit disk spill and merge buffer utilisation metrics for GroupBy queries (#17360 ) This change is to emit following metrics as part of GroupByStatsMonitor monitor, mergeBuffer/used -> Number of merge buffers used. mergeBuffer/acquisitionTimeNs -> Total time required to acquire merge buffer. mergeBuffer/acquisition -> Number of queries that acquired a batch of merge buffers. groupBy/spilledQueries -> Number of queries that spilled onto the disk. groupBy/spilledBytes-> Spilled bytes on the disk. groupBy/mergeDictionarySize -> Size of the merging dictionary.	2024-11-22 14:22:03 +05:30
Adarsh Sanjeev	2726c6f388	Minor refactors to processing Some refactors across druid to clean up the code and add utility functions where required.	2024-11-21 15:37:55 +05:30
Akshat Jain	17215cd677	Remove support for Java 8 (#17466 ) All JDK 8 based CI checks have been removed. Images used in Dockerfile(s) have been updated to Java 17 based images. Documentation has been updated accordingly.	2024-11-21 15:33:08 +05:30
Adithya Chakilam	6f436301be	supervisor: make rejection periods work with stopTasksCount (#17442 ) * kafka-indexing: Report consumer io time * commit * backward * tests * remove unwanted changes * comments * comments * coverage * change name * fixes * fixes * comments	2024-11-18 13:12:24 -08:00
Zoltan Haindrich	f296102f05	ScanQuery should not ignore columnTypes in equals/hashCode (#17463 ) * ScanQuery: equals/hashCode/toString * DruidQuery: changes of Align ScanQuery column order with its desired signature #17457 * ScanQueryTest: add equalsverifer test	2024-11-12 14:26:59 +05:30
Akshat Jain	c571e6905d	Refactor WindowOperatorQueryKit to use WindowStage class for representing different window stages (#17158 )	2024-11-12 14:18:16 +05:30
Akshat Jain	3f56b57c7e	MSQ WF: Pass a flag from broker to determine operator chain transformation (#17443 )	2024-11-12 09:28:28 +05:30
Akshat Jain	73cbce9109	WindowOperatorQueryFrameProcessorFactory: Pass QueryContext instead of WindowOperatorQuery to WindowOperatorQueryFrameProcessor (#17405 ) * WindowOperatorQueryKit: Pass QueryContext instead of WindowOperatorQuery to subsequent layers * Add serializer for QueryContext class * Revert changes of WindowOperatorQueryFrameProcessorFactory json param * Fix checkstyle * Address review comment: Remove older method in favor of calling new method inline	2024-11-07 11:29:49 +05:30
Zoltan Haindrich	2eac8318f8	Support Union in Decoupled planning (#17354 ) * introduces `UnionQuery` * some changes to enable a `UnionQuery` to have multiple input datasources * `UnionQuery` execution is driven by the `QueryLogic` - which could later enable to reduce some complexity in `ClientQuerySegmentWalker` * to run the subqueries of `UnionQuery` there was a need to access the `conglomerate` from the `Runner`; to enable that some refactors were done * renamed `UnionQueryRunner` to `UnionDataSourceQueryRunner` * `QueryRunnerFactoryConglomerate` have taken the place of `QueryToolChestWarehouse` which shaves of some unnecessary things here and there * small cleanup/refactors	2024-11-05 16:58:57 +01:00
Tom	e4cdbca23c	make planner errors be user persona (#17437 ) Change the persona for errors within the planner from Admin to User. The ADMIN persona is meant to be "a persona who is interacting with admin APIs and understands Druid query concepts". This isn't an admin API, it's a query API. Low quality error messages being returned to the correct audience is better than hiding all error messages. The errors that can be returned back can be user solvable, and other times requires a druid expert. But the errors do not leak information that should only be seen by more expert/privileged personas. The original ADMIN persona showed some reticence to tag low-quality error messages with a USER persona. but it really does seem user-directed to me so USER to me would make sense.	2024-11-04 10:48:35 -08:00
Akshat Jain	21e7e5cddd	Add benchmark suite for MSQ window functions (#17377 ) * Add benchmark suite for MSQ window functions * Fix inspection checks * Address review comment: Rename method	2024-10-30 11:32:28 +05:30
Akshat Jain	63c91ad813	Fix backward compatibility issues in WindowOperatorQueryFrameProcessorFactory and WindowOperatorQueryFrameProcessor (#17433 )	2024-10-30 11:32:02 +05:30
Adarsh Sanjeev	b7c661b801	Make tempStorageDirectory configuration optional and rely on task dir instead (#17015 ) Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir. Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set. Please note that preference would be given to the user configured, druid..storage.tempDir, on the tasks. If that is not configured, we then use the configured temporary directory. Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.	2024-10-29 13:36:59 +05:30
Gian Merlino	446a8f466f	Update errorprone, mockito, jacoco, checkerframework. (#17414 ) * Update errorprone, mockito, jacoco, checkerframework. This patch updates various build and test dependencies, to see if they cause unit tests on JDK 21 to behave more reliably. * Update licenses, tests. * Remove assertEquals. * Repair two tests. * Update some more tests.	2024-10-28 11:34:03 -07:00
Suraj Goel	7306d280cc	Migrate jaxb bind dependency to jakarta (#17370 ) - Migrated from javax.xml.bind 2.3.1 to jakarta.xml.bind 2.3.3. - Minor version is modified to avoid any breaking changes.	2024-10-26 21:24:17 -07:00
Akshat Jain	fe0f4150c9	MSQ ingestion: Improve error message on encountering non-long timestamp column (#17411 ) This PR improves the error message during MSQ ingestion if we encounter a non-long timestamp column.	2024-10-25 15:02:32 +05:30
Akshat Jain	1e96c85b38	WindowOperatorQueryFrameProcessor: Avoid writing multiple frames to output channel in runIncrementally() (#17373 ) WindowOperatorQueryFrameProcessor: Avoid writing multiple frames to output channel in runIncrementally()	2024-10-23 10:34:37 +05:30
Gian Merlino	60daddedf8	SeekableStreamSupervisor: Use workerExec as the client connectExec. (#17394 ) * SeekableStreamSupervisor: Use workerExec as the client connectExec. This patch uses the already-existing per-supervisor workerExec as the connectExec for task clients, rather than using the process-wide default ServiceClientFactory pool. This helps prevent callbacks from backlogging on the process-wide pool. It's especially useful for retries, where callbacks may need to establish new TCP connections or perform TLS handshakes. * Fix compilation, tests. * Fix style.	2024-10-22 20:21:21 -07:00
Vishesh Garg	5da9949992	Fail MSQ compaction if multi-valued partition dimensions are found (#17344 ) MSQ currently supports only single-valued string dimensions as partition keys. This patch adds a check to ensure that partition keys are single-valued in case this info is available by virtue of segment download for schema inference. During compaction, if MSQ finds multi-valued dimensions (MVDs) declared as part of `range` partitionsSpec, it switches partitioning type to dynamic, ending up in repeated compactions of the same interval. To avoid this scenario, the segment download logic is also updated to always download segments if info on multi-valued dimensions is required.	2024-10-19 13:33:33 +05:30
Abhishek Radhakrishnan	9a16d4e219	Move SqlTaskStatus and SqlTaskStausTest from msq module to sql module. (#17380 ) - This is a non-functional change that moves SqlTaskStatus and its unit test SqlTaskStatusTest from the msq module to the sql module to help class reuse in other places. - This refactor is extracted from this PR to facilitate easier review. - Fix a minor spacing issue in the TaskStartTimeoutFault error message.	2024-10-18 14:39:01 -07:00
Laksh Singla	5b09329479	Fixes an issue with AppendableMemory that can cause MSQ jobs to fail (#17369 )	2024-10-18 09:05:53 +05:30
Adithya Chakilam	e834e49290	supervisor/autoscaler: Fix clearing of collected lags on skipped scale actions (#17356 ) * superviosr/autoscaler: Fix clearing of collected lags on skipped scale actions * comments * supervisor/autoscaler: Skip scaling when partitions are less than minTaskCount (#17335) * Fix pip installation after ubuntu upgrade (#17358) * fix tests --------- Co-authored-by: Pranav <pranavbhole@gmail.com>	2024-10-17 11:05:16 -07:00
Akshat Jain	8c52be81d3	Fix postgres metadata storage warning logs because of tablename causing issues (#17351 )	2024-10-17 15:48:22 +05:30
Akshat Jain	450fb0147b	Add GlueingPartitioningOperator + Corresponding changes in window function layer to consume it for MSQ (#17038 ) * GlueingPartitioningOperator: It continuously receives data, and outputs batches of partitioned RACs. It maintains a last-partitioning-boundary of the last-pushed-RAC, and attempts to glue it with the next RAC it receives, ensuring that partitions are handled correctly, even across multiple RACs. You can check GlueingPartitioningOperatorTest for some good examples of the "glueing" work. * PartitionSortOperator: It sorts rows inside partitioned RACs, on the sort columns. The input RACs it receives are expected to be "complete / separate" partitions of data.	2024-10-17 10:54:52 +05:30
TessaIO	a9f582711e	Fix loading lookup extension (#17212 ) We introduce the option to iterate over fetched data from the dataFetcher for loadingLookups in the lookups-cached-single extension. Also, added the handling of a use case where the data exists in Druid but not in the actual data fetcher, which is in our use-case JDBC Data fetcher, where the value returned is null. Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>	2024-10-16 07:28:32 -07:00
Gian Merlino	b287b219a8	MSQ: Include stageId, workerNumber in processing thread names. (#17324 ) * MSQ: Include stageId, workerNumber in processing thread names. Helps identify which query was running in a thread dump. * s/dart/msq/	2024-10-11 08:37:15 -07:00
Gian Merlino	a0c29f8bbb	MSQ WorkerResource: Fix timeout handler for httpGetChannelData. (#17328 ) The timeout handler should fire if the response has not been handled yet (i.e. if responseResolved was previously false). However, it erroneously fires only if the response was handled. This causes HTTP 500 errors if the timeout actually does fire. The timeout is 30 seconds, which can be hit during pipelined queries, if an earlier stage of the query hasn't produced its first frame within 30 seconds. This fixes a regression introduced in #17140.	2024-10-11 16:29:04 +05:30
Karan Kumar	034bb9dbea	Removing enable windowing from MSQ tests. (#17276 )	2024-10-11 05:33:27 +02:00
Clint Wylie	a6236c3d15	add substituteCombiningFactory implementations for datasketches aggs (#17314 ) Follow up to #17214, adds implementations for substituteCombiningFactory so that more datasketches aggs can match projections, along with some projections tests for datasketches.	2024-10-10 16:14:06 +05:30
Gian Merlino	074944e02c	Dart: Only use historicals as workers. (#17319 ) Only historicals load the Dart worker modules. Other types of servers in the server view (such as realtime tasks) should not be included.	2024-10-10 13:47:58 +05:30
Gian Merlino	4092f3fe47	MSQ: Call "onQueryComplete" after the query is closed. (#17313 ) This fixes a concurrency issue where, for failed queries, "onQueryComplete" could be called concurrently with "onResultsStart" or "onResultRow". Fully closing the controller ensures that the result reader is no longer active, which eliminates the race.	2024-10-10 10:44:44 +05:30
Gian Merlino	b27712933e	MSQ: Use leaf worker count for stages that have any leaf inputs. (#17312 ) Previously, the leaf worker count was used for stages that have no stage inputs. It should actually be used for stages that have any non-broadcast, non-stage inputs. This fixes a bug with broadcast joins. In a broadcast join, the stage has both a table and a broadcast stage as input. Previously, it would be planned using the non-leaf worker count. It should actually be planned using the leaf worker count.	2024-10-10 10:44:31 +05:30
Gian Merlino	baa16f30f6	DartWorkerContext: Return the correct workerId(). (#17280 ) Prior to this patch, the workerId() method did not actually return the worker ID. It returned some other string that had similar information, but was different. This caused the /druid/dart-worker/workers API, to return an internal server error. The API is useful for debugging, although it is not used during actual queries.	2024-10-08 09:52:55 -07:00
Gian Merlino	152330c5a8	WorkerManager: Correct javadoc for "stop". (#17279 ) The javadoc had a factual error: Dart's implementation does not in fact always return immediately.	2024-10-08 15:49:43 +05:30
Gian Merlino	0a279e634a	DartSqlResource: Return HTTP 202 on cancellation even if no such query. (#17278 ) Return HTTP 202 (Accepted) on cancellation, even if the requested query ID was not found. The main reason for this is that when the Router broadcasts DELETE requests to all Brokers, it returns the response from one of them randomly. If we return 404 when a query ID isn't found, then the Router randomly returns 404s even when the query really was found and canceled. This is also arguably still correct behavior. The cancellation request was accepted, it just won't do anything because the query was not in fact running.	2024-10-08 15:49:34 +05:30
Gian Merlino	01baf99148	DartWorkerModule: Replace en dash with regular dash. (#17281 ) Due to a typo, the thread name of the worker executor used an en dash (–) rather than a regular hyphen (-). This was unintentional, and makes it difficult to search for in thread dumps.	2024-10-08 15:48:10 +05:30
Gian Merlino	2309aa7bdf	DartSqlResource: Add controllerHost to GetQueriesResponse. (#17283 ) This helps find the specific Broker that is executing a query.	2024-10-08 15:47:32 +05:30
Gian Merlino	9921ac1b19	DartSqlResource: Sort queries by start time. (#17282 ) * DartSqlResource: Sort queries by start time. This keeps the list of queries returned by the API in a consistent order. * Fix test.	2024-10-08 15:47:21 +05:30
Gian Merlino	06bbdb38ce	MSQ: Allow for worker gaps. (#17277 ) In a Dart query, all Historicals are given worker IDs, but not all of them are going to actually be started or receive work orders. This can create gaps in the set of workers. For example, workers 1 and 3 could have work assigned while workers 0 and 2 do not. This patch updates ControllerStageTracker and WorkerInputs to handle such gaps, by using the set of actual worker numbers, rather than 0..workerCount, in various places.	2024-10-08 15:07:57 +05:30
Vishesh Garg	7e35e50052	Fix issues with MSQ Compaction (#17250 ) The patch makes the following changes: 1. Fixes a bug causing compaction to fail on array, complex, and other non-primitive-type columns 2. Updates compaction status check to be conscious of partition dimensions when comparing dimension ordering. 3. Ensures only string columns are specified as partition dimensions 4. Ensures `rollup` is true if and only if metricsSpec is non-empty 5. Ensures disjoint intervals aren't submitted for compaction 6. Adds `compactionReason` to compaction task context.	2024-10-06 21:48:26 +05:30
Clint Wylie	0bd13bcd51	Projections prototype (#17214 )	2024-10-05 04:38:57 -07:00
Clint Wylie	04fe56835d	add druid.expressions.allowVectorizeFallback and default to false (#17248 ) changes: adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types) add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future	2024-10-05 12:42:42 +05:30
Gian Merlino	d1709a329f	Dart: Skip final getCounters, postFinish to idle historicals. (#17255 ) In a Dart query, all Historicals are given worker IDs, but not all of them are going to actually be started or receive work orders. Attempting to send a getCounters or postFinish command to a worker that never received a work order is not only wasteful, but it causes errors due to the workers not knowing about that query ID.	2024-10-04 23:05:21 -07:00
Shivam Garg	93b5a8326b	Upgrade commons-io to 2.17.0 (#17227 )	2024-10-04 09:56:56 +02:00
Gian Merlino	fc00664760	KafkaInputFormat: Fix handling of CSV/TSV keyFormat. (#17226 ) * KafkaInputFormat: Fix handling of CSV/TSV keyFormat. Follow-up to #16630, which fixed a similar issue for the valueFormat. * Simplify.	2024-10-03 13:05:09 -07:00

1 2 3 4 5 ...

1686 Commits