druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	baa16f30f6	DartWorkerContext: Return the correct workerId(). (#17280 ) Prior to this patch, the workerId() method did not actually return the worker ID. It returned some other string that had similar information, but was different. This caused the /druid/dart-worker/workers API, to return an internal server error. The API is useful for debugging, although it is not used during actual queries.	2024-10-08 09:52:55 -07:00
Charles Smith	5ed68622c3	[Docs] Update known issues for window functions (#17097 ) * draft update to known issues * Update known issues Remove addressed known issues. Clarify the issue with SELECT * queries.	2024-10-08 08:47:13 -07:00
Gian Merlino	152330c5a8	WorkerManager: Correct javadoc for "stop". (#17279 ) The javadoc had a factual error: Dart's implementation does not in fact always return immediately.	2024-10-08 15:49:43 +05:30
Gian Merlino	0a279e634a	DartSqlResource: Return HTTP 202 on cancellation even if no such query. (#17278 ) Return HTTP 202 (Accepted) on cancellation, even if the requested query ID was not found. The main reason for this is that when the Router broadcasts DELETE requests to all Brokers, it returns the response from one of them randomly. If we return 404 when a query ID isn't found, then the Router randomly returns 404s even when the query really was found and canceled. This is also arguably still correct behavior. The cancellation request was accepted, it just won't do anything because the query was not in fact running.	2024-10-08 15:49:34 +05:30
Gian Merlino	01baf99148	DartWorkerModule: Replace en dash with regular dash. (#17281 ) Due to a typo, the thread name of the worker executor used an en dash (–) rather than a regular hyphen (-). This was unintentional, and makes it difficult to search for in thread dumps.	2024-10-08 15:48:10 +05:30
Gian Merlino	2309aa7bdf	DartSqlResource: Add controllerHost to GetQueriesResponse. (#17283 ) This helps find the specific Broker that is executing a query.	2024-10-08 15:47:32 +05:30
Gian Merlino	9921ac1b19	DartSqlResource: Sort queries by start time. (#17282 ) * DartSqlResource: Sort queries by start time. This keeps the list of queries returned by the API in a consistent order. * Fix test.	2024-10-08 15:47:21 +05:30
Gian Merlino	06bbdb38ce	MSQ: Allow for worker gaps. (#17277 ) In a Dart query, all Historicals are given worker IDs, but not all of them are going to actually be started or receive work orders. This can create gaps in the set of workers. For example, workers 1 and 3 could have work assigned while workers 0 and 2 do not. This patch updates ControllerStageTracker and WorkerInputs to handle such gaps, by using the set of actual worker numbers, rather than 0..workerCount, in various places.	2024-10-08 15:07:57 +05:30
Gian Merlino	4fbb129027	Improve javadocs for SegmentDescriptor. (#17274 ) The javadoc for SegmentDescriptor discusses differences between it and SegmentId, but misses the most important difference: SegmentDescriptor can have a narrower interval than the segment being referenced.	2024-10-08 00:59:55 -07:00
Clint Wylie	ab0d6eb620	Fix string array grouping comparator (#17183 )	2024-10-08 09:47:28 +05:30
Edgar Melendrez	a67a3c8e0a	[docs] update tutorial for Theta sketches (#16953 ) * from start to step 3 of Ingest data using Theta sketche * updated upto "Query the Theta sketch column" * fixed sentence * another typo * using sql ingestion instead of batch-sql * waiting for explanations on DS_THETA * Revert "using sql ingestion instead of batch-sql" This reverts commit `b95fcb9b32`. * Revert "using sql ingestion instead of batch-sql" This reverts commit `b95fcb9b32`. * just copy and pasting to where I was * updated tutorial * fixing images, and removing unused * slightly updating explanatio * Update docs/tutorials/tutorial-sketches-theta.md * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * addressing comments in review * made filter clause consitent with other instances * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-10-08 10:44:37 +08:00
317brian	9932f2e70a	docs: concurrent append and replace is gA (#17269 )	2024-10-08 07:55:55 +05:30
AmatyaAvadhanula	f42ecc9f25	Fail concurrent replace tasks with finer segment granularity than append (#17265 )	2024-10-08 07:35:13 +05:30
George Shiqi Wu	5d7c7a87ec	Add maximumCapacity to taskRunner (#17107 ) * Add maximumCapacity to taskRunner * fix tests * pr comments	2024-10-07 15:03:51 -04:00
AmatyaAvadhanula	ff97c67945	Fix batch segment allocation failure with replicas (#17262 ) Fixes #16587 Streaming ingestion tasks operate by allocating segments before ingesting rows. These allocations happen across replicas which may send different requests but must get the same segment id for a given (datasource, interval, version, sequenceName) across replicas. This patch fixes the bug by ignoring the previousSegmentId when skipLineageCheck is true.	2024-10-07 19:52:38 +05:30
Karan Kumar	6a4352f466	When removeNullBytes is set, length calculations did not take into account null bytes. (#17232 ) * When replaceNullBytes is set, length calculations did not take into account null bytes.	2024-10-07 18:02:52 +05:30
Adarsh Sanjeev	c9201ad658	Minor refactors to processing module (#17136 ) Refactors a few things. - Adds SemanticUtils maps to columns. - Add some addAll functions to reduce duplication, and for future reuse. - Refactor VariantColumnAndIndexSupplier to only take a SmooshedFileMapper instead. - Refactor LongColumnSerializerV2 to have separate functions for serializing a value and null.	2024-10-07 13:18:35 +05:30
Vishesh Garg	7e35e50052	Fix issues with MSQ Compaction (#17250 ) The patch makes the following changes: 1. Fixes a bug causing compaction to fail on array, complex, and other non-primitive-type columns 2. Updates compaction status check to be conscious of partition dimensions when comparing dimension ordering. 3. Ensures only string columns are specified as partition dimensions 4. Ensures `rollup` is true if and only if metricsSpec is non-empty 5. Ensures disjoint intervals aren't submitted for compaction 6. Adds `compactionReason` to compaction task context.	2024-10-06 21:48:26 +05:30
Shivam Garg	7d9e6d36fd	Upgraded Protobuf to 3.25.5 (#17249 ) * Bump com.google.protobuf:protobuf-java from 3.24.0 to 3.25.5 Bumps [com.google.protobuf:protobuf-java](https://github.com/protocolbuffers/protobuf) from 3.24.0 to 3.25.5. - [Release notes](https://github.com/protocolbuffers/protobuf/releases) - [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl) - [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.24.0...v3.25.5) --- updated-dependencies: - dependency-name: com.google.protobuf:protobuf-java dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Updated the license * Updated licenses.yaml --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-10-06 12:34:02 +05:30
Clint Wylie	0bd13bcd51	Projections prototype (#17214 )	2024-10-05 04:38:57 -07:00
Clint Wylie	04fe56835d	add druid.expressions.allowVectorizeFallback and default to false (#17248 ) changes: adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types) add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future	2024-10-05 12:42:42 +05:30
Gian Merlino	d1709a329f	Dart: Skip final getCounters, postFinish to idle historicals. (#17255 ) In a Dart query, all Historicals are given worker IDs, but not all of them are going to actually be started or receive work orders. Attempting to send a getCounters or postFinish command to a worker that never received a work order is not only wasteful, but it causes errors due to the workers not knowing about that query ID.	2024-10-04 23:05:21 -07:00
Vadim Ogievetsky	babf7f2ef6	Web console: don't assume that activeTasks is an array (#17254 )	2024-10-04 16:01:13 -07:00
Vadim Ogievetsky	2ffe7b177c	Explore view fix spin when applying defaults (#17252 )	2024-10-04 13:02:15 -07:00
Shivam Garg	93b5a8326b	Upgrade commons-io to 2.17.0 (#17227 )	2024-10-04 09:56:56 +02:00
Kashif Faraz	e41648f5bf	Simplify release script find-missing-backports.py (#17218 )	2024-10-04 10:37:04 +05:30
Vadim Ogievetsky	d1bc369f06	Web console: Final explore QA pass (#17240 ) * cleanup * remove redundancy * aggregate works for multiple queries	2024-10-03 19:52:17 -07:00
Gian Merlino	b9634a8613	SuperSorter: Don't set allDone if it's already set. (#17238 ) This fixes a race where, if there is no output at all, setAllDoneIfPossible could be called twice (once when the output partitions future resolves, and once when the batcher finishes). If the calls happen in that order, it would try to create nil output channels both times, resulting in a "Channel already set" error.	2024-10-04 06:41:16 +05:30
Charles Smith	acd973273f	Docs: adds MSQ examples to front coded dict. migration (#17236 ) * add msq example * adjust json formatting	2024-10-03 16:33:34 -07:00
Vadim Ogievetsky	fb94428a58	Web console: Explore view QA with live data (#17234 ) * Explore view QA with live data * update snapshots * add t for preview also * use pulse icon consistently	2024-10-03 13:50:08 -07:00
Gian Merlino	fc00664760	KafkaInputFormat: Fix handling of CSV/TSV keyFormat. (#17226 ) * KafkaInputFormat: Fix handling of CSV/TSV keyFormat. Follow-up to #16630, which fixed a similar issue for the valueFormat. * Simplify.	2024-10-03 13:05:09 -07:00
Gian Merlino	db7cc4634c	Dart: Smoother handling of stage early-exit. (#17228 ) Stages can be instructed to exit before they finish, especially when a downstream stage includes a "LIMIT". This patch has improvements related to early-exiting stages. Bug fix: - WorkerStageKernel: Don't allow fail() to set an exception if the stage is already in a terminal state (FINISHED or FAILED). If fail() is called while in a terminal state, log the exception, then throw it away. If it's a cancellation exception, don't even log it. This fixes a bug where a stage that exited early could transition to FINISHED and then to FAILED, causing the overall query to fail. Performance: - DartWorkerManager previously sent stopWorker commands to workers even when "interrupt" was false. Now it only sends those commands when "interrupt" is true. The method javadoc already claimed this is what the method did, but the implementation did not match the javadoc. This reduces the number of RPCs by 1 per worker per query. Quieter logging: - In ReadableByteChunksFrameChannel, skip logging exception from setError if the channel has been closed. Channels are closed when readers are done with them, so at that point, we wouldn't be interested in the errors. - In RunWorkOrder, skip calling notifyListener on failure of the main work, in the case when stop() has already been called. The stop() method will set its own error using CanceledFault. This enables callers to detect when a stage was canceled vs. failed for some other reason. - In WorkerStageKernel, skip logging cancellation errors in fail(). This is made possible by the previous change in RunWorkOrder.	2024-10-03 20:09:02 +05:30
Abhishek Agarwal	421aae39ad	Upgrade avro - minor version (#17230 )	2024-10-03 18:02:11 +05:30
Gian Merlino	316f8c81d2	RunWorkOrder: Account for two simultaneous statistics collectors. (#17216 ) * RunWorkOrder: Account for two simultaneous statistics collectors. As a follow up to #17057, divide the amount of partitionStatsMemory by two, to account for the fact that there are at some times going to be two copies of the full collector. First there will be one for processors and one for the accumulated collector. Then, after the processor ones are GCed, a snapshot of the accumulated collector will be created. Also includes an optimization to "addAll" for the two KeyCollectors, for the case where we're adding into an empty collector. This is always going to happen once per stage due to the "withAccumulation" call. * Fix missing variable. * Don't divide by numProcessingThreads twice. * Fix test.	2024-10-03 16:25:01 +05:30
Akshat Jain	edc235cfe1	WindowOperatorQueryFrameProcessor: Avoid unnecessary re-runs of runIncrementally() (#17211 )	2024-10-03 15:33:50 +05:30
Vadim Ogievetsky	8c4db8aeed	explore QA (#17225 )	2024-10-02 23:05:19 -07:00
Akshat Jain	135ca8f6a7	WindowOperatorQueryFrameProcessor: Fix frame writer capacity issues + adhere to FrameProcessor's contract (#17209 ) This PR fixes the above issue by maintaining the state of last rowId flushed to output channel, and triggering another iteration of runIncrementally() method if frame writer has rows pending flush to the output channel. The above is done keeping in mind FrameProcessor's contract which enforces that we should write only a single frame to each output channel in any given iteration of runIncrementally().	2024-10-03 10:39:22 +05:30
Gian Merlino	fbc1221837	DartTableInputSpecSlicer: Fix for TLS workers. (#17224 ) We should use getHost(), which returns TLS if configured or plaintext otherwise. getHostAndPort() returns plaintext only.	2024-10-03 11:01:11 +08:00
Vadim Ogievetsky	715ae5ece0	Web console: misc fixes to the Explore view (#17213 ) * make record table able to hide column * stickyness * refactor query log * fix measure drag * start nested column dialog * nested expand * fix filtering on Measures * use output name * fix scrolling * select all / none * use ARRAY_CONCAT_AGG * no need to limit if aggregating * remove magic number * better search * update arg list * add, don't replace	2024-10-02 08:52:08 -07:00
Arun Ramani	e5d027ee1c	Skip generating task context reports for sub tasks (#17219 ) * Skip task context for sub tasks * DRY a little + skip context for live report	2024-10-02 09:32:50 -04:00
Zoltan Haindrich	65277b17a9	Decoupled planning: add support for unnest (#17177 ) * adds support for `UNNEST` expressions * introduces `LogicalUnnestRule` to transform a `Correlate` doing UNNEST into a `LogicalUnnest` * `UnnestInputCleanupRule` could move the final unnested expr into the `LogicalUnnest` itself (usually its an `mv_to_array` expression) * enhanced source unwrapping to utilize `FilteredDataSource` if it looks right	2024-10-02 08:54:56 +02:00
Vadim Ogievetsky	c8529294eb	Web console: add support for Dart engine (#17147 ) * add console support for Dart engine This reverts commit 6e46edf15dd55e5c51a1a4068e83deba4f22529b. * feedback fixes * surface new fields * prioratize error over results * better metadata refresh * feedback fixes	2024-10-01 17:53:36 -07:00
317brian	1fc82a96bd	docs: update future development blurbs (#16939 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-10-01 15:02:05 -07:00
Gian Merlino	878adff9aa	MSQ profile for Brokers and Historicals. (#17140 ) This patch adds a profile of MSQ named "Dart" that runs on Brokers and Historicals, and which is compatible with the standard SQL query API. For more high-level description, and notes on future work, refer to #17139. This patch contains the following changes, grouped into packages. Controller (org.apache.druid.msq.dart.controller): The controller runs on Brokers. Main classes are, - DartSqlResource, which serves /druid/v2/sql/dart/. - DartSqlEngine and DartQueryMaker, the entry points from SQL that actually run the MSQ controller code. - DartControllerContext, which configures the MSQ controller. - DartMessageRelays, which sets up relays (see "message relays" below) to read messages from workers' DartControllerClients. - DartTableInputSpecSlicer, which assigns work based on a TimelineServerView. Worker (org.apache.druid.msq.dart.worker) The worker runs on Historicals. Main classes are, - DartWorkerResource, which supplies the regular MSQ WorkerResource, plus Dart-specific APIs. - DartWorkerRunner, which runs MSQ worker code. - DartWorkerContext, which configures the MSQ worker. - DartProcessingBuffersProvider, which provides processing buffers from sliced-up merge buffers. - DartDataSegmentProvider, which provides segments from the Historical's local cache. Message relays (org.apache.druid.messages): To avoid the need for Historicals to contact Brokers during a query, which would create opportunities for queries to get stuck, all connections are opened from Broker to Historical. This is made possible by a message relay system, where the relay server (worker) has an outbox of messages. The relay client (controller) connects to the outbox and retrieves messages. Code for this system lives in the "server" package to keep it separate from the MSQ extension and make it easier to maintain. The worker-to-controller ControllerClient is implemented using message relays. Other changes: - Controller: Added the method "hasWorker". Used by the ControllerMessageListener to notify the appropriate controllers when a worker fails. - WorkerResource: No longer tries to respond more than once in the "httpGetChannelData" API. This comes up when a response due to resolved future is ready at about the same time as a timeout occurs. - MSQTaskQueryMaker: Refactor to separate out some useful functions for reuse in DartQueryMaker. - SqlEngine: Add "queryContext" to "resultTypeForSelect" and "resultTypeForInsert". This allows the DartSqlEngine to modify result format based on whether a "fullReport" context parameter is set. - LimitedOutputStream: New utility class. Used when in "fullReport" mode. - TimelineServerView: Add getDruidServerMetadata as a performance optimization. - CliHistorical: Add SegmentWrangler, so it can query inline data, lookups, etc. - ServiceLocation: Add "fromUri" method, relocating some code from ServiceClientImpl. - FixedServiceLocator: New locator for a fixed set of service locations. Useful for URI locations.	2024-10-01 14:38:55 -07:00
Hardik Bajaj	3d56fa6f56	Improve logging to include taskId in segment handoff notifier thread (#17185 )	2024-10-01 15:34:39 +05:30
Vadim Ogievetsky	f33f60b32e	fix input step typo (#17202 )	2024-09-30 21:26:22 -07:00
George Shiqi Wu	5ad6ed0b73	Refactor pod template logic to make it easier to test (#17178 ) * Refactoring of pod template logic * fix javadoc * Fix intellij * remove unneeded throw * PR comments * fix style * Fix unit tests	2024-09-30 15:34:13 -07:00
Kashif Faraz	28fead58b4	MSQ: Use task context flag useConcurrentLocks to determine task lock type (#17193 )	2024-09-30 21:15:25 +05:30
Abhishek Radhakrishnan	15987f51f1	Update Delta Kernel to 3.2.1 (#17179 ) Updated Delta Kernel from 3.2.0 to 3.2.1. This upstream version bump contains fixes to reading long columns, class loader and better retry mechanism when reading checkpoint files.	2024-09-30 07:22:49 -07:00
Vadim Ogievetsky	d982727a29	Web console: revamp the experimental explore view (#17180 ) * explore revamp * remove ToDo * fix CodeQL * add tooltips * show issue on echart chars * fix: browser back does not refresh chart * fix maxRows 0 * be more resiliant to missing __time	2024-09-29 23:15:21 -07:00

1 2 3 4 5 ...

14642 Commits All Branches Search

14642 Commits

All Branches