Commit Graph

14487 Commits

Author SHA1 Message Date
Clint Wylie eccb6bd88a
add dependency updates to 31.0.1 release notes (#17518)
* add dependency updates to release notes

* ignore spelling
2024-11-27 13:35:11 -08:00
Clint Wylie c2d8ed7bc9
update kafka dependency version to 3.9.0 (#17513) (#17517) 2024-11-27 12:01:35 -08:00
317brian 718ea2a5ea
docs: relnotes cleanup (#17514)
* docs: relnotes cleanup

* Update release-notes.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-11-26 14:07:37 -08:00
Clint Wylie 8cfbf15572
Resolve CVEs: Upgrade jetty version and suppress azure cve (#17385) (#17512) 2024-11-26 12:38:49 -08:00
Clint Wylie 5dcf5f1cff [maven-release-plugin] prepare for next development iteration 2024-11-25 16:53:44 -08:00
Clint Wylie 7e60c8dab3 [maven-release-plugin] prepare release druid-31.0.1-rc1 2024-11-25 16:53:43 -08:00
Clint Wylie aa761a9f6e bump web-console version to 31.0.1, prepare docker-compose file for 31.0.1 2024-11-25 16:25:10 -08:00
Clint Wylie 2058fb4c0d
31.0.1 release notes (#17510) 2024-11-25 16:15:33 -08:00
Clint Wylie 6c56794ebc
QueryableIndexSegment: Re-use time boundary inspector. (#17397) (#17506)
This patch re-uses timeBoundaryInspector for each cursor holder, which
enables caching of minDataTimestamp and maxDataTimestamp.

Fixes a performance regression introduced in #16533, where these fields
stopped being cached across cursors. Prior to that patch, they were
cached in the QueryableIndexStorageAdapter.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-11-23 23:07:30 -08:00
Clint Wylie c9aac1bb40
projection segment merge fixes (#17460) (#17503)
changes:
* fix issue when merging projections from multiple-incremental persists which was hoping that some 'dim conversion' buffers were not closed, but they already were (by the merging iterator). fix involves selectively persisting these conversion buffers to temp files in the segment write out directory and mapping them and tying them to the segment level closer so that they are available after the lifetime of the parent merger
* modify auto column serializers to use segment write out directory for temp files instead of java.io.tmpdir
* fix queryable index projection to not put the time-like column as a dimension, instead only adding it as __time
* use smoosh for temp files so can safely write any Serializer to a temp smoosh
2024-11-22 14:58:35 -08:00
Clint Wylie 2bb2acca6f
use big endian for compressed complex column values to fit object strategy expectations (#17422) (#17502) 2024-11-22 14:37:34 -08:00
Clint Wylie b1802c4ff3
Web console: fix progress indication for table input (#17334) (#17505)
* fix porgress indication for table input

* fix snapshot

Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
2024-11-22 11:50:38 -08:00
Clint Wylie c5d0fcde50
Run JDK 21 workflows with 21.0.4. (#17458) (#17504)
* Run JDK 21 workflows with 21.0.4.

To work around #17429, run our JDK 21 workflows with
version 21.0.4. It does not appear to have this problem.

* Undo changes in standard-its.yml

* Add comments.

---------

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Co-authored-by: Zoltan Haindrich <kirk@rxd.hu>
2024-11-22 11:50:15 -08:00
Jill Osborne e97270664b
Druid 31.0.0 release notes (#17092) 2024-10-29 11:58:45 +05:30
Laksh Singla 3ba7badda3
Backport "Fixes an issue with AppendableMemory that can cause MSQ jobs to fail" (#17369) (#17372)
Fixes an issue with AppendableMemory that can cause MSQ jobs to fail (#17369)
2024-10-18 10:17:07 +05:30
317brian 26d5e95073
[Backport]docs: backport msq autocompact docs [#16681] (#17374)
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Vishesh Garg <vishesh.garg@imply.io>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-10-17 18:05:20 -07:00
Pranav 9dac0f7366 Fix pip installation after ubuntu upgrade (#17358) 2024-10-16 09:33:34 +05:30
Abhishek Agarwal fd487d7ded
Revert "Use canonical hostname instead of ip by default (#16386)" (#17347)
This reverts commit 9459722ebf.
2024-10-16 08:57:24 +05:30
Abhishek Agarwal 2b98facdc4
MSQ WorkerResource: Fix timeout handler for httpGetChannelData. (#17328) (#17330)
The timeout handler should fire if the response has not been handled yet
(i.e. if responseResolved was previously false). However, it erroneously
fires only if the response *was* handled. This causes HTTP 500 errors if
the timeout actually does fire. The timeout is 30 seconds, which can be
hit during pipelined queries, if an earlier stage of the query hasn't
produced its first frame within 30 seconds.

This fixes a regression introduced in #17140.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-11 18:31:01 +05:30
Kashif Faraz 1a7f91f0ab
add substituteCombiningFactory implementations for datasketches aggs (#17314) (#17323)
Follow up to #17214, adds implementations for substituteCombiningFactory so that more
datasketches aggs can match projections, along with some projections tests for datasketches.

Co-authored-by: Clint Wylie <cwylie@apache.org>
2024-10-10 19:01:05 +05:30
Karan Kumar 06c1a6a31e
[Backport] Dart backports for (#17312) , (#17313) , (#17319) (#17322)
* MSQ: Use leaf worker count for stages that have any leaf inputs. (#17312)
(cherry picked from commit b27712933e)

* MSQ: Call "onQueryComplete" after the query is closed. (#17313)
(cherry picked from commit 4092f3fe47)

* Dart: Only use historicals as workers. (#17319)
(cherry picked from commit 074944e02c)

---------
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-10 16:45:34 +05:30
Zoltan Haindrich f72dbfd9cd
[Backport] Window Functions : Improve performance by comparing Strings in frame bytes without converting them (#17091) (#17311)
(cherry picked from commit b9a4c73e52)

Co-authored-by: Sree Charan Manamala <sree.manamala@imply.io>
2024-10-10 08:52:54 +05:30
Gian Merlino f2df70a34a BaseWorkerClientImpl: Don't attempt to recover from a closed channel. (#17052)
* BaseWorkerClientImpl: Don't attempt to recover from a closed channel.

This patch introduces an exception type "ChannelClosedForWritesException",
which allows the BaseWorkerClientImpl to avoid retrying when the local
channel has been closed. This can happen in cases of cancellation.

* Add some test coverage.

* wip

* Add test coverage.

* Style.

(cherry picked from commit 4dc5942dab)
2024-10-09 22:12:09 +05:30
Gian Merlino 75ac9051cd MSQ: Fix two issues with phase transitions. (#17053)
1) ControllerQueryKernel: Update readyToReadResults to acknowledge that sorting stages can
   go directly from READING_INPUT to RESULTS_READY.

2) WorkerStageKernel: Ignore RESULTS_COMPLETE if work is already finished, which can happen
   if the transition to FINISHED comes early due to a downstream LIMIT.

(cherry picked from commit 654e0b444b)
2024-10-09 22:12:09 +05:30
Abhishek Agarwal 479fdea065
CVE suppression for various dependencies. (#17307) (#17308)
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2024-10-09 20:30:49 +05:30
AmatyaAvadhanula 709c119907
MSQ: Wake up the main controller thread on workerError. (#17075) (#17304)
[Backport] MSQ: Wake up the main controller thread on workerError. (#17075) #17304 
This isn't necessary when using MSQWorkerTaskLauncher as the WorkerManager
implementation, because in that case, task failure also wakes up the
main thread. However, when using workers that are not task-based, we don't
want to rely on the WorkerManager for this.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-09 19:09:06 +05:30
Karan Kumar 9b42fb1c65
Upgrade commons-io to 2.17.0 (#17227) (#17302)
(cherry picked from commit 93b5a8326b)

Co-authored-by: Shivam Garg <shigarg@visa.com>
2024-10-09 18:19:37 +05:30
AmatyaAvadhanula a035fb8fa9
DartWorkerContext: Return the correct workerId(). (#17280) (#17306)
Prior to this patch, the workerId() method did not actually return
the worker ID. It returned some other string that had similar information,
but was different.

This caused the /druid/dart-worker/workers API, to return an internal
server error. The API is useful for debugging, although it is not used
during actual queries.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-09 18:09:18 +05:30
AmatyaAvadhanula 9b90d9c3ae
QueryResource: Don't close JSON content on error. (#17034) (#17303)
* QueryResource: Don't close JSON content on error.

Following similar issues fixed in #11685 and #15880, this patch fixes
a bug where QueryResource would write a closing array marker if it
encountered an exception after starting to push results. This makes it
difficult for callers to detect errors.

The prior patches didn't catch this problem because QueryResource uses
the ObjectMapper in a unique way, through writeValuesAsArray, which
doesn't respect the global AUTO_CLOSE_JSON_CONTENT setting.

* Fix usage of customized ObjectMappers.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-09 18:05:12 +05:30
AmatyaAvadhanula 4ed6cde70a
Fix queries for updated segments on SinkQuerySegmentWalker (#17157) (#17298)
Fix the logic for usage of segment descriptors from queries in SinkQuerySegmentWalker when there are upgraded segments as a result of concurrent replace.

Concurrent append and replace:
With the introduction of concurrent append and replace, for a given interval:

The same sink can correspond to a base segment V0_x0, and have multiple mappings to higher versions with distinct partition numbers such as V1_x1.... Vn_xn.
The initial segment allocation can happen on version V0, but there can be several allocations during the lifecycle of a task which can have different versions spanning from V0 to Vn.
Changes:
Maintain a new timeline of (An overshadowable holding a SegmentDescriptor)
Every segment allocation of version upgrade adds the latest segment descriptor to this timeline.
Iterate this timeline instead of the sinkTimeline to get the segment descriptors in getQueryRunnerForIntervals
Also maintain a mapping of the upgraded segment to its base segment.
When a sink is needed to process the query, find the base segment corresponding to a given descriptor, and then use the sinkTimeline to find its chunk.
2024-10-09 18:04:46 +05:30
Charles Smith 0c805cbc2b
[docs] update tutorial for Theta sketches (#16953) (#17292)
Updates and revisions to theta sketches tutorial
2024-10-09 18:04:25 +05:30
Charles Smith 697907a612
[Docs] Update known issues for window functions (#17097) (#17291)
* draft update to known issues

* Update known issues

Remove addressed known issues. Clarify the issue with SELECT * queries.
2024-10-09 18:04:05 +05:30
AmatyaAvadhanula a5bfb5488c
[docs] update tutorial for Theta sketches (#16953) (#17301)
* from start to step 3 of Ingest data using Theta sketche

* updated upto "Query the Theta sketch column"

* fixed sentence

* another typo

* using sql ingestion instead of batch-sql

* waiting for explanations on DS_THETA

* Revert "using sql ingestion instead of batch-sql"

This reverts commit b95fcb9b32.

* Revert "using sql ingestion instead of batch-sql"

This reverts commit b95fcb9b32.

* just copy and pasting to where I was

* updated tutorial

* fixing images, and removing unused

* slightly updating explanatio

* Update docs/tutorials/tutorial-sketches-theta.md

* Apply suggestions from code review



* addressing comments in review

* made filter clause consitent with other instances

* Apply suggestions from code review




---------

Co-authored-by: Edgar Melendrez <evmelendrez@gmail.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-10-09 16:36:38 +05:30
AmatyaAvadhanula 9383f18906
[Docs] Update known issues for window functions (#17097) (#17305)
* draft update to known issues

* Update known issues

Remove addressed known issues. Clarify the issue with SELECT * queries.

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-10-09 16:35:56 +05:30
Vadim Ogievetsky 76ebac336d
[Backport] Web console: backport (#17290) and (#17295) (#17297)
* run npm audit fix (#17290)
* better timing bar styling (#17295)
2024-10-09 13:00:25 +05:30
Karan Kumar ccb7c2edd9
[Backport] Dart and security backports (#17249) (#17278) (#17281) (#17282) (#17283) (#17277) (#17285)
* MSQ: Allow for worker gaps. (#17277)
* DartSqlResource: Sort queries by start time. (#17282)
* DartSqlResource: Add controllerHost to GetQueriesResponse. (#17283)
* DartWorkerModule: Replace en dash with regular dash. (#17281)
* DartSqlResource: Return HTTP 202 on cancellation even if no such query. (#17278)
* Upgraded Protobuf to 3.25.5 (#17249)
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 7d9e6d36fd)
---------
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Co-authored-by: Shivam Garg <shigarg@visa.com>
2024-10-08 19:46:40 +05:30
Kashif Faraz f43964a808
Fail concurrent replace tasks with finer segment granularity than append (#17265) (#17272)
Co-authored-by: AmatyaAvadhanula <amatya.avadhanula@imply.io>
2024-10-08 10:03:57 +05:30
Kashif Faraz b30eab36b9
docs: concurrent append and replace is gA (#17269) (#17273)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-10-08 09:37:43 +05:30
Kashif Faraz e95e95a7a1
Fix batch segment allocation failure with replicas (#17262) (#17267)
Fixes #16587

Streaming ingestion tasks operate by allocating segments before ingesting rows.
These allocations happen across replicas which may send different requests but
must get the same segment id for a given (datasource, interval, version, sequenceName)
across replicas.

This patch fixes the bug by ignoring the previousSegmentId when skipLineageCheck is true.

Co-authored-by: AmatyaAvadhanula <amatya.avadhanula@imply.io>
2024-10-08 08:03:51 +05:30
Kashif Faraz f694066965
When removeNullBytes is set, length calculations did not take into account null bytes. (#17232) (#17266)
* When replaceNullBytes is set, length calculations did not take into account null bytes.

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2024-10-07 20:51:54 +05:30
Vishesh Garg f7010253da
Fix issues with MSQ Compaction (#17250) (#17263)
The patch makes the following changes:
1. Fixes a bug causing compaction to fail on array, complex, and other non-primitive-type columns
2. Updates compaction status check to be conscious of partition dimensions when comparing dimension ordering.
3. Ensures only string columns are specified as partition dimensions
4. Ensures `rollup` is true if and only if metricsSpec is non-empty
5. Ensures disjoint intervals aren't submitted for compaction
6. Adds `compactionReason` to compaction task context.

(cherry picked from commit 7e35e50052)
2024-10-07 08:42:34 +05:30
Abhishek Agarwal 52441c005c
add druid.expressions.allowVectorizeFallback and default to false (#17248) (#17260)
changes:

adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types)
add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future

Co-authored-by: Clint Wylie <cwylie@apache.org>
2024-10-06 12:29:16 +05:30
Clint Wylie 7b3fc4e768
backport projections (#17257)
* abstract `IncrementalIndex` cursor stuff to prepare for using different "views" of the data based on the cursor build spec (#17064)

* abstract `IncrementalIndex` cursor stuff to prepare to allow for possibility of using different "views" of the data based on the cursor build spec
changes:
* introduce `IncrementalIndexRowSelector` interface to capture how `IncrementalIndexCursor` and `IncrementalIndexColumnSelectorFactory` read data
* `IncrementalIndex` implements `IncrementalIndexRowSelector`
* move `FactsHolder` interface to separate file
* other minor refactorings

* add DataSchema.Builder to tidy stuff up a bit (#17065)

* add DataSchema.Builder to tidy stuff up a bit

* fixes

* fixes

* more style fixes

* review stuff

* Projections prototype (#17214)
2024-10-05 23:03:41 +05:30
Kashif Faraz 1435b9f4bd
Dart: Skip final getCounters, postFinish to idle historicals. (#17255) (#17259)
In a Dart query, all Historicals are given worker IDs, but not all of them
are going to actually be started or receive work orders.

Attempting to send a getCounters or postFinish command to a worker that
never received a work order is not only wasteful, but it causes errors due
to the workers not knowing about that query ID.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-05 19:32:21 +05:30
Kashif Faraz d8e3ac89c3
Web console: don't assume that activeTasks is an array (#17254) (#17258)
Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
2024-10-05 18:44:24 +05:30
Kashif Faraz f27a1dc651
[Backport] Dart: Smoother handling of stage early exit (#17228) (#17069) (#17256)
* MSQ: Properly report errors that occur when starting up RunWorkOrder. (#17069)
* Dart: Smoother handling of stage early-exit. (#17228)
---------
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-05 17:29:34 +05:30
Kashif Faraz 10528a6d9e
[Backport] Patches (#17039) (#17173) (#17216) (#17224) (#17230) (#17238) (#17251)
* SQL: Use regular filters for time filtering in subqueries. (#17173)
* RunWorkOrder: Account for two simultaneous statistics collectors. (#17216)
* DartTableInputSpecSlicer: Fix for TLS workers. (#17224)
* Upgrade avro - minor version (#17230)
* SuperSorter: Don't set allDone if it's already set. (#17238)
* Decoupled planning: improve join support (#17039)
---------
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Zoltan Haindrich <kirk@rxd.hu>
2024-10-05 09:22:57 +05:30
Vadim Ogievetsky 351330b990
Explore view fix spin when applying defaults (#17252) (#17253) 2024-10-05 08:04:41 +05:30
Charles Smith a939dd44fc
Docs: adds MSQ examples to front coded dict. migration (#17236) (#17239) 2024-10-04 10:33:33 -07:00
Clint Wylie 0ffdbaa6eb
read metadata in SimpleQueryableIndex if available to compute segment ordering (#17181) (#17191) 2024-10-04 21:15:27 +05:30