14494 Commits

Author SHA1 Message Date
Clint Wylie
b274723928 [maven-release-plugin] prepare for next development iteration 2024-12-18 14:08:08 -08:00
Clint Wylie
520482cb96 [maven-release-plugin] prepare release druid-31.0.1-rc2 druid-31.0.1-rc2 druid-31.0.1 2024-12-18 14:08:07 -08:00
Clint Wylie
888b873f46
add new fix to 31.0.1 release notes (#17587)
* add new fix to release notes

* Update docs/release-info/release-notes.md

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-12-18 13:26:17 -08:00
Clint Wylie
ca13a309de
[Backport] topn with granularity regression fixes (#17580)
* topn with granularity regression fixes (#17565)

* topn with granularity regression fixes

changes:
* fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from #16533
* fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from #16533

* move defensive check outside of loop

* more test

* extra layer of safety

* move check outside of loop

* fix spelling

* add query context parameter to allow using pooled algorithm for topN when multi-passes is required even wihen query granularity is not all

* add comment, revert IT context changes and add new context flag

* remove unused
2024-12-18 13:34:33 +05:30
Clint Wylie
fe4d7f3353 [maven-release-plugin] prepare for next development iteration 2024-12-03 12:33:33 -08:00
Clint Wylie
91f49cac73 [maven-release-plugin] prepare release druid-31.0.1-rc1 druid-31.0.1-rc1 2024-12-03 12:33:32 -08:00
Clint Wylie
1a162cd735
suppress kafka cve for ranger extension (#17531) (#17532) 2024-12-03 11:59:18 -08:00
Clint Wylie
eccb6bd88a
add dependency updates to 31.0.1 release notes (#17518)
* add dependency updates to release notes

* ignore spelling
2024-11-27 13:35:11 -08:00
Clint Wylie
c2d8ed7bc9
update kafka dependency version to 3.9.0 (#17513) (#17517) 2024-11-27 12:01:35 -08:00
317brian
718ea2a5ea
docs: relnotes cleanup (#17514)
* docs: relnotes cleanup

* Update release-notes.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-11-26 14:07:37 -08:00
Clint Wylie
8cfbf15572
Resolve CVEs: Upgrade jetty version and suppress azure cve (#17385) (#17512) 2024-11-26 12:38:49 -08:00
Clint Wylie
5dcf5f1cff [maven-release-plugin] prepare for next development iteration 2024-11-25 16:53:44 -08:00
Clint Wylie
7e60c8dab3 [maven-release-plugin] prepare release druid-31.0.1-rc1 2024-11-25 16:53:43 -08:00
Clint Wylie
aa761a9f6e bump web-console version to 31.0.1, prepare docker-compose file for 31.0.1 2024-11-25 16:25:10 -08:00
Clint Wylie
2058fb4c0d
31.0.1 release notes (#17510) 2024-11-25 16:15:33 -08:00
Clint Wylie
6c56794ebc
QueryableIndexSegment: Re-use time boundary inspector. (#17397) (#17506)
This patch re-uses timeBoundaryInspector for each cursor holder, which
enables caching of minDataTimestamp and maxDataTimestamp.

Fixes a performance regression introduced in #16533, where these fields
stopped being cached across cursors. Prior to that patch, they were
cached in the QueryableIndexStorageAdapter.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-11-23 23:07:30 -08:00
Clint Wylie
c9aac1bb40
projection segment merge fixes (#17460) (#17503)
changes:
* fix issue when merging projections from multiple-incremental persists which was hoping that some 'dim conversion' buffers were not closed, but they already were (by the merging iterator). fix involves selectively persisting these conversion buffers to temp files in the segment write out directory and mapping them and tying them to the segment level closer so that they are available after the lifetime of the parent merger
* modify auto column serializers to use segment write out directory for temp files instead of java.io.tmpdir
* fix queryable index projection to not put the time-like column as a dimension, instead only adding it as __time
* use smoosh for temp files so can safely write any Serializer to a temp smoosh
2024-11-22 14:58:35 -08:00
Clint Wylie
2bb2acca6f
use big endian for compressed complex column values to fit object strategy expectations (#17422) (#17502) 2024-11-22 14:37:34 -08:00
Clint Wylie
b1802c4ff3
Web console: fix progress indication for table input (#17334) (#17505)
* fix porgress indication for table input

* fix snapshot

Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
2024-11-22 11:50:38 -08:00
Clint Wylie
c5d0fcde50
Run JDK 21 workflows with 21.0.4. (#17458) (#17504)
* Run JDK 21 workflows with 21.0.4.

To work around #17429, run our JDK 21 workflows with
version 21.0.4. It does not appear to have this problem.

* Undo changes in standard-its.yml

* Add comments.

---------

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Co-authored-by: Zoltan Haindrich <kirk@rxd.hu>
2024-11-22 11:50:15 -08:00
Jill Osborne
e97270664b
Druid 31.0.0 release notes (#17092) 2024-10-29 11:58:45 +05:30
Laksh Singla
3ba7badda3
Backport "Fixes an issue with AppendableMemory that can cause MSQ jobs to fail" (#17369) (#17372)
Fixes an issue with AppendableMemory that can cause MSQ jobs to fail (#17369)
2024-10-18 10:17:07 +05:30
317brian
26d5e95073
[Backport]docs: backport msq autocompact docs [#16681] (#17374)
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Vishesh Garg <vishesh.garg@imply.io>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-10-17 18:05:20 -07:00
Pranav
9dac0f7366 Fix pip installation after ubuntu upgrade (#17358) 2024-10-16 09:33:34 +05:30
Abhishek Agarwal
fd487d7ded
Revert "Use canonical hostname instead of ip by default (#16386)" (#17347)
This reverts commit 9459722ebf6565d7161edab671d91588ff2c6e1b.
2024-10-16 08:57:24 +05:30
Abhishek Agarwal
2b98facdc4
MSQ WorkerResource: Fix timeout handler for httpGetChannelData. (#17328) (#17330)
The timeout handler should fire if the response has not been handled yet
(i.e. if responseResolved was previously false). However, it erroneously
fires only if the response *was* handled. This causes HTTP 500 errors if
the timeout actually does fire. The timeout is 30 seconds, which can be
hit during pipelined queries, if an earlier stage of the query hasn't
produced its first frame within 30 seconds.

This fixes a regression introduced in #17140.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-11 18:31:01 +05:30
Kashif Faraz
1a7f91f0ab
add substituteCombiningFactory implementations for datasketches aggs (#17314) (#17323)
Follow up to #17214, adds implementations for substituteCombiningFactory so that more
datasketches aggs can match projections, along with some projections tests for datasketches.

Co-authored-by: Clint Wylie <cwylie@apache.org>
2024-10-10 19:01:05 +05:30
Karan Kumar
06c1a6a31e
[Backport] Dart backports for (#17312) , (#17313) , (#17319) (#17322)
* MSQ: Use leaf worker count for stages that have any leaf inputs. (#17312)
(cherry picked from commit b27712933e07f1ae2977461b63aa26ddde018395)

* MSQ: Call "onQueryComplete" after the query is closed. (#17313)
(cherry picked from commit 4092f3fe47abb80b2b5bf2fd1e21482b79dc1057)

* Dart: Only use historicals as workers. (#17319)
(cherry picked from commit 074944e02c7fb29b9f4160626ff8dc6fd225b0a9)

---------
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-10 16:45:34 +05:30
Zoltan Haindrich
f72dbfd9cd
[Backport] Window Functions : Improve performance by comparing Strings in frame bytes without converting them (#17091) (#17311)
(cherry picked from commit b9a4c73e525d7addd9cde078e62490e2943da6e9)

Co-authored-by: Sree Charan Manamala <sree.manamala@imply.io>
2024-10-10 08:52:54 +05:30
Gian Merlino
f2df70a34a BaseWorkerClientImpl: Don't attempt to recover from a closed channel. (#17052)
* BaseWorkerClientImpl: Don't attempt to recover from a closed channel.

This patch introduces an exception type "ChannelClosedForWritesException",
which allows the BaseWorkerClientImpl to avoid retrying when the local
channel has been closed. This can happen in cases of cancellation.

* Add some test coverage.

* wip

* Add test coverage.

* Style.

(cherry picked from commit 4dc5942dabacfc9e458f9c742e510cb94c091c8d)
2024-10-09 22:12:09 +05:30
Gian Merlino
75ac9051cd MSQ: Fix two issues with phase transitions. (#17053)
1) ControllerQueryKernel: Update readyToReadResults to acknowledge that sorting stages can
   go directly from READING_INPUT to RESULTS_READY.

2) WorkerStageKernel: Ignore RESULTS_COMPLETE if work is already finished, which can happen
   if the transition to FINISHED comes early due to a downstream LIMIT.

(cherry picked from commit 654e0b444bfc968ba0ec65a669af6dfef98604cd)
2024-10-09 22:12:09 +05:30
Abhishek Agarwal
479fdea065
CVE suppression for various dependencies. (#17307) (#17308)
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2024-10-09 20:30:49 +05:30
AmatyaAvadhanula
709c119907
MSQ: Wake up the main controller thread on workerError. (#17075) (#17304)
[Backport] MSQ: Wake up the main controller thread on workerError. (#17075) #17304 
This isn't necessary when using MSQWorkerTaskLauncher as the WorkerManager
implementation, because in that case, task failure also wakes up the
main thread. However, when using workers that are not task-based, we don't
want to rely on the WorkerManager for this.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-09 19:09:06 +05:30
Karan Kumar
9b42fb1c65
Upgrade commons-io to 2.17.0 (#17227) (#17302)
(cherry picked from commit 93b5a8326bfcde747b43b481064096e01b7a407f)

Co-authored-by: Shivam Garg <shigarg@visa.com>
2024-10-09 18:19:37 +05:30
AmatyaAvadhanula
a035fb8fa9
DartWorkerContext: Return the correct workerId(). (#17280) (#17306)
Prior to this patch, the workerId() method did not actually return
the worker ID. It returned some other string that had similar information,
but was different.

This caused the /druid/dart-worker/workers API, to return an internal
server error. The API is useful for debugging, although it is not used
during actual queries.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-09 18:09:18 +05:30
AmatyaAvadhanula
9b90d9c3ae
QueryResource: Don't close JSON content on error. (#17034) (#17303)
* QueryResource: Don't close JSON content on error.

Following similar issues fixed in #11685 and #15880, this patch fixes
a bug where QueryResource would write a closing array marker if it
encountered an exception after starting to push results. This makes it
difficult for callers to detect errors.

The prior patches didn't catch this problem because QueryResource uses
the ObjectMapper in a unique way, through writeValuesAsArray, which
doesn't respect the global AUTO_CLOSE_JSON_CONTENT setting.

* Fix usage of customized ObjectMappers.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-10-09 18:05:12 +05:30
AmatyaAvadhanula
4ed6cde70a
Fix queries for updated segments on SinkQuerySegmentWalker (#17157) (#17298)
Fix the logic for usage of segment descriptors from queries in SinkQuerySegmentWalker when there are upgraded segments as a result of concurrent replace.

Concurrent append and replace:
With the introduction of concurrent append and replace, for a given interval:

The same sink can correspond to a base segment V0_x0, and have multiple mappings to higher versions with distinct partition numbers such as V1_x1.... Vn_xn.
The initial segment allocation can happen on version V0, but there can be several allocations during the lifecycle of a task which can have different versions spanning from V0 to Vn.
Changes:
Maintain a new timeline of (An overshadowable holding a SegmentDescriptor)
Every segment allocation of version upgrade adds the latest segment descriptor to this timeline.
Iterate this timeline instead of the sinkTimeline to get the segment descriptors in getQueryRunnerForIntervals
Also maintain a mapping of the upgraded segment to its base segment.
When a sink is needed to process the query, find the base segment corresponding to a given descriptor, and then use the sinkTimeline to find its chunk.
2024-10-09 18:04:46 +05:30
Charles Smith
0c805cbc2b
[docs] update tutorial for Theta sketches (#16953) (#17292)
Updates and revisions to theta sketches tutorial
2024-10-09 18:04:25 +05:30
Charles Smith
697907a612
[Docs] Update known issues for window functions (#17097) (#17291)
* draft update to known issues

* Update known issues

Remove addressed known issues. Clarify the issue with SELECT * queries.
2024-10-09 18:04:05 +05:30
AmatyaAvadhanula
a5bfb5488c
[docs] update tutorial for Theta sketches (#16953) (#17301)
* from start to step 3 of Ingest data using Theta sketche

* updated upto "Query the Theta sketch column"

* fixed sentence

* another typo

* using sql ingestion instead of batch-sql

* waiting for explanations on DS_THETA

* Revert "using sql ingestion instead of batch-sql"

This reverts commit b95fcb9b32608dba55deee9910e295f2391d77e2.

* Revert "using sql ingestion instead of batch-sql"

This reverts commit b95fcb9b32608dba55deee9910e295f2391d77e2.

* just copy and pasting to where I was

* updated tutorial

* fixing images, and removing unused

* slightly updating explanatio

* Update docs/tutorials/tutorial-sketches-theta.md

* Apply suggestions from code review



* addressing comments in review

* made filter clause consitent with other instances

* Apply suggestions from code review




---------

Co-authored-by: Edgar Melendrez <evmelendrez@gmail.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-10-09 16:36:38 +05:30
AmatyaAvadhanula
9383f18906
[Docs] Update known issues for window functions (#17097) (#17305)
* draft update to known issues

* Update known issues

Remove addressed known issues. Clarify the issue with SELECT * queries.

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-10-09 16:35:56 +05:30
Vadim Ogievetsky
76ebac336d
[Backport] Web console: backport (#17290) and (#17295) (#17297)
* run npm audit fix (#17290)
* better timing bar styling (#17295)
2024-10-09 13:00:25 +05:30
Karan Kumar
ccb7c2edd9
[Backport] Dart and security backports (#17249) (#17278) (#17281) (#17282) (#17283) (#17277) (#17285)
* MSQ: Allow for worker gaps. (#17277)
* DartSqlResource: Sort queries by start time. (#17282)
* DartSqlResource: Add controllerHost to GetQueriesResponse. (#17283)
* DartWorkerModule: Replace en dash with regular dash. (#17281)
* DartSqlResource: Return HTTP 202 on cancellation even if no such query. (#17278)
* Upgraded Protobuf to 3.25.5 (#17249)
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 7d9e6d36fddd7893825d1fa2f5da2e20f67c5de8)
---------
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Co-authored-by: Shivam Garg <shigarg@visa.com>
2024-10-08 19:46:40 +05:30
Kashif Faraz
f43964a808
Fail concurrent replace tasks with finer segment granularity than append (#17265) (#17272)
Co-authored-by: AmatyaAvadhanula <amatya.avadhanula@imply.io>
2024-10-08 10:03:57 +05:30
Kashif Faraz
b30eab36b9
docs: concurrent append and replace is gA (#17269) (#17273)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-10-08 09:37:43 +05:30
Kashif Faraz
e95e95a7a1
Fix batch segment allocation failure with replicas (#17262) (#17267)
Fixes #16587

Streaming ingestion tasks operate by allocating segments before ingesting rows.
These allocations happen across replicas which may send different requests but
must get the same segment id for a given (datasource, interval, version, sequenceName)
across replicas.

This patch fixes the bug by ignoring the previousSegmentId when skipLineageCheck is true.

Co-authored-by: AmatyaAvadhanula <amatya.avadhanula@imply.io>
2024-10-08 08:03:51 +05:30
Kashif Faraz
f694066965
When removeNullBytes is set, length calculations did not take into account null bytes. (#17232) (#17266)
* When replaceNullBytes is set, length calculations did not take into account null bytes.

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2024-10-07 20:51:54 +05:30
Vishesh Garg
f7010253da
Fix issues with MSQ Compaction (#17250) (#17263)
The patch makes the following changes:
1. Fixes a bug causing compaction to fail on array, complex, and other non-primitive-type columns
2. Updates compaction status check to be conscious of partition dimensions when comparing dimension ordering.
3. Ensures only string columns are specified as partition dimensions
4. Ensures `rollup` is true if and only if metricsSpec is non-empty
5. Ensures disjoint intervals aren't submitted for compaction
6. Adds `compactionReason` to compaction task context.

(cherry picked from commit 7e35e50052ee1b4f4d65222e0d5c4883e9fa26da)
2024-10-07 08:42:34 +05:30
Abhishek Agarwal
52441c005c
add druid.expressions.allowVectorizeFallback and default to false (#17248) (#17260)
changes:

adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types)
add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future

Co-authored-by: Clint Wylie <cwylie@apache.org>
2024-10-06 12:29:16 +05:30
Clint Wylie
7b3fc4e768
backport projections (#17257)
* abstract `IncrementalIndex` cursor stuff to prepare for using different "views" of the data based on the cursor build spec (#17064)

* abstract `IncrementalIndex` cursor stuff to prepare to allow for possibility of using different "views" of the data based on the cursor build spec
changes:
* introduce `IncrementalIndexRowSelector` interface to capture how `IncrementalIndexCursor` and `IncrementalIndexColumnSelectorFactory` read data
* `IncrementalIndex` implements `IncrementalIndexRowSelector`
* move `FactsHolder` interface to separate file
* other minor refactorings

* add DataSchema.Builder to tidy stuff up a bit (#17065)

* add DataSchema.Builder to tidy stuff up a bit

* fixes

* fixes

* more style fixes

* review stuff

* Projections prototype (#17214)
2024-10-05 23:03:41 +05:30