Commit Graph

14416 Commits

Author SHA1 Message Date
Gian Merlino a83125e4a0
Track IngestionState more accurately in realtime tasks. (#16934)
Previously, SeekableStreamIndexTaskRunner set ingestion state to
COMPLETED when it finished reading data from Kafka. This is incorrect.
After the changes in this patch, the transitions go:

1) The task stays in BUILD_SEGMENTS after it finishes reading from Kafka,
   while it is building its final set of segments to publish.

2) The task transitions to SEGMENT_AVAILABILITY_WAIT after publishing,
   while waiting for handoff.

3) The task transitions to COMPLETED immediately before exiting, when
   truly done.
2024-08-22 11:43:46 +05:30
Edgar Melendrez 725695342c
[Docs] Batch07: adding examples to string functions (#16862)
* Lower,Upper,Lpad,Rpad,Parse_long

* up to REGEXP_EXTRACT

* batch 07 ready for review

* updated definitions in scalar

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* rpad and lpad

* addressing comments

* minor fixes

* improving examples based on suggestions

* matched -> matches

* correcting typo

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-08-21 15:08:25 -07:00
Gian Merlino 338da67bc6
Add type coercion and null check to left, right, repeat exprs. (#16480)
* Add type coercion and null check to left, right, repeat exprs.

These exprs shouldn't validate types; they should coerce types. Coercion
is typical behavior for functions because it enables schema evolution.

The functions are also modified to check isNumericNull on the right-hand
argument. This was missing previously, which would erroneously cause
nulls to be treated as zeroes.

* Fix tests.
2024-08-21 15:07:24 -07:00
Gian Merlino 090023609b
Loosen case in FrameFileWriterTest. (#16938)
The specific error on a truncated file can vary based on how the final
frame of the truncated file is written. This patch loosens the check so
it passes regardless of how the truncated file is written.
2024-08-21 13:45:01 -07:00
Akshat Jain 97f9502ad2
Enable MSQ WF drill tests which were previously disabled (#16935) 2024-08-21 15:47:50 +05:30
Gian Merlino f6adacf5d6
SuperSorter: Store readOnly output channels. (#16928)
Without the call to readOnly, each output channel retains a 1 MB allocator,
leading to excessive memory use. Fixes regression from #16775.
2024-08-20 23:10:29 -07:00
Akshat Jain 0ce1b6b22f
MSQ window function: Take segment granularity into consideration to fix NPE issues with ingestion (#16854)
This PR changes the logic for window functions to use the resultShuffleSpecFactory for the last window stage.
2024-08-21 10:06:04 +05:30
Gian Merlino 2bd31603de
FrameFile: Improve error messages. (#16912)
* FrameFile: Improve error messages.

1) Include frame file path in error messages.

2) Adhere better to style (no space before brackets).

* Fix test.
2024-08-20 11:56:30 -07:00
benkrug 7b8573ed3d
Update index.md - remove the extra word "does" from one sentence. (#16922)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-08-20 11:06:12 -07:00
Jakub Matyszewski 82d9ff9cc8
Add docs for log audit manager (#16927)
* Add docs for log audit manager

* Adjust descriptions
2024-08-20 15:58:31 +05:30
Rishabh Singh bc4b3a2f91
Filter out tombstone segments from metadata cache (#16890)
* Fix build

* Support segment metadata queries for tombstones

* Filter out tombstone segments from metadata cache

* Revert some changes

* checkstyle

* Update docs
2024-08-20 11:35:02 +05:30
Clint Wylie 518f642028
remove isDescending from Query interface, move to TimeseriesQuery (#16917)
* remove isDescending from Query interface, since it is only actually settable and usable by TimeseriesQuery
2024-08-19 23:02:45 -07:00
Vishesh Garg fb7103ccef
Change dimensionToSchemaMap to dimensionSchemas and override ARRAY_INGEST_MODE to array (#16909)
A follow-up PR for #16864. Just renames dimensionToSchemaMap to dimensionSchemas and always overrides ARRAY_INGEST_MODE context value to array for MSQ compaction.
2024-08-20 10:30:24 +05:30
Kashif Faraz 2198001930
Remove unused cachingCost strategy runtime properties (#16918) 2024-08-19 10:15:03 +05:30
Vadim Ogievetsky 4e33ce2b21
fix collapsing in column tree (#16910) 2024-08-18 15:11:28 -07:00
Akshat Jain a56b5c018d
Propagate TooManyRowsInAWindowFault error message properly to the user (#16906)
* Propagate TooManyRowsInAWindowFault error message properly to the user

* Add TooManyRowsInAWindowFault to MSQFaultSerdeTest
2024-08-18 10:03:45 +05:30
Benedict Jin 688b4cf164
Fix flaky test in ParallelMergeCombiningSequenceTest (#16907) 2024-08-18 10:02:50 +05:30
Gian Merlino 806649f8af
SQL: Fix nullable DATE, TIMESTAMP reduction. (#16915)
Reduction of nullable DATE and TIMESTAMP expressions did not perform
a necessary null check, so would in some cases reduce to
1970-01-01 00:00:00 (epoch) rather than NULL.
2024-08-16 22:41:12 -07:00
Vadim Ogievetsky 422183ee70
Web console: expose handoff API (#16586)
* don't start completions on numbers... it makes numbers hard to enter

* add handoff dialog

* fix placeholder

* Update web-console/src/dialogs/supervisor-handoff-dialog/supervisor-handoff-dialog.tsx

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update web-console/src/dialogs/supervisor-handoff-dialog/supervisor-handoff-dialog.tsx

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update web-console/src/dialogs/supervisor-handoff-dialog/supervisor-handoff-dialog.tsx

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* feedback fixes

* update snapshot

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-08-16 14:39:16 -07:00
Edgar Melendrez c968e73171
[Docs] updating transformation during ingestion tutorial (#16845)
* first major revision of tutorial

* more edits

* re-ID the file to reflect new content + redirect

* renaming file

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* addressing suggestions

* adding column names

* Update docs/tutorials/tutorial-transform.md

* Update docs/tutorials/tutorial-transform.md

* Addressing suggestions

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* adding trademark logo and moving paragraph

* decided to shorten final paragraph

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-08-16 11:39:57 -07:00
Clint Wylie 4283b270e3
rework cursor creation (#16533)
changes:
* Added `CursorBuildSpec` which captures all of the 'interesting' stuff that goes into producing a cursor as a replacement for the method arguments of `CursorFactory.canVectorize`, `CursorFactory.makeCursor`, and `CursorFactory.makeVectorCursor`
* added new interface `CursorHolder` and new interface `CursorHolderFactory` as a replacement for `CursorFactory`, with method `makeCursorHolder`, which takes a `CursorBuildSpec` as an argument and replaces `CursorFactory.canVectorize`, `CursorFactory.makeCursor`, and `CursorFactory.makeVectorCursor`
* `CursorFactory.makeCursors` previously returned a `Sequence<Cursor>` corresponding to the query granularity buckets, with a separate `Cursor` per bucket. `CursorHolder.asCursor` instead returns a single `Cursor` (equivalent to 'ALL' granularity), and a new `CursorGranularizer` has been added for query engines to iterate over the cursor and divide into granularity buckets. This makes the non-vectorized engine behave the same way as the vectorized query engine (with its `VectorCursorGranularizer`), and simplifies a lot of stuff that has to read segments particularly if it does not care about bucketing the results into granularities. 
* Deprecated `CursorFactory`, `CursorFactory.canVectorize`, `CursorFactory.makeCursors`, and `CursorFactory.makeVectorCursor`
* updated all `StorageAdapter` implementations to implement `makeCursorHolder`, transitioned direct `CursorFactory` implementations to instead implement `CursorMakerFactory`. `StorageAdapter` being a `CursorMakerFactory` is intended to be a transitional thing, ideally will not be released in favor of moving `CursorMakerFactory` to be fetched directly from `Segment`, however this PR was already large enough so this will be done in a follow-up.
* updated all query engines to use `makeCursorHolder`, granularity based engines to use `CursorGranularizer`.
2024-08-16 11:34:10 -07:00
Vishesh Garg e37fe93f09
Add support for a custom `DimensionSchema` in `DataSourceMSQDestination` (#16864)
This PR adds support for passing in a custom DimensionSchema map to MSQ query destination of type DataSourceMSQDestination
2024-08-16 15:24:49 +05:30
Edgar Melendrez 5b94839d9d
[Docs] Batch08: adding examples to string functions (#16871)
* batch08 completed

* reviewing batch08

* apply corrections suggestions by @FrankChen021
2024-08-16 10:15:30 +08:00
Hugh Evans e91f680d50
Removed deprecated deep storage properties (#16904) 2024-08-15 11:54:34 -07:00
Hugh Evans 6cfdeb3894
Added a topic listing reserved keywords (#16843) 2024-08-15 10:25:09 -07:00
Hugh Evans 8c030feefc
Migration guide fixes (#16902)
* Fix typo in table header

* Fixed example NVL result
2024-08-15 09:26:34 -07:00
Sree Charan Manamala 964cf47bb5
fix NPE (#16897) 2024-08-15 18:12:22 +08:00
Vadim Ogievetsky 8181ef627a
add useConcurrentLocks toggle (#16899) 2024-08-14 13:44:53 -07:00
Vadim Ogievetsky ca82ecd352
bump axios to 1.7.4 (#16898) 2024-08-14 13:42:26 -07:00
Maytas Monsereenusorn c2ddff399d
Fix Parquet Reader when ingestion need to read columns in filter (#16874) 2024-08-14 12:31:38 -07:00
Laksh Singla 204533cade
Remove Query ID verification check from MSQ workers (#16886)
Upgrade/Downgrade between any version till or before Druid 30 where the newer version runs a worker task, while the older version runs a controller task can fail. The patch removes that verification check till its safe to add it back.
2024-08-14 10:22:19 +05:30
Abhishek Radhakrishnan acadc2df20
Handle Delta StructType, ArrayType and MapType (#16884)
Handle the following Delta complex types:
a. StructType as JSON
b. ArrayType as Java list
c. MapType as Java map

Generate and add a new Delta table complex-types-table that contains the above complex types for testing.

Update the tests to include a parameterized test with complex-types-table, with the expectations defined in ComplexTypesDeltaTable.java.
2024-08-13 07:50:03 -07:00
Adarsh Sanjeev c6da2f30e8
Add fieldReader for row based frames (#16707)
Add a new fieldReaders#makeRAC for RowBasedFrameRowsAndColumns.
2024-08-13 14:04:41 +05:30
Rishabh Singh f67ff92d07
[bugfix] Run cold schema refresh thread periodically (#16873)
* Fix build

* Run coldSchemaExec thread periodically

* Bugfix: Run cold schema refresh periodically

* Rename metrics for deep storage only segment schema process
2024-08-13 11:44:01 +05:30
Abhishek Radhakrishnan d7dfbebf97
[Docs]: Fix typo and update broadcast rules section (#16882)
* Fix typo in waitUntilSegmentsLoad.

* Add a note on configuring druid.segmentCache.locations for broadcast rules.

* Update docs/operations/rule-configuration.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-08-12 13:55:33 -07:00
Gian Merlino efe0044f9e
Use fuzzy matchers for compaction bytes asserts. (#16870)
* Use fuzzy matchers for compaction bytes asserts.

This still enables us to test that the bytes are zero and nonzero
when they're supposed to be, without having to ge them exactly
right. The need to get bytes exactly right makes it difficult to
ensure ITs pass when making changes to default segment metadata.

* Additional fuzziness.
2024-08-12 10:00:33 +08:00
Rushikesh Bankar 4ef4e75c5d
Fix the issue of missing implementation of IndexerTaskCountStatsProvider for peons (#16875)
Bug description:
Peons to fail to start up when `WorkerTaskCountStatsMonitor` is used on MiddleManagers.
This is because MiddleManagers pass on their properties to peons and peons are unable to
find `IndexerTaskCountStatsProvider` as that is bound only for indexer nodes.

Fix:
Check if node is an indexer before trying to get instance of `IndexerTaskCountStatsProvider`.
2024-08-10 14:53:16 +05:30
Vadim Ogievetsky 483a03f26c
Web console: Server context defaults (#16868)
* add server defaults

* null is NULL

* r to d

* add test

* typo
2024-08-09 14:46:59 -07:00
Adithya Chakilam a7dd436a32
Check if supervisor could be idle on startup (#16844)
Fixes #13936 

In cases where a supervisor is idle and the overlord is restarted for some reason, the supervisor would
start spinning tasks again. In clusters where there are many low throughput streams, this would spike
the task count unnecessarily.

This commit compares the latest stream offset with the ones in metadata during the startup of supervisor
and sets it to idle state if they match.
2024-08-09 14:42:48 +05:30
Akshat Jain 3d6cedb25f
Fix IndexOutOfBoundsException for MSQ window function queries with empty RAC (#16865)
* Fix IndexOutOfBoundsException for MSQ window function queries with empty RAC
2024-08-09 11:39:53 +05:30
zachjsh cb09b572e6
Fix Druid table schema resolution when table defined in catalog and has schema manager (#16869)
* SQL syntax error should target USER persona

* * revert change to queryHandler and related tests, based on review comments

* * add test

* Properly handle Druid schema blending with catalog definition and segment metadata

* * add javadocs
2024-08-08 21:21:03 -04:00
Clint Wylie 6cd8c6be22
fix IndexedStringDruidPredicateIndexes to not needlessly lookup index of values (#16860) 2024-08-07 23:29:56 -07:00
Akshat Jain 7f67d26dfa
Reduce logging in RetryableS3OutputStream (#16853)
This PR reduces logging in RetryableS3OutputStream.
2024-08-08 10:42:40 +05:30
Zoltan Haindrich 408702e100
Add ability to run MSQ in Quidem tests (#16798)
* implements some jdbc facade to enable msq usage
* adds an !msqPlan command
* adds more guice usage to testsystem startup
2024-08-08 06:37:06 +02:00
Hardik Bajaj 1cf3f4bebe
Fix Concurrent Task Insertion in pendingCompletionTaskGroups (#16834)
Fix streaming task failures that may arise due to concurrent task insertion in pendingCompletionTaskGroups
2024-08-08 08:37:27 +05:30
aaronm-bi ceed4a0634
Docs: Update list of ingestion types that support concurrent append and replace (#16852) 2024-08-08 08:06:22 +05:30
Vadim Ogievetsky 56c03582cf
support kinesis input format (#16850) 2024-08-07 10:24:28 -07:00
Rishabh Singh c6a7ab005f
Increase query cancellation timeout in the router (#16656)
* Fix build

* Increase query cancellation timeout in router

* Increase cancellation timeout to 5 seconds
2024-08-07 20:29:35 +05:30
Atul Mohan 76ad17fb4c
Add config for http client connect timeout (#16831)
Adds a configuration clientConnectTimeout to our http client config which controls the connection timeout for our http client requests.

It was observed that on busy K8S clusters, the default connect timeout of 500ms is sometimes not enough time to complete syn/acks for a request and in these cases, the requests timeout with the error:
exceptionType=java.net.SocketTimeoutException, exceptionMessage=Connect Timeout
This behavior was mostly observed on the router while forwarding queries to the broker.
Having a slightly higher connect timeout helped resolve these issues.
2024-08-07 19:31:10 +05:30
Sree Charan Manamala 84192b11d7
Benchmark for window functions (#16824) 2024-08-07 11:07:11 +02:00