Commit Graph

14446 Commits

Author SHA1 Message Date
Pranav a95397e712
Allow request headers in HttpInputSource in native and MSQ Ingestion (#16974)
Support for adding the request headers in http input source. we can now pass the additional headers as json in both native and MSQ.
2024-09-12 11:18:44 +05:30
Rishabh Singh a18f582ef0
Skip refresh for unused segments in metadata cache (#16990)
* Skip refresh for unused segments in metadata cache

* Cover the condition where a used segment missing schema is marked for refresh

* Fix test
2024-09-12 10:39:59 +05:30
George Shiqi Wu 428f58cf15
Support maxColumnsToMerge in supervisor tuningConfig (#17030)
* support maxColumnsToMerge in supervisor specs

* remove log line

* fix style

* add docs

* fix unit tests
2024-09-11 18:00:13 -04:00
Vadim Ogievetsky 9e1544e9c4
Fix maxRowsInMemory default for streaming (#17028)
* fix maxRowsInMemory

* fix button css
2024-09-11 08:43:00 -07:00
Sébastien 5de84253d8
Web console query view improvements (#16991)
* Made maxNumTaskOptions configurable in the Query view

* Updated the copy for taskAssignment options

* Reordered options in engine menu for msq engine

* fixed snapshot

* maxNumTaskOptions -> maxTasksOptions

* added back select destination item

* fixed duplicate menu item

* snapshot

* Added the ability to hide certain engine menu options

* Added the ability to hide/show more menu items

* -> fn

* -> fn
2024-09-10 11:34:49 -07:00
aho135 2427972c10
Implement segment range threshold for automatic query prioritization (#17009)
Implements threshold based automatic query prioritization using the time period of the actual segments scanned. This differs from the current implementation of durationThreshold which uses the duration in the user supplied query. There are some usability constraints with using durationThreshold from the user supplied query, especially when using SQL. For example, if a client does not explicitly specify both start and end timestamps then the duration is extremely large and will always exceed the configured durationThreshold. This is one example interval from a query that specifies no end timestamp:
"interval":["2024-08-30T08:05:41.944Z/146140482-04-24T15:36:27.903Z"]. This interval is generated from a query like SELECT * FROM table WHERE __time > CURRENT_TIMESTAMP - INTERVAL '15' HOUR. Using the time period of the actual segments scanned allows proper prioritization without explicitly having to specify start and end timestamps. This PR adds onto #9493
2024-09-10 15:01:52 +05:30
Sree Charan Manamala c7c3307e61
Fix String Frame Readers to read String Arrays correctly (#16885)
While writing to a frame, String arrays are written by setting the multivalue byte.
But while reading, it was hardcoded to false.
2024-09-10 14:20:54 +05:30
Laksh Singla 72fbaf2e56
Non querying tasks shouldn't use processing buffers / merge buffers (#16887)
Tasks that do not support querying or query processing i.e. supportsQueries = false do not require processing threads, processing buffers, and merge buffers.
2024-09-10 11:36:36 +05:30
Abhishek Agarwal 78775ad398
Prepare master for 32.0.0 release (#17022) 2024-09-10 11:01:20 +05:30
Clint Wylie f57cd6f7af
transition away from StorageAdapter (#16985)
* transition away from StorageAdapter
changes:
* CursorHolderFactory has been renamed to CursorFactory and moved off of StorageAdapter, instead fetched directly from the segment via 'asCursorFactory'. The previous deprecated CursorFactory interface has been merged into StorageAdapter
* StorageAdapter is no longer used by any engines or tests and has been marked as deprecated with default implementations of all methods that throw exceptions indicating the new methods to call instead
* StorageAdapter methods not covered by CursorFactory (CursorHolderFactory prior to this change) have been moved into interfaces which are retrieved by Segment.as, the primary classes are the previously existing Metadata, as well as new interfaces PhysicalSegmentInspector and TopNOptimizationInspector
* added UnnestSegment and FilteredSegment that extend WrappedSegmentReference since their StorageAdapter implementations were previously provided by WrappedSegmentReference
* added PhysicalSegmentInspector which covers some of the previous StorageAdapter functionality which was primarily used for segment metadata queries and other metadata uses, and is implemented for QueryableIndexSegment and IncrementalIndexSegment
* added TopNOptimizationInspector to cover the oddly specific StorageAdapter.hasBuiltInFilters implementation, which is implemented for HashJoinSegment, UnnestSegment, and FilteredSegment
* Updated all engines and tests to no longer use StorageAdapter
2024-09-09 14:55:29 -07:00
Abhishek Radhakrishnan f4261c0e4d
Add Delta snapshot version to the web-console (#17023)
* Web-console change to add Delta snapshot version.

Web-console change for https://github.com/apache/druid/pull/17004.

* Update web-console/src/druid-models/input-source/input-source.tsx

* Update web-console/src/druid-models/ingestion-spec/ingestion-spec.tsx
2024-09-09 11:49:53 -07:00
Abhishek Radhakrishnan aa833a711c
Support for reading Delta Lake table snapshots (#17004)
Problem
Currently, the delta input source only supports reading from the latest snapshot of the given Delta Lake table. This is a known documented limitation.

Description
Add support for reading Delta snapshot. By default, the Druid-Delta connector reads the latest snapshot of the Delta table in order to preserve compatibility. Users can specify a snapshotVersion to ingest change data events from Delta tables into Druid.

In the future, we can also add support for time-based snapshot reads. The Delta API to read time-based snapshots is not clear currently.
2024-09-09 14:12:48 +05:30
Sree Charan Manamala 51fe3c08ab
Window Functions : Reject MVDs during window processing (#17002)
This commit aims to reject MVDs in window processing as we do not support them.
Earlier to this commit, query running a window aggregate partitioned by an MVD column would fail with ClassCastException
2024-09-09 12:07:54 +05:30
Rishabh Singh 67f5aa65e7
Set response type `application/json` in CustomExceptionMapper to return correct failure message (#17016)
* Add produces annotation to ParallelIndexSupervisorTask#report

* change to application/json

* Set response type in CustomExceptionMapper instead
2024-09-09 12:07:05 +05:30
Adarsh Sanjeev 616c46c958
Add framework for running MSQ tests with taskSpec instead of SQL (#16970)
* Add framework for running MSQ tests with taskSpec instead of SQL

* Allow configurable datasegment for tests

* Add test

* Revert "Add test"

This reverts commit 79fb241545.

* Revert "Allow configurable datasegment for tests"

This reverts commit caf04ede2b.
2024-09-09 11:38:28 +05:30
Vishesh Garg 37d4174245
Compute `range` partitionsSpec using effective `maxRowsPerSegment` (#16987)
In the compaction config, a range type partitionsSpec supports setting one of maxRowsPerSegment and targetRowsPerSegment. When compaction is run with the native engine, while maxRowsPerSegment = x results in segments of size x, targetRowsPerSegment = y results in segments of size 1.5 * y.

MSQ only supports rowsPerSegment = x as part of its tuning config, the resulting segment size being approx. x -- which is in line with maxRowsPerSegment behaviour in native compaction.

This PR makes the following changes:

use effective maxRowsPerSegment to pass as rowsPerSegment parameter for MSQ
persist rowsPerSegment as maxRowsPerSegment in lastCompactionState for MSQ
Use effective maxRowsPerSegment-based range spec in CompactionStatus check for both Native and MSQ.
2024-09-09 10:53:58 +05:30
Parth Agrawal b7a21a9f67
Revert "[CVE Fixes] Update version of Nimbus.jose.jwt (#16320)" (#16986)
This reverts commit f1d24c868f.

Updating nimbus to version 9+ is causing HTTP ERROR 500 java.lang.NoSuchMethodError: 'net.minidev.json.JSONObject com.nimbusds.jwt.JWTClaimsSet.toJSONObject()'
Refer to SAP/cloud-security-services-integration-library#429 (comment) for more details.

We would need to upgrade other libraries as well for updating nimbus.jose.jwt
2024-09-09 10:11:58 +05:30
Clint Wylie b0f36c1b89
fix bug with CastOperatorConversion with types which cannot be mapped to native druid types (#17011) 2024-09-06 17:07:32 -07:00
Edgar Melendrez 48a758ee08
[docs] reverting changes for sql-functions.md (#17019) 2024-09-06 16:07:32 -07:00
Katya Macedo 94b0705109
Docs - Update the architecture diagram (#17007) 2024-09-06 12:21:27 -07:00
Edgar Melendrez 2d9e92ce78
[docs] Batch11 date and time functions (#16926)
* first draft of functions

* minor improvments

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-scalar.md

* Apply suggestions from code review

Accepted as is

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* applying next round of suggestions

* fixing missing column name

* addressing floor and ceil functions

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* re-wording TIMESTAMPADD

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-09-06 12:20:47 -07:00
Edgar Melendrez ed811262e3
[docs] Batch13 IP functions (#16947)
* new datasource

* reviewing before pr

* Update docs/querying/sql-functions.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Applying suggestions to IPV4_PARSE

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-09-06 12:19:36 -07:00
Adarsh Sanjeev 73ff9f9047
Convert MSQTerminalStageSpecFactory into an interface (#16996)
* Convert MSQTerminalStageSpecFactory into an interface

* Rename class and remove useless constructor
2024-09-06 09:56:35 +05:30
Virushade 476b205efa
Docs: Fix language in Schema Design docs (#17010) 2024-09-06 08:48:00 +05:30
Gian Merlino 175636b28f
Frame writers: Coerce numeric and array types in certain cases. (#16994)
This patch adds "TypeCastSelectors", which is used when writing frames to
perform two coercions:

- When a numeric type is desired and the underlying type is non-numeric or
  unknown, the underlying selector is wrapped, "getObject" is called and the
  result is coerced using "ExprEval.ofType". This differs from the prior
  behavior where the primitive methods like "getLong", "getDouble", etc, would
  be called directly. This fixes an issue where a column would be read as
  all-zeroes when its SQL type is numeric and its physical type is string, which
  can happen when evolving a column's type from string to number.

-  When an array type is desired, the underlying selector is wrapped,
   "getObject" is called, and the result is coerced to Object[]. This coercion
   replaces some earlier logic from #15917.
2024-09-05 17:20:00 -07:00
Edgar Melendrez c49dc83b22
[docs] batch 12: reduction functions (#16930)
* [docs] batch 12: reduction functions

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

* applying suggestions

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-09-05 17:02:45 -07:00
Vadim Ogievetsky dc5c55a836
Web console: better tooltip when no size is available (#17008)
* better tooltip when no size is available

* better labels for columns

* fix label in segments view
2024-09-05 13:51:03 -07:00
Kashif Faraz ba6f804f48
Fix compaction status API response (#17006)
Description:
#16768 introduces new compaction APIs on the Overlord `/compact/status` and `/compact/progress`.
But the corresponding `OverlordClient` methods do not return an object compatible with the actual
endpoints defined in `OverlordCompactionResource`.

This patch ensures that the objects are compatible.

Changes:
- Add `CompactionStatusResponse` and `CompactionProgressResponse`
- Use these as the return type in `OverlordClient` methods and as the response entity in `OverlordCompactionResource`
- Add `SupervisorCleanupModule` bound on the Coordinator to perform cleanup of supervisors.
Without this module, Coordinator cannot deserialize compaction supervisors.
2024-09-05 23:22:01 +05:30
Jill Osborne b4d83a86c2
Middle Manager wording update in docs (#17005) 2024-09-05 10:25:30 -07:00
Rishabh Singh 40f38f0191
Remove migrated deep storage standard ITs (#16933) 2024-09-05 16:07:33 +05:30
Rishabh Singh 18a9a7570a
Log a small subset of segments to refresh for debugging Coordinator refresh logic (#16998)
* Log a small number of segments to refresh per datasource in the Coordinator

* review comments

* Update log message
2024-09-05 11:00:25 +05:30
Rishabh Singh 39161b0b23
Use vault.centos.org to build Hadoop docker image (#16999)
The Dockerfile for building hadoop image is broken due to Centos 7 EOL.
Fixed it as per https://serverfault.com/a/1161847.
2024-09-05 10:36:55 +05:30
Rishabh Singh 4e02e5b856
Remove alert for pre-existing new columns while merging realtime schema (#16989)
Currently, an alert is thrown while merging datasource schema with realtime
segment schema when the datasource schema already has update columns from the delta schema.

This isn't an error condition since the datasource schema can have those columns from a different segment.

One scenario in which this can occur is when multiple replicas for a task is run.
2024-09-05 07:58:24 +05:30
Hugh Evans 9162339fa8
Replace dsql instructions in example (#16977) 2024-09-04 12:45:58 -07:00
AmatyaAvadhanula bfbd21bce0
Revert "Add integration tests for concurrent append and replace (#16755)" (#17000)
This reverts commit 70bad948e3.
2024-09-04 23:36:49 +05:30
Katya Macedo 03c37b3143
Fix spelling (#17001) 2024-09-04 13:33:17 -04:00
Laksh Singla b698440bfe
suppress cve (#16997) 2024-09-04 19:37:23 +05:30
Vishesh Garg e28424ea25
Enable rollup on multi-value dimensions for compaction with MSQ engine (#16937)
Currently compaction with MSQ engine doesn't work for rollup on multi-value dimensions (MVDs), the reason being the default behaviour of grouping on MVD dimensions to unnest the dimension values; for instance grouping on `[s1,s2]` with aggregate `a` will result in two rows: `<s1,a>` and `<s2,a>`. 

This change enables rollup on MVDs (without unnest) by converting MVDs to Arrays before rollup using virtual columns, and then converting them back to MVDs using post aggregators. If segment schema is available to the compaction task (when it ends up downloading segments to get existing dimensions/metrics/granularity), it selectively does the MVD-Array conversion only for known multi-valued columns; else it conservatively performs this conversion for all `string` columns.
2024-09-04 16:28:04 +05:30
Gian Merlino 76b8c20f4d
Create fewer temporary maps when querying sys.segments. (#16981)
Eliminates two map creations (availableSegmentMetadata, partialSegmentDataMap).
The segmentsAlreadySeen set remains.
2024-09-03 20:04:44 -07:00
Clint Wylie 57bf053dc9
remove compiler warnings about unqualified calls to yield() (#16995) 2024-09-03 20:04:30 -07:00
Gian Merlino 57c4b552d9
Fix logical merge conflict in SuperSorterTest. (#16993)
Logical merge conflict between #16911 and #16914.
2024-09-03 16:14:59 -04:00
Hardik Bajaj 2ef936be40
Update Documentation on meregeBuffer/pendingRequests for Real-time nodes (#16992)
#15025 adds mergeBuffer/pendingRequests metric in QueryCountStatsMonitor. Since real-time nodes also use the same merge buffers for queries and have QueryCountStatsMonitor , the documentation is being updated to include this metric.
2024-09-04 00:25:09 +05:30
Gian Merlino 786c959e9e
MSQ: Add limitHint to global-sort shuffles. (#16911)
* MSQ: Add limitHint to global-sort shuffles.

This allows pushing down limits into the SuperSorter.

* Test fixes.

* Add limitSpec to ScanQueryKit. Fix SuperSorter tracking.
2024-09-03 09:05:29 -07:00
AmatyaAvadhanula 70bad948e3
Add integration tests for concurrent append and replace (#16755)
IT for streaming tasks with concurrent compaction
2024-09-03 14:58:15 +05:30
Sree Charan Manamala 619d8ef964
Window Functions : Numeric Arrays Frame Column Writers - fix class cast exception (#16983)
Fix ClassCastException in ArrayFrameCoulmnWriters
2024-09-03 11:44:52 +05:30
Akshat Jain ca17beb77b
Fix issues with window functions wrt virtual columns when using ARRAY_CONCAT_AGG aggregator (#16971)
* Fix issues with window functions wrt virtual columns when using ARRAY_CONCAT_AGG aggregator

* Address review comment

* Address review comment
2024-09-03 11:41:40 +05:30
Kashif Faraz a7349442e4
Fix computed value of maxSegmentsToMove when coordinator period is very low (#16984)
Bug: When coordinator period is less than 30s, `maxSegmentsToMove` is always computed as 0
irrespective of number of available threads.

Changes:
- Fix lower bound condition and set minimum value to 100.
- Add new test which fails without this fix
2024-09-02 14:35:29 +05:30
Zoltan Haindrich 32e8e074ae
Planning could have failed if UNION ALL operator was completely removed (#16946) 2024-09-02 04:37:10 -04:00
Jonathan Wei 088c33759d
Allow druid.azure.account to be nullable (#16960) 2024-09-02 12:05:51 +05:30
Kashif Faraz fe3d589ff9
Run compaction as a supervisor on Overlord (#16768)
Description
-----------
Auto-compaction currently poses several challenges as it:
1. may get stuck on a failing interval.
2. may get stuck on the latest interval if more data keeps coming into it.
3. always picks the latest interval regardless of the level of compaction in it.
4. may never pick a datasource if its intervals are not very recent.
5. requires setting an explicit period which does not cater to the changing needs of a Druid cluster.

This PR introduces various improvements to compaction scheduling to tackle the above problems.

Change Summary
--------------
1. Run compaction for a datasource as a supervisor of type `autocompact` on Overlord.
2. Make compaction policy extensible and configurable.
3. Track status of recently submitted compaction tasks and pass this info to policy.
4. Add `/simulate` API on both Coordinator and Overlord to run compaction simulations.
5. Redirect compaction status APIs to the Overlord when compaction supervisors are enabled.
2024-09-02 07:53:13 +05:30