14406 Commits

Author SHA1 Message Date
Zoltan Haindrich
a16b75a42c
Window Functions : Context Parameter to Enable Transfer of RACs over wire (#17150) (#17182)
(cherry picked from commit 661614129ea2f85156701c6a419ce79c2b6d04bf)

Co-authored-by: Sree Charan Manamala <sree.manamala@imply.io>
2024-09-30 10:35:14 +05:30
Clint Wylie
51bea56948
add VirtualColumns.findEquivalent and VirtualColumn.EquivalenceKey (#17084) (#17166) 2024-09-27 17:54:06 -07:00
Victoria Lim
bcc3da6f98
[Backport] DIV not implemented in Druid 30 and earlier (#17095)
Co-authored-by: Edgar Melendrez <evmelendrez@gmail.com>
2024-09-26 10:03:57 -07:00
Cece Mei
a0c842e98b
Create a FilterBundle.Builder class and use it to construct FilterBundle. (#17055) (#17159) 2024-09-25 18:04:32 -07:00
Akshat Jain
986bc62b88
MSQ window functions: Fix boost column not being written to the frame in window stage (#17155) 2024-09-25 16:31:37 +05:30
Akshat Jain
627752922c
MSQ window functions: Reject MVDs during window processing (#17036) (#17127)
* MSQ window functions: Reject MVDs during window processing

* MSQ window functions: Reject MVDs during window processing

* Remove parameterization from MSQWindowTest
2024-09-25 15:00:02 +05:30
Kashif Faraz
8059b86c7f
Cleanup Coordinator logs, add duty status API (#16959) (#17154)
Description
-----------
Coordinator logs are fairly noisy and don't give much useful information (see example below).
Even when the Coordinator misbehaves, these logs are not very useful.

Main changes
------------
- Add API `GET /druid/coordinator/v1/duties` that returns a status list of all duty groups currently running on the Coordinator
- Emit metrics `segment/poll/time`, `segment/pollWithSchema/time`, `segment/buildSnapshot/time`
- Remove redundant logs that indicate normal operation of well-tested aspects of the Coordinator

Refactors
---------
- Move some logic from `DutiesRunnable` to `CoordinatorDutyGroup`
- Move stats collection from `CollectSegmentAndServerStats` to `PrepareBalancerAndLoadQueues`
- Minor cleanup of class `DruidCoordinator`
- Clean up class `DruidCoordinatorRuntimeParams`
  - Remove field `coordinatorStartTime`. Maintain start time in `MarkOvershadowedSegmentsAsUnused` instead.
  - Remove field `MetadataRuleManager`. Pass supplier to constructor of applicable duties instead.
  - Make `usedSegmentsNewestFirst` and `datasourcesSnapshot` as non-nullable as they are always required.
2024-09-25 14:59:53 +05:30
Clint Wylie
1096728fa4
use CastToObjectVectorProcessor for cast to string (#17148) (#17149) 2024-09-24 21:15:45 -07:00
Kashif Faraz
d06327ab24
[Backport] Allow MSQ engine only for compaction supervisors (#17033) (#17143)
#16768 added the functionality to run compaction as a supervisor on the overlord.
This patch builds on top of that to restrict MSQ engine to compaction in the supervisor-mode only.
With these changes, users can no longer add MSQ engine as part of datasource compaction config,
or as the default cluster-level compaction engine, on the Coordinator. 

The patch also adds an Overlord runtime property `druid.supervisor.compaction.engine=<msq/native>`
to specify the default engine for compaction supervisors.

Since these updates require major changes to existing MSQ compaction integration tests,
this patch disables MSQ-specific compaction integration tests -- they will be taken up in a follow-up PR.

Key changed/added classes in this patch:
* CompactionSupervisor
* CompactionSupervisorSpec
* CoordinatorCompactionConfigsResource
* OverlordCompactionScheduler

Co-authored-by: Vishesh Garg <gargvishesh@gmail.com>
2024-09-25 09:29:00 +05:30
Clint Wylie
cf00b4cd24
various fixes and improvements to vectorization fallback (#17098) (#17142)
changes:
* add `ApplyFunction` support to vectorization fallback, allowing many of the remaining expressions to be vectorized
* add `CastToObjectVectorProcessor` so that vector engine can correctly cast any type
* add support for array and complex vector constants
* reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness)
* fix array constructor expression, apply map expression to make actual evaluated type match the output type inference
* fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric
* fix isNull/isNotNull to correctly handle any type of input argument
2024-09-24 16:40:49 -07:00
Abhishek Radhakrishnan
0ae9988796
Support Iceberg ingestion from REST based catalogs (#17124) (#17145)
Adds support to the iceberg input source to read from Iceberg REST Catalogs.

Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com>
2024-09-24 12:09:27 -07:00
Sree Charan Manamala
b7cc0bb343
Window Functions : Remove enable windowing flag (#17087) (#17128)
(cherry picked from commit 67d361c9bfc2b1bf37d5522fa9d9af1e445a03df)
2024-09-24 10:28:11 +02:00
Abhishek Radhakrishnan
6c0ca77be4
Add Delta snapshot version to the web-console (#17023) (#17119)
Adds snapshot.version to the delta input source in the web-console:
2024-09-23 11:36:24 +05:30
Sree Charan Manamala
0c58f88ded
Add serde for ColumnBasedRowsAndColumns to fix window queries without group by (#16658) (#17111)
Register a Ser-De for RowsAndColumns so that the window operator query running on leaf operators would be transferred properly on the wire. Would fix the empty response given by window queries without group by on the native engine.

(cherry picked from commit bb1c3c174944460c22c6dd153579dd18994b1f60)
2024-09-20 11:34:35 +02:00
Laksh Singla
2f13cd2500
Support maxSubqueryBytes for window functions (#16800) (#17085)
Window queries now acknowledge maxSubqueryBytes.
2024-09-19 19:31:17 +05:30
Rishabh Singh
60ed36c89b
Skip tombstone segment refresh in metadata cache (#17025) (#17112)
This PR #16890 introduced a change to skip adding tombstone segments to the cache.
It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache.

The fix is to keep the segment in the metadata cache but skip them from refresh.

This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
2024-09-19 14:39:15 +05:30
Sree Charan Manamala
11727af2a6
Fix String Frame Readers to read String Arrays correctly (#16885) (#17103)
While writing to a frame, String arrays are written by setting the multivalue byte.
But while reading, it was hardcoded to false.

(cherry picked from commit c7c3307e6193db8ddc879f48bbf3b9e3d1b41a6c)
2024-09-19 09:02:12 +05:30
Akshat Jain
52929ed24a
Handle memory leaks from Mockito inline mocks (#17104) 2024-09-18 11:36:59 -07:00
Rishabh Singh
a63ac2590a
Skip refresh for unused segments in metadata cache (#16990) (#17079)
* Skip refresh for unused segments in metadata cache

* Cover the condition where a used segment missing schema is marked for refresh

* Fix test
2024-09-17 17:18:53 -07:00
Clint Wylie
c462e103b6
transition away from StorageAdapter (#16985) (#17024)
* transition away from StorageAdapter
changes:
* CursorHolderFactory has been renamed to CursorFactory and moved off of StorageAdapter, instead fetched directly from the segment via 'asCursorFactory'. The previous deprecated CursorFactory interface has been merged into StorageAdapter
* StorageAdapter is no longer used by any engines or tests and has been marked as deprecated with default implementations of all methods that throw exceptions indicating the new methods to call instead
* StorageAdapter methods not covered by CursorFactory (CursorHolderFactory prior to this change) have been moved into interfaces which are retrieved by Segment.as, the primary classes are the previously existing Metadata, as well as new interfaces PhysicalSegmentInspector and TopNOptimizationInspector
* added UnnestSegment and FilteredSegment that extend WrappedSegmentReference since their StorageAdapter implementations were previously provided by WrappedSegmentReference
* added PhysicalSegmentInspector which covers some of the previous StorageAdapter functionality which was primarily used for segment metadata queries and other metadata uses, and is implemented for QueryableIndexSegment and IncrementalIndexSegment
* added TopNOptimizationInspector to cover the oddly specific StorageAdapter.hasBuiltInFilters implementation, which is implemented for HashJoinSegment, UnnestSegment, and FilteredSegment
* Updated all engines and tests to no longer use StorageAdapter
2024-09-09 21:43:41 -07:00
abhishekagarwal87
2061c220b8 Prepare the release branch 2024-09-09 20:17:24 +05:30
Abhishek Radhakrishnan
aa833a711c
Support for reading Delta Lake table snapshots (#17004)
Problem
Currently, the delta input source only supports reading from the latest snapshot of the given Delta Lake table. This is a known documented limitation.

Description
Add support for reading Delta snapshot. By default, the Druid-Delta connector reads the latest snapshot of the Delta table in order to preserve compatibility. Users can specify a snapshotVersion to ingest change data events from Delta tables into Druid.

In the future, we can also add support for time-based snapshot reads. The Delta API to read time-based snapshots is not clear currently.
2024-09-09 14:12:48 +05:30
Sree Charan Manamala
51fe3c08ab
Window Functions : Reject MVDs during window processing (#17002)
This commit aims to reject MVDs in window processing as we do not support them.
Earlier to this commit, query running a window aggregate partitioned by an MVD column would fail with ClassCastException
2024-09-09 12:07:54 +05:30
Rishabh Singh
67f5aa65e7
Set response type application/json in CustomExceptionMapper to return correct failure message (#17016)
* Add produces annotation to ParallelIndexSupervisorTask#report

* change to application/json

* Set response type in CustomExceptionMapper instead
2024-09-09 12:07:05 +05:30
Adarsh Sanjeev
616c46c958
Add framework for running MSQ tests with taskSpec instead of SQL (#16970)
* Add framework for running MSQ tests with taskSpec instead of SQL

* Allow configurable datasegment for tests

* Add test

* Revert "Add test"

This reverts commit 79fb241545ebce0d136873a4b1045191c40542ae.

* Revert "Allow configurable datasegment for tests"

This reverts commit caf04ede2b2b5b27bfa2ac0712b3a260cc65737e.
2024-09-09 11:38:28 +05:30
Vishesh Garg
37d4174245
Compute range partitionsSpec using effective maxRowsPerSegment (#16987)
In the compaction config, a range type partitionsSpec supports setting one of maxRowsPerSegment and targetRowsPerSegment. When compaction is run with the native engine, while maxRowsPerSegment = x results in segments of size x, targetRowsPerSegment = y results in segments of size 1.5 * y.

MSQ only supports rowsPerSegment = x as part of its tuning config, the resulting segment size being approx. x -- which is in line with maxRowsPerSegment behaviour in native compaction.

This PR makes the following changes:

use effective maxRowsPerSegment to pass as rowsPerSegment parameter for MSQ
persist rowsPerSegment as maxRowsPerSegment in lastCompactionState for MSQ
Use effective maxRowsPerSegment-based range spec in CompactionStatus check for both Native and MSQ.
2024-09-09 10:53:58 +05:30
Parth Agrawal
b7a21a9f67
Revert "[CVE Fixes] Update version of Nimbus.jose.jwt (#16320)" (#16986)
This reverts commit f1d24c868f2cf6b2738c5342b2001fdb7ef2d2a0.

Updating nimbus to version 9+ is causing HTTP ERROR 500 java.lang.NoSuchMethodError: 'net.minidev.json.JSONObject com.nimbusds.jwt.JWTClaimsSet.toJSONObject()'
Refer to SAP/cloud-security-services-integration-library#429 (comment) for more details.

We would need to upgrade other libraries as well for updating nimbus.jose.jwt
2024-09-09 10:11:58 +05:30
Clint Wylie
b0f36c1b89
fix bug with CastOperatorConversion with types which cannot be mapped to native druid types (#17011) 2024-09-06 17:07:32 -07:00
Edgar Melendrez
48a758ee08
[docs] reverting changes for sql-functions.md (#17019) 2024-09-06 16:07:32 -07:00
Katya Macedo
94b0705109
Docs - Update the architecture diagram (#17007) 2024-09-06 12:21:27 -07:00
Edgar Melendrez
2d9e92ce78
[docs] Batch11 date and time functions (#16926)
* first draft of functions

* minor improvments

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-scalar.md

* Apply suggestions from code review

Accepted as is

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* applying next round of suggestions

* fixing missing column name

* addressing floor and ceil functions

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* re-wording TIMESTAMPADD

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-09-06 12:20:47 -07:00
Edgar Melendrez
ed811262e3
[docs] Batch13 IP functions (#16947)
* new datasource

* reviewing before pr

* Update docs/querying/sql-functions.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Applying suggestions to IPV4_PARSE

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-09-06 12:19:36 -07:00
Adarsh Sanjeev
73ff9f9047
Convert MSQTerminalStageSpecFactory into an interface (#16996)
* Convert MSQTerminalStageSpecFactory into an interface

* Rename class and remove useless constructor
2024-09-06 09:56:35 +05:30
Virushade
476b205efa
Docs: Fix language in Schema Design docs (#17010) 2024-09-06 08:48:00 +05:30
Gian Merlino
175636b28f
Frame writers: Coerce numeric and array types in certain cases. (#16994)
This patch adds "TypeCastSelectors", which is used when writing frames to
perform two coercions:

- When a numeric type is desired and the underlying type is non-numeric or
  unknown, the underlying selector is wrapped, "getObject" is called and the
  result is coerced using "ExprEval.ofType". This differs from the prior
  behavior where the primitive methods like "getLong", "getDouble", etc, would
  be called directly. This fixes an issue where a column would be read as
  all-zeroes when its SQL type is numeric and its physical type is string, which
  can happen when evolving a column's type from string to number.

-  When an array type is desired, the underlying selector is wrapped,
   "getObject" is called, and the result is coerced to Object[]. This coercion
   replaces some earlier logic from #15917.
2024-09-05 17:20:00 -07:00
Edgar Melendrez
c49dc83b22
[docs] batch 12: reduction functions (#16930)
* [docs] batch 12: reduction functions

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

* applying suggestions

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-09-05 17:02:45 -07:00
Vadim Ogievetsky
dc5c55a836
Web console: better tooltip when no size is available (#17008)
* better tooltip when no size is available

* better labels for columns

* fix label in segments view
2024-09-05 13:51:03 -07:00
Kashif Faraz
ba6f804f48
Fix compaction status API response (#17006)
Description:
#16768 introduces new compaction APIs on the Overlord `/compact/status` and `/compact/progress`.
But the corresponding `OverlordClient` methods do not return an object compatible with the actual
endpoints defined in `OverlordCompactionResource`.

This patch ensures that the objects are compatible.

Changes:
- Add `CompactionStatusResponse` and `CompactionProgressResponse`
- Use these as the return type in `OverlordClient` methods and as the response entity in `OverlordCompactionResource`
- Add `SupervisorCleanupModule` bound on the Coordinator to perform cleanup of supervisors.
Without this module, Coordinator cannot deserialize compaction supervisors.
2024-09-05 23:22:01 +05:30
Jill Osborne
b4d83a86c2
Middle Manager wording update in docs (#17005) 2024-09-05 10:25:30 -07:00
Rishabh Singh
40f38f0191
Remove migrated deep storage standard ITs (#16933) 2024-09-05 16:07:33 +05:30
Rishabh Singh
18a9a7570a
Log a small subset of segments to refresh for debugging Coordinator refresh logic (#16998)
* Log a small number of segments to refresh per datasource in the Coordinator

* review comments

* Update log message
2024-09-05 11:00:25 +05:30
Rishabh Singh
39161b0b23
Use vault.centos.org to build Hadoop docker image (#16999)
The Dockerfile for building hadoop image is broken due to Centos 7 EOL.
Fixed it as per https://serverfault.com/a/1161847.
2024-09-05 10:36:55 +05:30
Rishabh Singh
4e02e5b856
Remove alert for pre-existing new columns while merging realtime schema (#16989)
Currently, an alert is thrown while merging datasource schema with realtime
segment schema when the datasource schema already has update columns from the delta schema.

This isn't an error condition since the datasource schema can have those columns from a different segment.

One scenario in which this can occur is when multiple replicas for a task is run.
2024-09-05 07:58:24 +05:30
Hugh Evans
9162339fa8
Replace dsql instructions in example (#16977) 2024-09-04 12:45:58 -07:00
AmatyaAvadhanula
bfbd21bce0
Revert "Add integration tests for concurrent append and replace (#16755)" (#17000)
This reverts commit 70bad948e379dc07911d896bfcd23b5a8a149e32.
2024-09-04 23:36:49 +05:30
Katya Macedo
03c37b3143
Fix spelling (#17001) 2024-09-04 13:33:17 -04:00
Laksh Singla
b698440bfe
suppress cve (#16997) 2024-09-04 19:37:23 +05:30
Vishesh Garg
e28424ea25
Enable rollup on multi-value dimensions for compaction with MSQ engine (#16937)
Currently compaction with MSQ engine doesn't work for rollup on multi-value dimensions (MVDs), the reason being the default behaviour of grouping on MVD dimensions to unnest the dimension values; for instance grouping on `[s1,s2]` with aggregate `a` will result in two rows: `<s1,a>` and `<s2,a>`. 

This change enables rollup on MVDs (without unnest) by converting MVDs to Arrays before rollup using virtual columns, and then converting them back to MVDs using post aggregators. If segment schema is available to the compaction task (when it ends up downloading segments to get existing dimensions/metrics/granularity), it selectively does the MVD-Array conversion only for known multi-valued columns; else it conservatively performs this conversion for all `string` columns.
2024-09-04 16:28:04 +05:30
Gian Merlino
76b8c20f4d
Create fewer temporary maps when querying sys.segments. (#16981)
Eliminates two map creations (availableSegmentMetadata, partialSegmentDataMap).
The segmentsAlreadySeen set remains.
2024-09-03 20:04:44 -07:00
Clint Wylie
57bf053dc9
remove compiler warnings about unqualified calls to yield() (#16995) 2024-09-03 20:04:30 -07:00