druid

Commit Graph

Author	SHA1	Message	Date
Pranav	a95397e712	Allow request headers in HttpInputSource in native and MSQ Ingestion (#16974 ) Support for adding the request headers in http input source. we can now pass the additional headers as json in both native and MSQ.	2024-09-12 11:18:44 +05:30
Rishabh Singh	a18f582ef0	Skip refresh for unused segments in metadata cache (#16990 ) * Skip refresh for unused segments in metadata cache * Cover the condition where a used segment missing schema is marked for refresh * Fix test	2024-09-12 10:39:59 +05:30
George Shiqi Wu	428f58cf15	Support maxColumnsToMerge in supervisor tuningConfig (#17030 ) * support maxColumnsToMerge in supervisor specs * remove log line * fix style * add docs * fix unit tests	2024-09-11 18:00:13 -04:00
Vadim Ogievetsky	9e1544e9c4	Fix maxRowsInMemory default for streaming (#17028 ) * fix maxRowsInMemory * fix button css	2024-09-11 08:43:00 -07:00
Sébastien	5de84253d8	Web console query view improvements (#16991 ) * Made maxNumTaskOptions configurable in the Query view * Updated the copy for taskAssignment options * Reordered options in engine menu for msq engine * fixed snapshot * maxNumTaskOptions -> maxTasksOptions * added back select destination item * fixed duplicate menu item * snapshot * Added the ability to hide certain engine menu options * Added the ability to hide/show more menu items * -> fn * -> fn	2024-09-10 11:34:49 -07:00
aho135	2427972c10	Implement segment range threshold for automatic query prioritization (#17009 ) Implements threshold based automatic query prioritization using the time period of the actual segments scanned. This differs from the current implementation of durationThreshold which uses the duration in the user supplied query. There are some usability constraints with using durationThreshold from the user supplied query, especially when using SQL. For example, if a client does not explicitly specify both start and end timestamps then the duration is extremely large and will always exceed the configured durationThreshold. This is one example interval from a query that specifies no end timestamp: "interval":["2024-08-30T08:05:41.944Z/146140482-04-24T15:36:27.903Z"]. This interval is generated from a query like SELECT * FROM table WHERE __time > CURRENT_TIMESTAMP - INTERVAL '15' HOUR. Using the time period of the actual segments scanned allows proper prioritization without explicitly having to specify start and end timestamps. This PR adds onto #9493	2024-09-10 15:01:52 +05:30
Sree Charan Manamala	c7c3307e61	Fix String Frame Readers to read String Arrays correctly (#16885 ) While writing to a frame, String arrays are written by setting the multivalue byte. But while reading, it was hardcoded to false.	2024-09-10 14:20:54 +05:30
Laksh Singla	72fbaf2e56	Non querying tasks shouldn't use processing buffers / merge buffers (#16887 ) Tasks that do not support querying or query processing i.e. supportsQueries = false do not require processing threads, processing buffers, and merge buffers.	2024-09-10 11:36:36 +05:30
Abhishek Agarwal	78775ad398	Prepare master for 32.0.0 release (#17022 )	2024-09-10 11:01:20 +05:30
Clint Wylie	f57cd6f7af	transition away from StorageAdapter (#16985 ) * transition away from StorageAdapter changes: * CursorHolderFactory has been renamed to CursorFactory and moved off of StorageAdapter, instead fetched directly from the segment via 'asCursorFactory'. The previous deprecated CursorFactory interface has been merged into StorageAdapter * StorageAdapter is no longer used by any engines or tests and has been marked as deprecated with default implementations of all methods that throw exceptions indicating the new methods to call instead * StorageAdapter methods not covered by CursorFactory (CursorHolderFactory prior to this change) have been moved into interfaces which are retrieved by Segment.as, the primary classes are the previously existing Metadata, as well as new interfaces PhysicalSegmentInspector and TopNOptimizationInspector * added UnnestSegment and FilteredSegment that extend WrappedSegmentReference since their StorageAdapter implementations were previously provided by WrappedSegmentReference * added PhysicalSegmentInspector which covers some of the previous StorageAdapter functionality which was primarily used for segment metadata queries and other metadata uses, and is implemented for QueryableIndexSegment and IncrementalIndexSegment * added TopNOptimizationInspector to cover the oddly specific StorageAdapter.hasBuiltInFilters implementation, which is implemented for HashJoinSegment, UnnestSegment, and FilteredSegment * Updated all engines and tests to no longer use StorageAdapter	2024-09-09 14:55:29 -07:00
Abhishek Radhakrishnan	f4261c0e4d	Add Delta snapshot version to the web-console (#17023 ) * Web-console change to add Delta snapshot version. Web-console change for https://github.com/apache/druid/pull/17004. * Update web-console/src/druid-models/input-source/input-source.tsx * Update web-console/src/druid-models/ingestion-spec/ingestion-spec.tsx	2024-09-09 11:49:53 -07:00
Abhishek Radhakrishnan	aa833a711c	Support for reading Delta Lake table snapshots (#17004 ) Problem Currently, the delta input source only supports reading from the latest snapshot of the given Delta Lake table. This is a known documented limitation. Description Add support for reading Delta snapshot. By default, the Druid-Delta connector reads the latest snapshot of the Delta table in order to preserve compatibility. Users can specify a snapshotVersion to ingest change data events from Delta tables into Druid. In the future, we can also add support for time-based snapshot reads. The Delta API to read time-based snapshots is not clear currently.	2024-09-09 14:12:48 +05:30
Sree Charan Manamala	51fe3c08ab	Window Functions : Reject MVDs during window processing (#17002 ) This commit aims to reject MVDs in window processing as we do not support them. Earlier to this commit, query running a window aggregate partitioned by an MVD column would fail with ClassCastException	2024-09-09 12:07:54 +05:30
Rishabh Singh	67f5aa65e7	Set response type `application/json` in CustomExceptionMapper to return correct failure message (#17016 ) * Add produces annotation to ParallelIndexSupervisorTask#report * change to application/json * Set response type in CustomExceptionMapper instead	2024-09-09 12:07:05 +05:30
Adarsh Sanjeev	616c46c958	Add framework for running MSQ tests with taskSpec instead of SQL (#16970 ) * Add framework for running MSQ tests with taskSpec instead of SQL * Allow configurable datasegment for tests * Add test * Revert "Add test" This reverts commit `79fb241545`. * Revert "Allow configurable datasegment for tests" This reverts commit `caf04ede2b`.	2024-09-09 11:38:28 +05:30
Vishesh Garg	37d4174245	Compute `range` partitionsSpec using effective `maxRowsPerSegment` (#16987 ) In the compaction config, a range type partitionsSpec supports setting one of maxRowsPerSegment and targetRowsPerSegment. When compaction is run with the native engine, while maxRowsPerSegment = x results in segments of size x, targetRowsPerSegment = y results in segments of size 1.5 * y. MSQ only supports rowsPerSegment = x as part of its tuning config, the resulting segment size being approx. x -- which is in line with maxRowsPerSegment behaviour in native compaction. This PR makes the following changes: use effective maxRowsPerSegment to pass as rowsPerSegment parameter for MSQ persist rowsPerSegment as maxRowsPerSegment in lastCompactionState for MSQ Use effective maxRowsPerSegment-based range spec in CompactionStatus check for both Native and MSQ.	2024-09-09 10:53:58 +05:30
Parth Agrawal	b7a21a9f67	Revert "[CVE Fixes] Update version of Nimbus.jose.jwt (#16320 )" (#16986 ) This reverts commit `f1d24c868f`. Updating nimbus to version 9+ is causing HTTP ERROR 500 java.lang.NoSuchMethodError: 'net.minidev.json.JSONObject com.nimbusds.jwt.JWTClaimsSet.toJSONObject()' Refer to SAP/cloud-security-services-integration-library#429 (comment) for more details. We would need to upgrade other libraries as well for updating nimbus.jose.jwt	2024-09-09 10:11:58 +05:30
Clint Wylie	b0f36c1b89	fix bug with CastOperatorConversion with types which cannot be mapped to native druid types (#17011 )	2024-09-06 17:07:32 -07:00
Edgar Melendrez	48a758ee08	[docs] reverting changes for sql-functions.md (#17019 )	2024-09-06 16:07:32 -07:00
Katya Macedo	94b0705109	Docs - Update the architecture diagram (#17007 )	2024-09-06 12:21:27 -07:00
Edgar Melendrez	2d9e92ce78	[docs] Batch11 date and time functions (#16926 ) * first draft of functions * minor improvments * Update docs/querying/sql-functions.md * Update docs/querying/sql-scalar.md * Apply suggestions from code review Accepted as is Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * applying next round of suggestions * fixing missing column name * addressing floor and ceil functions * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * re-wording TIMESTAMPADD --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-09-06 12:20:47 -07:00
Edgar Melendrez	ed811262e3	[docs] Batch13 IP functions (#16947 ) * new datasource * reviewing before pr * Update docs/querying/sql-functions.md * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Applying suggestions to IPV4_PARSE --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-09-06 12:19:36 -07:00
Adarsh Sanjeev	73ff9f9047	Convert MSQTerminalStageSpecFactory into an interface (#16996 ) * Convert MSQTerminalStageSpecFactory into an interface * Rename class and remove useless constructor	2024-09-06 09:56:35 +05:30
Virushade	476b205efa	Docs: Fix language in Schema Design docs (#17010 )	2024-09-06 08:48:00 +05:30
Gian Merlino	175636b28f	Frame writers: Coerce numeric and array types in certain cases. (#16994 ) This patch adds "TypeCastSelectors", which is used when writing frames to perform two coercions: - When a numeric type is desired and the underlying type is non-numeric or unknown, the underlying selector is wrapped, "getObject" is called and the result is coerced using "ExprEval.ofType". This differs from the prior behavior where the primitive methods like "getLong", "getDouble", etc, would be called directly. This fixes an issue where a column would be read as all-zeroes when its SQL type is numeric and its physical type is string, which can happen when evolving a column's type from string to number. - When an array type is desired, the underlying selector is wrapped, "getObject" is called, and the result is coerced to Object[]. This coercion replaces some earlier logic from #15917.	2024-09-05 17:20:00 -07:00
Edgar Melendrez	c49dc83b22	[docs] batch 12: reduction functions (#16930 ) * [docs] batch 12: reduction functions * Update docs/querying/sql-functions.md * Update docs/querying/sql-functions.md * Update docs/querying/sql-functions.md * applying suggestions * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-09-05 17:02:45 -07:00
Vadim Ogievetsky	dc5c55a836	Web console: better tooltip when no size is available (#17008 ) * better tooltip when no size is available * better labels for columns * fix label in segments view	2024-09-05 13:51:03 -07:00
Kashif Faraz	ba6f804f48	Fix compaction status API response (#17006 ) Description: #16768 introduces new compaction APIs on the Overlord `/compact/status` and `/compact/progress`. But the corresponding `OverlordClient` methods do not return an object compatible with the actual endpoints defined in `OverlordCompactionResource`. This patch ensures that the objects are compatible. Changes: - Add `CompactionStatusResponse` and `CompactionProgressResponse` - Use these as the return type in `OverlordClient` methods and as the response entity in `OverlordCompactionResource` - Add `SupervisorCleanupModule` bound on the Coordinator to perform cleanup of supervisors. Without this module, Coordinator cannot deserialize compaction supervisors.	2024-09-05 23:22:01 +05:30
Jill Osborne	b4d83a86c2	Middle Manager wording update in docs (#17005 )	2024-09-05 10:25:30 -07:00
Rishabh Singh	40f38f0191	Remove migrated deep storage standard ITs (#16933 )	2024-09-05 16:07:33 +05:30
Rishabh Singh	18a9a7570a	Log a small subset of segments to refresh for debugging Coordinator refresh logic (#16998 ) * Log a small number of segments to refresh per datasource in the Coordinator * review comments * Update log message	2024-09-05 11:00:25 +05:30
Rishabh Singh	39161b0b23	Use vault.centos.org to build Hadoop docker image (#16999 ) The Dockerfile for building hadoop image is broken due to Centos 7 EOL. Fixed it as per https://serverfault.com/a/1161847.	2024-09-05 10:36:55 +05:30
Rishabh Singh	4e02e5b856	Remove alert for pre-existing new columns while merging realtime schema (#16989 ) Currently, an alert is thrown while merging datasource schema with realtime segment schema when the datasource schema already has update columns from the delta schema. This isn't an error condition since the datasource schema can have those columns from a different segment. One scenario in which this can occur is when multiple replicas for a task is run.	2024-09-05 07:58:24 +05:30
Hugh Evans	9162339fa8	Replace dsql instructions in example (#16977 )	2024-09-04 12:45:58 -07:00
AmatyaAvadhanula	bfbd21bce0	Revert "Add integration tests for concurrent append and replace (#16755 )" (#17000 ) This reverts commit `70bad948e3`.	2024-09-04 23:36:49 +05:30
Katya Macedo	03c37b3143	Fix spelling (#17001 )	2024-09-04 13:33:17 -04:00
Laksh Singla	b698440bfe	suppress cve (#16997 )	2024-09-04 19:37:23 +05:30
Vishesh Garg	e28424ea25	Enable rollup on multi-value dimensions for compaction with MSQ engine (#16937 ) Currently compaction with MSQ engine doesn't work for rollup on multi-value dimensions (MVDs), the reason being the default behaviour of grouping on MVD dimensions to unnest the dimension values; for instance grouping on `[s1,s2]` with aggregate `a` will result in two rows: `<s1,a>` and `<s2,a>`. This change enables rollup on MVDs (without unnest) by converting MVDs to Arrays before rollup using virtual columns, and then converting them back to MVDs using post aggregators. If segment schema is available to the compaction task (when it ends up downloading segments to get existing dimensions/metrics/granularity), it selectively does the MVD-Array conversion only for known multi-valued columns; else it conservatively performs this conversion for all `string` columns.	2024-09-04 16:28:04 +05:30
Gian Merlino	76b8c20f4d	Create fewer temporary maps when querying sys.segments. (#16981 ) Eliminates two map creations (availableSegmentMetadata, partialSegmentDataMap). The segmentsAlreadySeen set remains.	2024-09-03 20:04:44 -07:00
Clint Wylie	57bf053dc9	remove compiler warnings about unqualified calls to yield() (#16995 )	2024-09-03 20:04:30 -07:00
Gian Merlino	57c4b552d9	Fix logical merge conflict in SuperSorterTest. (#16993 ) Logical merge conflict between #16911 and #16914.	2024-09-03 16:14:59 -04:00
Hardik Bajaj	2ef936be40	Update Documentation on meregeBuffer/pendingRequests for Real-time nodes (#16992 ) #15025 adds mergeBuffer/pendingRequests metric in QueryCountStatsMonitor. Since real-time nodes also use the same merge buffers for queries and have QueryCountStatsMonitor , the documentation is being updated to include this metric.	2024-09-04 00:25:09 +05:30
Gian Merlino	786c959e9e	MSQ: Add limitHint to global-sort shuffles. (#16911 ) * MSQ: Add limitHint to global-sort shuffles. This allows pushing down limits into the SuperSorter. * Test fixes. * Add limitSpec to ScanQueryKit. Fix SuperSorter tracking.	2024-09-03 09:05:29 -07:00
AmatyaAvadhanula	70bad948e3	Add integration tests for concurrent append and replace (#16755 ) IT for streaming tasks with concurrent compaction	2024-09-03 14:58:15 +05:30
Sree Charan Manamala	619d8ef964	Window Functions : Numeric Arrays Frame Column Writers - fix class cast exception (#16983 ) Fix ClassCastException in ArrayFrameCoulmnWriters	2024-09-03 11:44:52 +05:30
Akshat Jain	ca17beb77b	Fix issues with window functions wrt virtual columns when using ARRAY_CONCAT_AGG aggregator (#16971 ) * Fix issues with window functions wrt virtual columns when using ARRAY_CONCAT_AGG aggregator * Address review comment * Address review comment	2024-09-03 11:41:40 +05:30
Kashif Faraz	a7349442e4	Fix computed value of maxSegmentsToMove when coordinator period is very low (#16984 ) Bug: When coordinator period is less than 30s, `maxSegmentsToMove` is always computed as 0 irrespective of number of available threads. Changes: - Fix lower bound condition and set minimum value to 100. - Add new test which fails without this fix	2024-09-02 14:35:29 +05:30
Zoltan Haindrich	32e8e074ae	Planning could have failed if UNION ALL operator was completely removed (#16946 )	2024-09-02 04:37:10 -04:00
Jonathan Wei	088c33759d	Allow druid.azure.account to be nullable (#16960 )	2024-09-02 12:05:51 +05:30
Kashif Faraz	fe3d589ff9	Run compaction as a supervisor on Overlord (#16768 ) Description ----------- Auto-compaction currently poses several challenges as it: 1. may get stuck on a failing interval. 2. may get stuck on the latest interval if more data keeps coming into it. 3. always picks the latest interval regardless of the level of compaction in it. 4. may never pick a datasource if its intervals are not very recent. 5. requires setting an explicit period which does not cater to the changing needs of a Druid cluster. This PR introduces various improvements to compaction scheduling to tackle the above problems. Change Summary -------------- 1. Run compaction for a datasource as a supervisor of type `autocompact` on Overlord. 2. Make compaction policy extensible and configurable. 3. Track status of recently submitted compaction tasks and pass this info to policy. 4. Add `/simulate` API on both Coordinator and Overlord to run compaction simulations. 5. Redirect compaction status APIs to the Overlord when compaction supervisors are enabled.	2024-09-02 07:53:13 +05:30

1 2 3 4 5 ...

14446 Commits All Branches Search

14446 Commits

All Branches