druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	f57cd6f7af	transition away from StorageAdapter (#16985 ) * transition away from StorageAdapter changes: * CursorHolderFactory has been renamed to CursorFactory and moved off of StorageAdapter, instead fetched directly from the segment via 'asCursorFactory'. The previous deprecated CursorFactory interface has been merged into StorageAdapter * StorageAdapter is no longer used by any engines or tests and has been marked as deprecated with default implementations of all methods that throw exceptions indicating the new methods to call instead * StorageAdapter methods not covered by CursorFactory (CursorHolderFactory prior to this change) have been moved into interfaces which are retrieved by Segment.as, the primary classes are the previously existing Metadata, as well as new interfaces PhysicalSegmentInspector and TopNOptimizationInspector * added UnnestSegment and FilteredSegment that extend WrappedSegmentReference since their StorageAdapter implementations were previously provided by WrappedSegmentReference * added PhysicalSegmentInspector which covers some of the previous StorageAdapter functionality which was primarily used for segment metadata queries and other metadata uses, and is implemented for QueryableIndexSegment and IncrementalIndexSegment * added TopNOptimizationInspector to cover the oddly specific StorageAdapter.hasBuiltInFilters implementation, which is implemented for HashJoinSegment, UnnestSegment, and FilteredSegment * Updated all engines and tests to no longer use StorageAdapter	2024-09-09 14:55:29 -07:00
Rishabh Singh	67f5aa65e7	Set response type `application/json` in CustomExceptionMapper to return correct failure message (#17016 ) * Add produces annotation to ParallelIndexSupervisorTask#report * change to application/json * Set response type in CustomExceptionMapper instead	2024-09-09 12:07:05 +05:30
Vishesh Garg	37d4174245	Compute `range` partitionsSpec using effective `maxRowsPerSegment` (#16987 ) In the compaction config, a range type partitionsSpec supports setting one of maxRowsPerSegment and targetRowsPerSegment. When compaction is run with the native engine, while maxRowsPerSegment = x results in segments of size x, targetRowsPerSegment = y results in segments of size 1.5 * y. MSQ only supports rowsPerSegment = x as part of its tuning config, the resulting segment size being approx. x -- which is in line with maxRowsPerSegment behaviour in native compaction. This PR makes the following changes: use effective maxRowsPerSegment to pass as rowsPerSegment parameter for MSQ persist rowsPerSegment as maxRowsPerSegment in lastCompactionState for MSQ Use effective maxRowsPerSegment-based range spec in CompactionStatus check for both Native and MSQ.	2024-09-09 10:53:58 +05:30
Kashif Faraz	ba6f804f48	Fix compaction status API response (#17006 ) Description: #16768 introduces new compaction APIs on the Overlord `/compact/status` and `/compact/progress`. But the corresponding `OverlordClient` methods do not return an object compatible with the actual endpoints defined in `OverlordCompactionResource`. This patch ensures that the objects are compatible. Changes: - Add `CompactionStatusResponse` and `CompactionProgressResponse` - Use these as the return type in `OverlordClient` methods and as the response entity in `OverlordCompactionResource` - Add `SupervisorCleanupModule` bound on the Coordinator to perform cleanup of supervisors. Without this module, Coordinator cannot deserialize compaction supervisors.	2024-09-05 23:22:01 +05:30
Rishabh Singh	18a9a7570a	Log a small subset of segments to refresh for debugging Coordinator refresh logic (#16998 ) * Log a small number of segments to refresh per datasource in the Coordinator * review comments * Update log message	2024-09-05 11:00:25 +05:30
Rishabh Singh	4e02e5b856	Remove alert for pre-existing new columns while merging realtime schema (#16989 ) Currently, an alert is thrown while merging datasource schema with realtime segment schema when the datasource schema already has update columns from the delta schema. This isn't an error condition since the datasource schema can have those columns from a different segment. One scenario in which this can occur is when multiple replicas for a task is run.	2024-09-05 07:58:24 +05:30
Vishesh Garg	e28424ea25	Enable rollup on multi-value dimensions for compaction with MSQ engine (#16937 ) Currently compaction with MSQ engine doesn't work for rollup on multi-value dimensions (MVDs), the reason being the default behaviour of grouping on MVD dimensions to unnest the dimension values; for instance grouping on `[s1,s2]` with aggregate `a` will result in two rows: `<s1,a>` and `<s2,a>`. This change enables rollup on MVDs (without unnest) by converting MVDs to Arrays before rollup using virtual columns, and then converting them back to MVDs using post aggregators. If segment schema is available to the compaction task (when it ends up downloading segments to get existing dimensions/metrics/granularity), it selectively does the MVD-Array conversion only for known multi-valued columns; else it conservatively performs this conversion for all `string` columns.	2024-09-04 16:28:04 +05:30
Gian Merlino	76b8c20f4d	Create fewer temporary maps when querying sys.segments. (#16981 ) Eliminates two map creations (availableSegmentMetadata, partialSegmentDataMap). The segmentsAlreadySeen set remains.	2024-09-03 20:04:44 -07:00
Kashif Faraz	a7349442e4	Fix computed value of maxSegmentsToMove when coordinator period is very low (#16984 ) Bug: When coordinator period is less than 30s, `maxSegmentsToMove` is always computed as 0 irrespective of number of available threads. Changes: - Fix lower bound condition and set minimum value to 100. - Add new test which fails without this fix	2024-09-02 14:35:29 +05:30
Kashif Faraz	fe3d589ff9	Run compaction as a supervisor on Overlord (#16768 ) Description ----------- Auto-compaction currently poses several challenges as it: 1. may get stuck on a failing interval. 2. may get stuck on the latest interval if more data keeps coming into it. 3. always picks the latest interval regardless of the level of compaction in it. 4. may never pick a datasource if its intervals are not very recent. 5. requires setting an explicit period which does not cater to the changing needs of a Druid cluster. This PR introduces various improvements to compaction scheduling to tackle the above problems. Change Summary -------------- 1. Run compaction for a datasource as a supervisor of type `autocompact` on Overlord. 2. Make compaction policy extensible and configurable. 3. Track status of recently submitted compaction tasks and pass this info to policy. 4. Add `/simulate` API on both Coordinator and Overlord to run compaction simulations. 5. Redirect compaction status APIs to the Overlord when compaction supervisors are enabled.	2024-09-02 07:53:13 +05:30
Virushade	0217c8c541	Change Inspection Profile to set "Method is identical to its super method" as error (#16976 ) * Make IntelliJ's MethodIsIdenticalToSuperMethod an error * Change codebase to follow new IntelliJ inspection * Restore non-short-circuit boolean expressions to pass tests	2024-08-31 09:37:34 +05:30
Gian Merlino	5d2ed33b89	Place __time in signatures according to sort order. (#16958 ) * Place __time in signatures according to sort order. Updates a variety of places to put __time in row signatures according to its position in the sort order, rather than always first, including: - InputSourceSampler. - ScanQueryEngine (in the default signature when "columns" is empty). - Various StorageAdapters, which also have the effect of reordering the column order in segmentMetadata queries, and therefore in SQL schemas as well. Follow-up to #16849. * Fix compilation. * Additional fixes. * Fix. * Fix style. * Omit nonexistent columns from the row signature. * Fix tests.	2024-08-26 21:45:51 -07:00
Gian Merlino	0603d5153d	Segments sorted by non-time columns. (#16849 ) * Segments primarily sorted by non-time columns. Currently, segments are always sorted by __time, followed by the sort order provided by the user via dimensionsSpec or CLUSTERED BY. Sorting by __time enables efficient execution of queries involving time-ordering or granularity. Time-ordering is a simple matter of reading the rows in stored order, and granular cursors can be generated in streaming fashion. However, for various workloads, it's better for storage footprint and query performance to sort by arbitrary orders that do not start with __time. With this patch, users can sort segments by such orders. For spec-based ingestion, users add "useExplicitSegmentSortOrder: true" to dimensionsSpec. The "dimensions" list determines the sort order. To define a sort order that includes "__time", users explicitly include a dimension named "__time". For SQL-based ingestion, users set the context parameter "useExplicitSegmentSortOrder: true". The CLUSTERED BY clause is then used as the explicit segment sort order. In both cases, when the new "useExplicitSegmentSortOrder" parameter is false (the default), __time is implicitly prepended to the sort order, as it always was prior to this patch. The new parameter is experimental for two main reasons. First, such segments can cause errors when loaded by older servers, due to violating their expectations that timestamps are always monotonically increasing. Second, even on newer servers, not all queries can run on non-time-sorted segments. Scan queries involving time-ordering and any query involving granularity will not run. (To partially mitigate this, a currently-undocumented SQL feature "sqlUseGranularity" is provided. When set to false the SQL planner avoids using "granularity".) Changes on the write path: 1) DimensionsSpec can now optionally contain a __time dimension, which controls the placement of __time in the sort order. If not present, __time is considered to be first in the sort order, as it has always been. 2) IncrementalIndex and IndexMerger are updated to sort facts more flexibly; not always by time first. 3) Metadata (stored in metadata.drd) gains a "sortOrder" field. 4) MSQ can generate range-based shard specs even when not all columns are singly-valued strings. It merely stops accepting new clustering key fields when it encounters the first one that isn't a singly-valued string. This is useful because it enables range shard specs on "someDim" to be created for clauses like "CLUSTERED BY someDim, __time". Changes on the read path: 1) Add StorageAdapter#getSortOrder so query engines can tell how a segment is sorted. 2) Update QueryableIndexStorageAdapter, IncrementalIndexStorageAdapter, and VectorCursorGranularizer to throw errors when using granularities on non-time-ordered segments. 3) Update ScanQueryEngine to throw an error when using the time-ordering "order" parameter on non-time-ordered segments. 4) Update TimeBoundaryQueryRunnerFactory to perform a segment scan when running on a non-time-ordered segment. 5) Add "sqlUseGranularity" context parameter that causes the SQL planner to avoid using granularities other than ALL. Other changes: 1) Rename DimensionsSpec "hasCustomDimensions" to "hasFixedDimensions" and change the meaning subtly: it now returns true if the DimensionsSpec represents an unchanging list of dimensions, or false if there is some discovery happening. This is what call sites had expected anyway. * Fixups from CI. * Fixes. * Fix missing arg. * Additional changes. * Fix logic. * Fixes. * Fix test. * Adjust test. * Remove throws. * Fix styles. * Fix javadocs. * Cleanup. * Smoother handling of null ordering. * Fix tests. * Missed a spot on the merge. * Fixups. * Avoid needless Filters.and. * Add timeBoundaryInspector to test. * Fix tests. * Fix FrameStorageAdapterTest. * Fix various tests. * Use forceSegmentSortByTime instead of useExplicitSegmentSortOrder. * Pom fix. * Fix doc.	2024-08-23 08:24:43 -07:00
Rishabh Singh	bc4b3a2f91	Filter out tombstone segments from metadata cache (#16890 ) * Fix build * Support segment metadata queries for tombstones * Filter out tombstone segments from metadata cache * Revert some changes * checkstyle * Update docs	2024-08-20 11:35:02 +05:30
Clint Wylie	518f642028	remove isDescending from Query interface, move to TimeseriesQuery (#16917 ) * remove isDescending from Query interface, since it is only actually settable and usable by TimeseriesQuery	2024-08-19 23:02:45 -07:00
Clint Wylie	4283b270e3	rework cursor creation (#16533 ) changes: * Added `CursorBuildSpec` which captures all of the 'interesting' stuff that goes into producing a cursor as a replacement for the method arguments of `CursorFactory.canVectorize`, `CursorFactory.makeCursor`, and `CursorFactory.makeVectorCursor` * added new interface `CursorHolder` and new interface `CursorHolderFactory` as a replacement for `CursorFactory`, with method `makeCursorHolder`, which takes a `CursorBuildSpec` as an argument and replaces `CursorFactory.canVectorize`, `CursorFactory.makeCursor`, and `CursorFactory.makeVectorCursor` * `CursorFactory.makeCursors` previously returned a `Sequence<Cursor>` corresponding to the query granularity buckets, with a separate `Cursor` per bucket. `CursorHolder.asCursor` instead returns a single `Cursor` (equivalent to 'ALL' granularity), and a new `CursorGranularizer` has been added for query engines to iterate over the cursor and divide into granularity buckets. This makes the non-vectorized engine behave the same way as the vectorized query engine (with its `VectorCursorGranularizer`), and simplifies a lot of stuff that has to read segments particularly if it does not care about bucketing the results into granularities. * Deprecated `CursorFactory`, `CursorFactory.canVectorize`, `CursorFactory.makeCursors`, and `CursorFactory.makeVectorCursor` * updated all `StorageAdapter` implementations to implement `makeCursorHolder`, transitioned direct `CursorFactory` implementations to instead implement `CursorMakerFactory`. `StorageAdapter` being a `CursorMakerFactory` is intended to be a transitional thing, ideally will not be released in favor of moving `CursorMakerFactory` to be fetched directly from `Segment`, however this PR was already large enough so this will be done in a follow-up. * updated all query engines to use `makeCursorHolder`, granularity based engines to use `CursorGranularizer`.	2024-08-16 11:34:10 -07:00
Maytas Monsereenusorn	c2ddff399d	Fix Parquet Reader when ingestion need to read columns in filter (#16874 )	2024-08-14 12:31:38 -07:00
Rishabh Singh	f67ff92d07	[bugfix] Run cold schema refresh thread periodically (#16873 ) * Fix build * Run coldSchemaExec thread periodically * Bugfix: Run cold schema refresh periodically * Rename metrics for deep storage only segment schema process	2024-08-13 11:44:01 +05:30
Rushikesh Bankar	4ef4e75c5d	Fix the issue of missing implementation of IndexerTaskCountStatsProvider for peons (#16875 ) Bug description: Peons to fail to start up when `WorkerTaskCountStatsMonitor` is used on MiddleManagers. This is because MiddleManagers pass on their properties to peons and peons are unable to find `IndexerTaskCountStatsProvider` as that is bound only for indexer nodes. Fix: Check if node is an indexer before trying to get instance of `IndexerTaskCountStatsProvider`.	2024-08-10 14:53:16 +05:30
Adithya Chakilam	a7dd436a32	Check if supervisor could be idle on startup (#16844 ) Fixes #13936 In cases where a supervisor is idle and the overlord is restarted for some reason, the supervisor would start spinning tasks again. In clusters where there are many low throughput streams, this would spike the task count unnecessarily. This commit compares the latest stream offset with the ones in metadata during the startup of supervisor and sets it to idle state if they match.	2024-08-09 14:42:48 +05:30
Zoltan Haindrich	408702e100	Add ability to run MSQ in Quidem tests (#16798 ) * implements some jdbc facade to enable msq usage * adds an !msqPlan command * adds more guice usage to testsystem startup	2024-08-08 06:37:06 +02:00
Atul Mohan	76ad17fb4c	Add config for http client connect timeout (#16831 ) Adds a configuration clientConnectTimeout to our http client config which controls the connection timeout for our http client requests. It was observed that on busy K8S clusters, the default connect timeout of 500ms is sometimes not enough time to complete syn/acks for a request and in these cases, the requests timeout with the error: exceptionType=java.net.SocketTimeoutException, exceptionMessage=Connect Timeout This behavior was mostly observed on the router while forwarding queries to the broker. Having a slightly higher connect timeout helped resolve these issues.	2024-08-07 19:31:10 +05:30
Vishesh Garg	593c3b2150	Do not support non-idempotent aggregator in MSQ compaction (#16846 ) This PR adds checks for verification of DataSourceCompactionConfig and CompactionTask with msq engine to ensure: each aggregator in metricsSpec is idempotent metricsSpec is non-null when rollup is set to true Unit tests and existing compaction ITs have been updated accordingly.	2024-08-06 20:58:08 +05:30
Kashif Faraz	aa49be61ea	Do not create ZK paths if not needed (#16816 ) Background: ZK-based segment loading has been completely disabled in #15705 . ZK `servedSegmentsPath` has been deprecated since Druid 0.7.1, #1182 . This legacy path has been replaced by the `liveSegmentsPath` and is not used in the code anymore. Changes: - Never create ZK loadQueuePath as it is never used. - Never create ZK servedSegmentsPath as it is never used. - Do not create ZK liveSegmentsPath if announcement on ZK is disabled - Fix up tests	2024-08-06 19:29:13 +05:30
Zoltan Haindrich	26e3c44f4b	Quidem record (#16624 ) * enables to launch a fake broker based on test resources (druidtest uri) * could record queries into new testfiles during usage * instead of re-purpose Calcite's Hook migrates to use DruidHook which we can add further keys * added a quidem-ut module which could be the place for tests which could iteract with modules/etc	2024-08-05 14:58:32 +02:00
Rushikesh Bankar	c8323d1a7c	Add indexer task success and failure metrics (#16829 ) This PR adds indexer-level task metrics- "indexer/task/failed/count" "indexer/task/success/count" the current "worker/task/completed/count" metric shows all the tasks completed irrespective of success or failure status so these metrics would help us get more visibility into the status of the completed tasks	2024-08-05 16:21:27 +05:30
Laksh Singla	0411c4e67e	Add metrics for number of rows/bytes materialized while running subqueries (#16835 ) subquery/rows and subquery/bytes metrics have been added, which indicate the size of the results materialized on the heap.	2024-08-05 14:13:20 +05:30
Abhishek Radhakrishnan	31b43753fb	Add `druid.indexing.formats.stringMultiValueHandlingMode` system config (#16822 ) This patch introduces an optional cluster configuration, druid.indexing.formats.stringMultiValueHandlingMode, allowing operators to override the default mode SORTED_SET for string dimensions. The possible values for the config are SORTED_SET, SORTED_ARRAY, or ARRAY (SORTED_SET is the default). Case insensitive values are allowed. While this cluster property allows users to manage the multi-value handling mode for string dimension types, it's recommended to migrate to using real array types instead of MVDs. This fixes a long-standing issue where compaction will honor the configured cluster wide property instead of rewriting it as the default SORTED_ARRAY always, even if the data was originally ingested with ARRAY or SORTED_SET.	2024-08-03 10:23:44 -07:00
Kashif Faraz	9dc2569f22	Track and emit segment loading rate for HttpLoadQueuePeon on Coordinator (#16691 ) Design: The loading rate is computed as a moving average of at least the last 10 GiB of successful segment loads. To account for multiple loading threads on a server, we use the concept of a batch to track load times. A batch is a set of segments added by the coordinator to the load queue of a server in one go. Computation: batchDurationMillis = t(load queue becomes empty) - t(first load request in batch is sent to server) batchBytes = total bytes successfully loaded in batch avg loading rate in batch (kbps) = (8 * batchBytes) / batchDurationMillis overall avg loading rate (kbps) = (8 * sumOverWindow(batchBytes)) / sumOverWindow(batchDurationMillis) Changes: - Add `LoadingRateTracker` which computes a moving average load rate based on the last few GBs of successful segment loads. - Emit metric `segment/loading/rateKbps` from the Coordinator. In the future, we may also consider emitting this metric from the historicals themselves. - Add `expectedLoadTimeMillis` to response of API `/druid/coordinator/v1/loadQueue?simple`	2024-08-03 13:14:21 +05:30
Kashif Faraz	954aaafe0c	Refactor: Clean up compaction config classes (#16810 ) Changes: - Rename `CoordinatorCompactionConfig` to `DruidCompactionConfig` - Rename `CompactionConfigUpdateRequest` to `ClusterCompactionConfig` - Refactor methods in `DruidCompactionConfig` - Clean up `DataSourceCompactionConfigHistory` and its tests - Clean up tests and add new tests - Change API path `/druid/coordinator/v1/config/global` to `/druid/coordinator/v1/config/cluster`	2024-07-30 12:17:25 +05:30
AmatyaAvadhanula	92a40d8169	Add API to fetch conflicting task locks (#16799 ) * Add API to fetch conflicting active locks	2024-07-30 11:40:48 +05:30
Vishesh Garg	e9ea243d97	Enable compaction ITs on MSQ engine (#16778 ) Follow-up to #16291, this commit enables a subset of existing native compaction ITs on the MSQ engine. In the process, the following changes have been introduced in the MSQ compaction flow: - Populate `metricsSpec` in `CompactionState` from `querySpec` in `MSQControllerTask` instead of `dataSchema` - Add check for pre-rolled-up segments having `AggregatorFactory` with different input and output column names - Fix passing missing cluster-by clause in scan queries - Add annotation of `CompactionState` to tombstone segments	2024-07-30 09:34:46 +05:30
Kashif Faraz	caedeb66cd	Add API to update compaction engine (#16803 ) Changes: - Add API `/druid/coordinator/v1/config/compaction/global` to update cluster level compaction config - Add class `CompactionConfigUpdateRequest` - Fix bug in `CoordinatorCompactionConfig` which caused compaction engine to not be persisted. Use json field name `engine` instead of `compactionEngine` because JSON field names must align with the getter name. - Update MSQ validation error messages - Complete overhaul of `CoordinatorCompactionConfigResourceTest` to remove unnecessary mocking and add more meaningful tests. - Add `TuningConfigBuilder` to easily build tuning configs for tests. - Add `DatasourceCompactionConfigBuilder`	2024-07-27 09:14:51 +05:30
Abhishek Radhakrishnan	3c493dc3ed	CircularList round-robin iterator for the KillUnusedSegments duty (#16719 ) * Round-robin iterator for datasources to kill. Currently there's a fairness problem in the KillUnusedSegments duty where the duty consistently selects the same set of datasources as discovered from the metadata store or dynamic config params. This is a problem especially when there are multiple unused. In a medium to large cluster, while we can increase the task slots to increase the likelihood of broader coverage. This patch adds a simple round-robin iterator to select datasources and has the following properties: 1. Starts with an initial random cursor position in an ordered list of candidates. 2. Consecutive {@code next()} iterations from {@link #getIterator()} are guaranteed to be deterministic unless the set of candidates change when {@link #updateCandidates(Set)} is called. 3. Guarantees that no duplicate candidates are returned in two consecutive {@code next()} iterations. * Renames in RoundRobinIteratorTest. * Address review comments. 1. Clarify javadocs on the ordered list. Also flesh out the details a bit more. 2. Rename the test hooks to make intent clearer and fix typo. 3. Add NotThreadSafe annotation. 4. Remove one potentially noisy log that's in the path of iteration. * Add null check to input candidates. * More commentary. * Addres review feedback: downgrade some new info logs to debug; invert condition. Remove redundant comments. Remove rendundant variable tracking. * CircularList adjustments. * Updates to CircularList and cleanup RoundRobinInterator. * One more case and add more tests. * Make advanceCursor private for now. * Review comments.	2024-07-26 12:20:49 -07:00
Clint Wylie	302739aa58	more aggressive cancellation of broker parallel merge, more chill blocking queue timeouts, and query cancellation participation (#16748 ) * more aggressive cancellation of broker parallel merge, more chill blocking queue timeouts * wire parallel merge into query cancellation system * oops * style * adjust metrics initialization * fix timeout, fix cleanup to not block * javadocs to clarify why cancellation future and gizmo are split * cancelled -> canceled, simplify QueuePusher since it always takes a ResultBatch, non-static terminal marker to make stuff stop complaining about types, specialize tryOffer to be tryOfferTerminal so it wont be misused, add comments to clarify reason for non-blocking offers that might fail	2024-07-24 14:58:34 +08:00
Clint Wylie	02b8738c00	remove batchProcessingMode from task config, remove AppenderatorImpl (#16765 ) changes: * removes `druid.indexer.task.batchProcessingMode` in favor of always using `CLOSED_SEGMENT_SINKS` which uses `BatchAppenderator`. This was intended to become the default for native batch, but that was missed so `CLOSED_SEGMENTS` was the default (using `AppenderatorImpl`), however MSQ has been exclusively using `BatchAppenderator` with no problems so it seems safe to just roll it out as the only option for batch ingestion everywhere. * with `batchProcessingMode` gone, there is no use for `AppenderatorImpl` so it has been removed * implify `Appenderator` construction since there are only separate stream and batch versions now * simplify tests since `batchProcessingMode` is gone	2024-07-22 13:56:44 -07:00
Clint Wylie	a34a06e192	remove Firehose and FirehoseFactory (#16758 ) changes: * removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after #16602 * Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader` * Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes * Moved anything remaining in a 'firehose' package somewhere else * Clean up docs on firehose stuff	2024-07-19 14:37:21 -07:00
Clint Wylie	35b876436b	remove native scan query legacy mode (#16659 )	2024-07-18 23:33:27 -07:00
Kashif Faraz	9f6ce6ddc0	Remove task action audit logging and druid_taskLog metadata table (#16309 ) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368. As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.	2024-07-17 17:09:00 +05:30
Kashif Faraz	01d67ae543	Allow CompactionSegmentIterator to have custom priority (#16737 ) Changes: - Break `NewestSegmentFirstIterator` into two parts - `DatasourceCompactibleSegmentIterator` - this contains all the code from `NewestSegmentFirstIterator` but now handles a single datasource and allows a priority to be specified - `PriorityBasedCompactionSegmentIterator` - contains separate iterator for each datasource and combines the results into a single queue to be used by a compaction search policy - Update `NewestSegmentFirstPolicy` to use the above new classes - Cleanup `CompactionStatistics` and `AutoCompactionSnapshot` - Cleanup `CompactSegments` - Remove unused methods from `Tasks` - Remove unneeded `TasksTest` - Move tests from `NewestSegmentFirstIteratorTest` to `CompactionStatusTest` and `DatasourceCompactibleSegmentIteratorTest`	2024-07-16 19:54:49 +05:30
AmatyaAvadhanula	6891866c43	Process retrieval of parent and child segment ids in batches (#16734 )	2024-07-15 18:24:23 +05:30
Rishabh Singh	64104533ac	Enable querying entirely cold datasources (#16676 ) Add ability to query entirely cold datasources.	2024-07-15 15:02:59 +05:30
Laksh Singla	209f8a9546	Deserialize complex dimensions in group by queries to their respective types when reading from spilled files and cached results (#16620 ) Like #16511, but for keys that have been spilled or cached during the grouping process	2024-07-15 15:00:17 +05:30
AmatyaAvadhanula	d6c760f7ce	Do not kill segments with referenced load specs from deep storage (#16667 ) Do not kill segments with referenced load specs from deep storage	2024-07-15 14:07:53 +05:30
Laksh Singla	3a1b437056	Improve the fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes (#16679 ) Better fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes: a. We don't touch the subquery sequence till we know that we can materialize the result as frames	2024-07-12 21:49:12 +05:30
Vishesh Garg	197c54f673	Auto-Compaction using Multi-Stage Query Engine (#16291 ) Description: Compaction operations issued by the Coordinator currently run using the native query engine. As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative that we support compaction on MSQ to make Compaction more robust and possibly faster. For instance, we have seen OOM errors in native compaction that MSQ could have handled by its auto-calculation of tuning parameters. This commit enables compaction on MSQ to remove the dependency on native engine. Main changes: * `DataSourceCompactionConfig` now has an additional field `engine` that can be one of `[native, msq]` with `native` being the default. * if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which could happen in case a fraction of the tasks were allotted and they eventually fell short of the number of tasks required by the MSQ engine to run the compaction. * `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field. * `CompactionTask` now has `CompactionRunner` interface instance with its implementations `NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension. The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the `CompactionRunner` instance that is mapped to the specified type [`native`, `msq`]. * `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks. * `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in the segment schema. If present, the task is created with a native group-by query; if not, the task is issued with a scan query. The `storeCompactionState` flag is set in the context. * Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask` since otherwise, the workers will be unable to determine the controller task's location for communication (as they haven't been launched via the overlord).	2024-07-12 16:40:20 +05:30
Kashif Faraz	616ae631c6	Fix NPE in CompactSegments (#16713 )	2024-07-10 14:51:52 +08:00
Abhishek Radhakrishnan	bf2be938a9	Refactor `SegmentLoadDropHandler` code (#16685 ) Motivation: - Improve code hygeiene - Make `SegmentLoadDropHandler` easily extensible Changes: - Add `SegmentBootstrapper` - Move code for bootstrapping segments already cached on disk and fetched from coordinator to `SegmentBootstrapper`. - No functional change - Use separate executor service in `SegmentBootstrapper` - Bind `SegmentBootstrapper` to `ManageLifecycle` explicitly in `CliBroker`, `CliHistorical` etc.	2024-07-08 09:29:55 +05:30
zachjsh	5e05858ff7	Catalog granularity accepts query format (#16680 ) Previously, the segment granularity for tables in the catalog had to be defined in period format, ie `'PT1H'` , `'P1D'`, etc. This disallows a user from defining segment granularity of `'ALL'` for a table in the catalog, which may be a valid use case. This change makes it so that a user may define the segment granularity of a table in the catalog, as any string that results in a valid granularity using either the `Granularity.fromString(str)` method, or `new PeriodGranularity(new Period(value), null, null)`, and that granularity maps to a standard supported granularity, where `GranularityType.isStandard(granularity)` returns true. As a result a user may who wants to assign a catalog table's segment granularity to be hourly, may assign the segment granularity property of the table to be either `PT1H`, or `HOUR`. These are the same formats accepted at query time.	2024-07-02 12:14:28 -04:00
Rishabh Singh	c96e783750	Fix schema backfill count metric (#16536 ) * Fix build * Fix backfill metric * Address review comment	2024-06-28 11:07:28 +05:30

1 2 3 4 5 ...

4332 Commits