druid

Commit Graph

Author	SHA1	Message	Date
Akshat Jain	ca8f24edd3	Upgrade Guice to 5.1.0 (#17578 ) * Move Guice to 5.1.0 and fix tests * Fix checkstyle * Revert overrideCurrentGuiceModules() and related changes * Fix the tests * Try using maven:3-openjdk-17-slim * Try enabling debugging for mvn command * Use maven:3.9 image * Address review comment: Fix formatting * Address review comment: Add brief javadoc for ExceptionMatcher --------- Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>	2024-12-19 09:08:20 +05:30
Kashif Faraz	d9a58a7bbd	Move segment update APIs from Coordinator to Overlord (#17545 ) Summary of changes --------------------- - Add `OverlordDataSourcesResource` with APIs to mark segments used/unused - Add corresponding methods to `OverlordClient` - Deprecate Coordinator APIs to update segments - Use `OverlordClient` in `DataSourcesResource` so that Coordinator APIs internally call the corresponding Overlord APIs - If the API call fails, fall back to updating the metadata store directly - Audit these actions only on the Overlord Other minor changes --------------------- - Do not perform null check on `OverlordClient` on the coordinator side `DataSourcesResource`. `OverlordClient` is always non-null in production. - Add new tests, fix existing ones - Complete the implementation of `TestSegmentsMetadataManager` New Overlord APIs ------------------ - Mark all segments of a datasource as unused: `POST /druid/indexer/v1/datasources/{dataSourceName}` - Mark all (non-overshadowed) segments of a datasource as used: `DELETE /druid/indexer/v1/datasources/{dataSourceName}` - Mark multiple segments as used `POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed` - Mark multiple (non-overshadowed) segments as unused `POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused` - Mark a single segment as used: `POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` - Mark a single segment as unused: `DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`	2024-12-19 09:05:00 +05:30
Akshat Jain	6fad11fe57	Revert "Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570 )" (#17584 ) This reverts commit `cd6083fb94`.	2024-12-18 08:29:10 -08:00
George Shiqi Wu	9ff11731c8	Parallelize supervisor stop logic to make it run faster (#17535 ) - Add new method `Supervisor.stopAsync` - Implement `SeekableStreamSupervisor.stopAsync()` to use a shutdown executor - Call `stopAsync` from `SupervisorManager`	2024-12-18 09:19:24 +05:30
Clint Wylie	a44ab109d5	remove druid.expressions.useStrictBooleans in favor of always being true (#17568 )	2024-12-17 18:49:16 -08:00
Akshat Jain	98b960c6ac	Refactor: Replace explicit type arguments with diamond operator (#17567 ) Since we aren't supporting Java 8 anymore, we can switch to diamond operators without specifying explicit type arguments.	2024-12-17 14:37:45 +05:30
Akshat Jain	cd6083fb94	Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570 ) * Add back UnnecessaryFullyQualifiedName rule in pmd ruleset * Fix checkstyle	2024-12-17 12:43:12 +05:30
Akshat Jain	a26e4c0e06	Cleanup unreachable Java 8 code flows (#17559 )	2024-12-13 15:24:21 +01:00
Kashif Faraz	24e5d8a9e8	Refactor: Minor cleanup of segment allocation flow (#17524 ) Changes -------- - Simplify the arguments of IndexerMetadataStorageCoordinator.allocatePendingSegment - Remove field SegmentCreateRequest.upgradedFromSegmentId as it was always null - Miscellaneous cleanup	2024-12-13 07:46:57 +05:30
George Shiqi Wu	aca56d6bb8	reject publishing actions with a retriable error code if a earlier task is still publishing (#17509 ) * Working queuing of publishing * fix style * Add unit tests * add tests * retry within the connector * fix unit tests * Update indexing-service/src/main/java/org/apache/druid/indexing/common/actions/LocalTaskActionClient.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Add comment * fix style * Fix unit tests * style fix --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-12-12 10:37:53 -05:00
zachjsh	3b6a3ae222	Add taskStatus dimension to service/heartbeat metric (#17488 ) * SQL syntax error should target USER persona * * revert change to queryHandler and related tests, based on review comments * * add test * * add taskStatus dimension to `service/heartbeat` metric * * address review comments * * fix compilation error from merge * * improve test coverage * Address review comments * * remove unuused import * * address remaining comments	2024-12-06 17:18:59 -05:00
Karan Kumar	0eb8d733d4	Adding leader and not being leader logging on the overlord. (#17519 )	2024-12-03 22:36:53 +05:30
Kashif Faraz	207ad16f07	Reduce metadata IO during segment allocation (#17496 ) Changes --------- - Add Overlord runtime property `druid.indexer.tasklock.batchAllocationReduceMetadataIO` - Setting this flag to true (default value) allows the Overlord to fetch only necessary segment payloads during segment allocation - Setting this flag to false restores original segment allocation behaviour	2024-11-26 11:40:09 +05:30
Akshat Jain	17215cd677	Remove support for Java 8 (#17466 ) All JDK 8 based CI checks have been removed. Images used in Dockerfile(s) have been updated to Java 17 based images. Documentation has been updated accordingly.	2024-11-21 15:33:08 +05:30
Adithya Chakilam	c1d6328249	StreamingTaskRunner: Close the rejection period updater executor service (#17490 )	2024-11-19 12:49:20 -08:00
Adithya Chakilam	6f436301be	supervisor: make rejection periods work with stopTasksCount (#17442 ) * kafka-indexing: Report consumer io time * commit * backward * tests * remove unwanted changes * comments * comments * coverage * change name * fixes * fixes * comments	2024-11-18 13:12:24 -08:00
Nandini Anagondi	32394e55f9	Upgrading org.codehaus to com.fasterxml (#17371 )	2024-11-07 10:55:47 +01:00
Gian Merlino	6a9c050095	DruidOverlord: Move becomeLeader/stopBeingLeader earlier. (#17415 ) * DruidOverlord: Move becomeLeader/stopBeingLeader earlier. On becoming leader, it is helpful for the TaskRunner and TaskQueue to be available when the SupervisorManager starts up, to aid the supervisors in discovering their tasks. On stopping leadership, it is helpful for the TaskRunner and TaskQueue to be available until the SupervisorManager has finished shutting down. They are only available when the TaskMaster is in "leader" mode, so to achieve the above, this patch moves it earlier in the sequence. * Adjust leadership into two phases. * Update test. * Adjustments for coverage. * Stop mirrors start better.	2024-10-28 20:43:13 -07:00
Gian Merlino	c4b513e599	SeekableStreamSupervisor: Don't await task futures in workerExec. (#17403 ) Following #17394, workerExec can get deadlocked with itself, because it waits for task futures and is also used as the connectExec for the task client. To fix this, we need to never await task futures in the workerExec. There are two specific changes: in "verifyAndMergeCheckpoints" and "checkpointTaskGroup", two "coalesceAndAwait" calls that formerly occurred in workerExec are replaced with Futures.transform (using a callback in workerExec). Because this adjustment removes a source of blocking, it may also improve supervisor responsiveness for high task counts. This is not the primary goal, however. The primary goal is to fix the bug introduced by #17394.	2024-10-24 12:07:18 -07:00
Gian Merlino	60daddedf8	SeekableStreamSupervisor: Use workerExec as the client connectExec. (#17394 ) * SeekableStreamSupervisor: Use workerExec as the client connectExec. This patch uses the already-existing per-supervisor workerExec as the connectExec for task clients, rather than using the process-wide default ServiceClientFactory pool. This helps prevent callbacks from backlogging on the process-wide pool. It's especially useful for retries, where callbacks may need to establish new TCP connections or perform TLS handshakes. * Fix compilation, tests. * Fix style.	2024-10-22 20:21:21 -07:00
Vishesh Garg	5da9949992	Fail MSQ compaction if multi-valued partition dimensions are found (#17344 ) MSQ currently supports only single-valued string dimensions as partition keys. This patch adds a check to ensure that partition keys are single-valued in case this info is available by virtue of segment download for schema inference. During compaction, if MSQ finds multi-valued dimensions (MVDs) declared as part of `range` partitionsSpec, it switches partitioning type to dynamic, ending up in repeated compactions of the same interval. To avoid this scenario, the segment download logic is also updated to always download segments if info on multi-valued dimensions is required.	2024-10-19 13:33:33 +05:30
Adithya Chakilam	e834e49290	supervisor/autoscaler: Fix clearing of collected lags on skipped scale actions (#17356 ) * superviosr/autoscaler: Fix clearing of collected lags on skipped scale actions * comments * supervisor/autoscaler: Skip scaling when partitions are less than minTaskCount (#17335) * Fix pip installation after ubuntu upgrade (#17358) * fix tests --------- Co-authored-by: Pranav <pranavbhole@gmail.com>	2024-10-17 11:05:16 -07:00
Adithya Chakilam	c57bd3b438	supervisor/autoscaler: Skip scaling when partitions are less than minTaskCount (#17335 )	2024-10-15 14:12:53 -07:00
Kashif Faraz	3f797c52d0	Fix duplicate compaction task launched by OverlordCompactionScheduler (#17287 ) Description ----------- The `OverlordCompactionScheduler` may sometimes launch a duplicate compaction task for an interval that has just been compacted. This may happen as follows: - Scheduler launches a compaction task for an uncompacted interval. - While the compaction task is running, the `CompactionStatusTracker` does not consider this interval as compactible and returns the `CompactionStatus` as `SKIPPED` for it. - As soon as the compaction task finishes, the `CompactionStatusTracker` starts considering the interval eligible for compaction again. - This interval remains eligible for compaction until the newly published segments are polled from the database. - Once the new segments have been polled, the `CompactionStatus` of the interval changes to `COMPLETE`. Change -------- - Keep track of the `snapshotTime` in `DataSourcesSnapshot`. This time represents the start of the poll. - Use the `snapshotTime` to determine if a poll has happened after a compaction task completed. - If not, then skip the interval to avoid launching duplicate tasks. - For tests, use a future `snapshotTime` to ensure that compaction is always triggered.	2024-10-10 08:44:09 +05:30
AmatyaAvadhanula	f42ecc9f25	Fail concurrent replace tasks with finer segment granularity than append (#17265 )	2024-10-08 07:35:13 +05:30
George Shiqi Wu	5d7c7a87ec	Add maximumCapacity to taskRunner (#17107 ) * Add maximumCapacity to taskRunner * fix tests * pr comments	2024-10-07 15:03:51 -04:00
AmatyaAvadhanula	ff97c67945	Fix batch segment allocation failure with replicas (#17262 ) Fixes #16587 Streaming ingestion tasks operate by allocating segments before ingesting rows. These allocations happen across replicas which may send different requests but must get the same segment id for a given (datasource, interval, version, sequenceName) across replicas. This patch fixes the bug by ignoring the previousSegmentId when skipLineageCheck is true.	2024-10-07 19:52:38 +05:30
Vishesh Garg	7e35e50052	Fix issues with MSQ Compaction (#17250 ) The patch makes the following changes: 1. Fixes a bug causing compaction to fail on array, complex, and other non-primitive-type columns 2. Updates compaction status check to be conscious of partition dimensions when comparing dimension ordering. 3. Ensures only string columns are specified as partition dimensions 4. Ensures `rollup` is true if and only if metricsSpec is non-empty 5. Ensures disjoint intervals aren't submitted for compaction 6. Adds `compactionReason` to compaction task context.	2024-10-06 21:48:26 +05:30
Clint Wylie	0bd13bcd51	Projections prototype (#17214 )	2024-10-05 04:38:57 -07:00
Arun Ramani	e5d027ee1c	Skip generating task context reports for sub tasks (#17219 ) * Skip task context for sub tasks * DRY a little + skip context for live report	2024-10-02 09:32:50 -04:00
Hardik Bajaj	3d56fa6f56	Improve logging to include taskId in segment handoff notifier thread (#17185 )	2024-10-01 15:34:39 +05:30
Shivam Garg	ab361747a8	Migrated commons-lang usages to commons-lang3 (#17156 )	2024-09-28 10:28:11 +02:00
Clint Wylie	d77637344d	log.warn anytime a column is relying on ArrayIngestMode.MVD (#17164 ) * log.warn anytime a column is relying on ArrayIngestMode.MVD	2024-09-26 13:44:37 +05:30
Abhishek Radhakrishnan	9132a65a48	Add `StreamSupervisor` interface (#17151 ) Follow up to #17137. Instead of moving the streaming-only methods to the SeekableStreamSupervisor abstract class, this patch moves them to a separate StreamSupervisor interface. The reason is that the SeekableStreamSupervisor abstract class also has many other abstract methods. The StreamSupervisor interface on the other hand provides a minimal set of functions offering a good middle ground for any custom concrete implementation that doesn't require all the goodies from SeekableStreamSupervisor.	2024-09-25 14:52:39 +05:30
Abhishek Radhakrishnan	83299e9882	Miscellaneous cleanup in the supervisor API flow. (#17144 ) Extracting a few miscellaneous non-functional changes from the batch supervisor branch: - Replace anonymous inner classes with lambda expressions in the SQL supervisor manager layer - Add explicit @Nullable annotations in DynamicConfigProviderUtils to make IDE happy - Small variable renames (copy-paste error perhaps) and fix typos - Add table name for this exception message: Delete the supervisor from the table[%s] in the database... - Prefer CollectionUtils.isEmptyOrNull() over list == null \|\| list.size() > 0. We can change the Precondition checks to throwing DruidException separately for a batch of APIs at a time.	2024-09-24 13:06:23 -07:00
Abhishek Radhakrishnan	5c862f6ed9	Refactor: Move streaming supervisor methods to `SeekableStreamSupervisor` (#17137 ) The current Supervisor interface is primarily focused on streaming use cases. However, as we introduce supervisors for non-streaming use cases, such as the recently added CompactionSupervisor (and the upcoming BatchSupervisor), certain operations like resetting offsets, checkpointing, task group handoff, etc., are not really applicable to non-streaming use cases. So the methods are split between: 1. Supervisor: common methods that are applicable to both streaming and non-streaming use cases 2. SeekableStreamSupervisor: Supervisor + streaming-only operations. The existing streaming-only overrides exist along with the new abstract method public abstract LagStats computeLagStats(), for which custom implementations already exist in the concrete types This PR is primarily a refactoring change with minimal functional adjustments (e.g., throwing an exception in a few places in SupervisorManager when the supervisor isn't the expected SeekableStreamSupervisor type).	2024-09-24 10:46:37 -07:00
Kashif Faraz	9670305669	Cleanup Coordinator logs, add duty status API (#16959 ) Description ----------- Coordinator logs are fairly noisy and don't give much useful information (see example below). Even when the Coordinator misbehaves, these logs are not very useful. Main changes ------------ - Add API `GET /druid/coordinator/v1/duties` that returns a status list of all duty groups currently running on the Coordinator - Emit metrics `segment/poll/time`, `segment/pollWithSchema/time`, `segment/buildSnapshot/time` - Remove redundant logs that indicate normal operation of well-tested aspects of the Coordinator Refactors --------- - Move some logic from `DutiesRunnable` to `CoordinatorDutyGroup` - Move stats collection from `CollectSegmentAndServerStats` to `PrepareBalancerAndLoadQueues` - Minor cleanup of class `DruidCoordinator` - Clean up class `DruidCoordinatorRuntimeParams` - Remove field `coordinatorStartTime`. Maintain start time in `MarkOvershadowedSegmentsAsUnused` instead. - Remove field `MetadataRuleManager`. Pass supplier to constructor of applicable duties instead. - Make `usedSegmentsNewestFirst` and `datasourcesSnapshot` as non-nullable as they are always required.	2024-09-24 19:46:22 +05:30
Vishesh Garg	f576e299db	Allow MSQ engine only for compaction supervisors (#17033 ) #16768 added the functionality to run compaction as a supervisor on the overlord. This patch builds on top of that to restrict MSQ engine to compaction in the supervisor-mode only. With these changes, users can no longer add MSQ engine as part of datasource compaction config, or as the default cluster-level compaction engine, on the Coordinator. The patch also adds an Overlord runtime property `druid.supervisor.compaction.engine=<msq/native>` to specify the default engine for compaction supervisors. Since these updates require major changes to existing MSQ compaction integration tests, this patch disables MSQ-specific compaction integration tests -- they will be taken up in a follow-up PR. Key changed/added classes in this patch: * CompactionSupervisor * CompactionSupervisorSpec * CoordinatorCompactionConfigsResource * OverlordCompactionScheduler	2024-09-24 17:19:16 +05:30
PANKAJ KUMAR	36dfff4b1a	Adding extra debug logs for the checkpoint logic (#16321 ) Logging to understand checkpointing better in streaming ingestion	2024-09-24 09:38:46 +05:30
Abhishek Radhakrishnan	635e418131	Support to parse numbers in text-based input formats (#17082 ) Text-based input formats like csv and tsv currently parse inputs only as strings, following the RFC4180Parser spec). To workaround this, the web-console and other tools need to further inspect the sample data returned to sample data returned by the Druid sampler API to parse them as numbers. This patch introduces a new optional config, tryParseNumbers, for the csv and tsv input formats. If enabled, any numbers present in the input will be parsed in the following manner -- long data type for integer types and double for floating-point numbers, and if parsing fails for whatever reason, the input is treated as a string. By default, this configuration is set to false, so numeric strings will be treated as strings.	2024-09-19 13:21:18 -07:00
Clint Wylie	4f137d2700	hard-code compaction tasks to use ARRAY for multi-value handling to preserve order (#17110 )	2024-09-19 11:56:12 -07:00
Misha	6aad9b08dd	Fix low sonatype findings (#17017 ) Fixed vulnerabilities CVE-2021-26291 : Apache Maven is vulnerable to Man-in-the-Middle (MitM) attacks. Various functions across several files, mentioned below, allow for custom repositories to use the insecure HTTP protocol. An attacker can exploit this as part of a Man-in-the-Middle (MitM) attack, taking over or impersonating a repository using the insecure HTTP protocol. Unsuspecting users may then have the compromised repository defined as a dependency in their Project Object Model (pom) file and download potentially malicious files from it. Was fixed by removing outdated tesla-aether library containing vulnerable maven-settings (v3.1.1) package, pull-deps utility updated to use maven resolver instead. sonatype-2020-0244 : The joni package is vulnerable to Man-in-the-Middle (MitM) attacks. This project downloads dependencies over HTTP due to an insecure repository configuration within the .pom file. Consequently, a MitM could intercept requests to the specified repository and replace the requested dependencies with malicious versions, which can execute arbitrary code from the application that was built with them. Was fixed by upgrading joni package to recommended 2.1.34 version	2024-09-16 16:10:25 +05:30
Clint Wylie	aa6336c5cf	add DataSchema.Builder to tidy stuff up a bit (#17065 ) * add DataSchema.Builder to tidy stuff up a bit * fixes * fixes * more style fixes * review stuff	2024-09-15 11:18:34 -07:00
Abhishek Radhakrishnan	5ef94c9dee	Add support for selective loading of broadcast datasources in the task layer (#17027 ) Tasks control the loading of broadcast datasources via BroadcastDatasourceLoadingSpec getBroadcastDatasourceLoadingSpec(). By default, tasks download all broadcast datasources, unless there's an override as with kill and MSQ controller task. The CLIPeon command line option --loadBroadcastSegments is deprecated in favor of --loadBroadcastDatasourceMode. Broadcast datasources can be specified in SQL queries through JOIN and FROM clauses, or obtained from other sources such as lookups.To this effect, we have introduced a BroadcastDatasourceLoadingSpec. Finding the set of broadcast datasources during SQL planning will be done in a follow-up, which will apply only to MSQ tasks, so they load only required broadcast datasources. This PR primarily focuses on the skeletal changes around BroadcastDatasourceLoadingSpec and integrating it from the Task interface via CliPeon to SegmentBootstrapper. Currently, only kill tasks and MSQ controller tasks skip loading broadcast datasources.	2024-09-12 13:30:28 -04:00
Pranav	a95397e712	Allow request headers in HttpInputSource in native and MSQ Ingestion (#16974 ) Support for adding the request headers in http input source. we can now pass the additional headers as json in both native and MSQ.	2024-09-12 11:18:44 +05:30
George Shiqi Wu	428f58cf15	Support maxColumnsToMerge in supervisor tuningConfig (#17030 ) * support maxColumnsToMerge in supervisor specs * remove log line * fix style * add docs * fix unit tests	2024-09-11 18:00:13 -04:00
Laksh Singla	72fbaf2e56	Non querying tasks shouldn't use processing buffers / merge buffers (#16887 ) Tasks that do not support querying or query processing i.e. supportsQueries = false do not require processing threads, processing buffers, and merge buffers.	2024-09-10 11:36:36 +05:30
Abhishek Agarwal	78775ad398	Prepare master for 32.0.0 release (#17022 )	2024-09-10 11:01:20 +05:30
Clint Wylie	f57cd6f7af	transition away from StorageAdapter (#16985 ) * transition away from StorageAdapter changes: * CursorHolderFactory has been renamed to CursorFactory and moved off of StorageAdapter, instead fetched directly from the segment via 'asCursorFactory'. The previous deprecated CursorFactory interface has been merged into StorageAdapter * StorageAdapter is no longer used by any engines or tests and has been marked as deprecated with default implementations of all methods that throw exceptions indicating the new methods to call instead * StorageAdapter methods not covered by CursorFactory (CursorHolderFactory prior to this change) have been moved into interfaces which are retrieved by Segment.as, the primary classes are the previously existing Metadata, as well as new interfaces PhysicalSegmentInspector and TopNOptimizationInspector * added UnnestSegment and FilteredSegment that extend WrappedSegmentReference since their StorageAdapter implementations were previously provided by WrappedSegmentReference * added PhysicalSegmentInspector which covers some of the previous StorageAdapter functionality which was primarily used for segment metadata queries and other metadata uses, and is implemented for QueryableIndexSegment and IncrementalIndexSegment * added TopNOptimizationInspector to cover the oddly specific StorageAdapter.hasBuiltInFilters implementation, which is implemented for HashJoinSegment, UnnestSegment, and FilteredSegment * Updated all engines and tests to no longer use StorageAdapter	2024-09-09 14:55:29 -07:00
Kashif Faraz	ba6f804f48	Fix compaction status API response (#17006 ) Description: #16768 introduces new compaction APIs on the Overlord `/compact/status` and `/compact/progress`. But the corresponding `OverlordClient` methods do not return an object compatible with the actual endpoints defined in `OverlordCompactionResource`. This patch ensures that the objects are compatible. Changes: - Add `CompactionStatusResponse` and `CompactionProgressResponse` - Use these as the return type in `OverlordClient` methods and as the response entity in `OverlordCompactionResource` - Add `SupervisorCleanupModule` bound on the Coordinator to perform cleanup of supervisors. Without this module, Coordinator cannot deserialize compaction supervisors.	2024-09-05 23:22:01 +05:30

1 2 3 4 5 ...

2318 Commits