druid

Commit Graph

Author	SHA1	Message	Date
Kashif Faraz	e51181957c	Use num cores to determine balancerComputeThreads (#14902 ) Changes: - Determine the default value of balancerComputeThreads based on number of coordinator cpus rather than number of segments. Even if the number of segments is low and we create more balancer threads, it doesn't hurt the system as threads would mostly be idle. - Remove unused field from SegmentLoadQueueManager Expected values: - Clusters with ~1M segments typically work with Coordinators having 16 cores or more. This would give us 8 balancer threads, which is the same as the current maximum. - On small clusters, even a single thread is enough to do the required balancing work.	2023-08-25 08:15:27 +05:30
Clint Wylie	36e659a501	remove group-by v1 (#14866 ) * remove group-by v1 * docs * remove unused configs, fix test * fix test * adjustments * why not * adjust * review stuff	2023-08-23 12:44:06 -07:00
zachjsh	0c76df1c7d	Enable Continuous auto kill (#14831 ) ### Description This change enables the `KillUnusedSegments` coordinator duty to be scheduled continuously. Things that prevented this, or made this difficult before were the following: 1. If scheduled at fast enough rate, the duty would find the same intervals to kill for the same datasources, while kill tasks submitted for those same datasources and intervals were already underway, thus wasting task slots on duplicated work. 2. The task resources used by auto kill were previously unbounded. Each duty run period, if unused segments were found for any datasource, a kill task would be submitted to kill them. This pr solves for both of these issues: 1. The duty keeps track of the end time of the last interval found when killing unused segments for each datasource, in a in memory map. The end time for each datasource, if found, is used as the start time lower bound, when searching for unused intervals for that same datasource. Each duty run, we remove any datasource keys from this map that are no longer found to match datasources in the system, or in whitelist, and also remove a datasource entry, if there is found to be no unused segments for the datasource, which happens when we fail to find an interval which includes unused segments. Removing the datasource entry from the map, allows for searching for unusedSegments in the datasource from the beginning of time once again 2. The unbounded task resource usage can be mitigated with coordinator dynamic config added as part of `ba957a9b97` Operators can configure continous auto kill by providing coordinator runtime properties similar to the following: ``` druid.coordinator.period.indexingPeriod=PT60S druid.coordinator.kill.period=PT60S ``` And providing sensible limits to the killTask usage via coordinator dynamic properties.	2023-08-23 09:23:08 -04:00
Adarsh Sanjeev	dfb5a98888	Add coordinator API for unused segments (#14846 ) There is a current issue due to inconsistent metadata between worker and controller in MSQ. A controller can receive one set of segments, which are then marked as unused by, say, a compaction job. The worker would be unable to get the segment information as MetadataResource.	2023-08-23 14:51:25 +05:30
Kashif Faraz	9376d8d6e1	Refactor: Move `UpdateCoordinatorStateAndPrepareCluster` duty out of `DruidCoordinator` (#14845 ) Motivation: - Clean up `DruidCoordinator` and move methods to classes where they are most relevant Changes: - No functional change - Add duty `PrepareBalancerAndLoadQueues` to replace `UpdateCoordinatorState` - Move map of `LoadQueuePeon` from `DruidCoordinator` to `LoadQueueTaskMaster` - Make `BalancerStrategyFactory` an abstract class and keep the balancer executor here - Move reporting of used segment stats and historical capacity stats from `CollectSegmentAndServerStats` to `PrepareBalancerAndLoadQueues` - Move reporting of unavailable and under-replicated segment stats from `CollectSegmentAndServerStats` to `UpdateReplicationStatus` duty	2023-08-22 19:50:41 +05:30
Tejaswini Bandlamudi	d87056e708	Upgrade guava version to 31.1-jre (#14767 ) Currently, Druid is using Guava 16.0.1 version. This upgrade to 31.1-jre fixes the following issues. CVE-2018-10237 (Unbounded memory allocation in Google Guava 11.0 through 24.x before 24.1.1 allows remote attackers to conduct denial of service attacks against servers that depend on this library and deserialize attacker-provided data because the AtomicDoubleArray class (when serialized with Java serialization) and the CompoundOrdering class (when serialized with GWT serialization) perform eager allocation without appropriate checks on what a client has sent and whether the data size is reasonable). We don't use Java or GWT serializations. Despite being false positive they're causing red security scans on Druid distribution. Latest version of google-client-api is incompatible with the existing Guava version. This PR unblocks Update google client apis to latest version #14414	2023-08-22 12:09:53 +05:30
Kashif Faraz	92906059d2	Remove segmentsToBeDropped from SegmentTransactionInsertAction (#14883 ) Motivation: - There is no usage of the `SegmentTransactionInsertAction` which passes a non-null non-empty value of `segmentsToBeDropped`. - This is not really needed either as overshadowed segments are marked as unused by the Coordinator and need not be done in the same transaction as committing segments. - It will also help simplify the changes being made in #14407 Changes: - Remove `segmentsToBeDropped` from the task action and all intermediate methods - Remove related tests which are not needed anymore	2023-08-21 20:08:56 +05:30
Kashif Faraz	c211dcc4b3	Clean up compaction logs on coordinator (#14875 ) Changes: - Move logic of `NewestSegmentFirstIterator.needsCompaction` to `CompactionStatus` to improve testability and readability - Capture the list of checks performed to determine if compaction is needed in a readable manner in `CompactionStatus.CHECKS` - Make `CompactionSegmentIterator` iterate over instances of `SegmentsToCompact` instead of `List<DataSegment>`. This allows use of the `umbrellaInterval` later. - Replace usages of `QueueEntry` with `SegmentsToCompact` - Move `SegmentsToCompact` out of `NewestSegmentFirstIterator` - Simplify `CompactionStatistics` - Reduce level of less important logs to debug - No change made to tests to ensure correctness	2023-08-21 17:30:41 +05:30
Kashif Faraz	07a193a142	Use separate executor for each coordinator duty group (#14869 ) Changes: - Use separate executor for every duty group - This change is thread-safe as every duty group uses its own copy of `DruidCoordinatorRuntimeParams` and does not share any other mutable instances with other duty groups. - With the exception of `HistoricalManagementDuties`, duty groups are typically not very compute intensive and mostly perform database or HTTP I/O. So, coordinator resources would still mostly be available for `HistoricalManagementDuties`.	2023-08-21 15:53:22 +05:30
Abhishek Agarwal	9065ef1aff	Fix a bug in QosFilter (#14859 ) QoSFilter class is trying to parse the timeout as an integer. We need to round a value of query timeout that is higher than INT.MAX to INT.MAX.	2023-08-21 13:00:41 +05:30
Kashif Faraz	097b645005	Clean up after add kill bufferPeriod (#14868 ) Follow up changes to #12599 Changes: - Rename column `used_flag_last_updated` to `used_status_last_updated` - Remove new CLI tool `UpdateTables`. - We already have a `CreateTables` with similar functionality, which should be able to handle update cases too. - Any user running the cluster for the first time should either just have `connector.createTables` enabled or run `CreateTables` which should create tables at the latest version. - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has been added to Druid, and users would have to run `CreateTables` anyway. - Remove `upgrade-prep.md` and include that info in `metadata-init.md`. - Fix log messages to adhere to Druid style - Use lambdas	2023-08-19 00:00:04 +05:30
Lucas Capistrant	9c124f2cde	Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty (#12599 ) * Add new configurable buffer period to create gap between mark unused and kill of segment * Changes after testing * fixes and improvements * changes after initial self review * self review changes * update sql statement that was lacking last_used * shore up some code in SqlMetadataConnector after self review * fix derby compatibility and improve testing/docs * fix checkstyle violations * Fixes post merge with master * add some unit tests to improve coverage * ignore test coverage on new UpdateTools cli tool * another attempt to ignore UpdateTables in coverage check * change column name to used_flag_last_updated * fix a method signature after column name switch * update docs spelling * Update spelling dictionary * Fixing up docs/spelling and integrating altering tasks table with my alteration code * Update NULL values for used_flag_last_updated in the background * Remove logic to allow segs with null used_flag_last_updated to be killed regardless of bufferPeriod * remove unneeded things now that the new column is automatically updated * Test new background row updater method * fix broken tests * fix create table statement * cleanup DDL formatting * Revert adding columns to entry table by default * fix compilation issues after merge with master * discovered and fixed metastore inserts that were breaking integration tests * fixup forgotten insert by using pattern of sharing now timestamp across columns * fix issue introduced by merge * fixup after merge with master * add some directions to docs in the case of segment table validation issues	2023-08-17 19:32:51 -05:00
Abhishek Radhakrishnan	37db5d9b81	Reset offsets supervisor API (#14772 ) * Add supervisor /resetOffsets API. - Add a new endpoint /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets which accepts DataSourceMetadata as a body parameter. - Update logs, unit tests and docs. * Add a new interface method for backwards compatibility. * Rename * Adjust tests and javadocs. * Use CoreInjectorBuilder instead of deprecated makeInjectorWithModules * UT fix * Doc updates. * remove extraneous debugging logs. * Remove the boolean setting; only ResetHandle() and resetInternal() * Relax constraints and add a new ResetOffsetsNotice; cleanup old logic. * A separate ResetOffsetsNotice and some cleanup. * Minor cleanup * Add a check & test to verify that sequence numbers are only of type SeekableStreamEndSequenceNumbers * Add unit tests for the no op implementations for test coverage * CodeQL fix * checkstyle from merge conflict * Doc changes * DOCUSAURUS code tabs fix. Thanks, Brian!	2023-08-17 14:13:10 -07:00
Kashif Faraz	fffb2e4fe7	Speed up SQLMetadataStorageActionHandlerTest (#14856 ) Changes - Reduce test time of `SQLMetadataStorageActionHandlerTest.testMigration` - Slightly modify log messages to adhere to Druid style	2023-08-17 18:02:43 +05:30
Kashif Faraz	5d4ac64178	Adapt maxSegmentsToMove based on cluster skew (#14584 ) Changes: - No change in behaviour if `smartSegmentLoading` is disabled - If `smartSegmentLoading` is enabled - Compute `balancerComputeThreads` based on `numUsedSegments` - Compute `maxSegmentsToMove` based on `balancerComputeThreads` - Compute `segmentsToMoveToFixSkew` based on usage skew - Compute `segmentsToMove = Math.min(maxSegmentsToMove, segmentsToMoveToFixSkew)` Limits: - 1 <= `balancerComputeThreads` <= 8 - `maxSegmentsToMove` <= 20% of total segments - `minSegmentsToMove` = 0.15% of total segments	2023-08-17 11:14:54 +05:30
Clint Wylie	6b14dde50e	deprecate config-magic in favor of json configuration stuff (#14695 ) * json config based processing and broker merge configs to deprecate config-magic	2023-08-16 18:23:57 -07:00
Kashif Faraz	d9221e46e4	Completely disable cachingCost balancer strategy (#14798 ) `cachingCost` has been deprecated in #14484 and is not advised to be used in production clusters as it may cause usage skew across historicals which the coordinator is unable to rectify. This PR completely disables `cachingCost` strategy as it has now been rendered redundant due to recent performance improvements made to `cost` strategy. Changes - Disable `cachingCost` strategy - Add `DisabledCachingCostBalancerStrategyFactory` for the time being so that we can give a proper error message before falling back to `CostBalancerStrategy`. This will be removed in subsequent releases. - Retain `CachingCostBalancerStrategy` for testing/benchmarking purposes. - Add javadocs to `DiskNormalizedCostBalancerStrategy`	2023-08-16 11:43:52 +05:30
AmatyaAvadhanula	e16096735b	Fix 404 when segment is used but not in the Coordinator snapshot (#14762 ) * Fix 404 when used segment has not been updated in the Coordinator snapshot * Add unit test	2023-08-14 13:20:43 +05:30
Kashif Faraz	786e772d26	Remove config `druid.coordinator.compaction.skipLockedIntervals` (#14807 ) The value of `druid.coordinator.compaction.skipLockedIntervals` should always be `true`.	2023-08-14 12:31:15 +05:30
Rishabh Singh	0dc305f9e4	Upgrade hibernate validator version to fix CVE-2019-10219 (#14757 )	2023-08-14 11:50:51 +05:30
zachjsh	82d82dfbd6	Add stats to KillUnusedSegments coordinator duty (#14782 ) ### Description Added the following metrics, which are calculated from the `KillUnusedSegments` coordinatorDuty `"killTask/availableSlot/count"`: calculates the number remaining task slots available for auto kill `"killTask/maxSlot/count"`: calculates the maximum number of tasks available for auto kill `"killTask/task/count"`: calculates the number of tasks submitted by auto kill. #### Release note NEW: metrics added for auto kill `"killTask/availableSlot/count"`: calculates the number remaining task slots available for auto kill `"killTask/maxSlot/count"`: calculates the maximum number of tasks available for auto kill `"killTask/task/count"`: calculates the number of tasks submitted by auto kill.	2023-08-10 18:36:53 -04:00
Rishabh Singh	4b9846b90f	Improve exception message when DruidLeaderClient doesn't find leader node (#14775 ) The existing exception message No known server thrown in DruidLeaderClient is unhelpful.	2023-08-10 16:37:37 +05:30
Tejaswini Bandlamudi	550a66d71e	Upgrade jackson-databind to 2.12.7 (#14770 ) The current version of jackson-databind is flagged for vulnerabilities CVE-2020-28491 (Although cbor format is not used in druid), CVE-2020-36518 (Seems genuine as deeply nested json in can cause resource exhaustion). Updating the dependency to the latest version 2.12.7 to fix these vulnerabilities.	2023-08-09 12:22:16 +05:30
zachjsh	660e6cfa01	Allow for task limit on kill tasks spawned by auto kill coordinator duty (#14769 ) ### Description Previously, the `KillUnusedSegments` coordinator duty, in charge of periodically deleting unused segments, could spawn an unlimited number of kill tasks for unused segments. This change adds 2 new coordinator dynamic configs that can be used to control the limit of tasks spawned by this coordinator duty `killTaskSlotRatio`: Ratio of total available task slots, including autoscaling if applicable that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty. Default is 1, which allows all available tasks to be used, which is the existing behavior `maxKillTaskSlots`: Maximum number of tasks that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty. Default is INT.MAX, which essentially allows for unbounded number of tasks, which is the existing behavior. Realize that we can effectively get away with just the one `killTaskSlotRatio`, but following similarly to the compaction config, which has similar properties; I thought it was good to have some control of the upper limit regardless of ratio provided. #### Release note NEW: `killTaskSlotRatio` and `maxKillTaskSlots` coordinator dynamic config properties added that allow control of task resource usage spawned by `KillUnusedSegments` coordinator task (auto kill)	2023-08-08 08:40:55 -04:00
Kashif Faraz	2d8e0f28f3	Refactor: Cleanup coordinator duties for metadata cleanup (#14631 ) Changes - Add abstract class `MetadataCleanupDuty` - Make `KillAuditLogs`, `KillCompactionConfig`, etc extend `MetadataCleanupDuty` - Improve log and error messages - Cleanup tests - No functional change	2023-08-05 13:08:23 +05:30
Suneet Saldanha	62ddeaf16f	Additional dimensions for service/heartbeat (#14743 ) * Additional dimensions for service/heartbeat * docs * review * review	2023-08-04 11:01:07 -07:00
Pranav	d31c04c4c6	Fix the bug in getIndexInfo for mysql (#14750 )	2023-08-03 21:45:01 -07:00
zachjsh	ba957a9b97	Add ability to limit the number of segments killed in kill task (#14662 ) ### Description Previously, the `maxSegments` configured for auto kill could be ignored if an interval of data for a given datasource had more than this number of unused segments, causing the kill task spawned with the task of deleting unused segments in that given interval of data to delete more than the `maxSegments` configured. Now each kill task spawned by the auto kill coordinator duty, will kill at most `limit` segments. This is done by adding a new config property to the `KillUnusedSegmentTask` which allows users to specify this limit.	2023-08-03 22:17:04 -04:00
imply-cheddar	748874405c	Minimize PostAggregator computations (#14708 ) * Minimize PostAggregator computations Since a change back in 2014, the topN query has been computing all PostAggregators on all intermediate responses from leaf nodes to brokers. This generates significant slow downs for queries with relatively expensive PostAggregators. This change rewrites the query that is pushed down to only have the minimal set of PostAggregators such that it is impossible for downstream processing to do too much work. The final PostAggregators are applied at the very end.	2023-08-04 00:04:31 +05:30
Kashif Faraz	b27d281b11	Remove unused param in MetadataResource (#14747 )	2023-08-03 19:18:01 +05:30
Kashif Faraz	ee4e0c93b4	Improve alert message for segment assignments (#14696 ) Changes: - Add interface `SegmentDeleteHandler` for marking segments as unused - In `StrategicSegmentAssigner`, collect all segments on which a drop rule applies in a list - Process the list above as a batch delete rather than individual deletes - Improve alert messages when an invalid tier is specified in a load rule - Improve alert message when no rule applies on a segment	2023-08-01 23:33:05 +05:30
Kashif Faraz	10328c0743	Rename metadatacache and serverview metrics (#14716 )	2023-08-01 14:18:20 +05:30
Kashif Faraz	d04521d58f	Improve description field when emitting metric for broadcast failure (#14703 ) Changes: - Emit descriptions such as `Load queue is full`, `No disk space` etc. instead of `Unknown error` - Rewrite `BroadcastDistributionRuleTest`	2023-08-01 10:13:55 +05:30
Jason Koch	44d5c1a15f	split KillUnusedSegmentsTask to processing in smaller chunks (#14642 ) split KillUnusedSegmentsTask to smaller batches Processing in smaller chunks allows the task execution to yield the TaskLockbox lock, which allows the overlord to continue being responsive to other tasks and users while this particular kill task is executing. * introduce KillUnusedSegmentsTask batchSize parameter to control size of batching * provide an explanation for kill task batchSize parameter * add logging details for kill batch progress	2023-07-31 12:56:27 -07:00
Kashif Faraz	844a9c7ffb	Cancel loads of unused segments (#14644 )	2023-07-31 18:01:50 +05:30
Kashif Faraz	e9b4f1e95c	Fix reported replication factor of segment with zero required replicas (#14701 )	2023-07-31 14:51:01 +05:30
Kashif Faraz	22290fd632	Test: Simplify test impl of LoadQueuePeon (#14684 ) Changes - Rename `LoadQueuePeonTester` to `TestLoadQueuePeon` - Simplify `TestLoadQueuePeon` by removing dependency on `CuratorLoadQueuePeon` - Remove usages of mock peons in `LoadRuleTest` and use `TestLoadQueuePeon` instead	2023-07-28 16:14:23 +05:30
TSFenwick	9a9038c7ae	Speed up kill tasks by deleting segments in batch (#14131 ) * allow for batched delete of segments instead of deleting segment data one by one create new batchdelete method in datasegment killer that has default functionality of iterating through all segments and calling delete on them. This will enable a slow rollout of other deepstorage implementations to move to a batched delete on their own time * cleanup batchdelete segments * batch delete with the omni data deleter cleaned up code just need to add tests and docs for this functionality * update java doc to explain how it will try to use batch if function is overwritten * rename killBatch to kill add unit tests * add omniDataSegmentKillerTest for deleting multiple segments at a time. fix checkstyle * explain test peculiarity better * clean up batch kill in s3. * remove unused return value. cleanup comments and fix checkstyle * default to batch delete. more specific java docs. list segments that couldn't be deleted if there was a client error or server error * simplify error handling * add tests where an exception is thrown when killing multiple s3 segments * add test for failing to delete two calls with the s3 client * fix javadoc for kill(List<DataSegment> segments) clean up tests remove feature flag * fix typo in javadocs * fix test failure * fix checkstyle and improve tests * fix intellij inspections issues * address comments, make delete multiple segments not assume same bucket * fix test errors * better grammar and punctuation. fix test. and better logging for exception * remove unused code * avoid extra arraylist instantiation * fix broken test * fix broken test * fix tests to use assert.throws	2023-07-27 15:34:44 -07:00
Gian Merlino	986a271a7d	Merge core CoordinatorClient with MSQ CoordinatorServiceClient. (#14652 ) * Merge core CoordinatorClient with MSQ CoordinatorServiceClient. Continuing the work from #12696, this patch merges the MSQ CoordinatorServiceClient into the core CoordinatorClient, yielding a single interface that serves both needs and is based on the ServiceClient RPC system rather than DruidLeaderClient. Also removes the backwards-compatibility code for the handoff API in CoordinatorBasedSegmentHandoffNotifier, because the new API was added in 0.14.0. That's long enough ago that we don't need backwards compatibility for rolling updates. * Fixups. * Trigger GHA. * Remove unnecessary retrying in DruidInputSource. Add "about an hour" retry policy and h * EasyMock	2023-07-27 13:23:37 -07:00
Kashif Faraz	7634ac896e	Quick fix for SegmentLoadDropHandler bug (#14670 )	2023-07-27 11:53:58 +05:30
Gian Merlino	4a68f8a294	Fix maxCompletedTasks parameter in OverlordClientImpl. (#14667 ) It was sent to the server as "maxCompletedTasks", but the server expects "max". This caused it to be ignored. This bug was introduced in #14581.	2023-07-26 15:12:20 -07:00
Gian Merlino	2f9619a96f	Use OverlordClient for all Overlord RPCs. (#14581 ) * Use OverlordClient for all Overlord RPCs. Continuing the work from #12696, this patch removes HttpIndexingServiceClient and the IndexingService flavor of DruidLeaderClient completely. All remaining usages are migrated to OverlordClient. Supporting changes include: 1) Add a variety of methods to OverlordClient. 2) Update MetadataTaskStorage to skip the complete-task lookup when the caller requests zero completed tasks. This helps performance of the "get active tasks" APIs, which don't want to see complete ones. * Use less forbidden APIs. * Fixes from CI. * Add test coverage. * Two more tests. * Fix test. * Updates from CR. * Remove unthrown exceptions. * Refactor to improve testability and test coverage. * Add isNil tests. * Remove unnecessary "deserialize" methods.	2023-07-24 21:14:27 -07:00
aho135	607f511767	Improve logging in CoordinatorBasedSegmentHandoffNotifier (#14640 )	2023-07-24 18:04:21 +05:30
Jason Koch	54f29fedce	Use PreparedBatch while deleting segments (#14639 ) Related to #14634 Changes: - Update `IndexerSQLMetadataStorageCoordinator.deleteSegments` to use JDBI PreparedBatch instead of issuing single DELETE statements	2023-07-23 22:55:04 +05:30
Abhishek Agarwal	1ddbaa8744	Reserve threads for non-query requests without using laning (#14576 ) This PR uses the QoSFilter available in Jetty to park the query requests that exceed a configured limit. This is done so that other HTTP requests such as health check calls do not get blocked if the query server is busy serving long-running queries. The same mechanism can also be used in the future to isolate interactive queries from long-running select queries from interactive queries within the same broker. Right now, you can still get that isolation by setting druid.query.scheduler.numThreads to a value lowe than druid.server.http.numThreads. That enables total laning but the side effect is that excess requests are not queued and rejected outright that leads to a bad user experience. Parked requests are timed out after 30 seconds by default. I overrode that to the maxQueryTimeout in this PR.	2023-07-20 15:03:48 +05:30
Clint Wylie	913416c669	add equality, null, and range filter (#14542 ) changes: * new filters that preserve match value typing to better handle filtering different column types * sql planner uses new filters by default in sql compatible null handling mode * remove isFilterable from column capabilities * proper handling of array filtering, add array processor to column processors * javadoc for sql test filter functions * range filter support for arrays, tons more tests, fixes * add dimension selector tests for mixed type roots * support json equality * rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes * add cooler equality index maker, ValueIndexes * fix missing string utf8 index supplier * expression array comparator stuff	2023-07-18 12:15:22 -07:00
AmatyaAvadhanula	0412f40d36	Prepare master branch for next release, 28.0.0 (#14595 ) * Prepare master branch for next release, 28.0.0	2023-07-18 09:22:30 +05:30
Kashif Faraz	ab051d9c5e	Add test for ReservoirSegmentSampler (#14591 ) Tests to verify the following behaviour have been added: - Segments from more populous servers are more likely to be picked irrespective of sample size. - Segments from all servers are equally likely to be picked if all servers have equivalent number of segments.	2023-07-17 18:50:02 +05:30
Gian Merlino	95ca43034f	Change default handoffConditionTimeout to 15 minutes. (#14539 ) * Change default handoffConditionTimeout to 15 minutes. Most of the time, when handoff is taking this long, it's because something is preventing Historicals from loading new data. In this case, we have two choices: 1) Stop making progress on ingestion, wait for Historicals to load stuff, and keep the waiting-for-handoff segments available on realtime tasks. (handoffConditionTimeout = 0, the current default) 2) Continue making progress on ingestion, by exiting the realtime tasks that were waiting for handoff. Once the Historicals get their act together, the segments will be loaded, as they are still there on deep storage. They will just not be continuously available. (handoffConditionTimeout > 0) I believe most users would prefer [2], because [1] risks ingestion falling behind the stream, which causes many other problems. It can cause data loss if the stream ages-out data before we have a chance to ingest it. Due to the way tuningConfigs are serialized -- defaults are baked into the serialized form that is written to the database -- this default change will not change anyone's existing supervisors. It will take effect for newly created supervisors. * Fix tests. * Update docs/development/extensions-core/kafka-supervisor-reference.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kinesis-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-07-13 13:17:14 -07:00
Abhishek Radhakrishnan	f4ee58eaa8	Add `aggregatorMergeStrategy` property in SegmentMetadata queries (#14560 ) * Add aggregatorMergeStrategy property to SegmentMetadaQuery. - Adds a new property aggregatorMergeStrategy to segmentMetadata query. aggregatorMergeStrategy currently supports three types of merge strategies - the legacy strict and lenient strategies, and the new latest strategy. - The latest strategy considers the latest aggregator from the latest segment by time order when there's a conflict when merging aggregators from different segments. - Deprecate lenientAggregatorMerge property; The API validates that both the new and old properties are not set, and returns an exception. - When merging segments as part of segmentMetadata query, the segments have a more elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to the name format that segments usually contain. Previously it was simply "merged". - Adjust unit tests to test the latest strategy, to assert the returned complete SegmentAnalysis object instead of just the aggregators for completeness. * Don't explicitly set strict strategy in tests * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/segmentmetadataquery.md * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-07-13 12:37:36 -04:00

1 2 3 4 5 ...

4107 Commits