Description:
Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368.
As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs.
- Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments`
which returns the list of segments created by a task.
- The use case is really narrow and no prod clusters really use this information.
- There can be better ways of obtaining this information, such as the metric
`segment/added/bytes` which reports both the segment ID and task ID
when a segment is committed by a task. We could also include committed segment IDs in task reports.
- A task persisting several segments would bloat up the audit logs table putting unnecessary strain
on metadata storage.
Changes:
- Remove `TaskAuditLogConfig`
- Remove method `TaskAction.isAudited()`. No task action is audited anymore.
- Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction`
is the new incarnation which has been in use for a while.
- Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore
but need to be retained for backward compatibility of extensions.
- Do not create `druid_taskLog` metadata table anymore.
Changes:
- Break `NewestSegmentFirstIterator` into two parts
- `DatasourceCompactibleSegmentIterator` - this contains all the code from `NewestSegmentFirstIterator`
but now handles a single datasource and allows a priority to be specified
- `PriorityBasedCompactionSegmentIterator` - contains separate iterator for each datasource and
combines the results into a single queue to be used by a compaction search policy
- Update `NewestSegmentFirstPolicy` to use the above new classes
- Cleanup `CompactionStatistics` and `AutoCompactionSnapshot`
- Cleanup `CompactSegments`
- Remove unused methods from `Tasks`
- Remove unneeded `TasksTest`
- Move tests from `NewestSegmentFirstIteratorTest` to `CompactionStatusTest`
and `DatasourceCompactibleSegmentIteratorTest`
Better fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes:
a. We don't touch the subquery sequence till we know that we can materialize the result as frames
Description:
Compaction operations issued by the Coordinator currently run using the native query engine.
As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative
that we support compaction on MSQ to make Compaction more robust and possibly faster.
For instance, we have seen OOM errors in native compaction that MSQ could have handled by its
auto-calculation of tuning parameters.
This commit enables compaction on MSQ to remove the dependency on native engine.
Main changes:
* `DataSourceCompactionConfig` now has an additional field `engine` that can be one of
`[native, msq]` with `native` being the default.
* if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the
launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which
could happen in case a fraction of the tasks were allotted and they eventually fell short of the number
of tasks required by the MSQ engine to run the compaction.
* `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field.
* `CompactionTask` now has `CompactionRunner` interface instance with its implementations
`NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension.
The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the
`CompactionRunner` instance that is mapped to the specified type [`native`, `msq`].
* `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks.
* `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in
the segment schema. If present, the task is created with a native group-by query; if not, the task is
issued with a scan query. The `storeCompactionState` flag is set in the context.
* Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the
final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask`
since otherwise, the workers will be unable to determine the controller task's location for communication
(as they haven't been launched via the overlord).
Motivation:
- Improve code hygeiene
- Make `SegmentLoadDropHandler` easily extensible
Changes:
- Add `SegmentBootstrapper`
- Move code for bootstrapping segments already cached on disk and fetched from coordinator to
`SegmentBootstrapper`.
- No functional change
- Use separate executor service in `SegmentBootstrapper`
- Bind `SegmentBootstrapper` to `ManageLifecycle` explicitly in `CliBroker`, `CliHistorical` etc.
Previously, the segment granularity for tables in the catalog had to be defined in period format, ie `'PT1H'` , `'P1D'`, etc. This disallows a user from defining segment granularity of `'ALL'` for a table in the catalog, which may be a valid use case. This change makes it so that a user may define the segment granularity of a table in the catalog, as any string that results in a valid granularity using either the `Granularity.fromString(str)` method, or `new PeriodGranularity(new Period(value), null, null)`, and that granularity maps to a standard supported granularity, where `GranularityType.isStandard(granularity)` returns true. As a result a user may who wants to assign a catalog table's segment granularity to be hourly, may assign the segment granularity property of the table to be either `PT1H`, or `HOUR`. These are the same formats accepted at query time.
Changes:
- Rename `UsedSegmentChecker` to `PublishedSegmentsRetriever`
- Remove deprecated single `Interval` argument from `RetrieveUsedSegmentsAction`
as it is now unused and has been deprecated since #1988
- Return `Set` of segments instead of a `Collection` from `IndexerMetadataStorageCoordinator.retrieveUsedSegments()`
index_realtime tasks were removed from the documentation in #13107. Even
at that time, they weren't really documented per se— just mentioned. They
existed solely to support Tranquility, which is an obsolete ingestion
method that predates migration of Druid to ASF and is no longer being
maintained. Tranquility docs were also de-linked from the sidebars and
the other doc pages in #11134. Only a stub remains, so people with
links to the page can see that it's no longer recommended.
index_realtime_appenderator tasks existed in the code base, but were
never documented, nor as far as I am aware were they used for any purpose.
This patch removes both task types completely, as well as removes all
supporting code that was otherwise unused. It also updates the stub
doc for Tranquility to be firmer that it is not compatible. (Previously,
the stub doc said it wasn't recommended, and pointed out that it is
built against an ancient 0.9.2 version of Druid.)
ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion.
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
* Initial support for bootstrap segments.
- Adds a new API in the coordinator.
- All processes that have storage locations configured (including tasks)
talk to the coordinator if they can, and fetch bootstrap segments from it.
- Then load the segments onto the segment cache as part of startup.
- This addresses the segment bootstrapping logic required by processes before
they can start serving queries or ingesting.
This patch also lays the foundation to speed up upgrades.
* Fail open by default if there are any errors talking to the coordinator.
* Add test for failure scenario and cleanup logs.
* Cleanup and add debug log
* Assert the events so we know the list exactly.
* Revert RunRules test.
The rules aren't evaluated if there are no clusters.
* Revert RunRulesTest too.
* Remove debug info.
* Make the API POST and update log.
* Fix up UTs.
* Throw 503 from MetadataResource; clean up exception handling and DruidException.
* Remove unused logger, add verification of metrics and docs.
* Update error message
* Update server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* Apply suggestions from code review
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* Adjust test metric expectations with the rename.
* Add BootstrapSegmentResponse container in the response for future extensibility.
* Rename to BootstrapSegmentsInfo for internal consistency.
* Remove unused log.
* Use a member variable for broadcast segments instead of segmentAssigner.
* Minor cleanup
* Add test for loadable bootstrap segments and clarify comment.
* Review suggestions.
---------
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Changes:
- Add new task action `RetrieveSegmentsByIdAction`
- Use new task action to retrieve segments irrespective of their visibility
- During rolling upgrades, this task action would fail as Overlord would be on old version
- If new action fails, fall back to just fetching used segments as before
* Fix flakey BrokerClientTest.
The testError() method reliably fails in the IDE. This is because the
the test runner has
<surefire.rerunFailingTestsCount>3</surefire.rerunFailingTestsCount> is set to 3, so maven
retries this "flaky test" multiple times and the test code returns a successful response
in the third attempt.
The exception handling in BrokerClientTest was broken:
- All non-2xx errors were being turned as 5xx errors. Remove that block of
code. If we need to handle retries of more specific 5xx error codes, that should be
hanlded explicitly. Or if there's a source of 4xx class error that needs to be 5xx,
fix that in the source of error.
* Fix CodeQL warning for unused parameter.
* Handle null values of centralized schema config in PartialMergeTask
* Fix checkstyle
* Do not pass centralized schema config from supervisor task to sub-tasks
* Do not pass ObjectMapper in constructor of task
* Fix logs
* Fix tests
* Router: Authorize permissionless internal requests.
Router-internal requests like /proxy/enabled and errors for invalid
requests should not require permissions, but they still need to be
authorized in order to satisfy the PreResponseAuthorizationCheckFilter.
This patch adds authorization checks that do not require any particular
permissions.
* Fix tests.
* Add interface method for returning canonical lookup name
* Address review comment
* Add test in LookupReferencesManagerTest for coverage check
* Add test in LookupSerdeModuleTest for coverage check
* Fix task bootstrap locations.
* Remove dependency of SegmentCacheManager from SegmentLoadDropHandler.
- The load drop handler code talks to the local cache manager via
SegmentManager.
* Clean up unused imports and stuff.
* Test fixes.
* Intellij inspections and test bind.
* Clean up dependencies some more
* Extract test load spec and factory to its own class.
* Cleanup test util
* Pull SegmentForTesting out to TestSegmentUtils.
* Fix up.
* Minor changes to infoDir
* Replace server announcer mock and verify that.
* Add tests.
* Update javadocs.
* Address review comments.
* Separate methods for download and bootstrap load
* Clean up return types and exception handling.
* No callback for loadSegment().
* Minor cleanup
* Pull out the test helpers into its own static class so it can have better state control.
* LocalCacheManager stuff
* Fix build.
* Fix build.
* Address some CI warnings.
* Minor updates to javadocs and test code.
* Address some CodeQL test warnings and checkstyle fix.
* Pass a Consumer<DataSegment> instead of boolean & rename variables.
* Small updates
* Remove one test constructor.
* Remove the other constructor that wasn't initializing fully and update usages.
* Cleanup withInfoDir() builder and unnecessary test hooks.
* Remove mocks and elaborate on comments.
* Commentary
* Fix a few Intellij inspection warnings.
* Suppress corePoolSize intellij-inspect warning.
The intellij-inspect tool doesn't seem to correctly inspect
lambda usages. See ScheduledExecutors.
* Update docs and add more tests.
* Use hamcrest for asserting order on expectation.
* Shutdown bootstrap exec.
* Fix checkstyle
* fix issue with auto column grouping
changes:
* fixes bug where AutoTypeColumnIndexer reports incorrect cardinality, allowing it to incorrectly use array grouper algorithm for realtime queries producing incorrect results for strings
* fixes bug where auto LONG and DOUBLE type columns incorrectly report not having null values, resulting in incorrect null handling when grouping
* fix test
This PR updates CompactionTask to not load any lookups by default, unless transformSpec is present.
If transformSpec is present, we will make the decision based on context values, loading all lookups by default. This is done to ensure backward compatibility since transformSpec can reference lookups.
If transform spec is not present and no context value is passed, we donot load any lookup.
This behavior can be overridden by supplying lookupLoadingMode and lookupsToLoad in the task context.
Changes:
- Remove deprecated `markAsUnused` parameter from `KillUnusedSegmentsTask`
- Allow `kill` task to use `REPLACE` lock when `useConcurrentLocks` is true
- Use `EXCLUSIVE` lock by default