* Add update() in TestDerbyConnectorRule
* use common function.
* fixup build.
* fixup indentations.
* Revert "fixup indentations."
This reverts commit a9d6b73e79.
* fixup indentataions.
* Remove Thread.sleep() by directly calling updateUsedStatusLastUpdated.
* another indentation slip.
* Move common segment initialization to setup().
* Fix for checkstyle.
* review comments: indentation fixes, type.
* Wrapper class for Segments table
* Add KillUnusedSegmentsTaskBuilder in test class
* Remove javadocs for self-explanatory methods.
Changes:
- Remove deprecated `DruidException` (old one) and `EntryExistsException`
- Use newly added comprehensive `DruidException` instead
- Update error message in `SqlMetadataStorageActionHandler` when max packet limit is violated.
- Factor out common code from several faults into `BaseFault`.
- Slightly update javadoc in `DruidException` to render it correctly
- Remove unused classes `SegmentToMove`, `SegmentToDrop`
- Move `ServletResourceUtils` from module `druid-processing` to `druid-server`
- Add utility method to build error Response from `DruidException`.
Changes
- Replace usages of `UnknownSegmentIdsException` with `DruidException`
- Add method `SqlMetadataQuery.retrieveSegments`
- Add new field `used` to `DataSegmentPlus`
* Kill task version support.
Kill tasks by default kill all versions of unused segments in the specified
interval. Users wanting to delete specific versions (for example, data compliance
reasons) and keep rest of the versions can specify the optional version in the
kill task payload.
* Formatting changes.
* Multi version tests in RetrieveSegmentsActionsTest
Sort of like method-level parameterized tests.
* Address review feedback
* Accept a list of versions instead of a single version.
Support multiple versions.
* Tests for multiple versions.
* Update docs
* Cleanup
* Address review comments.
Retain the old interface method and make it default and route it to
the method with nullable versions variant. Update usages to use the
default method where versions doesn't matter.
* Remove versions from retreive used segments action.
* Some updates.
* Apply suggestions from code review
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* /s/actual/observed/g
* minor test cleanup
* WIP: Test fixes and updates. Also add test for kill by version with used load spec.
Checkpoint.
---------
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Changes:
- Use an actual SqlSegmentsMetadataManager instead of TestSqlSegmentsMetadataManager
- Simplify TestSegmentsMetadataManager
- Add a test for large interval segments.
Changes:
Improve `SqlSegmentsMetadataManager`
- Break the loop in `populateUsedStatusLastUpdated` before going to sleep if there are no more segments to update
- Add comments and clean up logs
Refactor `SqlSegmentsMetadataManagerTest`
- Merge `SqlSegmentsMetadataManagerEmptyTest` into this test
- Add method `testPollEmpty`
- Shave a few seconds off of the tests by reducing poll duration
- Simplify creation of test segments
- Some renames here and there
- Remove unused methods
- Move `TestDerbyConnector.allowLastUsedFlagToBeNull` to this class
Other minor changes
- Add javadoc to `NoneShardSpec`
- Use lambda in `SqlSegmentMetadataPublisher`
While converting Sequence<ScanResultValue> to Sequence<Frames>, when maxSubqueryBytes is enabled, we batch the results to prevent creating a single frame per ScanResultValue. Batching requires peeking into the actual value, and checking if the row signature of the scan result’s value matches that of the previous value.
Since we can do this indefinitely (in the worst case all of them have the same signature), we keep fetching them and accumulating them in a list (on the heap). We don’t really know how much to batch before we actually write the value as frames.
The PR modifies the batching logic to not accumulate the results in an intermediary list
BaseNodeRoleWatcher counts down cacheInitialized after a timeout, but also sets some flag that it was a timed-out initialization. and call nodeViewInitializationTimedOut (new method on listeners) instead of nodeViewInitialized. Then listeners can do what is most appropriate with this information.
* Move retries into DataSegmentPusher implementations.
The individual implementations know better when they should and should
not retry. They can also generate better error messages.
The inspiration for this patch was a situation where EntityTooLarge was
generated by the S3DataSegmentPusher, and retried uselessly by the
retry harness in PartialSegmentMergeTask.
* Fix missing var.
* Adjust imports.
* Tests, comments, style.
* Remove unused import.
Updated the Direct Druid Client so as to make Connection Count Server Selector Strategy work more efficiently.
If creating connection to a node is slow, then that slowness wouldn't be accounted for if we count the open connections after sending the request. So we increment the counter and then send the request.
Currently, while reading results from realtime tasks, requests are sent on a segment level. This is slightly wasteful, as when contacting a data servers, it is possible to transfer results for all segments which it is hosting, instead of only one segment at a time.
One change this PR makes is to group the segments on the basis of servers. This reduces the number of queries to data servers made. Since we don't have access to the number of rows for realtime segments, the grouping is done with a fixed estimated number of rows for each realtime segment.
* All segments stored in the same batch have the same created_date entry.
In the absence of a group_id column, this metadata would allow us to easily
reason about and troubleshoot ingestion-related issues.
* Rename metric name and code references to eligibleUnusedSegments.
Address review comment from https://github.com/apache/druid/pull/15941#discussion_r1503631992
* Kill duty and test improvements.
Initial commit with:
- Bug fixes - auto-kill can throw NPE when there are no datasources present and defaults mismatch.
- Add new stat for candidate segment intervals killed.
- Move a couple of debug logs to info logs for improved visibility (should only log once per kill period).
- Remove redundant checks for code readability.
- Updated tests from using mocks (also the mocks weren't using last updated timestamp) and
add more test coverage for different config parameters.
- Add a couple of unit tests that are ignored for the eternity case to prove that
the kill duty doesn't clean up segments with ALL grain or that end in DateTimes.MAX.
- Migrate Druid exception from user to operator persona.
* Address review comments.
* Remove unused methods.
* fix up format specifier and validate bad config tests.
* Consolidate the helpers a bit more and add another test.
* Update test names. Add javadoc placeholders for slightly involved tests.
* Add docs for metric kill/candidateUnusedSegments/count.
Also, rename to disambiguate.
* Comments.
* Apply logging suggestions from code review
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* Review comments
- Clarify docs on eligibility.
- Add test for multiple segments in the same interval. Clarify comment.
- Remove log line from test.
- Remove lastUpdatedDate = now.plus(10) from test.
* minor cleanup.
* Clarify javadocs for getUnusedSegmentIntervals().
---------
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
The code in the groupBy engine and the topN engine assume that the dimensions are comparable and can call dimA.compareTo(dimB) to sort the dimensions and group them together.
This works well for the primitive dimensions, because they are Comparable, however falls apart when the dimensions can be arrays (or in future scenarios complex columns). In cases when the dimensions are not comparable, Druid resorts to having a wrapper type ComparableStringArray and ComparableList, which is a Comparable, based on the list comparator.
During ingestion, incremental segments are created in memory for the different time chunks and persisted to disk when certain thresholds are reached (max number of rows, max memory, incremental persist period etc). In the case where there are a lot of dimension and metrics (1000+) it was observed that the creation/serialization of incremental segment file format for persistence and persisting the file took a while and it was blocking ingestion of new data. This affected the real-time ingestion. This serialization and persistence can be parallelized across the different time chunks. This update aims to do that.
The patch adds a simple configuration parameter to the ingestion tuning configuration to specify number of persistence threads. The default value is 1 if it not specified which makes it the same as it is today.
Proposal #13469
Original PR #14024
A new method is being added in QueryLifecycle class to authorise a query based on authentication result.
This method is required since we authenticate the query by intercepting it in the grpc extension and pass down the authentication result.
### Description
The unusedSegment api response was extended to include the original DataSegment object with the creation time and last used update time added to it. A new object `DataSegmentPlus` was created for this purpose, and the metadata queries used were updated as needed.
example response:
```
[
{
"dataSegment": {
"dataSource": "inline_data",
"interval": "2023-01-02T00:00:00.000Z/2023-01-03T00:00:00.000Z",
"version": "2024-01-25T16:06:42.835Z",
"loadSpec": {
"type": "local",
"path": "/Users/zachsherman/projects/opensrc-druid/distribution/target/apache-druid-30.0.0-SNAPSHOT/var/druid/segments/inline_data/2023-01-02T00:00:00.000Z_2023-01-03T00:00:00.000Z/2024-01-25T16:06:42.835Z/0/index/"
},
"dimensions": "str_dim,double_measure1,double_measure2",
"metrics": "",
"shardSpec": {
"type": "numbered",
"partitionNum": 0,
"partitions": 1
},
"binaryVersion": 9,
"size": 1402,
"identifier": "inline_data_2023-01-02T00:00:00.000Z_2023-01-03T00:00:00.000Z_2024-01-25T16:06:42.835Z"
},
"createdDate": "2024-01-25T16:06:44.196Z",
"usedStatusLastUpdatedDate": "2024-01-25T16:07:34.909Z"
}
]
```
* Merge hydrant runners flatly for realtime queries.
Prior to this patch, we have two layers of mergeRunners for realtime
queries: one for each Sink (a logical segment) and one across all
Sinks. This is done because to keep metrics and results grouped by Sink,
given that each FireHydrant within a Sink has its own separate storage
adapter.
However, it costs extra memory usage due to the extra layer of
materialization. This is especially pronounced for groupBy queries,
which only use their merge buffers at the top layer of merging. The
lower layer of merging materializes ResultRows directly into the heap,
which can cause heap exhaustion if there are enough ResultRows.
This patch changes to a single layer of merging when bySegment: false,
just like Historicals. To accommodate that, segment metrics like
query/segment/time are now per-FireHydrant instead of per-Sink.
Two layers of merging are retained when bySegment: true. This isn't
common, because it's typically only used when segment level caching
is enabled on the Broker, which is off by default.
* Use SinkQueryRunners.
* Remove unused method.
* Kill tasks should honor the buffer period of unused segments.
- The coordinator duty KillUnusedSegments determines an umbrella interval
for each datasource to determine the kill interval. There can be multiple unused
segments in an umbrella interval with different used_status_last_updated timestamps.
For example, consider an unused segment that is 30 days old and one that is 1 hour old. Currently
the kill task after the 30-day mark would kill both the unused segments and not retain the 1-hour
old one.
- However, when a kill task is instantiated with this umbrella interval, it’d kill
all the unused segments regardless of the last updated timestamp. We need kill
tasks and RetrieveUnusedSegmentsAction to honor the bufferPeriod to avoid killing
unused segments in the kill interval prematurely.
* Clarify default behavior in docs.
* test comments
* fix canDutyRun()
* small updates.
* checkstyle
* forbidden api fix
* doc fix, unused import, codeql scan error, and cleanup logs.
* Address review comments
* Rename maxUsedFlagLastUpdatedTime to maxUsedStatusLastUpdatedTime
This is consistent with the column name `used_status_last_updated`.
* Apply suggestions from code review
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* Make period Duration type
* Remove older variants of runKilLTask() in OverlordClient interface
* Test can now run without waiting for canDutyRun().
* Remove previous variants of retrieveUnusedSegments from internal metadata storage coordinator interface.
Removes the following interface methods in favor of a new method added:
- retrieveUnusedSegmentsForInterval(String, Interval)
- retrieveUnusedSegmentsForInterval(String, Interval, Integer)
* Chain stream operations
* cleanup
* Pass in the lastUpdatedTime to markUnused test function and remove sleep.
---------
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* Reverse, pull up lookups in the SQL planner.
Adds two new rules:
1) ReverseLookupRule, which eliminates calls to LOOKUP by doing
reverse lookups.
2) AggregatePullUpLookupRule, which pulls up calls to LOOKUP above
GROUP BY, when the lookup is injective.
Adds configs `sqlReverseLookup` and `sqlPullUpLookup` to control whether
these rules fire. Both are enabled by default.
To minimize the chance of performance problems due to many keys mapping to
the same value, ReverseLookupRule refrains from reversing a lookup if there
are more keys than `inSubQueryThreshold`. The rationale for using this setting
is that reversal works by generating an IN, and the `inSubQueryThreshold`
describes the largest IN the user wants the planner to create.
* Add additional line.
* Style.
* Remove commented-out lines.
* Fix tests.
* Add test.
* Fix doc link.
* Fix docs.
* Add one more test.
* Fix tests.
* Logic, test updates.
* - Make FilterDecomposeConcatRule more flexible.
- Make CalciteRulesManager apply reduction rules til fixpoint.
* Additional tests, simplify code.
Currently, If 2 tasks are consuming from the same partitions, try to publish the segment and update the metadata, the second task can fail because the end offset stored in the metadata store doesn't match with the start offset of the second task. We can fix this by retrying instead of failing.
AFAIK apart from the above issue, the metadata mismatch can happen in 2 scenarios:
- when we update the input topic name for the data source
- when we run 2 replicas of ingestion tasks(1 replica will publish and 1 will fail as the first replica has already updated the metadata).
Implemented the comparable function to compare the last committed end offset and new Sequence start offset. And return a specific error msg for this.
Add retry logic on indexers to retry for this specific error msg.
Updated the existing test case.
The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This task encompasses addressing both realtime and finalized segments.
This modification specifically addresses the issue with realtime segments. Tasks will now routinely communicate the schema for realtime segments during the segment announcement process. The Coordinator will identify the schema alongside the segment announcement and subsequently update the schema for realtime segments in the metadata cache.
This PR enables the flag by default to queue excess query requests in the jetty queue. Still keeping the flag so that it can be turned off if necessary. But the flag will be removed in the future.
* Allow overwriting ServerConnector accept queue size
* Use a single config
* Fix spacing
* fix spacing
* fixed value
* read value from environment
* fix spacing
* Unpack value before reading
* check somaxconn on linux only
* Reverse lookup fixes and enhancements.
1) Add a "mayIncludeUnknown" parameter to DimFilter#optimize. This is important
because otherwise the reverse-lookup optimization is done improperly when
the "in" filter appears under a "not", and the lookup extractionFn may return
null for some possible values of the filtered column. The "includeUnknown" test
cases in InDimFilterTest illustrate the difference in behavior.
2) Enhance InDimFilter#optimizeLookup to handle "mayIncludeUnknown", and to be able
to do a reverse lookup in a wider variety of cases.
3) Make "unapply" protected in LookupExtractor, and move callers to "unapplyAll".
The main reason is that MapLookupExtractor, a common implementation, lacks a
reverse mapping and therefore does a scan of the map for each call to "unapply".
For performance sake these calls need to be batched.
* Remove optimize call from BloomDimFilter.
* Follow the law.
* Fix tests.
* Fix imports.
* Switch function.
* Fix tests.
* More tests.
* Minor fixes
* Update docs/development/extensions-contrib/prometheus.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>