Summary of changes
---------------------
- Add `OverlordDataSourcesResource` with APIs to mark segments used/unused
- Add corresponding methods to `OverlordClient`
- Deprecate Coordinator APIs to update segments
- Use `OverlordClient` in `DataSourcesResource` so that Coordinator APIs internally
call the corresponding Overlord APIs
- If the API call fails, fall back to updating the metadata store directly
- Audit these actions only on the Overlord
Other minor changes
---------------------
- Do not perform null check on `OverlordClient` on the coordinator side `DataSourcesResource`.
`OverlordClient` is always non-null in production.
- Add new tests, fix existing ones
- Complete the implementation of `TestSegmentsMetadataManager`
New Overlord APIs
------------------
- Mark all segments of a datasource as unused:
`POST /druid/indexer/v1/datasources/{dataSourceName}`
- Mark all (non-overshadowed) segments of a datasource as used:
`DELETE /druid/indexer/v1/datasources/{dataSourceName}`
- Mark multiple segments as used
`POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed`
- Mark multiple (non-overshadowed) segments as unused
`POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused`
- Mark a single segment as used:
`POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`
- Mark a single segment as unused:
`DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`
* Remove duplicate context from Request Logging
* Update Unit Tests to read context
---------
Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
* Emit aggregate segment processing metrics per sink instead of firehydrant
* add docs
* minor change
* checkstyle
* Fix DefaultQueryMetricsTest
* Minor changes in SinkMetricsEmittingQueryRunner
* spotbugs
* Address review comments
* Use ImmutableSet and ImmutableMap
* Create a helper class for saving state of StubServiceEmitter
* Add SinkQuerySegmentWalkerBenchmark
* Create SegmentMetrics class for tracking segment metrics
---------
Co-authored-by: Akshat Jain <akjn11@gmail.com>
- Add new method `Supervisor.stopAsync`
- Implement `SeekableStreamSupervisor.stopAsync()` to use a shutdown executor
- Call `stopAsync` from `SupervisorManager`
* topn with granularity regression fixes
changes:
* fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from #16533
* fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from #16533
* move defensive check outside of loop
* more test
* extra layer of safety
* move check outside of loop
* fix spelling
* add query context parameter to allow using pooled algorithm for topN when multi-passes is required even wihen query granularity is not all
* add comment, revert IT context changes and add new context flag
Changes
--------
- Simplify the arguments of IndexerMetadataStorageCoordinator.allocatePendingSegment
- Remove field SegmentCreateRequest.upgradedFromSegmentId as it was always null
- Miscellaneous cleanup
* Docs: improve druid.coordinator.kill.on description
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update description for durationToRetain
* Update docs/configuration/index.md
* Update after review
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Description
Migrate the initial 3 sets of SQL tests to quidem. These 3 sets cover numeric, string, and datetime scalar functions.
These tests use the existing kttm dataset. They aim to exercise SQL queries in a more comprehensive way:
Each scalar function is exercised in 3 different query shapes:
simple query
subquery
group by query
Each query covers all operators in its predicates.
All queries are select count(*) queries. They are designed to all return the same result for easy maintenance and debugging.
These are the initial sets of tests. More tests to cover the rest of the scalar and aggregation functions will come later.
* [Docs] Improve Bloom filter topic
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update spelling file
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* snapshot column capabilities for realtime cursors
changes:
* adds `CursorBuildSpec.getPhysicalColumns()` to allow specifying the set of required physical columns from a segment. if null, all columns are assumed to be required (e.g. full scan)
* `IncrementalIndexCursorFactory`/`IncrementalIndexCursorHolder` uses the physical columns from the cursor build spec to know which set of dimensions to 'snapshot' the capabilities for, allowing expression selectors on realtime queries to no longer be required to treat selectors from `StringDimensionIndexer` as multi-valued unless they truly are multi-valued. this fixes several bugs with expressions on realtime queries that change a value from `StringDimensionIndexer` to some type other than string, which would often result in a single element array from the column being handled as multi-valued
* `StringDimensionIndexer.setSparseIndexed()` now adds the default value to the dictionary when set
* `StringDimensionIndexer` column value selectors now always report that they are dictionary encoded, and that name lookup is possible in advance on their selectors (since set sparse adds the null value so the cardinality is correct)
* fixed a mistake that expression selectors for realtime queries with no null values could not use dictionary encoded selectors
* hmm
* test changes
* cleanup
* add test coverage
* fix test
* fixes
* cleanup