* topn with granularity regression fixes
changes:
* fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from #16533
* fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from #16533
* move defensive check outside of loop
* more test
* extra layer of safety
* move check outside of loop
* fix spelling
* add query context parameter to allow using pooled algorithm for topN when multi-passes is required even wihen query granularity is not all
* add comment, revert IT context changes and add new context flag
Changes
--------
- Simplify the arguments of IndexerMetadataStorageCoordinator.allocatePendingSegment
- Remove field SegmentCreateRequest.upgradedFromSegmentId as it was always null
- Miscellaneous cleanup
* Docs: improve druid.coordinator.kill.on description
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update description for durationToRetain
* Update docs/configuration/index.md
* Update after review
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Description
Migrate the initial 3 sets of SQL tests to quidem. These 3 sets cover numeric, string, and datetime scalar functions.
These tests use the existing kttm dataset. They aim to exercise SQL queries in a more comprehensive way:
Each scalar function is exercised in 3 different query shapes:
simple query
subquery
group by query
Each query covers all operators in its predicates.
All queries are select count(*) queries. They are designed to all return the same result for easy maintenance and debugging.
These are the initial sets of tests. More tests to cover the rest of the scalar and aggregation functions will come later.
* [Docs] Improve Bloom filter topic
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update spelling file
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* snapshot column capabilities for realtime cursors
changes:
* adds `CursorBuildSpec.getPhysicalColumns()` to allow specifying the set of required physical columns from a segment. if null, all columns are assumed to be required (e.g. full scan)
* `IncrementalIndexCursorFactory`/`IncrementalIndexCursorHolder` uses the physical columns from the cursor build spec to know which set of dimensions to 'snapshot' the capabilities for, allowing expression selectors on realtime queries to no longer be required to treat selectors from `StringDimensionIndexer` as multi-valued unless they truly are multi-valued. this fixes several bugs with expressions on realtime queries that change a value from `StringDimensionIndexer` to some type other than string, which would often result in a single element array from the column being handled as multi-valued
* `StringDimensionIndexer.setSparseIndexed()` now adds the default value to the dictionary when set
* `StringDimensionIndexer` column value selectors now always report that they are dictionary encoded, and that name lookup is possible in advance on their selectors (since set sparse adds the null value so the cardinality is correct)
* fixed a mistake that expression selectors for realtime queries with no null values could not use dictionary encoded selectors
* hmm
* test changes
* cleanup
* add test coverage
* fix test
* fixes
* cleanup
This PR contains non-functional / refactoring changes of the following classes in the sql module:
1. Move ExplainPlan and ExplainAttributes fromsql/src/main/java/org/apache/druid/sql/http to processing/src/main/java/org/apache/druid/query/explain
2. Move sql/src/main/java/org/apache/druid/sql/SqlTaskStatus.java -> processing/src/main/java/org/apache/druid/query/http/SqlTaskStatus.java
3. Add a new class processing/src/main/java/org/apache/druid/query/http/ClientSqlQuery.java that is effectively a thin POJO version of SqlQuery in the sql module but without any of the Calcite functionality and business logic.
4. Move BrokerClient, BrokerClientImpl and Broker classes from sql/src/main/java/org/apache/druid/sql/client to server/src/main/java/org/apache/druid/client/broker.
5. Remove BrokerServiceModule that provided the BrokerClient. The functionality is now contained in ServiceClientModule in the server package itself which provides all the clients as well.
This is done so that we can reuse the said classes in #17353 without brining in Calcite and other dependencies to the Overlord.
* plan consistently with either UnionDataSource or UnionQuery for decoupled mode
* expose errors
* move decoupled related setting from PlannerConfig to QueryContexts
This method was missing some required synchronization. This patch also
adds GuardedBy annotations to historicalServers and realtimeServers, which
would have caught it.
This patch supports sorting segments by non-time columns (added in #16849) to MSQ compaction.
Specifically, if `forceSegmentSortByTime` is set in the data schema, either via the user-supplied
compaction config or in the inferred schema, the following steps are taken:
- Skip adding `__time` explicitly as the first column to the dimension schema since it already comes
as part of the schema
- Ensure column mappings propagate `__time` in the order specified by the schema
- Set `forceSegmentSortByTime` in the MSQ context.
Changes
---------
- Add Overlord runtime property `druid.indexer.tasklock.batchAllocationReduceMetadataIO`
- Setting this flag to true (default value) allows the Overlord to fetch only necessary segment
payloads during segment allocation
- Setting this flag to false restores original segment allocation behaviour