In this PR, I have gotten rid of multiTopic parameter and instead added a topicPattern parameter. Kafka supervisor will pass topicPattern or topic as the stream name to the core ingestion engine. There is validation to ensure that only one of topic or topicPattern will be set. This new setting is easier to understand than overloading the topic field that earlier could be interpreted differently depending on the value of some other field.
This PR adds support to read from multiple Kafka topics in the same supervisor. A multi-topic ingestion can be useful in scenarios where a cluster admin has no control over input streams. Different teams in an org may create different input topics that they can write the data to. However, the cluster admin wants all this data to be queryable in one data source.
* Update to Calcite 1.35.0
* Update from.ftl for Calcite 1.35.0.
* Fixed tests in Calcite upgrade by doing the following:
1. Added a new rule, CoreRules.PROJECT_FILTER_TRANSPOSE_WHOLE_PROJECT_EXPRESSIONS, to Base rules
2. Refactored the CorrelateUnnestRule
3. Updated CorrelateUnnestRel accordingly
4. Fixed a case with selector filters on the left where Calcite was eliding the virtual column
5. Additional test cases for fixes in 2,3,4
6. Update to StringListAggregator to fail a query if separators are not propagated appropriately
* Refactored for testcases to pass after the upgrade, introduced 2 new data sources for handling filters and select projects
* Added a literalSqlAggregator as the upgraded Calcite involved changes to subquery remove rule. This corrected plans for 2 queries with joins and subqueries by replacing an useless literal dimension with a post agg. Additionally a test with COUNT DISTINCT and FILTER which was failing with Calcite 1.21 is added here which passes with 1.35
* Updated to latest avatica and updated code as SqlUnknownTimeStamp is now used in Calcite which needs to be resolved to a timestamp literal
* Added a wrapper segment ref to use for unnest and filter segment reference
### Description
Added the following metrics, which are calculated from the `KillUnusedSegments` coordinatorDuty
`"killTask/availableSlot/count"`: calculates the number remaining task slots available for auto kill
`"killTask/maxSlot/count"`: calculates the maximum number of tasks available for auto kill
`"killTask/task/count"`: calculates the number of tasks submitted by auto kill.
#### Release note
NEW: metrics added for auto kill
`"killTask/availableSlot/count"`: calculates the number remaining task slots available for auto kill
`"killTask/maxSlot/count"`: calculates the maximum number of tasks available for auto kill
`"killTask/task/count"`: calculates the number of tasks submitted by auto kill.
The Azure connector is introduced and MSQ's fault tolerance and durable storage can now be used with Microsoft Azure's blob storage. Also, the results of newly introduced queries from deep storage can now store and fetch the results from Azure's blob storage.
### Description
Previously, the `KillUnusedSegments` coordinator duty, in charge of periodically deleting unused segments, could spawn an unlimited number of kill tasks for unused segments. This change adds 2 new coordinator dynamic configs that can be used to control the limit of tasks spawned by this coordinator duty
`killTaskSlotRatio`: Ratio of total available task slots, including autoscaling if applicable that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty. Default is 1, which allows all available tasks to be used, which is the existing behavior
`maxKillTaskSlots`: Maximum number of tasks that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty. Default is INT.MAX, which essentially allows for unbounded number of tasks, which is the existing behavior.
Realize that we can effectively get away with just the one `killTaskSlotRatio`, but following similarly to the compaction config, which has similar properties; I thought it was good to have some control of the upper limit regardless of ratio provided.
#### Release note
NEW: `killTaskSlotRatio` and `maxKillTaskSlots` coordinator dynamic config properties added that allow control of task resource usage spawned by `KillUnusedSegments` coordinator task (auto kill)
### Description
Previously, the `maxSegments` configured for auto kill could be ignored if an interval of data for a given datasource had more than this number of unused segments, causing the kill task spawned with the task of deleting unused segments in that given interval of data to delete more than the `maxSegments` configured. Now each kill task spawned by the auto kill coordinator duty, will kill at most `limit` segments. This is done by adding a new config property to the `KillUnusedSegmentTask` which allows users to specify this limit.
* Remove chatAsync parameter, so chat is always async.
chatAsync has been made default in Druid 26. I have seen good
battle-testing of it in production, and am comfortable removing the
older sync client.
This was the last remaining usage of IndexTaskClient, so this patch
deletes all that stuff too.
* Remove unthrown exception.
* Remove unthrown exception.
* No more TimeoutException.
split KillUnusedSegmentsTask to smaller batches
Processing in smaller chunks allows the task execution to yield the TaskLockbox lock,
which allows the overlord to continue being responsive to other tasks and users while
this particular kill task is executing.
* introduce KillUnusedSegmentsTask batchSize parameter to control size of batching
* provide an explanation for kill task batchSize parameter
* add logging details for kill batch progress
* Add ingest/input/bytes metric and Kafka consumer metrics.
New metrics:
1) ingest/input/bytes. Equivalent to processedBytes in the task reports.
2) kafka/consumer/bytesConsumed: Equivalent to the Kafka consumer
metric "bytes-consumed-total". Only emitted for Kafka tasks.
3) kafka/consumer/recordsConsumed: Equivalent to the Kafka consumer
metric "records-consumed-total". Only emitted for Kafka tasks.
* Fix anchor.
* Fix KafkaConsumerMonitor.
* Interface updates.
* Doc changes.
* Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTask.java
Co-authored-by: Benedict Jin <asdf2014@apache.org>
---------
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Add EARLIEST aggregator merge strategy.
- More unit tests.
- Include the aggregators analysis type by default in tests.
* Docs.
* Some comments and a test
* Collapse into individual code blocks.
This adds a new contrib extension: druid-iceberg-extensions which can be used to ingest data stored in Apache Iceberg format. It adds a new input source of type iceberg that connects to a catalog and retrieves the data files associated with an iceberg table and provides these data file paths to either an S3 or HDFS input source depending on the warehouse location.
Two important dependencies associated with Apache Iceberg tables are:
Catalog : This extension supports reading from either a Hive Metastore catalog or a Local file-based catalog. Support for AWS Glue is not available yet.
Warehouse : This extension supports reading data files from either HDFS or S3. Adapters for other cloud object locations should be easy to add by extending the AbstractInputSourceAdapter.
* Change default handoffConditionTimeout to 15 minutes.
Most of the time, when handoff is taking this long, it's because something
is preventing Historicals from loading new data. In this case, we have
two choices:
1) Stop making progress on ingestion, wait for Historicals to load stuff,
and keep the waiting-for-handoff segments available on realtime tasks.
(handoffConditionTimeout = 0, the current default)
2) Continue making progress on ingestion, by exiting the realtime tasks
that were waiting for handoff. Once the Historicals get their act
together, the segments will be loaded, as they are still there on
deep storage. They will just not be continuously available.
(handoffConditionTimeout > 0)
I believe most users would prefer [2], because [1] risks ingestion falling
behind the stream, which causes many other problems. It can cause data loss
if the stream ages-out data before we have a chance to ingest it.
Due to the way tuningConfigs are serialized -- defaults are baked into the
serialized form that is written to the database -- this default change will
not change anyone's existing supervisors. It will take effect for newly
created supervisors.
* Fix tests.
* Update docs/development/extensions-core/kafka-supervisor-reference.md
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Update docs/development/extensions-core/kinesis-ingestion.md
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
---------
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Add aggregatorMergeStrategy property to SegmentMetadaQuery.
- Adds a new property aggregatorMergeStrategy to segmentMetadata query.
aggregatorMergeStrategy currently supports three types of merge strategies -
the legacy strict and lenient strategies, and the new latest strategy.
- The latest strategy considers the latest aggregator from the latest segment
by time order when there's a conflict when merging aggregators from different
segments.
- Deprecate lenientAggregatorMerge property; The API validates that both the new
and old properties are not set, and returns an exception.
- When merging segments as part of segmentMetadata query, the segments have a more
elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to
the name format that segments usually contain. Previously it was simply "merged".
- Adjust unit tests to test the latest strategy, to assert the returned complete
SegmentAnalysis object instead of just the aggregators for completeness.
* Don't explicitly set strict strategy in tests
* Apply suggestions from code review
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Update docs/querying/segmentmetadataquery.md
* Apply suggestions from code review
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
---------
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Add ZooKeeper connection state alerts and metrics.
- New metric "zk/connected" is an indicator showing 1 when connected,
0 when disconnected.
- New metric "zk/disconnected/time" measures time spent disconnected.
- New alert when Curator connection state enters LOST or SUSPENDED.
* Use right GuardedBy.
* Test fixes, coverage.
* Adjustment.
* Fix tests.
* Fix ITs.
* Improved injection.
* Adjust metric name, add tests.
Cache is disabled for GroupByStrategyV2 on broker since the pr #3820 [groupBy v2: Results not fully merged when caching is enabled on the broker]. But we can enable the result-level cache on broker for GroupByStrategyV2 and keep the segment-level cache disabled.