druid/extensions-core
AmatyaAvadhanula d294404924
Kinesis ingestion with empty shards (#12792)
Kinesis ingestion requires all shards to have at least 1 record at the required position in druid.
Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic.

Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively.
These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard.

If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset.

These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number.

However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation:

The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering)

Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read.
However, the equality check is exclusive when dealing with UNREAD tokens.
2022-08-05 22:38:58 +05:30
..
avro-extensions Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
azure-extensions Able to filter Cloud objects with glob notation. (#12659) 2022-06-24 11:40:08 +05:30
datasketches Use datasketches version 3.2.0 (#12509) 2022-05-13 11:28:15 +05:30
druid-aws-rds-extensions Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
druid-basic-security Improve build performance of modules (#12486) 2022-05-01 22:43:11 +08:00
druid-bloom-filter Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
druid-kerberos Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
druid-pac4j Mark specific nimbus.lang.tag.version. (#12751) 2022-07-07 09:58:35 +05:30
druid-ranger-security Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
ec2-extensions Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
google-extensions Able to filter Cloud objects with glob notation. (#12659) 2022-06-24 11:40:08 +05:30
hdfs-storage Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030) 2022-05-02 08:40:44 -05:00
histogram Free ByteBuffers in tests and fix some bugs. (#12521) 2022-05-19 07:42:29 -07:00
kafka-extraction-namespace Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
kafka-indexing-service Kinesis ingestion with empty shards (#12792) 2022-08-05 22:38:58 +05:30
kinesis-indexing-service Kinesis ingestion with empty shards (#12792) 2022-08-05 22:38:58 +05:30
kubernetes-extensions Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
lookups-cached-global Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
lookups-cached-single Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
mysql-metadata-storage deps: upgrade mysql-connector-java to v5.1.49 (#12704) 2022-06-29 23:15:46 +08:00
orc-extensions Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
parquet-extensions Perform lazy initialization of parquet extensions module (#12827) 2022-08-02 13:41:12 +05:30
postgresql-metadata-storage Optimize overlord GET /tasks memory usage (#12404) 2022-06-16 22:30:37 +05:30
protobuf-extensions Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
s3-extensions Tidy up construction of the Guice Injectors (#12816) 2022-08-04 00:05:07 -07:00
simple-client-sslcontext Tidy up construction of the Guice Injectors (#12816) 2022-08-04 00:05:07 -07:00
stats Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
testing-tools Tidy up construction of the Guice Injectors (#12816) 2022-08-04 00:05:07 -07:00