Apache Druid: a high performance real-time analytics database.
Go to file
AmatyaAvadhanula d294404924
Kinesis ingestion with empty shards (#12792)
Kinesis ingestion requires all shards to have at least 1 record at the required position in druid.
Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic.

Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively.
These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard.

If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset.

These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number.

However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation:

The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering)

Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read.
However, the equality check is exclusive when dealing with UNREAD tokens.
2022-08-05 22:38:58 +05:30
.github Lock hadoop dependencies to 2.8.5 (#11583) 2021-08-12 15:16:47 +05:30
.idea Poison stupid pool (#12646) 2022-07-03 14:36:22 -07:00
benchmarks add NumericRangeIndex interface and BoundFilter support (#12830) 2022-07-29 18:58:49 -07:00
cloud Tidy up construction of the Guice Injectors (#12816) 2022-08-04 00:05:07 -07:00
codestyle Frame processing and channels. (#12848) 2022-08-04 21:29:04 -07:00
core Frame processing and channels. (#12848) 2022-08-04 21:29:04 -07:00
dev Add some debug tips for debugging peons (#12697) 2022-07-09 01:47:25 -07:00
distribution Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
docs Kinesis ingestion with empty shards (#12792) 2022-08-05 22:38:58 +05:30
examples Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
extendedset Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
extensions-contrib Upgrade prometheus version, add more labels to PrometheusEmitter (#12769) 2022-07-15 14:43:12 +05:30
extensions-core Kinesis ingestion with empty shards (#12792) 2022-08-05 22:38:58 +05:30
helm/druid replaces hard-coded probe delays with helm values (#12805) 2022-07-26 14:04:06 +05:30
hll Free ByteBuffers in tests and fix some bugs. (#12521) 2022-05-19 07:42:29 -07:00
hooks Git hooks should fail on errors; pass args to git hooks (#12322) 2022-03-10 09:07:50 +09:00
indexing-hadoop Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030) 2022-05-02 08:40:44 -05:00
indexing-service Kinesis ingestion with empty shards (#12792) 2022-08-05 22:38:58 +05:30
integration-tests Kinesis ingestion with empty shards (#12792) 2022-08-05 22:38:58 +05:30
licenses Blueprint 4 (#12391) 2022-04-04 10:34:22 -07:00
processing Frame processing and channels. (#12848) 2022-08-04 21:29:04 -07:00
publications De-incubation cleanup in code, docs, packaging (#9108) 2020-01-03 12:33:19 -05:00
server Add check for eternity time segment to SqlSegmentsMetadataQuery (#12844) 2022-08-04 22:33:08 -07:00
services Tidy up construction of the Guice Injectors (#12816) 2022-08-04 00:05:07 -07:00
sql Tidy up construction of the Guice Injectors (#12816) 2022-08-04 00:05:07 -07:00
web-console Kinesis ingestion with empty shards (#12792) 2022-08-05 22:38:58 +05:30
website Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
.asf.yaml Add .asf.yaml. (#9083) 2019-12-20 16:45:38 -08:00
.backportrc.json Add 0.18.0 to .backportrc.json to facilitate backport. (#9661) 2020-04-11 13:49:04 -07:00
.codecov.yml Use Codecov (#8388) 2019-08-28 08:49:30 -07:00
.dockerignore Add docker container for druid (#6896) 2019-02-08 12:12:28 +00:00
.gitignore Frame processing and channels. (#12848) 2022-08-04 21:29:04 -07:00
.lgtm.yml Suppress LGTM warnings about stack trace exposure (#9631) 2020-04-09 17:31:03 -07:00
.travis.yml Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
CONTRIBUTING.md Fix numbered list formatting in markdown. (#9664) 2020-04-21 20:18:12 -07:00
LABELS Add plain text README.txt, use relative link from README.md to build.md (#7611) 2019-05-09 21:29:26 -07:00
LICENSE support Aliyun OSS service as deep storage (#9898) 2020-07-01 22:20:53 -07:00
NOTICE license.yaml fixes for code introduced related to AWS RDS token based password provider in PR #9518 (#10885) 2021-03-10 12:59:25 -08:00
README.md Readme - link fix to build guide (#12849) 2022-08-03 19:32:37 +08:00
README.template De-incubation cleanup in code, docs, packaging (#9108) 2020-01-03 12:33:19 -05:00
check_test_suite.py run web-console e2e tests for java changes too (#12776) 2022-07-13 16:12:57 -07:00
check_test_suite_test.py run web-console e2e tests for java changes too (#12776) 2022-07-13 16:12:57 -07:00
licenses.yaml Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
owasp-dependency-check-suppressions.xml Suppress some false alarm CVEs (#12812) 2022-07-22 22:27:31 +05:30
pom.xml Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
upload.sh Adding licenses and enable apache-rat-plugin. (#6215) 2018-09-18 08:39:26 -07:00

README.md

Slack Build Status Language grade: Java Coverage Status Docker Helm


Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download


Apache Druid

Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.

Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.

Getting started

You can get started with Druid with our local or Docker quickstart.

Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).

Load data

data loader Kafka

Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.

Manage the cluster

management

Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.

Issue queries

query view combo

Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs in this repository and submit a pull request.

Community

Community support is available on the druid-user mailing list, which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

Chat with Druid committers and users in real-time on the Apache Druid Slack channel. Please use this invitation link to join and invite others.

Building from source

Please note that JDK 8 or JDK 11 is required to build Druid.

See the latest build guide for instructions on building Apache Druid from source.

Contributing

Please follow the community guidelines for contributing.

For instructions on setting up IntelliJ dev/intellij-setup.md

License

Apache License, Version 2.0