druid

Apache Druid: a high performance real-time analytics database.

druid

Go to file

AmatyaAvadhanula d294404924 Kinesis ingestion with empty shards (#12792 ) Kinesis ingestion requires all shards to have at least 1 record at the required position in druid. Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic. Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively. These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard. If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset. These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number. However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation: The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering) Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read. However, the equality check is exclusive when dealing with UNREAD tokens.		2022-08-05 22:38:58 +05:30
.github	Lock hadoop dependencies to 2.8.5 (#11583 )	2021-08-12 15:16:47 +05:30
.idea	Poison stupid pool (#12646 )	2022-07-03 14:36:22 -07:00
benchmarks	add NumericRangeIndex interface and BoundFilter support (#12830 )	2022-07-29 18:58:49 -07:00
cloud	Tidy up construction of the Guice Injectors (#12816 )	2022-08-04 00:05:07 -07:00
codestyle	Frame processing and channels. (#12848 )	2022-08-04 21:29:04 -07:00
core	Frame processing and channels. (#12848 )	2022-08-04 21:29:04 -07:00
dev	Add some debug tips for debugging peons (#12697 )	2022-07-09 01:47:25 -07:00
distribution	Improved Java 17 support and Java runtime docs. (#12839 )	2022-08-03 23:16:05 -07:00
docs	Kinesis ingestion with empty shards (#12792 )	2022-08-05 22:38:58 +05:30
examples	Improved Java 17 support and Java runtime docs. (#12839 )	2022-08-03 23:16:05 -07:00
extendedset	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
extensions-contrib	Upgrade prometheus version, add more labels to PrometheusEmitter (#12769 )	2022-07-15 14:43:12 +05:30
extensions-core	Kinesis ingestion with empty shards (#12792 )	2022-08-05 22:38:58 +05:30
helm/druid	replaces hard-coded probe delays with helm values (#12805 )	2022-07-26 14:04:06 +05:30
hll	Free ByteBuffers in tests and fix some bugs. (#12521 )	2022-05-19 07:42:29 -07:00
hooks	Git hooks should fail on errors; pass args to git hooks (#12322 )	2022-03-10 09:07:50 +09:00
indexing-hadoop	Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030 )	2022-05-02 08:40:44 -05:00
indexing-service	Kinesis ingestion with empty shards (#12792 )	2022-08-05 22:38:58 +05:30
integration-tests	Kinesis ingestion with empty shards (#12792 )	2022-08-05 22:38:58 +05:30
licenses	Blueprint 4 (#12391 )	2022-04-04 10:34:22 -07:00
processing	Frame processing and channels. (#12848 )	2022-08-04 21:29:04 -07:00
publications	De-incubation cleanup in code, docs, packaging (#9108 )	2020-01-03 12:33:19 -05:00
server	Add check for eternity time segment to SqlSegmentsMetadataQuery (#12844 )	2022-08-04 22:33:08 -07:00
services	Tidy up construction of the Guice Injectors (#12816 )	2022-08-04 00:05:07 -07:00
sql	Tidy up construction of the Guice Injectors (#12816 )	2022-08-04 00:05:07 -07:00
web-console	Kinesis ingestion with empty shards (#12792 )	2022-08-05 22:38:58 +05:30
website	Improved Java 17 support and Java runtime docs. (#12839 )	2022-08-03 23:16:05 -07:00
.asf.yaml	Add .asf.yaml. (#9083 )	2019-12-20 16:45:38 -08:00
.backportrc.json	Add 0.18.0 to .backportrc.json to facilitate backport. (#9661 )	2020-04-11 13:49:04 -07:00
.codecov.yml	Use Codecov (#8388 )	2019-08-28 08:49:30 -07:00
.dockerignore	Add docker container for druid (#6896 )	2019-02-08 12:12:28 +00:00
.gitignore	Frame processing and channels. (#12848 )	2022-08-04 21:29:04 -07:00
.lgtm.yml	Suppress LGTM warnings about stack trace exposure (#9631 )	2020-04-09 17:31:03 -07:00
.travis.yml	Improved Java 17 support and Java runtime docs. (#12839 )	2022-08-03 23:16:05 -07:00
CONTRIBUTING.md	Fix numbered list formatting in markdown. (#9664 )	2020-04-21 20:18:12 -07:00
LABELS	Add plain text README.txt, use relative link from README.md to build.md (#7611 )	2019-05-09 21:29:26 -07:00
LICENSE	support Aliyun OSS service as deep storage (#9898 )	2020-07-01 22:20:53 -07:00
NOTICE	license.yaml fixes for code introduced related to AWS RDS token based password provider in PR #9518 (#10885 )	2021-03-10 12:59:25 -08:00
README.md	Readme - link fix to build guide (#12849 )	2022-08-03 19:32:37 +08:00
README.template	De-incubation cleanup in code, docs, packaging (#9108 )	2020-01-03 12:33:19 -05:00
check_test_suite.py	run web-console e2e tests for java changes too (#12776 )	2022-07-13 16:12:57 -07:00
check_test_suite_test.py	run web-console e2e tests for java changes too (#12776 )	2022-07-13 16:12:57 -07:00
licenses.yaml	Improved Java 17 support and Java runtime docs. (#12839 )	2022-08-03 23:16:05 -07:00
owasp-dependency-check-suppressions.xml	Suppress some false alarm CVEs (#12812 )	2022-07-22 22:27:31 +05:30
pom.xml	Improved Java 17 support and Java runtime docs. (#12839 )	2022-08-03 23:16:05 -07:00
upload.sh	Adding licenses and enable apache-rat-plugin. (#6215 )	2018-09-18 08:39:26 -07:00

README.md

Apache Druid

Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.

Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.

Getting started

You can get started with Druid with our local or Docker quickstart.

Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).

Load data

Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.

Manage the cluster

Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.

Issue queries

Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs in this repository and submit a pull request.

Community

Community support is available on the druid-user mailing list, which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

Chat with Druid committers and users in real-time on the Apache Druid Slack channel. Please use this invitation link to join and invite others.

Building from source

Please note that JDK 8 or JDK 11 is required to build Druid.

See the latest build guide for instructions on building Apache Druid from source.

Contributing

Please follow the community guidelines for contributing.

For instructions on setting up IntelliJ dev/intellij-setup.md

License

Apache License, Version 2.0