druid

Apache Druid: a high performance real-time analytics database.

druid

Go to file

Gian Merlino 5b6727f319 Enable vectorized virtual column processing by default. (#12520 ) In the majority of cases, this improves performance. There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing. IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue		2022-05-16 15:43:53 +05:30
.github	Lock hadoop dependencies to 2.8.5 (#11583 )	2021-08-12 15:16:47 +05:30
.idea	Use ExecutorService variables to assign ExecutorService Instances (#11373 )	2021-06-25 16:56:34 -07:00
benchmarks	Add IPAddress java library as dependency and migrate IPv4 functions to use the new library. (#11634 )	2022-05-11 22:06:20 -07:00
cloud	add aws-java-sdk-sts to aws-common classpath (#12482 )	2022-05-03 12:25:51 -07:00
codestyle	GroupBy: Reduce allocations by reusing entry and key holders. (#12474 )	2022-04-28 23:21:13 -07:00
core	Allow coordinator to be configured to kill segments in future (#10877 )	2022-05-11 07:35:15 +05:30
dev	Add git hooks that can run multiple scripts (#12300 )	2022-03-09 07:16:47 +09:00
distribution	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
docs	Enable vectorized virtual column processing by default. (#12520 )	2022-05-16 15:43:53 +05:30
examples	Fix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial (#12248 )	2022-04-11 20:28:09 +05:30
extendedset	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
extensions-contrib	Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation (#12484 )	2022-05-09 10:40:17 -07:00
extensions-core	Use datasketches version 3.2.0 (#12509 )	2022-05-13 11:28:15 +05:30
helm/druid	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
hll	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
hooks	Git hooks should fail on errors; pass args to git hooks (#12322 )	2022-03-10 09:07:50 +09:00
indexing-hadoop	Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030 )	2022-05-02 08:40:44 -05:00
indexing-service	Enforce console logging for peon process (#12067 )	2022-05-16 15:07:21 +05:30
integration-tests	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
licenses	Blueprint 4 (#12391 )	2022-04-04 10:34:22 -07:00
processing	Enable vectorized virtual column processing by default. (#12520 )	2022-05-16 15:43:53 +05:30
publications	De-incubation cleanup in code, docs, packaging (#9108 )	2020-01-03 12:33:19 -05:00
server	Allow coordinator to be configured to kill segments in future (#10877 )	2022-05-11 07:35:15 +05:30
services	remake column indexes and query processing of filters (#12388 )	2022-05-11 11:57:08 +05:30
sql	Add replace statement to sql parser (#12386 )	2022-05-13 10:56:40 +05:30
web-console	Add daily stats to console (#12329 )	2022-05-05 15:31:21 -07:00
website	Enforce console logging for peon process (#12067 )	2022-05-16 15:07:21 +05:30
.asf.yaml	Add .asf.yaml. (#9083 )	2019-12-20 16:45:38 -08:00
.backportrc.json	Add 0.18.0 to .backportrc.json to facilitate backport. (#9661 )	2020-04-11 13:49:04 -07:00
.codecov.yml	Use Codecov (#8388 )	2019-08-28 08:49:30 -07:00
.dockerignore	Add docker container for druid (#6896 )	2019-02-08 12:12:28 +00:00
.gitignore	Refactor ResponseContext (#11828 )	2021-12-06 17:03:12 -08:00
.lgtm.yml	Suppress LGTM warnings about stack trace exposure (#9631 )	2020-04-09 17:31:03 -07:00
.travis.yml	Enable Arm builds (#12451 )	2022-04-26 20:14:40 +05:30
CONTRIBUTING.md	Fix numbered list formatting in markdown. (#9664 )	2020-04-21 20:18:12 -07:00
LABELS	Add plain text README.txt, use relative link from README.md to build.md (#7611 )	2019-05-09 21:29:26 -07:00
LICENSE	support Aliyun OSS service as deep storage (#9898 )	2020-07-01 22:20:53 -07:00
NOTICE	license.yaml fixes for code introduced related to AWS RDS token based password provider in PR #9518 (#10885 )	2021-03-10 12:59:25 -08:00
README.md	Add JDK 11 (#12333 )	2022-03-16 15:03:04 -07:00
README.template	De-incubation cleanup in code, docs, packaging (#9108 )	2020-01-03 12:33:19 -05:00
check_test_suite.py	suppress false positive cve (#11699 )	2021-09-13 20:45:38 -07:00
check_test_suite_test.py	suppress false positive cve (#11699 )	2021-09-13 20:45:38 -07:00
licenses.yaml	Add IPAddress java library as dependency and migrate IPv4 functions to use the new library. (#11634 )	2022-05-11 22:06:20 -07:00
owasp-dependency-check-suppressions.xml	Supress CVE 2022 26612 (#12463 )	2022-04-21 08:48:20 -07:00
pom.xml	Use datasketches version 3.2.0 (#12509 )	2022-05-13 11:28:15 +05:30
upload.sh	Adding licenses and enable apache-rat-plugin. (#6215 )	2018-09-18 08:39:26 -07:00

README.md

Apache Druid

Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.

Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.

Getting started

You can get started with Druid with our local or Docker quickstart.

Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).

Load data

Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.

Manage the cluster

Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.

Issue queries

Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs in this repository and submit a pull request.

Community

Community support is available on the druid-user mailing list, which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

Chat with Druid committers and users in real-time on the Apache Druid Slack channel. Please use this invitation link to join and invite others.

Building from source

Please note that JDK 8 or JDK 11 is required to build Druid.

For instructions on building Druid from source, see docs/development/build.md

Contributing

Please follow the community guidelines for contributing.

For instructions on setting up IntelliJ dev/intellij-setup.md

License

Apache License, Version 2.0