druid

Apache Druid: a high performance real-time analytics database.

druid

Go to file

Laksh Singla 7c17341caa Return empty result when a group by gets optimized to a timeseries query (#12065 ) Related to #11188 The above mentioned PR allowed timeseries queries to return a default result, when queries of type: select count() from table where dim1="_not_present_dim_" were executed. Before the PR, it returned no row, after the PR, it would return a row with value of count() as 0 (as expected by SQL standards of different dbs). In Grouping#applyProject, we can sometimes perform optimization of a groupBy query to a timeseries query if possible (when the keys of the groupBy are constants, as generated by automated tools). For example, in select count() from table where dim1="_present_dim_" group by "dummy_key", the groupBy clause can be removed. However, in the case when the filter doesn't return anything, i.e. select count() from table where dim1="_not_present_dim_" group by "dummy_key", the behavior of general databases would be to return nothing, while druid (due to above change) returns an empty row. This PR aims to fix this divergence of behavior. Example cases: select count() from table where dim1="_not_present_dim_" group by "dummy_key". CURRENT: Returns a row with count() = 0 EXPECTED: Return no row select 'A', dim1 from foo where m1 = 123123 and dim1 = '_not_present_again_' group by dim1 CURRENT: Returns a row with ('A', 'wat') EXPECTED: Return no row To do this, a boolean droppedDimensionsWhileApplyingProject has been added to Grouping which is true whenever we make changes to the original shape with optimization. Hence if a timeseries query has a grouping with this set to true, we set skipEmptyBuckets=true in the query context (i.e. donot return any row).		2022-01-07 21:53:48 +05:30
.github	Lock hadoop dependencies to 2.8.5 (#11583 )	2021-08-12 15:16:47 +05:30
.idea	Use ExecutorService variables to assign ExecutorService Instances (#11373 )	2021-06-25 16:56:34 -07:00
benchmarks	Add interface for external schema provider to Druid SQL (#12043 )	2021-12-22 22:17:57 +05:30
cloud	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
codestyle	Remove use of deprecated PMD ruleset (#12044 )	2021-12-09 13:04:27 -08:00
core	Segment pruning for multi-dim partitioning given query domain (#12046 )	2021-12-17 12:44:43 +05:30
dev	chore: fix case of GitHub (#10928 )	2021-05-07 01:15:43 -07:00
distribution	Use Druid's extension loading for integration test instead of maven (#12095 )	2022-01-05 23:33:04 -08:00
docs	Standardizing SQL function docs (#12091 )	2022-01-06 23:57:03 -08:00
examples	Replace source call to make scripts more portable (#12014 )	2021-12-06 13:41:25 +05:30
extendedset	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
extensions-contrib	Add parse error list API for stream supervisors, use structured object for parse exceptions, simplify parse exception message (#11961 )	2021-12-09 15:42:55 -06:00
extensions-core	MySqlFirehoseDatabaseConnector uses configured driver class name (#12049 )	2021-12-09 20:58:55 -08:00
helm/druid	update Druid Chart README doc and removes unnecessary lock file (#11945 )	2021-11-22 21:34:26 +08:00
hll	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
hooks	Add thread count to pre-push hook to speed up checking (#11808 )	2021-11-22 21:33:01 +08:00
indexing-hadoop	fix IncrementalIndex performance regression (#12048 )	2021-12-09 22:04:32 -08:00
indexing-service	Lock count guardrail for parallel single phase/sequential task (#12052 )	2021-12-15 11:12:21 -06:00
integration-tests	Use Druid's extension loading for integration test instead of maven (#12095 )	2022-01-05 23:33:04 -08:00
licenses	Web console: new Ace, diff view, and cleanup. Decorating the console for the holidays ✨ 🎁 (#12085 )	2021-12-22 16:31:17 -08:00
processing	Removing unused processing threadpool on broker (#12070 )	2021-12-21 13:07:53 -08:00
publications	De-incubation cleanup in code, docs, packaging (#9108 )	2020-01-03 12:33:19 -05:00
server	Add http response status code to org.eclipse.jetty.server.RequestLog (#12116 )	2022-01-06 20:10:01 +08:00
services	Removing unused processing threadpool on broker (#12070 )	2021-12-21 13:07:53 -08:00
sql	Return empty result when a group by gets optimized to a timeseries query (#12065 )	2022-01-07 21:53:48 +05:30
web-console	Web console: remove console.log (#12094 )	2021-12-22 19:31:23 -08:00
website	Removing unused processing threadpool on broker (#12070 )	2021-12-21 13:07:53 -08:00
.asf.yaml	Add .asf.yaml. (#9083 )	2019-12-20 16:45:38 -08:00
.backportrc.json	Add 0.18.0 to .backportrc.json to facilitate backport. (#9661 )	2020-04-11 13:49:04 -07:00
.codecov.yml	Use Codecov (#8388 )	2019-08-28 08:49:30 -07:00
.dockerignore	Add docker container for druid (#6896 )	2019-02-08 12:12:28 +00:00
.gitignore	Refactor ResponseContext (#11828 )	2021-12-06 17:03:12 -08:00
.lgtm.yml	Suppress LGTM warnings about stack trace exposure (#9631 )	2020-04-09 17:31:03 -07:00
.travis.yml	Use Druid's extension loading for integration test instead of maven (#12095 )	2022-01-05 23:33:04 -08:00
CONTRIBUTING.md	Fix numbered list formatting in markdown. (#9664 )	2020-04-21 20:18:12 -07:00
LABELS	Add plain text README.txt, use relative link from README.md to build.md (#7611 )	2019-05-09 21:29:26 -07:00
LICENSE	support Aliyun OSS service as deep storage (#9898 )	2020-07-01 22:20:53 -07:00
NOTICE	license.yaml fixes for code introduced related to AWS RDS token based password provider in PR #9518 (#10885 )	2021-03-10 12:59:25 -08:00
README.md	Fix travis' link behind build badge (#11858 )	2021-11-01 07:26:30 -07:00
README.template	De-incubation cleanup in code, docs, packaging (#9108 )	2020-01-03 12:33:19 -05:00
check_test_suite.py	suppress false positive cve (#11699 )	2021-09-13 20:45:38 -07:00
check_test_suite_test.py	suppress false positive cve (#11699 )	2021-09-13 20:45:38 -07:00
licenses.yaml	Update log4j2 to 2.17.1 (#12106 )	2021-12-30 19:18:16 -06:00
owasp-dependency-check-suppressions.xml	Support for hadoop 3 via maven profiles (#11794 )	2021-10-30 22:46:24 +05:30
pom.xml	Update log4j2 to 2.17.1 (#12106 )	2021-12-30 19:18:16 -06:00
setup-hooks.sh	Add git pre-commit hook to source control (#9554 )	2020-06-05 11:19:42 -10:00
upload.sh	Adding licenses and enable apache-rat-plugin. (#6215 )	2018-09-18 08:39:26 -07:00

README.md

Apache Druid

Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.

Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.

Getting started

You can get started with Druid with our local or Docker quickstart.

Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).

Load data

Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.

Manage the cluster

Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.

Issue queries

Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs in this repository and submit a pull request.

Community

Community support is available on the druid-user mailing list, which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

Chat with Druid committers and users in real-time on the #druid channel in the Apache Slack team. Please use this invitation link to join the ASF Slack, and once joined, go into the #druid channel.

Building from source

Please note that JDK 8 is required to build Druid.

For instructions on building Druid from source, see docs/development/build.md

Contributing

Please follow the community guidelines for contributing.

For instructions on setting up IntelliJ dev/intellij-setup.md

License

Apache License, Version 2.0