druid/docs/comparisons/druid-vs-key-value.md

---
id: druid-vs-key-value
title: "Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->


Druid is highly optimized for scans and aggregations, it supports arbitrarily deep drill downs into data sets. This same functionality
is supported in key/value stores in 2 ways:

1. Pre-compute all permutations of possible user queries
2. Range scans on event data

When pre-computing results, the key is the exact parameters of the query, and the value is the result of the query.
The queries return extremely quickly, but at the cost of flexibility, as ad-hoc exploratory queries are not possible with
pre-computing every possible query permutation. Pre-computing all permutations of all ad-hoc queries leads to result sets
that grow exponentially with the number of columns of a data set, and pre-computing queries for complex real-world data sets
can require hours of pre-processing time.

The other approach to using key/value stores for aggregations to use the dimensions of an event as the key and the event measures as the value.
Aggregations are done by issuing range scans on this data. Timeseries specific databases such as OpenTSDB use this approach.
One of the limitations here is that the key/value storage model does not have indexes for any kind of filtering other than prefix ranges,
which can be used to filter a query down to a metric and time range, but cannot resolve complex predicates to narrow the exact data to scan.
When the number of rows to scan gets large, this limitation can greatly reduce performance. It is also harder to achieve good
locality with key/value stores because most don’t support pushing down aggregates to the storage layer.

For arbitrary exploration of data (flexible data filtering), Druid's custom column format enables ad-hoc queries without pre-computation. The format
also enables fast scans on columns, which is important for good aggregation performance.
-												Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733)


											
										
										
											2018-12-13 14:47:20 -05:00
+								---
-												Docusaurus build framework + ingestion doc refresh. (#8311)

* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes

											
										
										
											2019-08-21 00:48:59 -04:00
+								id: druid-vs-key-value
 								title: "Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)"
-												Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733)


											
										
										
											2018-12-13 14:47:20 -05:00
+								---
-												add missing license headers, in particular to MD files; clean up RAT … (#6563)

* add missing license headers, in particular to MD files; clean up RAT exclusions

* revert inadvertent doc changes

* docs

* cr changes

* fix modified druid-production.svg

											
										
										
											2018-11-13 12:38:37 -05:00
+								<!--
 								  ~ Licensed to the Apache Software Foundation (ASF) under one
 								  ~ or more contributor license agreements.  See the NOTICE file
 								  ~ distributed with this work for additional information
 								  ~ regarding copyright ownership.  The ASF licenses this file
 								  ~ to you under the Apache License, Version 2.0 (the
 								  ~ "License"); you may not use this file except in compliance
 								  ~ with the License.  You may obtain a copy of the License at
 								  ~
 								  ~   http://www.apache.org/licenses/LICENSE-2.0
 								  ~
 								  ~ Unless required by applicable law or agreed to in writing,
 								  ~ software distributed under the License is distributed on an
 								  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 								  ~ KIND, either express or implied.  See the License for the
 								  ~ specific language governing permissions and limitations
 								  ~ under the License.
 								  -->
-												New comparisons for Druid

											
										
										
											2015-11-06 15:21:51 -05:00
-												Docusaurus build framework + ingestion doc refresh. (#8311)

* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes

											
										
										
											2019-08-21 00:48:59 -04:00
+								Druid is highly optimized for scans and aggregations, it supports arbitrarily deep drill downs into data sets. This same functionality
-												New comparisons for Druid

											
										
										
											2015-11-06 15:21:51 -05:00
+								is supported in key/value stores in 2 ways:
 . Pre-compute all permutations of possible user queries
 . Range scans on event data
-												Docusaurus build framework + ingestion doc refresh. (#8311)

* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes

											
										
										
											2019-08-21 00:48:59 -04:00
+								When pre-computing results, the key is the exact parameters of the query, and the value is the result of the query.
 								The queries return extremely quickly, but at the cost of flexibility, as ad-hoc exploratory queries are not possible with
 								pre-computing every possible query permutation. Pre-computing all permutations of all ad-hoc queries leads to result sets
 								that grow exponentially with the number of columns of a data set, and pre-computing queries for complex real-world data sets
-												New comparisons for Druid

											
										
										
											2015-11-06 15:21:51 -05:00
+								can require hours of pre-processing time.
-												Docusaurus build framework + ingestion doc refresh. (#8311)

* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes

											
										
										
											2019-08-21 00:48:59 -04:00
+								The other approach to using key/value stores for aggregations to use the dimensions of an event as the key and the event measures as the value.
 								Aggregations are done by issuing range scans on this data. Timeseries specific databases such as OpenTSDB use this approach.
 								One of the limitations here is that the key/value storage model does not have indexes for any kind of filtering other than prefix ranges,
 								which can be used to filter a query down to a metric and time range, but cannot resolve complex predicates to narrow the exact data to scan.
 								When the number of rows to scan gets large, this limitation can greatly reduce performance. It is also harder to achieve good
-												New comparisons for Druid

											
										
										
											2015-11-06 15:21:51 -05:00
+								locality with key/value stores because most don’t support pushing down aggregates to the storage layer.
-												Docusaurus build framework + ingestion doc refresh. (#8311)

* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes

											
										
										
											2019-08-21 00:48:59 -04:00
+								For arbitrary exploration of data (flexible data filtering), Druid's custom column format enables ad-hoc queries without pre-computation. The format
-												New comparisons for Druid

											
										
										
											2015-11-06 15:21:51 -05:00
+								also enables fast scans on columns, which is important for good aggregation performance.