2018-12-13 14:47:20 -05:00
|
|
|
---
|
2019-08-21 00:48:59 -04:00
|
|
|
id: caching
|
|
|
|
title: "Query caching"
|
2018-12-13 14:47:20 -05:00
|
|
|
---
|
|
|
|
|
2018-11-13 12:38:37 -05:00
|
|
|
<!--
|
|
|
|
~ Licensed to the Apache Software Foundation (ASF) under one
|
|
|
|
~ or more contributor license agreements. See the NOTICE file
|
|
|
|
~ distributed with this work for additional information
|
|
|
|
~ regarding copyright ownership. The ASF licenses this file
|
|
|
|
~ to you under the Apache License, Version 2.0 (the
|
|
|
|
~ "License"); you may not use this file except in compliance
|
|
|
|
~ with the License. You may obtain a copy of the License at
|
|
|
|
~
|
|
|
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
~
|
|
|
|
~ Unless required by applicable law or agreed to in writing,
|
|
|
|
~ software distributed under the License is distributed on an
|
|
|
|
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
|
|
~ KIND, either express or implied. See the License for the
|
|
|
|
~ specific language governing permissions and limitations
|
|
|
|
~ under the License.
|
|
|
|
-->
|
|
|
|
|
2016-01-06 00:27:52 -05:00
|
|
|
|
2019-04-19 18:52:26 -04:00
|
|
|
Apache Druid (incubating) supports query result caching at both the segment and whole-query result level. Cache data can be stored in the
|
2019-03-11 20:57:00 -04:00
|
|
|
local JVM heap or in an external distributed key/value store. In all cases, the Druid cache is a query result cache.
|
|
|
|
The only difference is whether the result is a _partial result_ for a particular segment, or the result for an entire
|
|
|
|
query. In both cases, the cache is invalidated as soon as any underlying data changes; it will never return a stale
|
|
|
|
result.
|
2016-01-06 00:27:52 -05:00
|
|
|
|
2019-03-11 20:57:00 -04:00
|
|
|
Segment-level caching allows the cache to be leveraged even when some of the underling segments are mutable and
|
|
|
|
undergoing real-time ingestion. In this case, Druid will potentially cache query results for immutable historical
|
|
|
|
segments, while re-computing results for the real-time segments on each query. Whole-query result level caching is not
|
|
|
|
useful in this scenario, since it would be continuously invalidated.
|
|
|
|
|
|
|
|
Segment-level caching does require Druid to merge the per-segment results on each query, even when they are served
|
|
|
|
from the cache. For this reason, whole-query result level caching can be more efficient if invalidation due to real-time
|
|
|
|
ingestion is not an issue.
|
2016-01-06 00:27:52 -05:00
|
|
|
|
|
|
|
## Query caching on Brokers
|
|
|
|
|
2019-03-11 20:57:00 -04:00
|
|
|
Brokers support both segment-level and whole-query result level caching. Segment-level caching is controlled by the
|
|
|
|
parameters `useCache` and `populateCache`. Whole-query result level caching is controlled by the parameters
|
2019-08-21 00:48:59 -04:00
|
|
|
`useResultLevelCache` and `populateResultLevelCache` and [runtime properties](../configuration/index.md)
|
2019-03-11 20:57:00 -04:00
|
|
|
`druid.broker.cache.*`..
|
|
|
|
|
|
|
|
Enabling segment-level caching on the Broker can yield faster results than if query caches were enabled on Historicals for small
|
|
|
|
clusters. This is the recommended setup for smaller production clusters (< 5 servers). Populating segment-level caches on
|
|
|
|
the Broker is _not_ recommended for large production clusters, since when the property `druid.broker.cache.populateCache` is
|
|
|
|
set to `true` (and query context parameter `populateCache` is _not_ set to `false`), results from Historicals are returned
|
|
|
|
on a per segment basis, and Historicals will not be able to do any local result merging. This impairs the ability of the
|
|
|
|
Druid cluster to scale well.
|
2016-01-06 00:27:52 -05:00
|
|
|
|
|
|
|
## Query caching on Historicals
|
|
|
|
|
2019-03-11 20:57:00 -04:00
|
|
|
Historicals only support segment-level caching. Segment-level caching is controlled by the query context
|
2019-08-21 00:48:59 -04:00
|
|
|
parameters `useCache` and `populateCache` and [runtime properties](../configuration/index.md)
|
2019-03-11 20:57:00 -04:00
|
|
|
`druid.historical.cache.*`.
|
|
|
|
|
|
|
|
Larger production clusters should enable segment-level cache population on Historicals only (not on Brokers) to avoid
|
|
|
|
having to use Brokers to merge all query results. Enabling cache population on the Historicals instead of the Brokers
|
|
|
|
enables the Historicals to do their own local result merging and puts less strain on the Brokers.
|
2019-08-06 18:57:17 -04:00
|
|
|
|
|
|
|
## Query caching on Ingestion Tasks
|
|
|
|
|
|
|
|
Task executor processes such as the Peon or the experimental Indexer only support segment-level caching. Segment-level
|
|
|
|
caching is controlled by the query context parameters `useCache` and `populateCache`
|
|
|
|
and [runtime properties](../configuration/index.html) `druid.realtime.cache.*`.
|
|
|
|
|
|
|
|
Larger production clusters should enable segment-level cache population on task execution processes only
|
|
|
|
(not on Brokers) to avoid having to use Brokers to merge all query results. Enabling cache population on the
|
|
|
|
task execution processes instead of the Brokers enables the task execution processes to do their own local
|
|
|
|
result merging and puts less strain on the Brokers.
|
|
|
|
|
|
|
|
Note that the task executor processes only support caches that keep their data locally, such as the `caffeine` cache.
|
|
|
|
This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
|
|
|
|
ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
|
|
|
|
remote cache types such as `memcached` will be ignored by task executor processes.
|