druid/docs/content/querying/query-context.md

---
layout: doc_page
title: "Query Context"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

# Query Context

The query context is used for various query configuration parameters. The following parameters apply to all queries.

|property         |default                                 | description          |
|-----------------|----------------------------------------|----------------------|
|timeout          | `druid.server.http.defaultQueryTimeout`| Query timeout in millis, beyond which unfinished queries will be cancelled. 0 timeout means `no timeout`. To set the default timeout, see [Broker configuration](../configuration/index.html#broker) |
|priority         | `0`                                    | Query Priority. Queries with higher priority get precedence for computational resources.|
|queryId          | auto-generated                         | Unique identifier given to this query. If a query ID is set or known, this can be used to cancel the query |
|useCache         | `true`                                 | Flag indicating whether to leverage the query cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Apache Druid (incubating) uses druid.broker.cache.useCache or druid.historical.cache.useCache to determine whether or not to read from the query cache |
|populateCache    | `true`                                 | Flag indicating whether to save the results of the query to the query cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateCache or druid.historical.cache.populateCache to determine whether or not to save the results of this query to the query cache |
|useResultLevelCache         | `true`                      | Flag indicating whether to leverage the result level cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses druid.broker.cache.useResultLevelCache to determine whether or not to read from the result-level query cache |
|populateResultLevelCache    | `true`                      | Flag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateResultLevelCache to determine whether or not to save the results of this query to the result-level query cache |
|bySegment        | `false`                                | Return "by segment" results. Primarily used for debugging, setting it to `true` returns results associated with the data segment they came from |
|finalize         | `true`                                 | Flag indicating whether to "finalize" aggregation results. Primarily used for debugging. For instance, the `hyperUnique` aggregator will return the full HyperLogLog sketch instead of the estimated cardinality when this flag is set to `false` |
|chunkPeriod      | `P0D` (off)                            | At the Broker process level, long interval queries (of any type) may be broken into shorter interval queries to parallelize merging more than normal. Broken up queries will use a larger share of cluster resources, but, if you use groupBy "v1, it may be able to complete faster as a result. Use ISO 8601 periods. For example, if this property is set to `P1M` (one month), then a query covering a year would be broken into 12 smaller queries. The broker uses its query processing executor service to initiate processing for query chunks, so make sure "druid.processing.numThreads" is configured appropriately on the broker. [groupBy queries](groupbyquery.html) do not support chunkPeriod by default, although they do if using the legacy "v1" engine. This context is deprecated since it's only useful for groupBy "v1", and will be removed in the future releases.|
|maxScatterGatherBytes| `druid.server.http.maxScatterGatherBytes` | Maximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. This parameter can be used to further reduce `maxScatterGatherBytes` limit at query time. See [Broker configuration](../configuration/index.html#broker) for more details.|
|maxQueuedBytes       | `druid.broker.http.maxQueuedBytes`        | Maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Similar to `maxScatterGatherBytes`, except unlike that configuration, this one will trigger backpressure rather than query failure. Zero means disabled.|
|serializeDateTimeAsLong| `false`       | If true, DateTime is serialized as long in the result returned by Broker and the data transportation between Broker and compute process|
|serializeDateTimeAsLongInner| `false`  | If true, DateTime is serialized as long in the data transportation between Broker and compute process|

In addition, some query types offer context parameters specific to that query type.

### TopN queries

|property         |default              | description          |
|-----------------|---------------------|----------------------|
|minTopNThreshold | `1000`              | The top minTopNThreshold local results from each segment are returned for merging to determine the global topN. |

### Timeseries queries

|property         |default              | description          |
|-----------------|---------------------|----------------------|
|skipEmptyBuckets | `false`             | Disable timeseries zero-filling behavior, so only buckets with results will be returned. |

### GroupBy queries

See [GroupBy query context](groupbyquery.html#query-context).

### Vectorizable queries

The GroupBy and Timeseries query types can run in _vectorized_ mode, which speeds up query execution by processing
batches of rows at a time. Not all queries can be vectorized. In particular, vectorization currently has the following
requirements:

- All query-level filters must either be able to run on bitmap indexes or must offer vectorized row-matchers. These
include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
- All filters in filtered aggregators must offer vectorized row-matchers.
- All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum",
"hyperUnique", and "filtered".
- No virtual columns.
- For GroupBy: All dimension specs must be "default" (no extraction functions or filtered dimension specs).
- For GroupBy: No multi-value dimensions.
- For Timeseries: No "descending" order.
- Only immutable segments (not real-time).

Other query types (like TopN, Scan, Select, and Search) ignore the "vectorize" parameter, and will execute without
vectorization. These query types will ignore the "vectorize" parameter even if it is set to `"force"`.

Vectorization is an alpha-quality feature as of Druid #{DRUIDVERSION}. We heartily welcome any feedback and testing
from the community as we work to battle-test it.

|property|default| description|
|--------|-------|------------|
|vectorize|`false`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail).|
|vectorSize|`512`|Sets the row batching size for a particular query.|
-												Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733)


											
										
										
											2018-12-13 14:47:20 -05:00
+								---
 								layout: doc_page
 								title: "Query Context"
 								---
-												add missing license headers, in particular to MD files; clean up RAT … (#6563)

* add missing license headers, in particular to MD files; clean up RAT exclusions

* revert inadvertent doc changes

* docs

* cr changes

* fix modified druid-production.svg

											
										
										
											2018-11-13 12:38:37 -05:00
+								<!--
 								  ~ Licensed to the Apache Software Foundation (ASF) under one
 								  ~ or more contributor license agreements.  See the NOTICE file
 								  ~ distributed with this work for additional information
 								  ~ regarding copyright ownership.  The ASF licenses this file
 								  ~ to you under the Apache License, Version 2.0 (the
 								  ~ "License"); you may not use this file except in compliance
 								  ~ with the License.  You may obtain a copy of the License at
 								  ~
 								  ~   http://www.apache.org/licenses/LICENSE-2.0
 								  ~
 								  ~ Unless required by applicable law or agreed to in writing,
 								  ~ software distributed under the License is distributed on an
 								  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 								  ~ KIND, either express or implied.  See the License for the
 								  ~ specific language governing permissions and limitations
 								  ~ under the License.
 								  -->
-												Added titles and harmonized docs to improve usability and SEO (#6731)

* added titles and harmonized docs

* manually fixed some titles

											
										
										
											2018-12-12 23:42:12 -05:00
+								# Query Context
-												renaming all *.md filenames to only have lowercase and dashes
so that they are editable on case-insensitive os as well

											
										
										
											2015-05-05 17:07:32 -04:00
-												SQL: Add context and contextual functions to planner. (#3919)

* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.

											
										
										
											2017-02-15 17:09:14 -05:00
+								The query context is used for various query configuration parameters. The following parameters apply to all queries.
-												renaming all *.md filenames to only have lowercase and dashes
so that they are editable on case-insensitive os as well

											
										
										
											2015-05-05 17:07:32 -04:00
-												Make timeout behavior consistent to document (#4134)

* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis

											
										
										
											2017-04-18 20:47:53 -04:00
+								|property         |default                                 | description          |
 								|-----------------|----------------------------------------|----------------------|
-												Add master/data/query server concepts to docs/packaging (#6916)

* Add master/data/query server concepts to docs/packaging

* PR comments

* TOC and markdown fix

* Update image legend

* PR comment

* More PR comments

											
										
										
											2019-01-30 22:41:07 -05:00
+								|timeout          | `druid.server.http.defaultQueryTimeout`| Query timeout in millis, beyond which unfinished queries will be cancelled. 0 timeout means `no timeout`. To set the default timeout, see [Broker configuration](../configuration/index.html#broker) |
-												Make timeout behavior consistent to document (#4134)

* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis

											
										
										
											2017-04-18 20:47:53 -04:00
+								|priority         | `0`                                    | Query Priority. Queries with higher priority get precedence for computational resources.|
 								|queryId          | auto-generated                         | Unique identifier given to this query. If a query ID is set or known, this can be used to cancel the query |
-												Add more Apache branding to docs (#7515)


											
										
										
											2019-04-19 18:52:26 -04:00
+								|useCache         | `true`                                 | Flag indicating whether to leverage the query cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Apache Druid (incubating) uses druid.broker.cache.useCache or druid.historical.cache.useCache to determine whether or not to read from the query cache |
-												Updating the description of useCache (#4200)

Updating the description of useCache

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment
											
										
										
											2017-04-25 13:26:15 -04:00
+								|populateCache    | `true`                                 | Flag indicating whether to save the results of the query to the query cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateCache or druid.historical.cache.populateCache to determine whether or not to save the results of this query to the query cache |
-												Further improve caching documentation. (#7236)

Follow-up to #7223 that fixes a doc bug (a result-level cache property
was misspelled), changes the recommended "small cluster" threshold from
20 to 5 servers, and clarifies behavior of the various caching options.
											
										
										
											2019-03-11 20:57:00 -04:00
+								|useResultLevelCache         | `true`                      | Flag indicating whether to leverage the result level cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses druid.broker.cache.useResultLevelCache to determine whether or not to read from the result-level query cache |
 								|populateResultLevelCache    | `true`                      | Flag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateResultLevelCache to determine whether or not to save the results of this query to the result-level query cache |
-												Make timeout behavior consistent to document (#4134)

* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis

											
										
										
											2017-04-18 20:47:53 -04:00
+								|bySegment        | `false`                                | Return "by segment" results. Primarily used for debugging, setting it to `true` returns results associated with the data segment they came from |
 								|finalize         | `true`                                 | Flag indicating whether to "finalize" aggregation results. Primarily used for debugging. For instance, the `hyperUnique` aggregator will return the full HyperLogLog sketch instead of the estimated cardinality when this flag is set to `false` |
-												Reword 'node' to 'process' (#7172)


											
										
										
											2019-02-28 21:10:39 -05:00
+								|chunkPeriod      | `P0D` (off)                            | At the Broker process level, long interval queries (of any type) may be broken into shorter interval queries to parallelize merging more than normal. Broken up queries will use a larger share of cluster resources, but, if you use groupBy "v1, it may be able to complete faster as a result. Use ISO 8601 periods. For example, if this property is set to `P1M` (one month), then a query covering a year would be broken into 12 smaller queries. The broker uses its query processing executor service to initiate processing for query chunks, so make sure "druid.processing.numThreads" is configured appropriately on the broker. [groupBy queries](groupbyquery.html) do not support chunkPeriod by default, although they do if using the legacy "v1" engine. This context is deprecated since it's only useful for groupBy "v1", and will be removed in the future releases.|
 								|maxScatterGatherBytes| `druid.server.http.maxScatterGatherBytes` | Maximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. This parameter can be used to further reduce `maxScatterGatherBytes` limit at query time. See [Broker configuration](../configuration/index.html#broker) for more details.|
-												Broker backpressure. (#6313)

* Broker backpressure.

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.

Fixes #4933.

* Fix query context doc.

											
										
										
											2018-09-10 12:33:29 -04:00
+								|maxQueuedBytes       | `druid.broker.http.maxQueuedBytes`        | Maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Similar to `maxScatterGatherBytes`, except unlike that configuration, this one will trigger backpressure rather than query failure. Zero means disabled.|
-												Reword 'node' to 'process' (#7172)


											
										
										
											2019-02-28 21:10:39 -05:00
+								|serializeDateTimeAsLong| `false`       | If true, DateTime is serialized as long in the result returned by Broker and the data transportation between Broker and compute process|
 								|serializeDateTimeAsLongInner| `false`  | If true, DateTime is serialized as long in the data transportation between Broker and compute process|
-												SQL: Add context and contextual functions to planner. (#3919)

* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.

											
										
										
											2017-02-15 17:09:14 -05:00
 								In addition, some query types offer context parameters specific to that query type.
 								### TopN queries
 								|property         |default              | description          |
 								|-----------------|---------------------|----------------------|
-												add doc

											
										
										
											2016-01-11 22:25:11 -05:00
+								|minTopNThreshold | `1000`              | The top minTopNThreshold local results from each segment are returned for merging to determine the global topN. |
-												SQL: Add context and contextual functions to planner. (#3919)

* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.

											
										
										
											2017-02-15 17:09:14 -05:00
 								### Timeseries queries
 								|property         |default              | description          |
 								|-----------------|---------------------|----------------------|
 								|skipEmptyBuckets | `false`             | Disable timeseries zero-filling behavior, so only buckets with results will be returned. |
 								### GroupBy queries
 								See [GroupBy query context](groupbyquery.html#query-context).
-												Query vectorization. (#6794)

* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.

											
										
										
											2019-07-12 15:54:07 -04:00
 								### Vectorizable queries
 								The GroupBy and Timeseries query types can run in _vectorized_ mode, which speeds up query execution by processing
 								batches of rows at a time. Not all queries can be vectorized. In particular, vectorization currently has the following
 								requirements:
 								- All query-level filters must either be able to run on bitmap indexes or must offer vectorized row-matchers. These
 								include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
 								- All filters in filtered aggregators must offer vectorized row-matchers.
 								- All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum",
 								"hyperUnique", and "filtered".
 								- No virtual columns.
 								- For GroupBy: All dimension specs must be "default" (no extraction functions or filtered dimension specs).
 								- For GroupBy: No multi-value dimensions.
 								- For Timeseries: No "descending" order.
 								- Only immutable segments (not real-time).
 								Other query types (like TopN, Scan, Select, and Search) ignore the "vectorize" parameter, and will execute without
 								vectorization. These query types will ignore the "vectorize" parameter even if it is set to `"force"`.
 								Vectorization is an alpha-quality feature as of Druid #{DRUIDVERSION}. We heartily welcome any feedback and testing
 								from the community as we work to battle-test it.
 								|property|default| description|
 								|--------|-------|------------|
 								|vectorize|`false`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail).|
 								|vectorSize|`512`|Sets the row batching size for a particular query.|