druid/docs/querying/query-context.md

---
id: query-context
title: "Query context"
sidebar_label: "Query context"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

The query context is used for various query configuration parameters. Query context parameters can be specified in
the following ways:

- For [Druid SQL](sql-api.md), context parameters are provided either as a JSON object named `context` to the
HTTP POST API, or as properties to the JDBC connection.
- For [native queries](querying.md), context parameters are provided as a JSON object named `context`.

Note that setting query context will override both the default value and the runtime properties value in the format of
`druid.query.default.context.{property_key}` (if set). 

## General parameters

Unless otherwise noted, the following parameters apply to all query types.

|property         |default                                 | description          |
|-----------------|----------------------------------------|----------------------|
|timeout          | `druid.server.http.defaultQueryTimeout`| Query timeout in millis, beyond which unfinished queries will be cancelled. 0 timeout means `no timeout` (up to the server-side maximum query timeout, `druid.server.http.maxQueryTimeout`). To set the default timeout and maximum timeout, see [Broker configuration](../configuration/index.md#broker) |
|priority         | `0`                                    | Query Priority. Queries with higher priority get precedence for computational resources.|
|lane             | `null`                                 | Query lane, used to control usage limits on classes of queries. See [Broker configuration](../configuration/index.md#broker) for more details.|
|queryId          | auto-generated                         | Unique identifier given to this query. If a query ID is set or known, this can be used to cancel the query |
|brokerService    | `null`                                 | Broker service to which this query should be routed. This parameter is honored only by a broker selector strategy of type *manual*. See [Router strategies](../design/router.md#router-strategies) for more details.|
|useCache         | `true`                                 | Flag indicating whether to leverage the query cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Apache Druid uses `druid.broker.cache.useCache` or `druid.historical.cache.useCache` to determine whether or not to read from the query cache |
|populateCache    | `true`                                 | Flag indicating whether to save the results of the query to the query cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses `druid.broker.cache.populateCache` or `druid.historical.cache.populateCache` to determine whether or not to save the results of this query to the query cache |
|useResultLevelCache         | `true`                      | Flag indicating whether to leverage the result level cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses `druid.broker.cache.useResultLevelCache` to determine whether or not to read from the result-level query cache |
|populateResultLevelCache    | `true`                      | Flag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses `druid.broker.cache.populateResultLevelCache` to determine whether or not to save the results of this query to the result-level query cache |
|bySegment        | `false`                                | Native queries only. Return "by segment" results. Primarily used for debugging, setting it to `true` returns results associated with the data segment they came from |
|finalize         | `true`                                 | Flag indicating whether to "finalize" aggregation results. Primarily used for debugging. For instance, the `hyperUnique` aggregator will return the full HyperLogLog sketch instead of the estimated cardinality when this flag is set to `false` |
|maxScatterGatherBytes| `druid.server.http.maxScatterGatherBytes` | Maximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. This parameter can be used to further reduce `maxScatterGatherBytes` limit at query time. See [Broker configuration](../configuration/index.md#broker) for more details.|
|maxQueuedBytes       | `druid.broker.http.maxQueuedBytes`        | Maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Similar to `maxScatterGatherBytes`, except unlike that configuration, this one will trigger backpressure rather than query failure. Zero means disabled.|
|serializeDateTimeAsLong| `false`       | If true, DateTime is serialized as long in the result returned by Broker and the data transportation between Broker and compute process|
|serializeDateTimeAsLongInner| `false`  | If true, DateTime is serialized as long in the data transportation between Broker and compute process|
|enableParallelMerge|`true`|Enable parallel result merging on the Broker. Note that `druid.processing.merge.useParallelMergePool` must be enabled for this setting to be set to `true`. See [Broker configuration](../configuration/index.md#broker) for more details.|
|parallelMergeParallelism|`druid.processing.merge.pool.parallelism`|Maximum number of parallel threads to use for parallel result merging on the Broker. See [Broker configuration](../configuration/index.md#broker) for more details.|
|parallelMergeInitialYieldRows|`druid.processing.merge.task.initialYieldNumRows`|Number of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See [Broker configuration](../configuration/index.md#broker) for more details.|
|parallelMergeSmallBatchRows|`druid.processing.merge.task.smallBatchNumRows`|Size of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See [Broker configuration](../configuration/index.md#broker) for more details.|
|useFilterCNF|`false`| If true, Druid will attempt to convert the query filter to Conjunctive Normal Form (CNF). During query processing, columns can be pre-filtered by intersecting the bitmap indexes of all values that match the eligible filters, often greatly reducing the raw number of rows which need to be scanned. But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter. As such, filters in CNF potentially have a higher chance to utilize a large amount of bitmap indexes on string columns during pre-filtering. However, this setting should be used with great caution, as it can sometimes have a negative effect on performance, and in some cases, the act of computing CNF of a filter can be expensive. We recommend hand tuning your filters to produce an optimal form if possible, or at least verifying through experimentation that using this parameter actually improves your query performance with no ill-effects.|
|secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
|enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
|debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
|setProcessingThreadNames|`true`| Whether processing thread names will be set to `queryType_dataSource_intervals` while processing a query. This aids in interpreting thread dumps, and is on by default. Query overhead can be reduced slightly by setting this to `false`. This has a tiny effect in most scenarios, but can be meaningful in high-QPS, low-per-segment-processing-time scenarios. |
|maxNumericInFilters|`-1`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates only to an [OR](../querying/filters.md#or) of [Bound filter](../querying/filters.md#bound-filter). By default, Druid does not restrict the amount of of numeric Bound Filters on String columns, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100.  Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the `WHERE` clause instead of numbers. For example, `WHERE someString IN (‘123’, ‘456’)`. This value cannot exceed the set system configuration `druid.sql.planner.maxNumericInFilters`. This value is ignored if `druid.sql.planner.maxNumericInFilters` is not set explicitly.|
|inSubQueryThreshold|`2147483647`| Threshold for minimum number of values in an IN clause to convert the query to a JOIN operation on an inlined table rather than a predicate. A threshold of 0 forces usage of an inline table in all cases; a threshold of [Integer.MAX_VALUE] forces usage of OR in all cases. |

## Druid SQL parameters

See [SQL query context](sql-query-context.md) for query context parameters specific to Druid SQL queries.

## Parameters by query type

Some query types offer context parameters specific to that query type.

### TopN

|property         |default              | description          |
|-----------------|---------------------|----------------------|
|minTopNThreshold | `1000`              | The top minTopNThreshold local results from each segment are returned for merging to determine the global topN. |

### Timeseries

|property         |default              | description          |
|-----------------|---------------------|----------------------|
|skipEmptyBuckets | `false`             | Disable timeseries zero-filling behavior, so only buckets with results will be returned. |

### Join filter

|property         |default              | description          |
|-----------------|---------------------|----------------------|
|enableJoinFilterPushDown | `true` | Controls whether a join query will attempt filter push down, which reduces the number of rows that have to be compared in a join operation.|
|enableJoinFilterRewrite | `true` | Controls whether filter clauses that reference non-base table columns will be rewritten into filters on base table columns.|
|enableJoinFilterRewriteValueColumnFilters | `false` | Controls whether Druid rewrites non-base table filters on non-key columns in the non-base table. Requires a scan of the non-base table.|
|enableRewriteJoinToFilter | `true` | Controls whether a join can be pushed partial or fully to the base table as a filter at runtime.|
|joinFilterRewriteMaxSize | `10000` | The maximum size of the correlated value set used for filter rewrites. Set this limit to prevent excessive memory use.| 

### GroupBy

See the list of [GroupBy query context](groupbyquery.md#advanced-configurations) parameters available on the groupBy
query page.

## Vectorization parameters

The GroupBy and Timeseries query types can run in _vectorized_ mode, which speeds up query execution by processing
batches of rows at a time. Not all queries can be vectorized. In particular, vectorization currently has the following
requirements:

- All query-level filters must either be able to run on bitmap indexes or must offer vectorized row-matchers. These
include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
- All filters in filtered aggregators must offer vectorized row-matchers.
- All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", "longMin",
 "longMax", "doubleMin", "doubleMax", "floatMin", "floatMax", "longAny", "doubleAny", "floatAny", "stringAny",
 "hyperUnique", "filtered", "approxHistogram", "approxHistogramFold", and "fixedBucketsHistogram" (with numerical input). 
- All virtual columns must offer vectorized implementations. Currently for expression virtual columns, support for vectorization is decided on a per expression basis, depending on the type of input and the functions used by the expression. See the currently supported list in the [expression documentation](../misc/math-expr.md#vectorization-support).
- For GroupBy: All dimension specs must be "default" (no extraction functions or filtered dimension specs).
- For GroupBy: No multi-value dimensions.
- For Timeseries: No "descending" order.
- Only immutable segments (not real-time).
- Only [table datasources](datasource.md#table) (not joins, subqueries, lookups, or inline datasources).

Other query types (like TopN, Scan, Select, and Search) ignore the "vectorize" parameter, and will execute without
vectorization. These query types will ignore the "vectorize" parameter even if it is set to `"force"`.

|property|default| description|
|--------|-------|------------|
|vectorize|`true`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail). This will override `druid.query.default.context.vectorize` if it's set.|
|vectorSize|`512`|Sets the row batching size for a particular query. This will override `druid.query.default.context.vectorSize` if it's set.|
|vectorizeVirtualColumns|`true`|Enables or disables vectorized query processing of queries with virtual columns, layered on top of `vectorize` (`vectorize` must also be set to true for a query to utilize vectorization). Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries with virtual columns that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production. This will override `druid.query.default.context.vectorizeVirtualColumns` if it's set.|
-												Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733)


											
										
										
											2018-12-13 14:47:20 -05:00
+								---
-												Docusaurus build framework + ingestion doc refresh. (#8311)

* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes

											
										
										
											2019-08-21 00:48:59 -04:00
+								id: query-context
 								title: "Query context"
-												Document query context parameters related to join filters (#12057)

* docs update for query context and filters

* updates from review

* Update docs/querying/filters.md

											
										
										
											2021-12-13 20:47:21 -05:00
+								sidebar_label: "Query context"
-												Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733)


											
										
										
											2018-12-13 14:47:20 -05:00
+								---
-												add missing license headers, in particular to MD files; clean up RAT … (#6563)

* add missing license headers, in particular to MD files; clean up RAT exclusions

* revert inadvertent doc changes

* docs

* cr changes

* fix modified druid-production.svg

											
										
										
											2018-11-13 12:38:37 -05:00
+								<!--
 								  ~ Licensed to the Apache Software Foundation (ASF) under one
 								  ~ or more contributor license agreements.  See the NOTICE file
 								  ~ distributed with this work for additional information
 								  ~ regarding copyright ownership.  The ASF licenses this file
 								  ~ to you under the Apache License, Version 2.0 (the
 								  ~ "License"); you may not use this file except in compliance
 								  ~ with the License.  You may obtain a copy of the License at
 								  ~
 								  ~   http://www.apache.org/licenses/LICENSE-2.0
 								  ~
 								  ~ Unless required by applicable law or agreed to in writing,
 								  ~ software distributed under the License is distributed on an
 								  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 								  ~ KIND, either express or implied.  See the License for the
 								  ~ specific language governing permissions and limitations
 								  ~ under the License.
 								  -->
-												Refresh query docs. (#9704)

* Refresh query docs.

Larger changes:

- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.

Smaller changes:

- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.

* Updates.

* Fix numbering.

* Fix glitches.

* Add new words to spellcheck file.

* Assorted changes.

* Further adjustments.

* Add missing punctuation.
											
										
										
											2020-04-15 19:12:20 -04:00
+								The query context is used for various query configuration parameters. Query context parameters can be specified in
 								the following ways:
-												Refactor SQL docs (#12239)

* refactor and link fixes

* add sql docs to left nav

* code format for needle

* updated web console script

* link fixes

* update earliest/latest functions

* edits for grammar and style

* more link fixes

* another link

* update with #12226

* update .spelling file
											
										
										
											2022-02-11 17:43:30 -05:00
+								- For [Druid SQL](sql-api.md), context parameters are provided either as a JSON object named `context` to the
-												Refresh query docs. (#9704)

* Refresh query docs.

Larger changes:

- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.

Smaller changes:

- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.

* Updates.

* Fix numbering.

* Fix glitches.

* Add new words to spellcheck file.

* Assorted changes.

* Further adjustments.

* Add missing punctuation.
											
										
										
											2020-04-15 19:12:20 -04:00
+								HTTP POST API, or as properties to the JDBC connection.
 								- For [native queries](querying.md), context parameters are provided as a JSON object named `context`.
-												Cluster wide default query context setting (#10208)

* Cluster wide default query context setting

* Cluster wide default query context setting

* Cluster wide default query context setting

* add docs

* fix docs

* update props

* fix checkstyle

* fix checkstyle

* fix checkstyle

* update docs

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix NPE
											
										
										
											2020-07-29 18:19:18 -04:00
+								Note that setting query context will override both the default value and the runtime properties value in the format of
 								`druid.query.default.context.{property_key}` (if set).
-												Document query context parameters related to join filters (#12057)

* docs update for query context and filters

* updates from review

* Update docs/querying/filters.md

											
										
										
											2021-12-13 20:47:21 -05:00
+								## General parameters
-												clarify bySegment is native only (#11331)


											
										
										
											2021-06-11 16:48:17 -04:00
+								Unless otherwise noted, the following parameters apply to all query types.
-												renaming all *.md filenames to only have lowercase and dashes
so that they are editable on case-insensitive os as well

											
										
										
											2015-05-05 17:07:32 -04:00
-												Make timeout behavior consistent to document (#4134)

* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis

											
										
										
											2017-04-18 20:47:53 -04:00
+								|property         |default                                 | description          |
 								|-----------------|----------------------------------------|----------------------|
-												Fix druid client timeout zero (#12023)

* fix bug where queries fail immediately when timeout is 0 instead of using default timeout

* fix to use serverside max

* more better

* less flaky test

* oops
											
										
										
											2021-12-07 15:41:01 -05:00
+								|timeout          | `druid.server.http.defaultQueryTimeout`| Query timeout in millis, beyond which unfinished queries will be cancelled. 0 timeout means `no timeout` (up to the server-side maximum query timeout, `druid.server.http.maxQueryTimeout`). To set the default timeout and maximum timeout, see [Broker configuration](../configuration/index.md#broker) |
-												Make timeout behavior consistent to document (#4134)

* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis

											
										
										
											2017-04-18 20:47:53 -04:00
+								|priority         | `0`                                    | Query Priority. Queries with higher priority get precedence for computational resources.|
-												cleaning up and fixing links (#10528)

* cleaning up and fixing links

* reverting local link

* Update indexer.md

* link checking

* Fixing one more stale link for PostgreSQL
											
										
										
											2020-12-17 16:37:43 -05:00
+								|lane             | `null`                                 | Query lane, used to control usage limits on classes of queries. See [Broker configuration](../configuration/index.md#broker) for more details.|
-												Make timeout behavior consistent to document (#4134)

* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis

											
										
										
											2017-04-18 20:47:53 -04:00
+								|queryId          | auto-generated                         | Unique identifier given to this query. If a query ID is set or known, this can be used to cancel the query |
-												Select broker based on query context parameter `brokerService` (#11495)

This change allows the selection of a specific broker service (or broker tier) by the Router.

The newly added ManualTieredBrokerSelectorStrategy works as follows:

Check for the parameter brokerService in the query context. If this is a valid broker service, use it.
Check if the field defaultManualBrokerService has been set in the strategy. If this is a valid broker service, use it.
Move on to the next strategy
											
										
										
											2021-07-27 11:26:05 -04:00
+								|brokerService    | `null`                                 | Broker service to which this query should be routed. This parameter is honored only by a broker selector strategy of type *manual*. See [Router strategies](../design/router.md#router-strategies) for more details.|
-												De-incubation cleanup in code, docs, packaging (#9108)

* De-incubation cleanup in code, docs, packaging

* remove unused docs script

											
										
										
											2020-01-03 12:33:19 -05:00
+								|useCache         | `true`                                 | Flag indicating whether to leverage the query cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Apache Druid uses `druid.broker.cache.useCache` or `druid.historical.cache.useCache` to determine whether or not to read from the query cache |
-												fix spelling errors triggered by another doc PR (#8653)


											
										
										
											2019-10-09 02:43:58 -04:00
+								|populateCache    | `true`                                 | Flag indicating whether to save the results of the query to the query cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses `druid.broker.cache.populateCache` or `druid.historical.cache.populateCache` to determine whether or not to save the results of this query to the query cache |
 								|useResultLevelCache         | `true`                      | Flag indicating whether to leverage the result level cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses `druid.broker.cache.useResultLevelCache` to determine whether or not to read from the result-level query cache |
 								|populateResultLevelCache    | `true`                      | Flag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses `druid.broker.cache.populateResultLevelCache` to determine whether or not to save the results of this query to the result-level query cache |
-												clarify bySegment is native only (#11331)


											
										
										
											2021-06-11 16:48:17 -04:00
+								|bySegment        | `false`                                | Native queries only. Return "by segment" results. Primarily used for debugging, setting it to `true` returns results associated with the data segment they came from |
-												Make timeout behavior consistent to document (#4134)

* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis

											
										
										
											2017-04-18 20:47:53 -04:00
+								|finalize         | `true`                                 | Flag indicating whether to "finalize" aggregation results. Primarily used for debugging. For instance, the `hyperUnique` aggregator will return the full HyperLogLog sketch instead of the estimated cardinality when this flag is set to `false` |
-												cleaning up and fixing links (#10528)

* cleaning up and fixing links

* reverting local link

* Update indexer.md

* link checking

* Fixing one more stale link for PostgreSQL
											
										
										
											2020-12-17 16:37:43 -05:00
+								|maxScatterGatherBytes| `druid.server.http.maxScatterGatherBytes` | Maximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. This parameter can be used to further reduce `maxScatterGatherBytes` limit at query time. See [Broker configuration](../configuration/index.md#broker) for more details.|
-												Broker backpressure. (#6313)

* Broker backpressure.

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.

Fixes #4933.

* Fix query context doc.

											
										
										
											2018-09-10 12:33:29 -04:00
+								|maxQueuedBytes       | `druid.broker.http.maxQueuedBytes`        | Maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Similar to `maxScatterGatherBytes`, except unlike that configuration, this one will trigger backpressure rather than query failure. Zero means disabled.|
-												Reword 'node' to 'process' (#7172)


											
										
										
											2019-02-28 21:10:39 -05:00
+								|serializeDateTimeAsLong| `false`       | If true, DateTime is serialized as long in the result returned by Broker and the data transportation between Broker and compute process|
 								|serializeDateTimeAsLongInner| `false`  | If true, DateTime is serialized as long in the data transportation between Broker and compute process|
-												cleaning up and fixing links (#10528)

* cleaning up and fixing links

* reverting local link

* Update indexer.md

* link checking

* Fixing one more stale link for PostgreSQL
											
										
										
											2020-12-17 16:37:43 -05:00
+								|enableParallelMerge|`true`|Enable parallel result merging on the Broker. Note that `druid.processing.merge.useParallelMergePool` must be enabled for this setting to be set to `true`. See [Broker configuration](../configuration/index.md#broker) for more details.|
 								|parallelMergeParallelism|`druid.processing.merge.pool.parallelism`|Maximum number of parallel threads to use for parallel result merging on the Broker. See [Broker configuration](../configuration/index.md#broker) for more details.|
 								|parallelMergeInitialYieldRows|`druid.processing.merge.task.initialYieldNumRows`|Number of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See [Broker configuration](../configuration/index.md#broker) for more details.|
 								|parallelMergeSmallBatchRows|`druid.processing.merge.task.smallBatchNumRows`|Size of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See [Broker configuration](../configuration/index.md#broker) for more details.|
-												document useFilterCNF query context parameter (#9647)

* document useFilterCNF query context parameter

* move context key to QueryContexts

* Update .spelling
											
										
										
											2020-04-17 01:12:20 -04:00
+								|useFilterCNF|`false`| If true, Druid will attempt to convert the query filter to Conjunctive Normal Form (CNF). During query processing, columns can be pre-filtered by intersecting the bitmap indexes of all values that match the eligible filters, often greatly reducing the raw number of rows which need to be scanned. But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter. As such, filters in CNF potentially have a higher chance to utilize a large amount of bitmap indexes on string columns during pre-filtering. However, this setting should be used with great caution, as it can sometimes have a negative effect on performance, and in some cases, the act of computing CNF of a filter can be expensive. We recommend hand tuning your filters to produce an optimal form if possible, or at least verifying through experimentation that using this parameter actually improves your query performance with no ill-effects.|
-												Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288)

* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided

* query context

* fix tests; add more test

* javadoc

* docs and more tests

* remove default and hadoop tests

* consistent name and fix javadoc

* spelling and field name

* default function for partitionsSpec

* other comments

* address comments

* fix tests and spelling

* test

* doc
											
										
										
											2020-09-24 19:32:56 -04:00
+								|secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
-												Add flag in SQL to disable left base filter optimization for joins (#10947)

* Add flag to disable left base filter

* code coverage

* Draft

* Review comments

* code coverage

* add docs

* Add old tests
											
										
										
											2021-03-09 16:07:34 -05:00
+								|enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
-												Improve query error logging (#11519)

* Improve query error logging

* add docs

* address comments

* address comments
											
										
										
											2021-08-05 11:51:09 -04:00
+								|debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
-												Add setProcessingThreadNames context parameter. (#12514)

setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.
											
										
										
											2022-05-16 04:12:00 -04:00
+								|setProcessingThreadNames|`true`| Whether processing thread names will be set to `queryType_dataSource_intervals` while processing a query. This aids in interpreting thread dumps, and is on by default. Query overhead can be reduced slightly by setting this to `false`. This has a tiny effect in most scenarios, but can be meaningful in high-QPS, low-per-segment-processing-time scenarios. |
-												Moving in filter check to broker (#12195)

* Moving in filter check to broker

* Adding more unit tests, making error message meaningful

* Spelling and doc changes

* Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100

* Removing upper limit of 100, updated docs

* Making documentation more meaningful

* Moving check outside to PlannerConfig, updating test cases and adding back max limit

* Updated with some additional code comments

* Missed removing one line during the checkin

* Addressing doc changes and one forbidden API correction

* Final doc change

* Adding a speling exception, correcting a testcase

* Reading entire filter tree to address combinations of ANDs and ORs

* Specifying in docs that, this case works only for ORs

* Revert "Reading entire filter tree to address combinations of ANDs and ORs"

This reverts commit 81ca8f8496777eec41907899957b39ca99ccbada.

* Covering a class cast exception and updating docs

* Counting changed

Co-authored-by: Jihoon Son <jihoonson@apache.org>
											
										
										
											2022-02-15 23:45:07 -05:00
+								|maxNumericInFilters|`-1`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates only to an [OR](../querying/filters.md#or) of [Bound filter](../querying/filters.md#bound-filter). By default, Druid does not restrict the amount of of numeric Bound Filters on String columns, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100.  Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the `WHERE` clause instead of numbers. For example, `WHERE someString IN (‘123’, ‘456’)`. This value cannot exceed the set system configuration `druid.sql.planner.maxNumericInFilters`. This value is ignored if `druid.sql.planner.maxNumericInFilters` is not set explicitly.|
-												Convert inQueryThreshold into query context parameter. (#12357)

Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.
											
										
										
											2022-03-22 09:03:57 -04:00
+								|inSubQueryThreshold|`2147483647`| Threshold for minimum number of values in an IN clause to convert the query to a JOIN operation on an inlined table rather than a predicate. A threshold of 0 forces usage of an inline table in all cases; a threshold of [Integer.MAX_VALUE] forces usage of OR in all cases. |
-												updated docs for sql query context (#12406)


											
										
										
											2022-04-21 14:19:39 -04:00
 								## Druid SQL parameters
 								See [SQL query context](sql-query-context.md) for query context parameters specific to Druid SQL queries.
-												Document query context parameters related to join filters (#12057)

* docs update for query context and filters

* updates from review

* Update docs/querying/filters.md

											
										
										
											2021-12-13 20:47:21 -05:00
+								## Parameters by query type
-												SQL: Add context and contextual functions to planner. (#3919)

* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.

											
										
										
											2017-02-15 17:09:14 -05:00
-												Document query context parameters related to join filters (#12057)

* docs update for query context and filters

* updates from review

* Update docs/querying/filters.md

											
										
										
											2021-12-13 20:47:21 -05:00
+								Some query types offer context parameters specific to that query type.
-												SQL: Add context and contextual functions to planner. (#3919)

* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.

											
										
										
											2017-02-15 17:09:14 -05:00
-												Refresh query docs. (#9704)

* Refresh query docs.

Larger changes:

- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.

Smaller changes:

- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.

* Updates.

* Fix numbering.

* Fix glitches.

* Add new words to spellcheck file.

* Assorted changes.

* Further adjustments.

* Add missing punctuation.
											
										
										
											2020-04-15 19:12:20 -04:00
+								### TopN
-												SQL: Add context and contextual functions to planner. (#3919)

* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.

											
										
										
											2017-02-15 17:09:14 -05:00
 								|property         |default              | description          |
 								|-----------------|---------------------|----------------------|
-												add doc

											
										
										
											2016-01-11 22:25:11 -05:00
+								|minTopNThreshold | `1000`              | The top minTopNThreshold local results from each segment are returned for merging to determine the global topN. |
-												SQL: Add context and contextual functions to planner. (#3919)

* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.

											
										
										
											2017-02-15 17:09:14 -05:00
-												Refresh query docs. (#9704)

* Refresh query docs.

Larger changes:

- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.

Smaller changes:

- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.

* Updates.

* Fix numbering.

* Fix glitches.

* Add new words to spellcheck file.

* Assorted changes.

* Further adjustments.

* Add missing punctuation.
											
										
										
											2020-04-15 19:12:20 -04:00
+								### Timeseries
-												SQL: Add context and contextual functions to planner. (#3919)

* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.

											
										
										
											2017-02-15 17:09:14 -05:00
 								|property         |default              | description          |
 								|-----------------|---------------------|----------------------|
 								|skipEmptyBuckets | `false`             | Disable timeseries zero-filling behavior, so only buckets with results will be returned. |
-												Document query context parameters related to join filters (#12057)

* docs update for query context and filters

* updates from review

* Update docs/querying/filters.md

											
										
										
											2021-12-13 20:47:21 -05:00
+								### Join filter
 								|property         |default              | description          |
 								|-----------------|---------------------|----------------------|
 								|enableJoinFilterPushDown | `true` | Controls whether a join query will attempt filter push down, which reduces the number of rows that have to be compared in a join operation.|
 								|enableJoinFilterRewrite | `true` | Controls whether filter clauses that reference non-base table columns will be rewritten into filters on base table columns.|
 								|enableJoinFilterRewriteValueColumnFilters | `false` | Controls whether Druid rewrites non-base table filters on non-key columns in the non-base table. Requires a scan of the non-base table.|
-												Enable conversion of join to filter by default (#12868)


											
										
										
											2022-08-13 11:07:43 -04:00
+								|enableRewriteJoinToFilter | `true` | Controls whether a join can be pushed partial or fully to the base table as a filter at runtime.|
-												Document query context parameters related to join filters (#12057)

* docs update for query context and filters

* updates from review

* Update docs/querying/filters.md

											
										
										
											2021-12-13 20:47:21 -05:00
+								|joinFilterRewriteMaxSize | `10000` | The maximum size of the correlated value set used for filter rewrites. Set this limit to prevent excessive memory use.|
-												Refresh query docs. (#9704)

* Refresh query docs.

Larger changes:

- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.

Smaller changes:

- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.

* Updates.

* Fix numbering.

* Fix glitches.

* Add new words to spellcheck file.

* Assorted changes.

* Further adjustments.

* Add missing punctuation.
											
										
										
											2020-04-15 19:12:20 -04:00
+								### GroupBy
-												SQL: Add context and contextual functions to planner. (#3919)

* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.

											
										
										
											2017-02-15 17:09:14 -05:00
-												Refresh query docs. (#9704)

* Refresh query docs.

Larger changes:

- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.

Smaller changes:

- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.

* Updates.

* Fix numbering.

* Fix glitches.

* Add new words to spellcheck file.

* Assorted changes.

* Further adjustments.

* Add missing punctuation.
											
										
										
											2020-04-15 19:12:20 -04:00
+								See the list of [GroupBy query context](groupbyquery.md#advanced-configurations) parameters available on the groupBy
 								query page.
-												Query vectorization. (#6794)

* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.

											
										
										
											2019-07-12 15:54:07 -04:00
-												Refresh query docs. (#9704)

* Refresh query docs.

Larger changes:

- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.

Smaller changes:

- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.

* Updates.

* Fix numbering.

* Fix glitches.

* Add new words to spellcheck file.

* Assorted changes.

* Further adjustments.

* Add missing punctuation.
											
										
										
											2020-04-15 19:12:20 -04:00
+								## Vectorization parameters
-												Query vectorization. (#6794)

* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.

											
										
										
											2019-07-12 15:54:07 -04:00
 								The GroupBy and Timeseries query types can run in _vectorized_ mode, which speeds up query execution by processing
 								batches of rows at a time. Not all queries can be vectorized. In particular, vectorization currently has the following
 								requirements:
 								- All query-level filters must either be able to run on bitmap indexes or must offer vectorized row-matchers. These
 								include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
 								- All filters in filtered aggregators must offer vectorized row-matchers.
-												Add vectorization support for the longMin aggregator. (#10211)

* Fix minor formatting in docs.

* Add Nullhandling initialization for test to run from IDE.

* Vectorize longMin aggregator.

- A new vectorized class for the vectorized long min aggregator.
- Changes to AggregatorFactory to support vectorize functionality.
- Few changes to schema evolution test to add LongMinAggregatorFactory.

* Add longSum to the supported vectorized aggregator implementations.

* Add MIN() long min to calcite query test that can vectorize.

* Add simple long aggregations test.

* Fixup formatting per checkstyle guide.

* fixup and add more tests for long min aggregator.

* Override test for groupBy since timestamps are handled differently.

* Null compatibility check in test.

* Review comment: Add a test case to LongMinAggregationTest.
											
										
										
											2020-08-01 18:32:09 -04:00
+								- All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", "longMin",
-												Vectorized ANY aggregators (#10338)

* WIP vectorized ANY aggregators

* tests

* fix aggs

* cleanup

* code review + tests

* docs

* use NilVectorSelector when needed

* fix spellcheck

* dont instantiate vectors

* cleanup
											
										
										
											2020-09-14 22:44:58 -04:00
+								 "longMax", "doubleMin", "doubleMax", "floatMin", "floatMax", "longAny", "doubleAny", "floatAny", "stringAny",
 								 "hyperUnique", "filtered", "approxHistogram", "approxHistogramFold", and "fixedBucketsHistogram" (with numerical input).
-												add vectorizeVirtualColumns query context parameter (#10432)

* add vectorizeVirtualColumns query context parameter

* oops

* spelling

* default to false, more docs

* fix test

* fix spelling
											
										
										
											2020-09-28 21:48:34 -04:00
+								- All virtual columns must offer vectorized implementations. Currently for expression virtual columns, support for vectorization is decided on a per expression basis, depending on the type of input and the functions used by the expression. See the currently supported list in the [expression documentation](../misc/math-expr.md#vectorization-support).
-												Query vectorization. (#6794)

* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.

											
										
										
											2019-07-12 15:54:07 -04:00
+								- For GroupBy: All dimension specs must be "default" (no extraction functions or filtered dimension specs).
 								- For GroupBy: No multi-value dimensions.
 								- For Timeseries: No "descending" order.
 								- Only immutable segments (not real-time).
-												cleaning up and fixing links (#10528)

* cleaning up and fixing links

* reverting local link

* Update indexer.md

* link checking

* Fixing one more stale link for PostgreSQL
											
										
										
											2020-12-17 16:37:43 -05:00
+								- Only [table datasources](datasource.md#table) (not joins, subqueries, lookups, or inline datasources).
-												Query vectorization. (#6794)

* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.

											
										
										
											2019-07-12 15:54:07 -04:00
 								Other query types (like TopN, Scan, Select, and Search) ignore the "vectorize" parameter, and will execute without
 								vectorization. These query types will ignore the "vectorize" parameter even if it is set to `"force"`.
 								|property|default| description|
 								|--------|-------|------------|
-												Cluster wide default query context setting (#10208)

* Cluster wide default query context setting

* Cluster wide default query context setting

* Cluster wide default query context setting

* add docs

* fix docs

* update props

* fix checkstyle

* fix checkstyle

* fix checkstyle

* update docs

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix NPE
											
										
										
											2020-07-29 18:19:18 -04:00
+								|vectorize|`true`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail). This will override `druid.query.default.context.vectorize` if it's set.|
 								|vectorSize|`512`|Sets the row batching size for a particular query. This will override `druid.query.default.context.vectorSize` if it's set.|
-												Enable vectorized virtual column processing by default. (#12520)

In the majority of cases, this improves performance.

There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing.

IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue 
											
										
										
											2022-05-16 06:13:53 -04:00
+								|vectorizeVirtualColumns|`true`|Enables or disables vectorized query processing of queries with virtual columns, layered on top of `vectorize` (`vectorize` must also be set to true for a query to utilize vectorization). Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries with virtual columns that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production. This will override `druid.query.default.context.vectorizeVirtualColumns` if it's set.|