druid/docs/querying/sql-query-context.md

9.5 KiB
Raw Blame History

id title sidebar_label
sql-query-context SQL query context SQL query context

:::info Apache Druid supports two query languages: Druid SQL and native queries. This document describes the SQL language. :::

Druid supports query context parameters which affect SQL query planning. See Query context for general query context parameters for all query types.

SQL query context parameters

Configure Druid SQL query planning using the parameters in the table below.

Parameter Description Default value
sqlQueryId Unique identifier given to this SQL query. For HTTP client, it will be returned in X-Druid-SQL-Query-Id header.

To specify a unique identifier for SQL query, use sqlQueryId instead of queryId. Setting queryId for a SQL request has no effect. All native queries underlying SQL use an auto-generated queryId.
auto-generated
sqlTimeZone Sets the time zone for this connection, which will affect how time functions and timestamp literals behave. Should be a time zone name like "America/Los_Angeles" or offset like "-08:00". druid.sql.planner.sqlTimeZone on the Broker (default: UTC)
sqlStringifyArrays When set to true, result columns which return array values will be serialized into a JSON string in the response instead of as an array true, except for JDBC connections, where it is always false
useApproximateCountDistinct Whether to use an approximate cardinality algorithm for COUNT(DISTINCT foo). druid.sql.planner.useApproximateCountDistinct on the Broker (default: true)
useGroupingSetForExactDistinct Whether to use grouping sets to execute queries with multiple exact distinct aggregations. druid.sql.planner.useGroupingSetForExactDistinct on the Broker (default: false)
useApproximateTopN Whether to use approximate TopN queries when a SQL query could be expressed as such. If false, exact GroupBy queries will be used instead. druid.sql.planner.useApproximateTopN on the Broker (default: true)
enableTimeBoundaryPlanning If true, SQL queries will get converted to TimeBoundary queries wherever possible. TimeBoundary queries are very efficient for min-max calculation on __time column in a datasource druid.query.default.context.enableTimeBoundaryPlanning on the Broker (default: false)
useNativeQueryExplain If true, EXPLAIN PLAN FOR will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite.

This property is provided for backwards compatibility. It is not recommended to use this parameter unless you were depending on the older behavior.
druid.sql.planner.useNativeQueryExplain on the Broker (default: true)
sqlFinalizeOuterSketches If false (default behavior in Druid 25.0.0 and later), DS_HLL, DS_THETA, and DS_QUANTILES_SKETCH return sketches in query results, as documented. If true (default behavior in Druid 24.0.1 and earlier), sketches from these functions are finalized when they appear in query results.

This property is provided for backwards compatibility with behavior in Druid 24.0.1 and earlier. It is not recommended to use this parameter unless you were depending on the older behavior. Instead, use a function that does not return a sketch, such as APPROX_COUNT_DISTINCT_DS_HLL, APPROX_COUNT_DISTINCT_DS_THETA, APPROX_QUANTILE_DS, DS_THETA_ESTIMATE, or DS_GET_QUANTILE.
druid.query.default.context.sqlFinalizeOuterSketches on the Broker (default: false)
sqlUseBoundAndSelectors If false (default behavior if druid.generic.useDefaultValueForNull=false in Druid 27.0.0 and later), the SQL planner will use equality, null, and range filters instead of selector and bounds. This value must be set to false for correct behavior for filtering ARRAY typed values. Defaults to same value as druid.generic.useDefaultValueForNull, which is false
sqlReverseLookup Whether to consider the reverse-lookup rewrite of the LOOKUP function during SQL planning.

Calls to LOOKUP are only reversed when the number of matching keys is lower than both inSubQueryThreshold and sqlReverseLookupThreshold.
true
sqlReverseLookupThreshold Maximum size of IN filter to create when applying a reverse-lookup rewrite. If a LOOKUP call matches more keys than this threshold, it is left as-is.

If inSubQueryThreshold is lower than sqlReverseLookupThreshold, the inSubQueryThreshold is used as the threshold instead.
10000
sqlPullUpLookup Whether to consider the pull-up rewrite of the LOOKUP function during SQL planning. true
enableJoinLeftTableScanDirect false This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter, by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.
maxNumericInFilters -1 Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates only to an OR of Bound filter. By default, Druid does not restrict the amount of of numeric Bound Filters on String columns, although this situation may block other queries from running. Set this parameter to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100. Users who submit a query that exceeds the limit of maxNumericInFilters should instead rewrite their queries to use strings in the WHERE clause instead of numbers. For example, WHERE someString IN (123, 456). This value cannot exceed the set system configuration druid.sql.planner.maxNumericInFilters. This value is ignored if druid.sql.planner.maxNumericInFilters is not set explicitly.
inFunctionThreshold 100 At or beyond this threshold number of values, SQL IN is converted to SCALAR_IN_ARRAY. A threshold of 0 forces this conversion in all cases. A threshold of [Integer.MAX_VALUE] disables this conversion. The converted function is eligible for fewer planning-time optimizations, which speeds up planning, but may prevent certain planning-time optimizations.
inFunctionExprThreshold 2 At or beyond this threshold number of values, SQL IN is eligible for execution using the native function scalar_in_array rather than an || of ==, even if the number of values is below inFunctionThreshold. This property only affects translation of SQL IN to a native expression. It does not affect translation of SQL IN to a native filter. This property is provided for backwards compatibility purposes, and may be removed in a future release.
inSubQueryThreshold 2147483647 At or beyond this threshold number of values, SQL IN is converted to JOIN on an inline table. inFunctionThreshold takes priority over this setting. A threshold of 0 forces usage of an inline table in all cases where the size of a SQL IN is larger than inFunctionThreshold. A threshold of 2147483647 disables the rewrite of SQL IN to JOIN.

Setting the query context

The query context parameters can be specified as a "context" object in the JSON API or as a JDBC connection properties object. See examples for each option below.

Example using JSON API

{
  "query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar' AND __time > TIMESTAMP '2000-01-01 00:00:00'",
  "context" : {
    "sqlTimeZone" : "America/Los_Angeles"
  }
}

Example using JDBC

String url = "jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/";

// Set any query context parameters you need here.
Properties connectionProperties = new Properties();
connectionProperties.setProperty("sqlTimeZone", "America/Los_Angeles");
connectionProperties.setProperty("useCache", "false");

try (Connection connection = DriverManager.getConnection(url, connectionProperties)) {
  // create and execute statements, process result sets, etc
}