druid/sql-query-context.md at 85d36be0853b6eaa5f2f94be3028a91291ed56b3

mirror of https://github.com/apache/druid.git synced 2025-02-07 02:28:19 +00:00

Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. (#13247 )

* Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH.

These aggregation functions are documented as creating sketches. However,
they are planned into native aggregators that include finalization logic
to convert the sketch to a number of some sort. This creates an
inconsistency: the functions sometimes return sketches, and sometimes
return numbers, depending on where they lie in the native query plan.

This patch changes these SQL aggregators to _never_ finalize, by using
the "shouldFinalize" feature of the native aggregators. It already
existed for theta sketches. This patch adds the feature for hll and
quantiles sketches.

As to impact, Druid finalizes aggregators in two cases:

- When they appear in the outer level of a query (not a subquery).
- When they are used as input to an expression or finalizing-field-access
  post-aggregator (not any other kind of post-aggregator).

With this patch, the functions will no longer be finalized in these cases.

The second item is not likely to matter much. The SQL functions all declare
return type OTHER, which would be usable as an input to any other function
that makes sense and that would be planned into an expression.

So, the main effect of this patch is the first item. To provide backwards
compatibility with anyone that was depending on the old behavior, the
patch adds a "sqlFinalizeOuterSketches" query context parameter that
restores the old behavior.

Other changes:

1) Move various argument-checking logic from runtime to planning time in
   DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker.

2) Add various JsonIgnores to the sketches to simplify their JSON representations.

3) Allow chaining of ExpressionPostAggregators and other PostAggregators
   in the SQL layer.

4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer,
   now that expressions can operate on complex inputs.

5) Adjust return type to thetaSketch (instead of OTHER) in
   ThetaSketchSetBaseOperatorConversion.

* Fix benchmark class.

* Fix compilation error.

* Fix ThetaSketchSqlAggregatorTest.

* Hopefully fix ITAutoCompactionTest.

* Adjustment to ITAutoCompactionTest.

2022-11-03 09:43:00 -07:00

5.3 KiB

Raw Blame History

id	title	sidebar_label
sql-query-context	SQL query context	SQL query context

Apache Druid supports two query languages: Druid SQL and native queries. This document describes the SQL language.

Druid supports query context parameters which affect SQL query planning. See Query context for general query context parameters for all query types.

SQL query context parameters

Configure Druid SQL query planning using the parameters in the table below.

Parameter	Description	Default value
`sqlQueryId`	Unique identifier given to this SQL query. For HTTP client, it will be returned in `X-Druid-SQL-Query-Id` header. To specify a unique identifier for SQL query, use `sqlQueryId` instead of `queryId`. Setting `queryId` for a SQL request has no effect. All native queries underlying SQL use an auto-generated `queryId`.	auto-generated
`sqlTimeZone`	Sets the time zone for this connection, which will affect how time functions and timestamp literals behave. Should be a time zone name like "America/Los_Angeles" or offset like "-08:00".	`druid.sql.planner.sqlTimeZone` on the Broker (default: UTC)
`sqlStringifyArrays`	When set to true, result columns which return array values will be serialized into a JSON string in the response instead of as an array	true, except for JDBC connections, where it is always false
`useApproximateCountDistinct`	Whether to use an approximate cardinality algorithm for `COUNT(DISTINCT foo)`.	`druid.sql.planner.useApproximateCountDistinct` on the Broker (default: true)
`useGroupingSetForExactDistinct`	Whether to use grouping sets to execute queries with multiple exact distinct aggregations.	`druid.sql.planner.useGroupingSetForExactDistinct` on the Broker (default: false)
`useApproximateTopN`	Whether to use approximate TopN queries when a SQL query could be expressed as such. If false, exact GroupBy queries will be used instead.	`druid.sql.planner.useApproximateTopN` on the Broker (default: true)
`enableTimeBoundaryPlanning`	If true, SQL queries will get converted to TimeBoundary queries wherever possible. TimeBoundary queries are very efficient for min-max calculation on __time column in a datasource	`druid.query.default.context.enableTimeBoundaryPlanning` on the Broker (default: false)
`useNativeQueryExplain`	If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite. This property is provided for backwards compatibility. It is not recommended to use this parameter unless you were depending on the older behavior.	`druid.sql.planner.useNativeQueryExplain` on the Broker (default: true)
`sqlFinalizeOuterSketches`	If false (default behavior in Druid 25.0.0 and later), `DS_HLL`, `DS_THETA`, and `DS_QUANTILES_SKETCH` return sketches in query results, as documented. If true (default behavior in Druid 24.0.1 and earlier), sketches from these functions are finalized when they appear in query results. This property is provided for backwards compatibility with behavior in Druid 24.0.1 and earlier. It is not recommended to use this parameter unless you were depending on the older behavior. Instead, use a function that does not return a sketch, such as `APPROX_COUNT_DISTINCT_DS_HLL`, `APPROX_COUNT_DISTINCT_DS_THETA`, `APPROX_QUANTILE_DS`, `DS_THETA_ESTIMATE`, or `DS_GET_QUANTILE`.	`druid.query.default.context.sqlFinalizeOuterSketches` on the Broker (default: false)

Setting the query context

The query context parameters can be specified as a "context" object in the JSON API or as a JDBC connection properties object. See examples for each option below.

Example using JSON API

{
  "query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar' AND __time > TIMESTAMP '2000-01-01 00:00:00'",
  "context" : {
    "sqlTimeZone" : "America/Los_Angeles"
  }
}

Example using JDBC

String url = "jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/";

// Set any query context parameters you need here.
Properties connectionProperties = new Properties();
connectionProperties.setProperty("sqlTimeZone", "America/Los_Angeles");
connectionProperties.setProperty("useCache", "false");

try (Connection connection = DriverManager.getConnection(url, connectionProperties)) {
  // create and execute statements, process result sets, etc
}

5.3 KiB Raw Blame History

SQL query context parameters

Setting the query context

Example using JSON API

Example using JDBC

5.3 KiB

Raw Blame History