mirror of https://github.com/apache/druid.git
231 lines
13 KiB
Markdown
231 lines
13 KiB
Markdown
|
---
|
|||
|
id: sql-translation
|
|||
|
title: "SQL query translation"
|
|||
|
sidebar_label: "SQL query translation"
|
|||
|
---
|
|||
|
|
|||
|
<!--
|
|||
|
~ Licensed to the Apache Software Foundation (ASF) under one
|
|||
|
~ or more contributor license agreements. See the NOTICE file
|
|||
|
~ distributed with this work for additional information
|
|||
|
~ regarding copyright ownership. The ASF licenses this file
|
|||
|
~ to you under the Apache License, Version 2.0 (the
|
|||
|
~ "License"); you may not use this file except in compliance
|
|||
|
~ with the License. You may obtain a copy of the License at
|
|||
|
~
|
|||
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|||
|
~
|
|||
|
~ Unless required by applicable law or agreed to in writing,
|
|||
|
~ software distributed under the License is distributed on an
|
|||
|
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|||
|
~ KIND, either express or implied. See the License for the
|
|||
|
~ specific language governing permissions and limitations
|
|||
|
~ under the License.
|
|||
|
-->
|
|||
|
|
|||
|
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
|
|||
|
> This document describes the SQL language.
|
|||
|
|
|||
|
|
|||
|
Druid SQL translates SQL queries to [native queries](querying.md) before running them, and understanding how this
|
|||
|
translation works is key to getting good performance.
|
|||
|
|
|||
|
## Best practices
|
|||
|
|
|||
|
Consider this (non-exhaustive) list of things to look out for when looking into the performance implications of
|
|||
|
how your SQL queries are translated to native queries.
|
|||
|
|
|||
|
1. If you wrote a filter on the primary time column `__time`, make sure it is being correctly translated to an
|
|||
|
`"intervals"` filter, as described in the [Time filters](#time-filters) section below. If not, you may need to change
|
|||
|
the way you write the filter.
|
|||
|
|
|||
|
2. Try to avoid subqueries underneath joins: they affect both performance and scalability. This includes implicit
|
|||
|
subqueries generated by conditions on mismatched types, and implicit subqueries generated by conditions that use
|
|||
|
expressions to refer to the right-hand side.
|
|||
|
|
|||
|
3. Currently, Druid does not support pushing down predicates (condition and filter) past a Join (i.e. into
|
|||
|
Join's children). Druid only supports pushing predicates into the join if they originated from
|
|||
|
above the join. Hence, the location of predicates and filters in your Druid SQL is very important.
|
|||
|
Also, as a result of this, comma joins should be avoided.
|
|||
|
|
|||
|
4. Read through the [Query execution](query-execution.md) page to understand how various types of native queries
|
|||
|
will be executed.
|
|||
|
|
|||
|
5. Be careful when interpreting EXPLAIN PLAN output, and use request logging if in doubt. Request logs will show the
|
|||
|
exact native query that was run. See the [next section](#interpreting-explain-plan-output) for more details.
|
|||
|
|
|||
|
6. If you encounter a query that could be planned better, feel free to
|
|||
|
[raise an issue on GitHub](https://github.com/apache/druid/issues/new/choose). A reproducible test case is always
|
|||
|
appreciated.
|
|||
|
|
|||
|
## Interpreting EXPLAIN PLAN output
|
|||
|
|
|||
|
The [EXPLAIN PLAN](sql-syntax.md#explain-plan) functionality can help you understand how a given SQL query will
|
|||
|
be translated to native. For simple queries that do not involve subqueries or joins, the output of EXPLAIN PLAN
|
|||
|
is easy to interpret. The native query that will run is embedded as JSON inside a "DruidQueryRel" line:
|
|||
|
|
|||
|
```
|
|||
|
> EXPLAIN PLAN FOR SELECT COUNT(*) FROM wikipedia
|
|||
|
|
|||
|
DruidQueryRel(query=[{"queryType":"timeseries","dataSource":"wikipedia","intervals":"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z","granularity":"all","aggregations":[{"type":"count","name":"a0"}]}], signature=[{a0:LONG}])
|
|||
|
```
|
|||
|
|
|||
|
For more complex queries that do involve subqueries or joins, EXPLAIN PLAN is somewhat more difficult to interpret.
|
|||
|
For example, consider this query:
|
|||
|
|
|||
|
```
|
|||
|
> EXPLAIN PLAN FOR
|
|||
|
> SELECT
|
|||
|
> channel,
|
|||
|
> COUNT(*)
|
|||
|
> FROM wikipedia
|
|||
|
> WHERE channel IN (SELECT page FROM wikipedia GROUP BY page ORDER BY COUNT(*) DESC LIMIT 10)
|
|||
|
> GROUP BY channel
|
|||
|
|
|||
|
DruidJoinQueryRel(condition=[=($1, $3)], joinType=[inner], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"granularity":"all","dimensions":["channel"],"aggregations":[{"type":"count","name":"a0"}]}], signature=[{d0:STRING, a0:LONG}])
|
|||
|
DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"wikipedia"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"resultFormat":"compactedList","columns":["__time","channel","page"],"granularity":"all"}], signature=[{__time:LONG, channel:STRING, page:STRING}])
|
|||
|
DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"dimension":"page","metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"granularity":"all","aggregations":[{"type":"count","name":"a0"}]}], signature=[{d0:STRING}])
|
|||
|
```
|
|||
|
|
|||
|
Here, there is a join with two inputs. The way to read this is to consider each line of the EXPLAIN PLAN output as
|
|||
|
something that might become a query, or might just become a simple datasource. The `query` field they all have is
|
|||
|
called a "partial query" and represents what query would be run on the datasource represented by that line, if that
|
|||
|
line ran by itself. In some cases — like the "scan" query in the second line of this example — the query does not
|
|||
|
actually run, and it ends up being translated to a simple table datasource. See the [Join translation](#joins) section
|
|||
|
for more details about how this works.
|
|||
|
|
|||
|
We can see this for ourselves using Druid's [request logging](../configuration/index.md#request-logging) feature. After
|
|||
|
enabling logging and running this query, we can see that it actually runs as the following native query.
|
|||
|
|
|||
|
```json
|
|||
|
{
|
|||
|
"queryType": "groupBy",
|
|||
|
"dataSource": {
|
|||
|
"type": "join",
|
|||
|
"left": "wikipedia",
|
|||
|
"right": {
|
|||
|
"type": "query",
|
|||
|
"query": {
|
|||
|
"queryType": "topN",
|
|||
|
"dataSource": "wikipedia",
|
|||
|
"dimension": {"type": "default", "dimension": "page", "outputName": "d0"},
|
|||
|
"metric": {"type": "numeric", "metric": "a0"},
|
|||
|
"threshold": 10,
|
|||
|
"intervals": "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z",
|
|||
|
"granularity": "all",
|
|||
|
"aggregations": [
|
|||
|
{ "type": "count", "name": "a0"}
|
|||
|
]
|
|||
|
}
|
|||
|
},
|
|||
|
"rightPrefix": "j0.",
|
|||
|
"condition": "(\"page\" == \"j0.d0\")",
|
|||
|
"joinType": "INNER"
|
|||
|
},
|
|||
|
"intervals": "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z",
|
|||
|
"granularity": "all",
|
|||
|
"dimensions": [
|
|||
|
{"type": "default", "dimension": "channel", "outputName": "d0"}
|
|||
|
],
|
|||
|
"aggregations": [
|
|||
|
{ "type": "count", "name": "a0"}
|
|||
|
]
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
## Query types
|
|||
|
|
|||
|
Druid SQL uses four different native query types.
|
|||
|
|
|||
|
- [Scan](scan-query.md) is used for queries that do not aggregate (no GROUP BY, no DISTINCT).
|
|||
|
|
|||
|
- [Timeseries](timeseriesquery.md) is used for queries that GROUP BY `FLOOR(__time TO unit)` or `TIME_FLOOR(__time,
|
|||
|
period)`, have no other grouping expressions, no HAVING or LIMIT clauses, no nesting, and either no ORDER BY, or an
|
|||
|
ORDER BY that orders by same expression as present in GROUP BY. It also uses Timeseries for "grand total" queries that
|
|||
|
have aggregation functions but no GROUP BY. This query type takes advantage of the fact that Druid segments are sorted
|
|||
|
by time.
|
|||
|
|
|||
|
- [TopN](topnquery.md) is used by default for queries that group by a single expression, do have ORDER BY and LIMIT
|
|||
|
clauses, do not have HAVING clauses, and are not nested. However, the TopN query type will deliver approximate ranking
|
|||
|
and results in some cases; if you want to avoid this, set "useApproximateTopN" to "false". TopN results are always
|
|||
|
computed in memory. See the TopN documentation for more details.
|
|||
|
|
|||
|
- [GroupBy](groupbyquery.md) is used for all other aggregations, including any nested aggregation queries. Druid's
|
|||
|
GroupBy is a traditional aggregation engine: it delivers exact results and rankings and supports a wide variety of
|
|||
|
features. GroupBy aggregates in memory if it can, but it may spill to disk if it doesn't have enough memory to complete
|
|||
|
your query. Results are streamed back from data processes through the Broker if you ORDER BY the same expressions in your
|
|||
|
GROUP BY clause, or if you don't have an ORDER BY at all. If your query has an ORDER BY referencing expressions that
|
|||
|
don't appear in the GROUP BY clause (like aggregation functions) then the Broker will materialize a list of results in
|
|||
|
memory, up to a max of your LIMIT, if any. See the GroupBy documentation for details about tuning performance and memory
|
|||
|
use.
|
|||
|
|
|||
|
## Time filters
|
|||
|
|
|||
|
For all native query types, filters on the `__time` column will be translated into top-level query "intervals" whenever
|
|||
|
possible, which allows Druid to use its global time index to quickly prune the set of data that must be scanned.
|
|||
|
Consider this (non-exhaustive) list of time filters that will be recognized and translated to "intervals":
|
|||
|
|
|||
|
- `__time >= TIMESTAMP '2000-01-01 00:00:00'` (comparison to absolute time)
|
|||
|
- `__time >= CURRENT_TIMESTAMP - INTERVAL '8' HOUR` (comparison to relative time)
|
|||
|
- `FLOOR(__time TO DAY) = TIMESTAMP '2000-01-01 00:00:00'` (specific day)
|
|||
|
|
|||
|
Refer to the [Interpreting EXPLAIN PLAN output](#interpreting-explain-plan-output) section for details on confirming
|
|||
|
that time filters are being translated as you expect.
|
|||
|
|
|||
|
## Joins
|
|||
|
|
|||
|
SQL join operators are translated to native join datasources as follows:
|
|||
|
|
|||
|
1. Joins that the native layer can handle directly are translated literally, to a [join datasource](datasource.md#join)
|
|||
|
whose `left`, `right`, and `condition` are faithful translations of the original SQL. This includes any SQL join where
|
|||
|
the right-hand side is a lookup or subquery, and where the condition is an equality where one side is an expression based
|
|||
|
on the left-hand table, the other side is a simple column reference to the right-hand table, and both sides of the
|
|||
|
equality are the same data type.
|
|||
|
|
|||
|
2. If a join cannot be handled directly by a native [join datasource](datasource.md#join) as written, Druid SQL
|
|||
|
will insert subqueries to make it runnable. For example, `foo INNER JOIN bar ON foo.abc = LOWER(bar.def)` cannot be
|
|||
|
directly translated, because there is an expression on the right-hand side instead of a simple column access. A subquery
|
|||
|
will be inserted that effectively transforms this clause to
|
|||
|
`foo INNER JOIN (SELECT LOWER(def) AS def FROM bar) t ON foo.abc = t.def`.
|
|||
|
|
|||
|
3. Druid SQL does not currently reorder joins to optimize queries.
|
|||
|
|
|||
|
Refer to the [Interpreting EXPLAIN PLAN output](#interpreting-explain-plan-output) section for details on confirming
|
|||
|
that joins are being translated as you expect.
|
|||
|
|
|||
|
Refer to the [Query execution](query-execution.md#join) page for information about how joins are executed.
|
|||
|
|
|||
|
## Subqueries
|
|||
|
|
|||
|
Subqueries in SQL are generally translated to native query datasources. Refer to the
|
|||
|
[Query execution](query-execution.md#query) page for information about how subqueries are executed.
|
|||
|
|
|||
|
> Note: Subqueries in the WHERE clause, like `WHERE col1 IN (SELECT foo FROM ...)` are translated to inner joins.
|
|||
|
|
|||
|
## Approximations
|
|||
|
|
|||
|
Druid SQL will use approximate algorithms in some situations:
|
|||
|
|
|||
|
- The `COUNT(DISTINCT col)` aggregation functions by default uses a variant of
|
|||
|
[HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf), a fast approximate distinct counting
|
|||
|
algorithm. Druid SQL will switch to exact distinct counts if you set "useApproximateCountDistinct" to "false", either
|
|||
|
through query context or through Broker configuration.
|
|||
|
|
|||
|
- GROUP BY queries over a single column with ORDER BY and LIMIT may be executed using the TopN engine, which uses an
|
|||
|
approximate algorithm. Druid SQL will switch to an exact grouping algorithm if you set "useApproximateTopN" to "false",
|
|||
|
either through query context or through Broker configuration.
|
|||
|
|
|||
|
- Aggregation functions that are labeled as using sketches or approximations, such as APPROX_COUNT_DISTINCT, are always
|
|||
|
approximate, regardless of configuration.
|
|||
|
|
|||
|
**A known issue with approximate functions based on data sketches**
|
|||
|
|
|||
|
The `APPROX_QUANTILE_DS` and `DS_QUANTILES_SKETCH` functions can fail with an `IllegalStateException` if one of the sketches for
|
|||
|
the query hits `maxStreamLength`: the maximum number of items to store in each sketch.
|
|||
|
See [GitHub issue 11544](https://github.com/apache/druid/issues/11544) for more details.
|
|||
|
To workaround the issue, increase value of the maximum string length with the `approxQuantileDsMaxStreamLength` parameter
|
|||
|
in the query context. Since it is set to 1,000,000,000 by default, you don't need to override it in most cases.
|
|||
|
See [accuracy information](https://datasketches.apache.org/docs/Quantiles/OrigQuantilesSketch) in the DataSketches documentation for how many bytes are required per stream length.
|
|||
|
This query context parameter is a temporary solution to avoid the known issue. It may be removed in a future release after the bug is fixed.
|
|||
|
|