druid/docs/querying/sql-translation.md

---
id: sql-translation
title: "SQL query translation"
sidebar_label: "SQL query translation"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.


Druid SQL translates SQL queries to [native queries](querying.md) before running them, and understanding how this
translation works is key to getting good performance.

## Best practices

Consider this (non-exhaustive) list of things to look out for when looking into the performance implications of
how your SQL queries are translated to native queries.

1. If you wrote a filter on the primary time column `__time`, make sure it is being correctly translated to an
`"intervals"` filter, as described in the [Time filters](#time-filters) section below. If not, you may need to change
the way you write the filter.

2. Try to avoid subqueries underneath joins: they affect both performance and scalability. This includes implicit
subqueries generated by conditions on mismatched types, and implicit subqueries generated by conditions that use
expressions to refer to the right-hand side.

3. Currently, Druid does not support pushing down predicates (condition and filter) past a Join (i.e. into 
Join's children). Druid only supports pushing predicates into the join if they originated from 
above the join. Hence, the location of predicates and filters in your Druid SQL is very important. 
Also, as a result of this, comma joins should be avoided.

4. Read through the [Query execution](query-execution.md) page to understand how various types of native queries
will be executed.

5. Be careful when interpreting EXPLAIN PLAN output, and use request logging if in doubt. Request logs will show the
exact native query that was run. See the [next section](#interpreting-explain-plan-output) for more details.

6. If you encounter a query that could be planned better, feel free to
[raise an issue on GitHub](https://github.com/apache/druid/issues/new/choose). A reproducible test case is always
appreciated.

## Interpreting EXPLAIN PLAN output

The [EXPLAIN PLAN](sql-syntax.md#explain-plan) functionality can help you understand how a given SQL query will
be translated to native. For simple queries that do not involve subqueries or joins, the output of EXPLAIN PLAN
is easy to interpret. The native query that will run is embedded as JSON inside a "DruidQueryRel" line:

```
> EXPLAIN PLAN FOR SELECT COUNT(*) FROM wikipedia

DruidQueryRel(query=[{"queryType":"timeseries","dataSource":"wikipedia","intervals":"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z","granularity":"all","aggregations":[{"type":"count","name":"a0"}]}], signature=[{a0:LONG}])
```

For more complex queries that do involve subqueries or joins, EXPLAIN PLAN is somewhat more difficult to interpret.
For example, consider this query:

```
> EXPLAIN PLAN FOR
> SELECT
>     channel,
>     COUNT(*)
> FROM wikipedia
> WHERE channel IN (SELECT page FROM wikipedia GROUP BY page ORDER BY COUNT(*) DESC LIMIT 10)
> GROUP BY channel

DruidJoinQueryRel(condition=[=($1, $3)], joinType=[inner], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"granularity":"all","dimensions":["channel"],"aggregations":[{"type":"count","name":"a0"}]}], signature=[{d0:STRING, a0:LONG}])
  DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"wikipedia"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"resultFormat":"compactedList","columns":["__time","channel","page"],"granularity":"all"}], signature=[{__time:LONG, channel:STRING, page:STRING}])
  DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"dimension":"page","metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"granularity":"all","aggregations":[{"type":"count","name":"a0"}]}], signature=[{d0:STRING}])
```

Here, there is a join with two inputs. The way to read this is to consider each line of the EXPLAIN PLAN output as
something that might become a query, or might just become a simple datasource. The `query` field they all have is
called a "partial query" and represents what query would be run on the datasource represented by that line, if that
line ran by itself. In some cases — like the "scan" query in the second line of this example — the query does not
actually run, and it ends up being translated to a simple table datasource. See the [Join translation](#joins) section
for more details about how this works.

We can see this for ourselves using Druid's [request logging](../configuration/index.md#request-logging) feature. After
enabling logging and running this query, we can see that it actually runs as the following native query.

```json
{
  "queryType": "groupBy",
  "dataSource": {
    "type": "join",
    "left": "wikipedia",
    "right": {
      "type": "query",
      "query": {
        "queryType": "topN",
        "dataSource": "wikipedia",
        "dimension": {"type": "default", "dimension": "page", "outputName": "d0"},
        "metric": {"type": "numeric", "metric": "a0"},
        "threshold": 10,
        "intervals": "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z",
        "granularity": "all",
        "aggregations": [
          { "type": "count", "name": "a0"}
        ]
      }
    },
    "rightPrefix": "j0.",
    "condition": "(\"page\" == \"j0.d0\")",
    "joinType": "INNER"
  },
  "intervals": "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z",
  "granularity": "all",
  "dimensions": [
    {"type": "default", "dimension": "channel", "outputName": "d0"}
  ],
  "aggregations": [
    { "type": "count", "name": "a0"}
  ]
}
```

## Query types

Druid SQL uses four different native query types.

- [Scan](scan-query.md) is used for queries that do not aggregate (no GROUP BY, no DISTINCT).

- [Timeseries](timeseriesquery.md) is used for queries that GROUP BY `FLOOR(__time TO unit)` or `TIME_FLOOR(__time,
period)`, have no other grouping expressions, no HAVING or LIMIT clauses, no nesting, and either no ORDER BY, or an
ORDER BY that orders by same expression as present in GROUP BY. It also uses Timeseries for "grand total" queries that
have aggregation functions but no GROUP BY. This query type takes advantage of the fact that Druid segments are sorted
by time.

- [TopN](topnquery.md) is used by default for queries that group by a single expression, do have ORDER BY and LIMIT
clauses, do not have HAVING clauses, and are not nested. However, the TopN query type will deliver approximate ranking
and results in some cases; if you want to avoid this, set "useApproximateTopN" to "false". TopN results are always
computed in memory. See the TopN documentation for more details.

- [GroupBy](groupbyquery.md) is used for all other aggregations, including any nested aggregation queries. Druid's
GroupBy is a traditional aggregation engine: it delivers exact results and rankings and supports a wide variety of
features. GroupBy aggregates in memory if it can, but it may spill to disk if it doesn't have enough memory to complete
your query. Results are streamed back from data processes through the Broker if you ORDER BY the same expressions in your
GROUP BY clause, or if you don't have an ORDER BY at all. If your query has an ORDER BY referencing expressions that
don't appear in the GROUP BY clause (like aggregation functions) then the Broker will materialize a list of results in
memory, up to a max of your LIMIT, if any. See the GroupBy documentation for details about tuning performance and memory
use.

## Time filters

For all native query types, filters on the `__time` column will be translated into top-level query "intervals" whenever
possible, which allows Druid to use its global time index to quickly prune the set of data that must be scanned.
Consider this (non-exhaustive) list of time filters that will be recognized and translated to "intervals":

- `__time >= TIMESTAMP '2000-01-01 00:00:00'` (comparison to absolute time)
- `__time >= CURRENT_TIMESTAMP - INTERVAL '8' HOUR` (comparison to relative time)
- `FLOOR(__time TO DAY) = TIMESTAMP '2000-01-01 00:00:00'` (specific day)

Refer to the [Interpreting EXPLAIN PLAN output](#interpreting-explain-plan-output) section for details on confirming
that time filters are being translated as you expect.

## Joins

SQL join operators are translated to native join datasources as follows:

1. Joins that the native layer can handle directly are translated literally, to a [join datasource](datasource.md#join)
whose `left`, `right`, and `condition` are faithful translations of the original SQL. This includes any SQL join where
the right-hand side is a lookup or subquery, and where the condition is an equality where one side is an expression based
on the left-hand table, the other side is a simple column reference to the right-hand table, and both sides of the
equality are the same data type.

2. If a join cannot be handled directly by a native [join datasource](datasource.md#join) as written, Druid SQL
will insert subqueries to make it runnable. For example, `foo INNER JOIN bar ON foo.abc = LOWER(bar.def)` cannot be
directly translated, because there is an expression on the right-hand side instead of a simple column access. A subquery
will be inserted that effectively transforms this clause to
`foo INNER JOIN (SELECT LOWER(def) AS def FROM bar) t ON foo.abc = t.def`.

3. Druid SQL does not currently reorder joins to optimize queries.

Refer to the [Interpreting EXPLAIN PLAN output](#interpreting-explain-plan-output) section for details on confirming
that joins are being translated as you expect.

Refer to the [Query execution](query-execution.md#join) page for information about how joins are executed.

## Subqueries

Subqueries in SQL are generally translated to native query datasources. Refer to the
[Query execution](query-execution.md#query) page for information about how subqueries are executed.

> Note: Subqueries in the WHERE clause, like `WHERE col1 IN (SELECT foo FROM ...)` are translated to inner joins.

## Approximations

Druid SQL will use approximate algorithms in some situations:

- The `COUNT(DISTINCT col)` aggregation functions by default uses a variant of
[HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf), a fast approximate distinct counting
algorithm. Druid SQL will switch to exact distinct counts if you set "useApproximateCountDistinct" to "false", either
through query context or through Broker configuration.

- GROUP BY queries over a single column with ORDER BY and LIMIT may be executed using the TopN engine, which uses an
approximate algorithm. Druid SQL will switch to an exact grouping algorithm if you set "useApproximateTopN" to "false",
either through query context or through Broker configuration.

- Aggregation functions that are labeled as using sketches or approximations, such as APPROX_COUNT_DISTINCT, are always
approximate, regardless of configuration.

**A known issue with approximate functions based on data sketches**

The `APPROX_QUANTILE_DS` and `DS_QUANTILES_SKETCH` functions can fail with an `IllegalStateException` if one of the sketches for
the query hits `maxStreamLength`: the maximum number of items to store in each sketch.
See [GitHub issue 11544](https://github.com/apache/druid/issues/11544) for more details.
To workaround the issue, increase value of the maximum string length with the `approxQuantileDsMaxStreamLength` parameter
in the query context. Since it is set to 1,000,000,000 by default, you don't need to override it in most cases.
See [accuracy information](https://datasketches.apache.org/docs/Quantiles/OrigQuantilesSketch) in the DataSketches documentation for how many bytes are required per stream length.
This query context  parameter is a temporary solution to avoid the known issue. It may be removed in a future release after the bug is fixed.