druid/docs/querying/sql-data-types.md

102 lines
6.5 KiB
Markdown
Raw Normal View History

---
id: sql-data-types
title: "SQL data types"
sidebar_label: "SQL data types"
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
Columns in Druid are associated with a specific data type. This topic describes supported data types in Druid SQL.
## Standard types
Druid natively supports five basic column types: "long" (64 bit signed int), "float" (32 bit float), "double" (64 bit
float) "string" (UTF-8 encoded strings and string arrays), and "complex" (catch-all for more exotic data types like
hyperUnique and approxHistogram columns).
Timestamps (including the `__time` column) are treated by Druid as longs, with the value being the number of
milliseconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. Therefore, timestamps in Druid do not carry any
timezone information, but only carry information about the exact moment in time they represent. See the
[Time functions](sql-scalar.md#date-and-time-functions) section for more information about timestamp handling.
The following table describes how Druid maps SQL types onto native types at query runtime. Casts between two SQL types
that have the same Druid runtime type will have no effect, other than exceptions noted in the table. Casts between two
SQL types that have different Druid runtime types will generate a runtime cast in Druid. If a value cannot be properly
cast to another value, as in `CAST('foo' AS BIGINT)`, the runtime will substitute a default value. NULL values cast
to non-nullable types will also be substituted with a default value (for example, nulls cast to numbers will be
converted to zeroes).
|SQL type|Druid runtime type|Default value|Notes|
|--------|------------------|-------------|-----|
|CHAR|STRING|`''`||
|VARCHAR|STRING|`''`|Druid STRING columns are reported as VARCHAR. Can include [multi-value strings](#multi-value-strings) as well.|
|DECIMAL|DOUBLE|`0.0`|DECIMAL uses floating point, not fixed point math|
|FLOAT|FLOAT|`0.0`|Druid FLOAT columns are reported as FLOAT|
|REAL|DOUBLE|`0.0`||
|DOUBLE|DOUBLE|`0.0`|Druid DOUBLE columns are reported as DOUBLE|
|BOOLEAN|LONG|`false`||
|TINYINT|LONG|`0`||
|SMALLINT|LONG|`0`||
|INTEGER|LONG|`0`||
|BIGINT|LONG|`0`|Druid LONG columns (except `__time`) are reported as BIGINT|
|TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting, e.g. `2000-01-02 03:04:05`, _not_ ISO8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
|OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc.|
## Multi-value strings
Druid's native type system allows strings to potentially have multiple values. These
[multi-value string dimensions](multi-value-dimensions.md) will be reported in SQL as `VARCHAR` typed, and can be
syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions will be
applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special
[multi-value string functions](sql-multivalue-string-functions.md), which can perform powerful array-aware operations.
Grouping by a multi-value expression will observe the native Druid multi-value aggregation behavior, which is similar to
the `UNNEST` functionality available in some other SQL dialects. Refer to the documentation on
[multi-value string dimensions](multi-value-dimensions.md) for additional details.
> Because multi-value dimensions are treated by the SQL planner as `VARCHAR`, there are some inconsistencies between how
> they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be
> incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND multi_val_dim = 'b'` will be optimized to
> `false`, even though it is possible for a single row to have both "a" and "b" as values for `multi_val_dim`. The
> SQL behavior of multi-value dimensions will change in a future release to more closely align with their behavior
> in native queries.
## NULL values
The `druid.generic.useDefaultValueForNull` [runtime property](../configuration/index.md#sql-compatible-null-handling)
controls Druid's NULL handling mode.
In the default value mode (`true`), Druid treats NULLs and empty strings interchangeably, rather than according to the SQL
standard. In this mode Druid SQL only has partial support for NULLs. For example, the expressions `col IS NULL` and
`col = ''` are equivalent, and both will evaluate to true if `col` contains an empty string. Similarly, the expression
`COALESCE(col1, col2)` will return `col2` if `col1` is an empty string. While the `COUNT(*)` aggregator counts all rows,
the `COUNT(expr)` aggregator will count the number of rows where `expr` is neither null nor the empty string. Numeric
columns in this mode are not nullable; any null or missing values will be treated as zeroes.
In SQL compatible mode (`false`), NULLs are treated more closely to the SQL standard. The property affects both storage
and querying, so for correct behavior, it should be set on all Druid service types to be available at both ingestion
time and query time. There is some overhead associated with the ability to handle NULLs; see
the [segment internals](../design/segments.md#sql-compatible-null-handling) documentation for more details.