mirror of https://github.com/apache/druid.git
102 lines
6.5 KiB
Markdown
102 lines
6.5 KiB
Markdown
|
---
|
||
|
id: sql-data-types
|
||
|
title: "SQL data types"
|
||
|
sidebar_label: "SQL data types"
|
||
|
---
|
||
|
|
||
|
<!--
|
||
|
~ Licensed to the Apache Software Foundation (ASF) under one
|
||
|
~ or more contributor license agreements. See the NOTICE file
|
||
|
~ distributed with this work for additional information
|
||
|
~ regarding copyright ownership. The ASF licenses this file
|
||
|
~ to you under the Apache License, Version 2.0 (the
|
||
|
~ "License"); you may not use this file except in compliance
|
||
|
~ with the License. You may obtain a copy of the License at
|
||
|
~
|
||
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
||
|
~
|
||
|
~ Unless required by applicable law or agreed to in writing,
|
||
|
~ software distributed under the License is distributed on an
|
||
|
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||
|
~ KIND, either express or implied. See the License for the
|
||
|
~ specific language governing permissions and limitations
|
||
|
~ under the License.
|
||
|
-->
|
||
|
|
||
|
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
|
||
|
> This document describes the SQL language.
|
||
|
|
||
|
|
||
|
Columns in Druid are associated with a specific data type. This topic describes supported data types in Druid SQL.
|
||
|
|
||
|
## Standard types
|
||
|
|
||
|
Druid natively supports five basic column types: "long" (64 bit signed int), "float" (32 bit float), "double" (64 bit
|
||
|
float) "string" (UTF-8 encoded strings and string arrays), and "complex" (catch-all for more exotic data types like
|
||
|
hyperUnique and approxHistogram columns).
|
||
|
|
||
|
Timestamps (including the `__time` column) are treated by Druid as longs, with the value being the number of
|
||
|
milliseconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. Therefore, timestamps in Druid do not carry any
|
||
|
timezone information, but only carry information about the exact moment in time they represent. See the
|
||
|
[Time functions](sql-scalar.md#date-and-time-functions) section for more information about timestamp handling.
|
||
|
|
||
|
The following table describes how Druid maps SQL types onto native types at query runtime. Casts between two SQL types
|
||
|
that have the same Druid runtime type will have no effect, other than exceptions noted in the table. Casts between two
|
||
|
SQL types that have different Druid runtime types will generate a runtime cast in Druid. If a value cannot be properly
|
||
|
cast to another value, as in `CAST('foo' AS BIGINT)`, the runtime will substitute a default value. NULL values cast
|
||
|
to non-nullable types will also be substituted with a default value (for example, nulls cast to numbers will be
|
||
|
converted to zeroes).
|
||
|
|
||
|
|SQL type|Druid runtime type|Default value|Notes|
|
||
|
|--------|------------------|-------------|-----|
|
||
|
|CHAR|STRING|`''`||
|
||
|
|VARCHAR|STRING|`''`|Druid STRING columns are reported as VARCHAR. Can include [multi-value strings](#multi-value-strings) as well.|
|
||
|
|DECIMAL|DOUBLE|`0.0`|DECIMAL uses floating point, not fixed point math|
|
||
|
|FLOAT|FLOAT|`0.0`|Druid FLOAT columns are reported as FLOAT|
|
||
|
|REAL|DOUBLE|`0.0`||
|
||
|
|DOUBLE|DOUBLE|`0.0`|Druid DOUBLE columns are reported as DOUBLE|
|
||
|
|BOOLEAN|LONG|`false`||
|
||
|
|TINYINT|LONG|`0`||
|
||
|
|SMALLINT|LONG|`0`||
|
||
|
|INTEGER|LONG|`0`||
|
||
|
|BIGINT|LONG|`0`|Druid LONG columns (except `__time`) are reported as BIGINT|
|
||
|
|TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting, e.g. `2000-01-02 03:04:05`, _not_ ISO8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
|
||
|
|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
|
||
|
|OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc.|
|
||
|
|
||
|
## Multi-value strings
|
||
|
|
||
|
Druid's native type system allows strings to potentially have multiple values. These
|
||
|
[multi-value string dimensions](multi-value-dimensions.md) will be reported in SQL as `VARCHAR` typed, and can be
|
||
|
syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions will be
|
||
|
applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special
|
||
|
[multi-value string functions](sql-multivalue-string-functions.md), which can perform powerful array-aware operations.
|
||
|
|
||
|
Grouping by a multi-value expression will observe the native Druid multi-value aggregation behavior, which is similar to
|
||
|
the `UNNEST` functionality available in some other SQL dialects. Refer to the documentation on
|
||
|
[multi-value string dimensions](multi-value-dimensions.md) for additional details.
|
||
|
|
||
|
> Because multi-value dimensions are treated by the SQL planner as `VARCHAR`, there are some inconsistencies between how
|
||
|
> they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be
|
||
|
> incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND multi_val_dim = 'b'` will be optimized to
|
||
|
> `false`, even though it is possible for a single row to have both "a" and "b" as values for `multi_val_dim`. The
|
||
|
> SQL behavior of multi-value dimensions will change in a future release to more closely align with their behavior
|
||
|
> in native queries.
|
||
|
|
||
|
## NULL values
|
||
|
|
||
|
The `druid.generic.useDefaultValueForNull` [runtime property](../configuration/index.md#sql-compatible-null-handling)
|
||
|
controls Druid's NULL handling mode.
|
||
|
|
||
|
In the default value mode (`true`), Druid treats NULLs and empty strings interchangeably, rather than according to the SQL
|
||
|
standard. In this mode Druid SQL only has partial support for NULLs. For example, the expressions `col IS NULL` and
|
||
|
`col = ''` are equivalent, and both will evaluate to true if `col` contains an empty string. Similarly, the expression
|
||
|
`COALESCE(col1, col2)` will return `col2` if `col1` is an empty string. While the `COUNT(*)` aggregator counts all rows,
|
||
|
the `COUNT(expr)` aggregator will count the number of rows where `expr` is neither null nor the empty string. Numeric
|
||
|
columns in this mode are not nullable; any null or missing values will be treated as zeroes.
|
||
|
|
||
|
In SQL compatible mode (`false`), NULLs are treated more closely to the SQL standard. The property affects both storage
|
||
|
and querying, so for correct behavior, it should be set on all Druid service types to be available at both ingestion
|
||
|
time and query time. There is some overhead associated with the ability to handle NULLs; see
|
||
|
the [segment internals](../design/segments.md#sql-compatible-null-handling) documentation for more details.
|