mirror of https://github.com/apache/druid.git
SQL: Improve docs around casts. (#13466)
Main change: clarify that the "default value" for casts only applies if druid.generic.useDefaultValueForNull = true. Secondary change: adjust a bunch of wording from future to present tense.
This commit is contained in:
parent
668d1fad6b
commit
7f3c117e3a
|
@ -40,14 +40,17 @@ milliseconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. Therefore
|
||||||
timezone information, but only carry information about the exact moment in time they represent. See the
|
timezone information, but only carry information about the exact moment in time they represent. See the
|
||||||
[Time functions](sql-scalar.md#date-and-time-functions) section for more information about timestamp handling.
|
[Time functions](sql-scalar.md#date-and-time-functions) section for more information about timestamp handling.
|
||||||
|
|
||||||
The following table describes how Druid maps SQL types onto native types at query runtime. Casts between two SQL types
|
Casts between two SQL types with the same Druid runtime type (see below table) have no effect, other than exceptions
|
||||||
that have the same Druid runtime type will have no effect, other than exceptions noted in the table. Casts between two
|
noted in the table. Casts between two SQL types that have different Druid runtime types generate a runtime cast in
|
||||||
SQL types that have different Druid runtime types will generate a runtime cast in Druid. If a value cannot be properly
|
Druid. If a value cannot be cast to the target type, as in `CAST('foo' AS BIGINT)`, Druid either substitutes a default
|
||||||
cast to another value, as in `CAST('foo' AS BIGINT)`, the runtime will substitute a default value. NULL values cast
|
value (when `druid.generic.useDefaultValueForNull = true`, the default mode), or substitutes [NULL](#null-values) (when
|
||||||
to non-nullable types will also be substituted with a default value (for example, nulls cast to numbers will be
|
`druid.generic.useDefaultValueForNull = false`). NULL values cast to non-nullable types are also substituted with a
|
||||||
converted to zeroes).
|
default value. For example, if `druid.generic.useDefaultValueForNull = true`, a null VARCHAR cast to BIGINT is converted
|
||||||
|
to a zero.
|
||||||
|
|
||||||
|SQL type|Druid runtime type|Default value|Notes|
|
The following table describes how Druid maps SQL types onto native types when running queries.
|
||||||
|
|
||||||
|
|SQL type|Druid runtime type|Default value<sup>*</sup>|Notes|
|
||||||
|--------|------------------|-------------|-----|
|
|--------|------------------|-------------|-----|
|
||||||
|CHAR|STRING|`''`||
|
|CHAR|STRING|`''`||
|
||||||
|VARCHAR|STRING|`''`|Druid STRING columns are reported as VARCHAR. Can include [multi-value strings](#multi-value-strings) as well.|
|
|VARCHAR|STRING|`''`|Druid STRING columns are reported as VARCHAR. Can include [multi-value strings](#multi-value-strings) as well.|
|
||||||
|
@ -64,21 +67,23 @@ converted to zeroes).
|
||||||
|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
|
|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
|
||||||
|OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc.|
|
|OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc.|
|
||||||
|
|
||||||
|
<sup>*</sup> Default value applies if `druid.generic.useDefaultValueForNull = true` (the default mode). Otherwise, the default value is `NULL` for all types.
|
||||||
|
|
||||||
## Multi-value strings
|
## Multi-value strings
|
||||||
|
|
||||||
Druid's native type system allows strings to potentially have multiple values. These
|
Druid's native type system allows strings to potentially have multiple values. These
|
||||||
[multi-value string dimensions](multi-value-dimensions.md) will be reported in SQL as `VARCHAR` typed, and can be
|
[multi-value string dimensions](multi-value-dimensions.md) are reported in SQL as `VARCHAR` typed, and can be
|
||||||
syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions will be
|
syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions are
|
||||||
applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special
|
applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special
|
||||||
[multi-value string functions](sql-multivalue-string-functions.md), which can perform powerful array-aware operations.
|
[multi-value string functions](sql-multivalue-string-functions.md), which can perform powerful array-aware operations.
|
||||||
|
|
||||||
Grouping by a multi-value expression will observe the native Druid multi-value aggregation behavior, which is similar to
|
Grouping by a multi-value expression observes the native Druid multi-value aggregation behavior, which is similar to
|
||||||
the `UNNEST` functionality available in some other SQL dialects. Refer to the documentation on
|
the `UNNEST` functionality available in some other SQL dialects. Refer to the documentation on
|
||||||
[multi-value string dimensions](multi-value-dimensions.md) for additional details.
|
[multi-value string dimensions](multi-value-dimensions.md) for additional details.
|
||||||
|
|
||||||
> Because multi-value dimensions are treated by the SQL planner as `VARCHAR`, there are some inconsistencies between how
|
> Because multi-value dimensions are treated by the SQL planner as `VARCHAR`, there are some inconsistencies between how
|
||||||
> they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be
|
> they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be
|
||||||
> incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND multi_val_dim = 'b'` will be optimized to
|
> incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND multi_val_dim = 'b'` is optimized to
|
||||||
> `false`, even though it is possible for a single row to have both "a" and "b" as values for `multi_val_dim`. The
|
> `false`, even though it is possible for a single row to have both "a" and "b" as values for `multi_val_dim`. The
|
||||||
> SQL behavior of multi-value dimensions will change in a future release to more closely align with their behavior
|
> SQL behavior of multi-value dimensions will change in a future release to more closely align with their behavior
|
||||||
> in native queries.
|
> in native queries.
|
||||||
|
@ -90,11 +95,11 @@ runtime property controls Druid's NULL handling mode. For the most SQL compliant
|
||||||
|
|
||||||
When `druid.generic.useDefaultValueForNull = true` (the default mode), Druid treats NULLs and empty strings
|
When `druid.generic.useDefaultValueForNull = true` (the default mode), Druid treats NULLs and empty strings
|
||||||
interchangeably, rather than according to the SQL standard. In this mode Druid SQL only has partial support for NULLs.
|
interchangeably, rather than according to the SQL standard. In this mode Druid SQL only has partial support for NULLs.
|
||||||
For example, the expressions `col IS NULL` and `col = ''` are equivalent, and both will evaluate to true if `col`
|
For example, the expressions `col IS NULL` and `col = ''` are equivalent, and both evaluate to true if `col`
|
||||||
contains an empty string. Similarly, the expression `COALESCE(col1, col2)` will return `col2` if `col1` is an empty
|
contains an empty string. Similarly, the expression `COALESCE(col1, col2)` returns `col2` if `col1` is an empty
|
||||||
string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)` aggregator will count the number of rows
|
string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)` aggregator counts the number of rows
|
||||||
where `expr` is neither null nor the empty string. Numeric columns in this mode are not nullable; any null or missing
|
where `expr` is neither null nor the empty string. Numeric columns in this mode are not nullable; any null or missing
|
||||||
values will be treated as zeroes.
|
values are treated as zeroes.
|
||||||
|
|
||||||
When `druid.generic.useDefaultValueForNull = false`, NULLs are treated more closely to the SQL standard. In this mode,
|
When `druid.generic.useDefaultValueForNull = false`, NULLs are treated more closely to the SQL standard. In this mode,
|
||||||
numeric NULL is permitted, and NULLs and empty strings are no longer treated as interchangeable. This property
|
numeric NULL is permitted, and NULLs and empty strings are no longer treated as interchangeable. This property
|
||||||
|
|
Loading…
Reference in New Issue