SQL: Improve docs around casts. (#13466)

Main change: clarify that the "default value" for casts only applies if
druid.generic.useDefaultValueForNull = true.

Secondary change: adjust a bunch of wording from future to present tense.
This commit is contained in:
Gian Merlino 2022-12-15 15:01:40 -08:00 committed by GitHub
parent 668d1fad6b
commit 7f3c117e3a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 20 additions and 15 deletions

View File

@ -40,14 +40,17 @@ milliseconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. Therefore
timezone information, but only carry information about the exact moment in time they represent. See the
[Time functions](sql-scalar.md#date-and-time-functions) section for more information about timestamp handling.
The following table describes how Druid maps SQL types onto native types at query runtime. Casts between two SQL types
that have the same Druid runtime type will have no effect, other than exceptions noted in the table. Casts between two
SQL types that have different Druid runtime types will generate a runtime cast in Druid. If a value cannot be properly
cast to another value, as in `CAST('foo' AS BIGINT)`, the runtime will substitute a default value. NULL values cast
to non-nullable types will also be substituted with a default value (for example, nulls cast to numbers will be
converted to zeroes).
Casts between two SQL types with the same Druid runtime type (see below table) have no effect, other than exceptions
noted in the table. Casts between two SQL types that have different Druid runtime types generate a runtime cast in
Druid. If a value cannot be cast to the target type, as in `CAST('foo' AS BIGINT)`, Druid either substitutes a default
value (when `druid.generic.useDefaultValueForNull = true`, the default mode), or substitutes [NULL](#null-values) (when
`druid.generic.useDefaultValueForNull = false`). NULL values cast to non-nullable types are also substituted with a
default value. For example, if `druid.generic.useDefaultValueForNull = true`, a null VARCHAR cast to BIGINT is converted
to a zero.
|SQL type|Druid runtime type|Default value|Notes|
The following table describes how Druid maps SQL types onto native types when running queries.
|SQL type|Druid runtime type|Default value<sup>*</sup>|Notes|
|--------|------------------|-------------|-----|
|CHAR|STRING|`''`||
|VARCHAR|STRING|`''`|Druid STRING columns are reported as VARCHAR. Can include [multi-value strings](#multi-value-strings) as well.|
@ -64,21 +67,23 @@ converted to zeroes).
|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
|OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc.|
<sup>*</sup> Default value applies if `druid.generic.useDefaultValueForNull = true` (the default mode). Otherwise, the default value is `NULL` for all types.
## Multi-value strings
Druid's native type system allows strings to potentially have multiple values. These
[multi-value string dimensions](multi-value-dimensions.md) will be reported in SQL as `VARCHAR` typed, and can be
syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions will be
[multi-value string dimensions](multi-value-dimensions.md) are reported in SQL as `VARCHAR` typed, and can be
syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions are
applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special
[multi-value string functions](sql-multivalue-string-functions.md), which can perform powerful array-aware operations.
Grouping by a multi-value expression will observe the native Druid multi-value aggregation behavior, which is similar to
Grouping by a multi-value expression observes the native Druid multi-value aggregation behavior, which is similar to
the `UNNEST` functionality available in some other SQL dialects. Refer to the documentation on
[multi-value string dimensions](multi-value-dimensions.md) for additional details.
> Because multi-value dimensions are treated by the SQL planner as `VARCHAR`, there are some inconsistencies between how
> they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be
> incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND multi_val_dim = 'b'` will be optimized to
> incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND multi_val_dim = 'b'` is optimized to
> `false`, even though it is possible for a single row to have both "a" and "b" as values for `multi_val_dim`. The
> SQL behavior of multi-value dimensions will change in a future release to more closely align with their behavior
> in native queries.
@ -90,11 +95,11 @@ runtime property controls Druid's NULL handling mode. For the most SQL compliant
When `druid.generic.useDefaultValueForNull = true` (the default mode), Druid treats NULLs and empty strings
interchangeably, rather than according to the SQL standard. In this mode Druid SQL only has partial support for NULLs.
For example, the expressions `col IS NULL` and `col = ''` are equivalent, and both will evaluate to true if `col`
contains an empty string. Similarly, the expression `COALESCE(col1, col2)` will return `col2` if `col1` is an empty
string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)` aggregator will count the number of rows
For example, the expressions `col IS NULL` and `col = ''` are equivalent, and both evaluate to true if `col`
contains an empty string. Similarly, the expression `COALESCE(col1, col2)` returns `col2` if `col1` is an empty
string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)` aggregator counts the number of rows
where `expr` is neither null nor the empty string. Numeric columns in this mode are not nullable; any null or missing
values will be treated as zeroes.
values are treated as zeroes.
When `druid.generic.useDefaultValueForNull = false`, NULLs are treated more closely to the SQL standard. In this mode,
numeric NULL is permitted, and NULLs and empty strings are no longer treated as interchangeable. This property