mirror of https://github.com/apache/druid.git
add SQL docs for multi-value string dimensions (#8011)
* add SQL docs for multi-value string dimensions * formatting consistency * fix typo * adjust
This commit is contained in:
parent
c556d44a19
commit
3b84246cd6
|
@ -54,14 +54,14 @@ begin with a digit. To escape other special characters, you can quote it with do
|
||||||
For logical operators, a number is true if and only if it is positive (0 or negative value means false). For string
|
For logical operators, a number is true if and only if it is positive (0 or negative value means false). For string
|
||||||
type, it's the evaluation result of 'Boolean.valueOf(string)'.
|
type, it's the evaluation result of 'Boolean.valueOf(string)'.
|
||||||
|
|
||||||
Multi-value string dimensions are supported and may be treated as either scalar or array typed values. When treated as
|
[Multi-value string dimensions](../querying/multi-value-dimensions.html) are supported and may be treated as either
|
||||||
a scalar type, an expression will automatically be transformed to apply the scalar operation across all values of the
|
scalar or array typed values. When treated as a scalar type, an expression will automatically be transformed to apply
|
||||||
multi-valued type, to mimic Druid's native behavior. Values that result in arrays will be coerced back into the native
|
the scalar operation across all values of the multi-valued type, to mimic Druid's native behavior. Values that result in
|
||||||
Druid string type for aggregation. Druid aggregations on multi-value string dimensions on the individual values, _not_
|
arrays will be coerced back into the native Druid string type for aggregation. Druid aggregations on multi-value string
|
||||||
the 'array', behaving similar to the `unnest` operator available in many SQL dialects. However, by using the
|
dimensions on the individual values, _not_ the 'array', behaving similar to the `UNNEST` operator available in many SQL
|
||||||
`array_to_string` function, aggregations may be done on a stringified version of the complete array, allowing the
|
dialects. However, by using the `array_to_string` function, aggregations may be done on a stringified version of the
|
||||||
complete row to be preserved. Using `string_to_array` in an expression post-aggregator, allows transforming the
|
complete array, allowing the complete row to be preserved. Using `string_to_array` in an expression post-aggregator,
|
||||||
stringified dimension back into the true native array type.
|
allows transforming the stringified dimension back into the true native array type.
|
||||||
|
|
||||||
|
|
||||||
The following built-in functions are available.
|
The following built-in functions are available.
|
||||||
|
@ -168,31 +168,30 @@ See javadoc of java.lang.Math for detailed explanation for each function.
|
||||||
|
|
||||||
| function | description |
|
| function | description |
|
||||||
| --- | --- |
|
| --- | --- |
|
||||||
| `array(expr1,expr ...)` | constructs an array from the expression arguments, using the type of the first argument as the output array type |
|
| array(expr1,expr ...) | constructs an array from the expression arguments, using the type of the first argument as the output array type |
|
||||||
| `array_length(arr)` | returns length of array expression |
|
| array_length(arr) | returns length of array expression |
|
||||||
| `array_offset(arr,long)` | returns the array element at the 0 based index supplied, or null for an out of range index|
|
| array_offset(arr,long) | returns the array element at the 0 based index supplied, or null for an out of range index|
|
||||||
| `array_ordinal(arr,long)` | returns the array element at the 1 based index supplied, or null for an out of range index |
|
| array_ordinal(arr,long) | returns the array element at the 1 based index supplied, or null for an out of range index |
|
||||||
| `array_contains(arr,expr)` | returns 1 if the array contains the element specified by expr, or contains all elements specified by expr if expr is an array, else 0 |
|
| array_contains(arr,expr) | returns 1 if the array contains the element specified by expr, or contains all elements specified by expr if expr is an array, else 0 |
|
||||||
| `array_overlap(arr1,arr2)` | returns 1 if arr1 and arr2 have any elements in common, else 0 |
|
| array_overlap(arr1,arr2) | returns 1 if arr1 and arr2 have any elements in common, else 0 |
|
||||||
| `array_offset_of(arr,expr)` | returns the 0 based index of the first occurrence of expr in the array, or `null` if no matching elements exist in the array. |
|
| array_offset_of(arr,expr) | returns the 0 based index of the first occurrence of expr in the array, or `-1` or `null` if `druid.generic.useDefaultValueForNull=false`if no matching elements exist in the array. |
|
||||||
| `array_ordinal_of(arr,expr)` | returns the 1 based index of the first occurrence of expr in the array, or `null` if no matching elements exist in the array. |
|
| array_ordinal_of(arr,expr) | returns the 1 based index of the first occurrence of expr in the array, or `-1` or `null` if `druid.generic.useDefaultValueForNull=false` if no matching elements exist in the array. |
|
||||||
| `array_prepend(expr,arr)` | adds expr to arr at the beginning, the resulting array type determined by the type of the array |
|
| array_prepend(expr,arr) | adds expr to arr at the beginning, the resulting array type determined by the type of the array |
|
||||||
| `array_append(arr1,expr)` | appends expr to arr, the resulting array type determined by the type of the first array |
|
| array_append(arr1,expr) | appends expr to arr, the resulting array type determined by the type of the first array |
|
||||||
| `array_concat(arr1,arr2)` | concatenates 2 arrays, the resulting array type determined by the type of the first array |
|
| array_concat(arr1,arr2) | concatenates 2 arrays, the resulting array type determined by the type of the first array |
|
||||||
| `array_slice(arr,start,end)` | return the subarray of arr from the 0 based index start(inclusive) to end(exclusive), or `null`, if start is less than 0, greater than length of arr or less than end|
|
| array_slice(arr,start,end) | return the subarray of arr from the 0 based index start(inclusive) to end(exclusive), or `null`, if start is less than 0, greater than length of arr or less than end|
|
||||||
| `array_to_string(arr,str)` | joins all elements of arr by the delimiter specified by str |
|
| array_to_string(arr,str) | joins all elements of arr by the delimiter specified by str |
|
||||||
| `string_to_array(str1,str2)` | splits str1 into an array on the delimiter specified by str2 |
|
| string_to_array(str1,str2) | splits str1 into an array on the delimiter specified by str2 |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Apply Functions
|
## Apply Functions
|
||||||
|
|
||||||
| function | description |
|
| function | description |
|
||||||
| --- | --- |
|
| --- | --- |
|
||||||
| `map(lambda,arr)` | applies a transform specified by a single argument lambda expression to all elements of arr, returning a new array |
|
| map(lambda,arr) | applies a transform specified by a single argument lambda expression to all elements of arr, returning a new array |
|
||||||
| `cartesian_map(lambda,arr1,arr2,...)` | applies a transform specified by a multi argument lambda expression to all elements of the cartesian product of all input arrays, returning a new array; the number of lambda arguments and array inputs must be the same |
|
| cartesian_map(lambda,arr1,arr2,...) | applies a transform specified by a multi argument lambda expression to all elements of the cartesian product of all input arrays, returning a new array; the number of lambda arguments and array inputs must be the same |
|
||||||
| `filter(lambda,arr)` | filters arr by a single argument lambda, returning a new array with all matching elements, or null if no elements match |
|
| filter(lambda,arr) | filters arr by a single argument lambda, returning a new array with all matching elements, or null if no elements match |
|
||||||
| `fold(lambda,arr)` | folds a 2 argument lambda across arr. The first argument of the lambda is the array element and the second the accumulator, returning a single accumulated value. |
|
| fold(lambda,arr) | folds a 2 argument lambda across arr. The first argument of the lambda is the array element and the second the accumulator, returning a single accumulated value. |
|
||||||
| `cartesian_fold(lambda,arr1,arr2,...)` | folds a multi argument lambda across the cartesian product of all input arrays. The first arguments of the lambda is the array element and the last is the accumulator, returning a single accumulated value. |
|
| cartesian_fold(lambda,arr1,arr2,...) | folds a multi argument lambda across the cartesian product of all input arrays. The first arguments of the lambda is the array element and the last is the accumulator, returning a single accumulated value. |
|
||||||
| `any(lambda,arr)` | returns 1 if any element in the array matches the lambda expression, else 0 |
|
| any(lambda,arr) | returns 1 if any element in the array matches the lambda expression, else 0 |
|
||||||
| `all(lambda,arr)` | returns 1 if all elements in the array matches the lambda expression, else 0 |
|
| all(lambda,arr) | returns 1 if all elements in the array matches the lambda expression, else 0 |
|
||||||
|
|
|
@ -24,12 +24,15 @@ title: "Multi-value dimensions"
|
||||||
|
|
||||||
# Multi-value dimensions
|
# Multi-value dimensions
|
||||||
|
|
||||||
Apache Druid (incubating) supports "multi-value" string dimensions. These are generated when an input field contains an array of values
|
Apache Druid (incubating) supports "multi-value" string dimensions. These are generated when an input field contains an
|
||||||
instead of a single value (e.e. JSON arrays, or a TSV field containing one or more `listDelimiter` characters).
|
array of values instead of a single value (e.e. JSON arrays, or a TSV field containing one or more `listDelimiter`
|
||||||
|
characters).
|
||||||
|
|
||||||
This document describes the behavior of groupBy (topN has similar behavior) queries on multi-value dimensions when they
|
This document describes the behavior of groupBy (topN has similar behavior) queries on multi-value dimensions when they
|
||||||
are used as a dimension being grouped by. See the section on multi-value columns in
|
are used as a dimension being grouped by. See the section on multi-value columns in
|
||||||
[segments](../design/segments.html#multi-value-columns) for internal representation details.
|
[segments](../design/segments.html#multi-value-columns) for internal representation details. Examples in this document
|
||||||
|
are in the form of [native Druid queries](querying.html). Refer to the [Druid SQL documentation](sql.html) for details
|
||||||
|
about using multi-value string dimensions in SQL.
|
||||||
|
|
||||||
## Querying multi-value dimensions
|
## Querying multi-value dimensions
|
||||||
|
|
||||||
|
@ -109,9 +112,10 @@ This "selector" filter would match row4 of the dataset above:
|
||||||
### Grouping
|
### Grouping
|
||||||
|
|
||||||
topN and groupBy queries can group on multi-value dimensions. When grouping on a multi-value dimension, _all_ values
|
topN and groupBy queries can group on multi-value dimensions. When grouping on a multi-value dimension, _all_ values
|
||||||
from matching rows will be used to generate one group per value. It's possible for a query to return more groups than
|
from matching rows will be used to generate one group per value. This can be thought of as the equivalent to the
|
||||||
there are rows. For example, a topN on the dimension `tags` with filter `"t1" AND "t3"` would match only row1, and
|
`UNNEST` operator used on an `ARRAY` type that many SQL dialects support. This means it's possible for a query to return
|
||||||
generate a result with three groups: `t1`, `t2`, and `t3`. If you only need to include values that match
|
more groups than there are rows. For example, a topN on the dimension `tags` with filter `"t1" AND "t3"` would match
|
||||||
|
only row1, and generate a result with three groups: `t1`, `t2`, and `t3`. If you only need to include values that match
|
||||||
your filter, you can use a [filtered dimensionSpec](dimensionspecs.html#filtered-dimensionspecs). This can also
|
your filter, you can use a [filtered dimensionSpec](dimensionspecs.html#filtered-dimensionspecs). This can also
|
||||||
improve performance.
|
improve performance.
|
||||||
|
|
||||||
|
@ -280,11 +284,15 @@ returns following result.
|
||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
You might be surprised to see inclusion of "t1", "t2", "t4" and "t5" in the results. It happens because query filter is applied on the row before explosion. For multi-value dimensions, selector filter for "t3" would match row1 and row2, after which exploding is done. For multi-value dimensions, query filter matches a row if any individual value inside the multiple values matches the query filter.
|
You might be surprised to see inclusion of "t1", "t2", "t4" and "t5" in the results. It happens because query filter is
|
||||||
|
applied on the row before explosion. For multi-value dimensions, selector filter for "t3" would match row1 and row2,
|
||||||
|
after which exploding is done. For multi-value dimensions, query filter matches a row if any individual value inside
|
||||||
|
the multiple values matches the query filter.
|
||||||
|
|
||||||
### Example: GroupBy query with a selector query filter and additional filter in "dimensions" attributes
|
### Example: GroupBy query with a selector query filter and additional filter in "dimensions" attributes
|
||||||
|
|
||||||
To solve the problem above and to get only rows for "t3" returned, you would have to use a "filtered dimension spec" as in the query below.
|
To solve the problem above and to get only rows for "t3" returned, you would have to use a "filtered dimension spec" as
|
||||||
|
in the query below.
|
||||||
|
|
||||||
See section on filtered dimensionSpecs in [dimensionSpecs](dimensionspecs.html#filtered-dimensionspecs) for details.
|
See section on filtered dimensionSpecs in [dimensionSpecs](dimensionspecs.html#filtered-dimensionspecs) for details.
|
||||||
|
|
||||||
|
@ -337,4 +345,6 @@ returns the following result.
|
||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that, for groupBy queries, you could get similar result with a [having spec](having.html) but using a filtered dimensionSpec is much more efficient because that gets applied at the lowest level in the query processing pipeline. Having specs are applied at the outermost level of groupBy query processing.
|
Note that, for groupBy queries, you could get similar result with a [having spec](having.html) but using a filtered
|
||||||
|
dimensionSpec is much more efficient because that gets applied at the lowest level in the query processing pipeline.
|
||||||
|
Having specs are applied at the outermost level of groupBy query processing.
|
||||||
|
|
|
@ -42,6 +42,61 @@ queries on the query Broker (the first process you query), which are then passed
|
||||||
queries. Other than the (slight) overhead of translating SQL on the Broker, there isn't an additional performance
|
queries. Other than the (slight) overhead of translating SQL on the Broker, there isn't an additional performance
|
||||||
penalty versus native queries.
|
penalty versus native queries.
|
||||||
|
|
||||||
|
## Data types and casts
|
||||||
|
|
||||||
|
Druid natively supports five basic column types: "long" (64 bit signed int), "float" (32 bit float), "double" (64 bit
|
||||||
|
float) "string" (UTF-8 encoded strings and string arrays), and "complex" (catch-all for more exotic data types like
|
||||||
|
hyperUnique and approxHistogram columns).
|
||||||
|
|
||||||
|
Timestamps (including the `__time` column) are treated by Druid as longs, with the value being the number of
|
||||||
|
milliseconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. Therefore, timestamps in Druid do not carry any
|
||||||
|
timezone information, but only carry information about the exact moment in time they represent. See the
|
||||||
|
[Time functions](#time-functions) section for more information about timestamp handling.
|
||||||
|
|
||||||
|
Druid generally treats NULLs and empty strings interchangeably, rather than according to the SQL standard. As such,
|
||||||
|
Druid SQL only has partial support for NULLs. For example, the expressions `col IS NULL` and `col = ''` are equivalent,
|
||||||
|
and both will evaluate to true if `col` contains an empty string. Similarly, the expression `COALESCE(col1, col2)` will
|
||||||
|
return `col2` if `col1` is an empty string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)`
|
||||||
|
aggregator will count the number of rows where expr is neither null nor the empty string. String columns in Druid are
|
||||||
|
NULLable. Numeric columns are NOT NULL; if you query a numeric column that is not present in all segments of your Druid
|
||||||
|
datasource, then it will be treated as zero for rows from those segments.
|
||||||
|
|
||||||
|
For mathematical operations, Druid SQL will use integer math if all operands involved in an expression are integers.
|
||||||
|
Otherwise, Druid will switch to floating point math. You can force this to happen by casting one of your operands
|
||||||
|
to FLOAT. At runtime, Druid may widen 32-bit floats to 64-bit for certain operators, like SUM aggregators.
|
||||||
|
|
||||||
|
Druid [multi-value string dimensions](multi-value-dimensions.html) will appear in the table schema as `VARCHAR` typed,
|
||||||
|
and may be interacted with in expressions as such. Additionally, they can be treated as `ARRAY` 'like', via a handful of
|
||||||
|
special multi-value operators. Expressions against multi-value string dimensions will apply the expression to all values
|
||||||
|
of the row, however the caveat is that aggregations on these multi-value string columns will observe the native Druid
|
||||||
|
multi-value aggregation behavior, which is equivalent to the `UNNEST` function available in many dialects.
|
||||||
|
Refer to the documentation on [multi-value string dimensions](multi-value-dimensions.html) and
|
||||||
|
[Druid expressions documentation](../misc/math-expr.html) for additional details.
|
||||||
|
|
||||||
|
The following table describes how SQL types map onto Druid types during query runtime. Casts between two SQL types
|
||||||
|
that have the same Druid runtime type will have no effect, other than exceptions noted in the table. Casts between two
|
||||||
|
SQL types that have different Druid runtime types will generate a runtime cast in Druid. If a value cannot be properly
|
||||||
|
cast to another value, as in `CAST('foo' AS BIGINT)`, the runtime will substitute a default value. NULL values cast
|
||||||
|
to non-nullable types will also be substitued with a default value (for example, nulls cast to numbers will be
|
||||||
|
converted to zeroes).
|
||||||
|
|
||||||
|
|SQL type|Druid runtime type|Default value|Notes|
|
||||||
|
|--------|------------------|-------------|-----|
|
||||||
|
|CHAR|STRING|`''`||
|
||||||
|
|VARCHAR|STRING|`''`|Druid STRING columns are reported as VARCHAR|
|
||||||
|
|DECIMAL|DOUBLE|`0.0`|DECIMAL uses floating point, not fixed point math|
|
||||||
|
|FLOAT|FLOAT|`0.0`|Druid FLOAT columns are reported as FLOAT|
|
||||||
|
|REAL|DOUBLE|`0.0`||
|
||||||
|
|DOUBLE|DOUBLE|`0.0`|Druid DOUBLE columns are reported as DOUBLE|
|
||||||
|
|BOOLEAN|LONG|`false`||
|
||||||
|
|TINYINT|LONG|`0`||
|
||||||
|
|SMALLINT|LONG|`0`||
|
||||||
|
|INTEGER|LONG|`0`||
|
||||||
|
|BIGINT|LONG|`0`|Druid LONG columns (except `__time`) are reported as BIGINT|
|
||||||
|
|TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting, e.g. `2000-01-02 03:04:05`, _not_ ISO8601 formatting. For handling other formats, use one of the [time functions](#time-functions)|
|
||||||
|
|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](#time-functions)|
|
||||||
|
|OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc|
|
||||||
|
|
||||||
## Query syntax
|
## Query syntax
|
||||||
|
|
||||||
Each Druid datasource appears as a table in the "druid" schema. This is also the default schema, so Druid datasources
|
Each Druid datasource appears as a table in the "druid" schema. This is also the default schema, so Druid datasources
|
||||||
|
@ -269,6 +324,27 @@ simplest way to write literal timestamps in other time zones is to use TIME_PARS
|
||||||
|`x OR y`|Boolean OR.|
|
|`x OR y`|Boolean OR.|
|
||||||
|`NOT x`|Boolean NOT.|
|
|`NOT x`|Boolean NOT.|
|
||||||
|
|
||||||
|
### Multi-value string functions
|
||||||
|
All 'array' references in the multi-value string function documentation can refer to multi-value string columns or
|
||||||
|
`ARRAY` literals.
|
||||||
|
|
||||||
|
|Function|Notes|
|
||||||
|
|--------|-----|
|
||||||
|
| `ARRAY(expr1,expr ...)` | constructs an SQL ARRAY literal from the expression arguments, using the type of the first argument as the output array type |
|
||||||
|
| `MV_LENGTH(arr)` | returns length of array expression |
|
||||||
|
| `MV_OFFSET(arr,long)` | returns the array element at the 0 based index supplied, or null for an out of range index|
|
||||||
|
| `MV_ORDINAL(arr,long)` | returns the array element at the 1 based index supplied, or null for an out of range index |
|
||||||
|
| `MV_CONTAINS(arr,expr)` | returns 1 if the array contains the element specified by expr, or contains all elements specified by expr if expr is an array, else 0 |
|
||||||
|
| `MV_OVERLAP(arr1,arr2)` | returns 1 if arr1 and arr2 have any elements in common, else 0 |
|
||||||
|
| `MV_OFFSET_OF(arr,expr)` | returns the 0 based index of the first occurrence of expr in the array, or `-1` or `null` if `druid.generic.useDefaultValueForNull=false` if no matching elements exist in the array. |
|
||||||
|
| `MV_ORDINAL_OF(arr,expr)` | returns the 1 based index of the first occurrence of expr in the array, or `-1` or `null` if `druid.generic.useDefaultValueForNull=false` if no matching elements exist in the array. |
|
||||||
|
| `MV_PREPEND(expr,arr)` | adds expr to arr at the beginning, the resulting array type determined by the type of the array |
|
||||||
|
| `MV_APPEND(arr1,expr)` | appends expr to arr, the resulting array type determined by the type of the first array |
|
||||||
|
| `MV_CONCAT(arr1,arr2)` | concatenates 2 arrays, the resulting array type determined by the type of the first array |
|
||||||
|
| `MV_SLICE(arr,start,end)` | return the subarray of arr from the 0 based index start(inclusive) to end(exclusive), or `null`, if start is less than 0, greater than length of arr or less than end|
|
||||||
|
| `MV_TO_STRING(arr,str)` | joins all elements of arr by the delimiter specified by str |
|
||||||
|
| `STRING_TO_MV(str1,str2)` | splits str1 into an array on the delimiter specified by str2 |
|
||||||
|
|
||||||
### Other functions
|
### Other functions
|
||||||
|
|
||||||
|Function|Notes|
|
|Function|Notes|
|
||||||
|
@ -280,6 +356,7 @@ simplest way to write literal timestamps in other time zones is to use TIME_PARS
|
||||||
|`COALESCE(value1, value2, ...)`|Returns the first value that is neither NULL nor empty string.|
|
|`COALESCE(value1, value2, ...)`|Returns the first value that is neither NULL nor empty string.|
|
||||||
|`NVL(expr,expr-for-null)`|Returns 'expr-for-null' if 'expr' is null (or empty string for string type).|
|
|`NVL(expr,expr-for-null)`|Returns 'expr-for-null' if 'expr' is null (or empty string for string type).|
|
||||||
|`BLOOM_FILTER_TEST(<expr>, <serialized-filter>)`|Returns true if the value is contained in the base64 serialized bloom filter. See [bloom filter extension](../development/extensions-core/bloom-filter.html) documentation for additional details.|
|
|`BLOOM_FILTER_TEST(<expr>, <serialized-filter>)`|Returns true if the value is contained in the base64 serialized bloom filter. See [bloom filter extension](../development/extensions-core/bloom-filter.html) documentation for additional details.|
|
||||||
|
|
||||||
### Unsupported features
|
### Unsupported features
|
||||||
|
|
||||||
Druid does not support all SQL features, including:
|
Druid does not support all SQL features, including:
|
||||||
|
@ -291,57 +368,10 @@ Druid does not support all SQL features, including:
|
||||||
|
|
||||||
Additionally, some Druid features are not supported by the SQL language. Some unsupported Druid features include:
|
Additionally, some Druid features are not supported by the SQL language. Some unsupported Druid features include:
|
||||||
|
|
||||||
- [Multi-value dimensions](multi-value-dimensions.html).
|
- [Set operations on DataSketches aggregators](../development/extensions-core/datasketches-extension.html).
|
||||||
- [DataSketches aggregators](../development/extensions-core/datasketches-extension.html).
|
|
||||||
- [Spatial filters](../development/geo.html).
|
- [Spatial filters](../development/geo.html).
|
||||||
- [Query cancellation](querying.html#query-cancellation).
|
- [Query cancellation](querying.html#query-cancellation).
|
||||||
|
|
||||||
## Data types and casts
|
|
||||||
|
|
||||||
Druid natively supports five basic column types: "long" (64 bit signed int), "float" (32 bit float), "double" (64 bit
|
|
||||||
float) "string" (UTF-8 encoded strings), and "complex" (catch-all for more exotic data types like hyperUnique and
|
|
||||||
approxHistogram columns).
|
|
||||||
|
|
||||||
Timestamps (including the `__time` column) are treated by Druid as longs, with the value being the number of
|
|
||||||
milliseconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. Therefore, timestamps in Druid do not carry any
|
|
||||||
timezone information, but only carry information about the exact moment in time they represent. See the
|
|
||||||
[Time functions](#time-functions) section for more information about timestamp handling.
|
|
||||||
|
|
||||||
Druid generally treats NULLs and empty strings interchangeably, rather than according to the SQL standard. As such,
|
|
||||||
Druid SQL only has partial support for NULLs. For example, the expressions `col IS NULL` and `col = ''` are equivalent,
|
|
||||||
and both will evaluate to true if `col` contains an empty string. Similarly, the expression `COALESCE(col1, col2)` will
|
|
||||||
return `col2` if `col1` is an empty string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)`
|
|
||||||
aggregator will count the number of rows where expr is neither null nor the empty string. String columns in Druid are
|
|
||||||
NULLable. Numeric columns are NOT NULL; if you query a numeric column that is not present in all segments of your Druid
|
|
||||||
datasource, then it will be treated as zero for rows from those segments.
|
|
||||||
|
|
||||||
For mathematical operations, Druid SQL will use integer math if all operands involved in an expression are integers.
|
|
||||||
Otherwise, Druid will switch to floating point math. You can force this to happen by casting one of your operands
|
|
||||||
to FLOAT. At runtime, Druid may widen 32-bit floats to 64-bit for certain operators, like SUM aggregators.
|
|
||||||
|
|
||||||
The following table describes how SQL types map onto Druid types during query runtime. Casts between two SQL types
|
|
||||||
that have the same Druid runtime type will have no effect, other than exceptions noted in the table. Casts between two
|
|
||||||
SQL types that have different Druid runtime types will generate a runtime cast in Druid. If a value cannot be properly
|
|
||||||
cast to another value, as in `CAST('foo' AS BIGINT)`, the runtime will substitute a default value. NULL values cast
|
|
||||||
to non-nullable types will also be substitued with a default value (for example, nulls cast to numbers will be
|
|
||||||
converted to zeroes).
|
|
||||||
|
|
||||||
|SQL type|Druid runtime type|Default value|Notes|
|
|
||||||
|--------|------------------|-------------|-----|
|
|
||||||
|CHAR|STRING|`''`||
|
|
||||||
|VARCHAR|STRING|`''`|Druid STRING columns are reported as VARCHAR|
|
|
||||||
|DECIMAL|DOUBLE|`0.0`|DECIMAL uses floating point, not fixed point math|
|
|
||||||
|FLOAT|FLOAT|`0.0`|Druid FLOAT columns are reported as FLOAT|
|
|
||||||
|REAL|DOUBLE|`0.0`||
|
|
||||||
|DOUBLE|DOUBLE|`0.0`|Druid DOUBLE columns are reported as DOUBLE|
|
|
||||||
|BOOLEAN|LONG|`false`||
|
|
||||||
|TINYINT|LONG|`0`||
|
|
||||||
|SMALLINT|LONG|`0`||
|
|
||||||
|INTEGER|LONG|`0`||
|
|
||||||
|BIGINT|LONG|`0`|Druid LONG columns (except `__time`) are reported as BIGINT|
|
|
||||||
|TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting, e.g. `2000-01-02 03:04:05`, _not_ ISO8601 formatting. For handling other formats, use one of the [time functions](#time-functions)|
|
|
||||||
|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](#time-functions)|
|
|
||||||
|OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc|
|
|
||||||
|
|
||||||
## Query execution
|
## Query execution
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue