153 lines
6.9 KiB
Plaintext
153 lines
6.9 KiB
Plaintext
[role="xpack"]
|
|
[testenv="basic"]
|
|
[[sql-limitations]]
|
|
== SQL Limitations
|
|
|
|
[float]
|
|
[[large-parsing-trees]]
|
|
=== Large queries may throw `ParsingExpection`
|
|
|
|
Extremely large queries can consume too much memory during the parsing phase, in which case the {es-sql} engine will
|
|
abort parsing and throw an error. In such cases, consider reducing the query to a smaller size by potentially
|
|
simplifying it or splitting it into smaller queries.
|
|
|
|
[float]
|
|
[[sys-columns-describe-table-nested-fields]]
|
|
=== Nested fields in `SYS COLUMNS` and `DESCRIBE TABLE`
|
|
|
|
{es} has a special type of relationship fields called `nested` fields. In {es-sql} they can be used by referencing their inner
|
|
sub-fields. Even though `SYS COLUMNS` in non-driver mode (in the CLI and in REST calls) and `DESCRIBE TABLE` will still display
|
|
them as having the type `NESTED`, they cannot be used in a query. One can only reference its sub-fields in the form:
|
|
|
|
[source, sql]
|
|
--------------------------------------------------
|
|
[nested_field_name].[sub_field_name]
|
|
--------------------------------------------------
|
|
|
|
For example:
|
|
|
|
[source, sql]
|
|
--------------------------------------------------
|
|
SELECT dep.dep_name.keyword FROM test_emp GROUP BY languages;
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
=== Multi-nested fields
|
|
|
|
{es-sql} doesn't support multi-nested documents, so a query cannot reference more than one nested field in an index.
|
|
This applies to multi-level nested fields, but also multiple nested fields defined on the same level. For example, for this index:
|
|
|
|
[source, sql]
|
|
----------------------------------------------------
|
|
column | type | mapping
|
|
----------------------+---------------+-------------
|
|
nested_A |STRUCT |NESTED
|
|
nested_A.nested_X |STRUCT |NESTED
|
|
nested_A.nested_X.text|VARCHAR |KEYWORD
|
|
nested_A.text |VARCHAR |KEYWORD
|
|
nested_B |STRUCT |NESTED
|
|
nested_B.text |VARCHAR |KEYWORD
|
|
----------------------------------------------------
|
|
|
|
`nested_A` and `nested_B` cannot be used at the same time, nor `nested_A`/`nested_B` and `nested_A.nested_X` combination.
|
|
For such situations, {es-sql} will display an error message.
|
|
|
|
[float]
|
|
=== Paginating nested inner hits
|
|
|
|
When SELECTing a nested field, pagination will not work as expected, {es-sql} will return __at least__ the page size records.
|
|
This is because of the way nested queries work in {es}: the root nested field will be returned and it's matching inner nested fields as well,
|
|
pagination taking place on the **root nested document and not on its inner hits**.
|
|
|
|
[float]
|
|
[[normalized-keyword-fields]]
|
|
=== Normalized `keyword` fields
|
|
|
|
`keyword` fields in {es} can be normalized by defining a `normalizer`. Such fields are not supported in {es-sql}.
|
|
|
|
[float]
|
|
=== Array type of fields
|
|
|
|
Array fields are not supported due to the "invisible" way in which {es} handles an array of values: the mapping doesn't indicate whether
|
|
a field is an array (has multiple values) or not, so without reading all the data, {es-sql} cannot know whether a field is a single or multi value.
|
|
When multiple values are returned for a field, by default, {es-sql} will throw an exception. However, it is possible to change this behavior through `field_multi_value_leniency` parameter in REST (disabled by default) or
|
|
`field.multi.value.leniency` in drivers (enabled by default).
|
|
|
|
[float]
|
|
=== Sorting by aggregation
|
|
|
|
When doing aggregations (`GROUP BY`) {es-sql} relies on {es}'s `composite` aggregation for its support for paginating results.
|
|
However this type of aggregation does come with a limitation: sorting can only be applied on the key used for the aggregation's buckets.
|
|
{es-sql} overcomes this limitation by doing client-side sorting however as a safety measure, allows only up to *512* rows.
|
|
|
|
It is recommended to use `LIMIT` for queries that use sorting by aggregation, essentially indicating the top N results that are desired:
|
|
|
|
[source, sql]
|
|
--------------------------------------------------
|
|
SELECT * FROM test GROUP BY age ORDER BY COUNT(*) LIMIT 100;
|
|
--------------------------------------------------
|
|
|
|
It is possible to run the same queries without a `LIMIT` however in that case if the maximum size (*512*) is passed, an exception will be
|
|
returned as {es-sql} is unable to track (and sort) all the results returned.
|
|
|
|
[float]
|
|
=== Using aggregation functions on top of scalar functions
|
|
|
|
Aggregation functions like <<sql-functions-aggs-min,`MIN`>>, <<sql-functions-aggs-max,`MAX`>>, etc. can only be used
|
|
directly on fields, and so queries like `SELECT MAX(abs(age)) FROM test` are not possible.
|
|
|
|
[float]
|
|
=== Using a sub-select
|
|
|
|
Using sub-selects (`SELECT X FROM (SELECT Y)`) is **supported to a small degree**: any sub-select that can be "flattened" into a single
|
|
`SELECT` is possible with {es-sql}. For example:
|
|
|
|
["source","sql",subs="attributes,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{sql-specs}/docs/docs.csv-spec[limitationSubSelect]
|
|
--------------------------------------------------
|
|
|
|
The query above is possible because it is equivalent with:
|
|
|
|
["source","sql",subs="attributes,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{sql-specs}/docs/docs.csv-spec[limitationSubSelectRewritten]
|
|
--------------------------------------------------
|
|
|
|
But, if the sub-select would include a `GROUP BY` or `HAVING` or the enclosing `SELECT` would be more complex than `SELECT X
|
|
FROM (SELECT ...) WHERE [simple_condition]`, this is currently **un-supported**.
|
|
|
|
[float]
|
|
[[first-last-agg-functions-having-clause]]
|
|
=== Using <<sql-functions-aggs-first, `FIRST`>>/<<sql-functions-aggs-last,`LAST`>> aggregation functions in `HAVING` clause
|
|
|
|
Using `FIRST` and `LAST` in the `HAVING` clause is not supported. The same applies to
|
|
<<sql-functions-aggs-min,`MIN`>> and <<sql-functions-aggs-max,`MAX`>> when their target column
|
|
is of type <<keyword, `keyword`>> as they are internally translated to `FIRST` and `LAST`.
|
|
|
|
[float]
|
|
[[group-by-time]]
|
|
=== Using TIME data type in GROUP BY or <<sql-functions-grouping-histogram>>
|
|
|
|
Using `TIME` data type as a grouping key is currently not supported. For example:
|
|
|
|
[source, sql]
|
|
-------------------------------------------------------------
|
|
SELECT count(*) FROM test GROUP BY CAST(date_created AS TIME);
|
|
-------------------------------------------------------------
|
|
|
|
On the other hand, it can still be used if it's wrapped with a scalar function that returns another data type,
|
|
for example:
|
|
|
|
[source, sql]
|
|
-------------------------------------------------------------
|
|
SELECT count(*) FROM test GROUP BY MINUTE((CAST(date_created AS TIME));
|
|
-------------------------------------------------------------
|
|
|
|
`TIME` data type is also currently not supported in histogram grouping function. For example:
|
|
|
|
[source, sql]
|
|
-------------------------------------------------------------
|
|
SELECT HISTOGRAM(CAST(birth_date AS TIME), INTERVAL '10' MINUTES) as h, COUNT(*) FROM t GROUP BY h
|
|
-------------------------------------------------------------
|