druid/docs/querying
Gian Merlino 01eec4a55e
New handling for COALESCE, SEARCH, and filter optimization. (#15609)
* New handling for COALESCE, SEARCH, and filter optimization.

COALESCE is converted by Calcite's parser to CASE, which is largely
counterproductive for us, because it ends up duplicating expressions.
In the current code we end up un-doing it in our CaseOperatorConversion.
This patch has a different approach:

1) Add CaseToCoalesceRule to convert CASE back to COALESCE earlier, before
   the Volcano planner runs, using CaseToCoalesceRule.

2) Add FilterDecomposeCoalesceRule to decompose calls like
   "f(COALESCE(x, y))" into "(x IS NOT NULL AND f(x)) OR (x IS NULL AND f(y))".
   This helps use indexes when available on x and y.

3) Add CoalesceLookupRule to push COALESCE into the third arg of LOOKUP.

4) Add a native "coalesce" function so we can convert 3+ arg COALESCE.

The advantage of this approach is that by un-doing the CASE to COALESCE
conversion earlier, we have flexibility to do more stuff with
COALESCE (like decomposition and pushing into LOOKUP).

SEARCH is an operator used internally by Calcite to represent matching
an argument against some set of ranges. This patch improves our handling
of SEARCH in two ways:

1) Expand NOT points (point "holes" in the range set) from SEARCH as
   `!(a || b)` rather than `!a && !b`, which makes it possible to convert
   them to a "not" of "in" filter later.

2) Generate those nice conversions for NOT points even if the SEARCH
   is not composed of 100% NOT points. Without this change, a SEARCH
   for "x NOT IN ('a', 'b') AND x < 'm'" would get converted like
   "x < 'a' OR (x > 'a' AND x < 'b') OR (x > 'b' AND x < 'm')".

One of the steps we take when generating Druid queries from Calcite
plans is to optimize native filters. This patch improves this step:

1) Extract common ANDed predicates in ConvertSelectorsToIns, so we can
   convert "(a && x = 'b') || (a && x = 'c')" into "a && x IN ('b', 'c')".

2) Speed up CombineAndSimplifyBounds and ConvertSelectorsToIns on
   ORs with lots of children by adjusting the logic to avoid calling
   "indexOf" and "remove" on an ArrayList.

3) Refactor ConvertSelectorsToIns to reduce duplicated code between the
   handling for "selector" and "equals" filters.

* Not so final.

* Fixes.

* Fix test.

* Fix test.
2024-01-03 08:56:22 -08:00
..
aggregations.md Native doc update (#15456) 2023-11-30 10:37:23 +05:30
arrays.md explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245) 2023-11-02 00:31:37 -07:00
caching.md remove group-by v1 (#14866) 2023-08-23 12:44:06 -07:00
datasource.md SQL: Plan non-equijoin conditions as cross join followed by filter (#15302) 2023-11-29 13:46:11 +05:30
datasourcemetadataquery.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
dimensionspecs.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
filters.md document arrayContainsElement filter (#15455) 2023-12-07 00:14:00 -08:00
geo.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
granularities.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
groupbyquery.md Fix dictionarySize overrides in tests (#15354) 2023-11-28 18:49:09 +05:30
having.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
hll-old.md De-incubation cleanup in code, docs, packaging (#9108) 2020-01-03 12:33:19 -05:00
joins.md Sort-merge join and hash shuffles for MSQ. (#13506) 2023-03-08 14:19:39 -08:00
limitspec.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
lookups.md Exposing optional replaceMissingValueWith in lookup function and macros (#14956) 2023-10-02 17:09:23 -07:00
math-expr.md New handling for COALESCE, SEARCH, and filter optimization. (#15609) 2024-01-03 08:56:22 -08:00
multi-value-dimensions.md explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245) 2023-11-02 00:31:37 -07:00
multitenancy.md Docs: Fix some typos. (#14663) 2023-07-26 21:24:18 +05:30
nested-columns.md consolidate json and auto indexers, remove v4 nested column serializer (#14456) 2023-08-22 18:50:11 -07:00
post-aggregations.md explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245) 2023-11-02 00:31:37 -07:00
query-context.md Change default inSubQueryThreshold (#15336) 2023-11-14 14:08:12 +05:30
query-execution.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
query-from-deep-storage.md Query from deep storage doc fixes. (#15382) 2023-11-16 14:05:20 +05:30
query-processing.md Revamp design page (#15486) 2023-12-08 11:40:24 -08:00
querying.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
scan-query.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
searchquery.md remove search auto strategy, estimateSelectivity of BitmapColumnIndex (#15550) 2023-12-13 16:30:01 -08:00
segmentmetadataquery.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
select-query.md Add "offset" parameter to the Scan query. (#10233) 2020-08-13 14:56:24 -07:00
sorting-orders.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
sql-aggregations.md Enabling aggregateMultipleValues in all StringAnyAggregators (#15434) 2023-11-29 14:32:49 -08:00
sql-array-functions.md fixup array and mvd sql docs (#14928) 2023-09-05 16:17:00 -07:00
sql-data-types.md fix redirect for api docs and misc array-related typos (#15387) 2023-11-16 13:29:19 -08:00
sql-functions.md minor doc adjustments (#15531) 2023-12-11 18:22:44 -08:00
sql-json-functions.md minor doc adjustments (#15531) 2023-12-11 18:22:44 -08:00
sql-metadata-tables.md enable sql compatible null handling mode by default (#14792) 2023-08-21 20:07:13 -07:00
sql-multivalue-string-functions.md fixup array and mvd sql docs (#14928) 2023-09-05 16:17:00 -07:00
sql-operators.md Add IS [NOT] DISTINCT FROM to SQL and join matchers. (#14976) 2023-09-20 10:44:32 -07:00
sql-query-context.md enable sql compatible null handling mode by default (#14792) 2023-08-21 20:07:13 -07:00
sql-scalar.md [Docs] Document decode_base64_complex and decode_base64_utf8 functions (#15444) 2023-12-11 09:12:06 -08:00
sql-translation.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
sql-window-functions.md window functions docs (#14739) 2023-11-06 11:34:42 -08:00
sql.md explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245) 2023-11-02 00:31:37 -07:00
timeboundaryquery.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
timeseriesquery.md update timeseries to reflect NULL filling (#15512) 2023-12-07 14:41:27 -08:00
tips-good-queries.md remove references to Jupyter notebooks within the Druid repo (#15143) 2023-11-01 13:17:06 -07:00
topnmetricspec.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
topnquery.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00
troubleshooting.md remove group-by v1 (#14866) 2023-08-23 12:44:06 -07:00
using-caching.md Update Ingestion section (#14023) 2023-05-19 09:42:27 -07:00
virtual-columns.md Docusaurus2 upgrade for master (#14411) 2023-08-16 19:01:21 -07:00