druid/docs
Gian Merlino 01eec4a55e
New handling for COALESCE, SEARCH, and filter optimization. (#15609)
* New handling for COALESCE, SEARCH, and filter optimization.

COALESCE is converted by Calcite's parser to CASE, which is largely
counterproductive for us, because it ends up duplicating expressions.
In the current code we end up un-doing it in our CaseOperatorConversion.
This patch has a different approach:

1) Add CaseToCoalesceRule to convert CASE back to COALESCE earlier, before
   the Volcano planner runs, using CaseToCoalesceRule.

2) Add FilterDecomposeCoalesceRule to decompose calls like
   "f(COALESCE(x, y))" into "(x IS NOT NULL AND f(x)) OR (x IS NULL AND f(y))".
   This helps use indexes when available on x and y.

3) Add CoalesceLookupRule to push COALESCE into the third arg of LOOKUP.

4) Add a native "coalesce" function so we can convert 3+ arg COALESCE.

The advantage of this approach is that by un-doing the CASE to COALESCE
conversion earlier, we have flexibility to do more stuff with
COALESCE (like decomposition and pushing into LOOKUP).

SEARCH is an operator used internally by Calcite to represent matching
an argument against some set of ranges. This patch improves our handling
of SEARCH in two ways:

1) Expand NOT points (point "holes" in the range set) from SEARCH as
   `!(a || b)` rather than `!a && !b`, which makes it possible to convert
   them to a "not" of "in" filter later.

2) Generate those nice conversions for NOT points even if the SEARCH
   is not composed of 100% NOT points. Without this change, a SEARCH
   for "x NOT IN ('a', 'b') AND x < 'm'" would get converted like
   "x < 'a' OR (x > 'a' AND x < 'b') OR (x > 'b' AND x < 'm')".

One of the steps we take when generating Druid queries from Calcite
plans is to optimize native filters. This patch improves this step:

1) Extract common ANDed predicates in ConvertSelectorsToIns, so we can
   convert "(a && x = 'b') || (a && x = 'c')" into "a && x IN ('b', 'c')".

2) Speed up CombineAndSimplifyBounds and ConvertSelectorsToIns on
   ORs with lots of children by adjusting the logic to avoid calling
   "indexOf" and "remove" on an ArrayList.

3) Refactor ConvertSelectorsToIns to reduce duplicated code between the
   handling for "selector" and "equals" filters.

* Not so final.

* Fixes.

* Fix test.

* Fix test.
2024-01-03 08:56:22 -08:00
..
_bin De-incubation cleanup in code, docs, packaging (#9108) 2020-01-03 12:33:19 -05:00
api-reference Add api for Retrieving unused segments (#15415) 2023-12-11 16:32:18 -05:00
assets Revamp design page (#15486) 2023-12-08 11:40:24 -08:00
comparisons remove ref to plywood repo (#12809) 2022-07-26 10:12:13 +08:00
configuration Revamp design page (#15486) 2023-12-08 11:40:24 -08:00
data-management Re-arranging sections for append and replace docs. (#15497) 2023-12-06 13:13:05 -08:00
design Clean up duty for non-overlapping eternity tombstones (#15281) 2023-12-11 08:57:15 -08:00
development Prometheus config property doc fixup (#15613) 2024-01-02 16:28:42 -08:00
ingestion Revamp design page (#15486) 2023-12-08 11:40:24 -08:00
misc Update Ingestion section (#14023) 2023-05-19 09:42:27 -07:00
multi-stage-query Allow empty inserts and replaces in MSQ. (#15495) 2024-01-02 13:05:51 -08:00
operations Add MSQ Durable Storage Connector for Google Cloud Storage and change current Google Cloud Storage client library (#15398) 2023-12-14 07:34:49 +05:30
querying New handling for COALESCE, SEARCH, and filter optimization. (#15609) 2024-01-03 08:56:22 -08:00
release-info Fix `used_flag_last_updated` to `used_status_last_updated` in upgrade-notes.md (#15601) 2024-01-03 11:48:07 +08:00
tutorials Doc fixes for query from deep storage and MSQ (#15313) 2023-11-03 10:52:20 +05:30