mirror of https://github.com/apache/druid.git
01eec4a55e
* New handling for COALESCE, SEARCH, and filter optimization. COALESCE is converted by Calcite's parser to CASE, which is largely counterproductive for us, because it ends up duplicating expressions. In the current code we end up un-doing it in our CaseOperatorConversion. This patch has a different approach: 1) Add CaseToCoalesceRule to convert CASE back to COALESCE earlier, before the Volcano planner runs, using CaseToCoalesceRule. 2) Add FilterDecomposeCoalesceRule to decompose calls like "f(COALESCE(x, y))" into "(x IS NOT NULL AND f(x)) OR (x IS NULL AND f(y))". This helps use indexes when available on x and y. 3) Add CoalesceLookupRule to push COALESCE into the third arg of LOOKUP. 4) Add a native "coalesce" function so we can convert 3+ arg COALESCE. The advantage of this approach is that by un-doing the CASE to COALESCE conversion earlier, we have flexibility to do more stuff with COALESCE (like decomposition and pushing into LOOKUP). SEARCH is an operator used internally by Calcite to represent matching an argument against some set of ranges. This patch improves our handling of SEARCH in two ways: 1) Expand NOT points (point "holes" in the range set) from SEARCH as `!(a || b)` rather than `!a && !b`, which makes it possible to convert them to a "not" of "in" filter later. 2) Generate those nice conversions for NOT points even if the SEARCH is not composed of 100% NOT points. Without this change, a SEARCH for "x NOT IN ('a', 'b') AND x < 'm'" would get converted like "x < 'a' OR (x > 'a' AND x < 'b') OR (x > 'b' AND x < 'm')". One of the steps we take when generating Druid queries from Calcite plans is to optimize native filters. This patch improves this step: 1) Extract common ANDed predicates in ConvertSelectorsToIns, so we can convert "(a && x = 'b') || (a && x = 'c')" into "a && x IN ('b', 'c')". 2) Speed up CombineAndSimplifyBounds and ConvertSelectorsToIns on ORs with lots of children by adjusting the logic to avoid calling "indexOf" and "remove" on an ArrayList. 3) Refactor ConvertSelectorsToIns to reduce duplicated code between the handling for "selector" and "equals" filters. * Not so final. * Fixes. * Fix test. * Fix test. |
||
---|---|---|
.. | ||
aggregations.md | ||
arrays.md | ||
caching.md | ||
datasource.md | ||
datasourcemetadataquery.md | ||
dimensionspecs.md | ||
filters.md | ||
geo.md | ||
granularities.md | ||
groupbyquery.md | ||
having.md | ||
hll-old.md | ||
joins.md | ||
limitspec.md | ||
lookups.md | ||
math-expr.md | ||
multi-value-dimensions.md | ||
multitenancy.md | ||
nested-columns.md | ||
post-aggregations.md | ||
query-context.md | ||
query-execution.md | ||
query-from-deep-storage.md | ||
query-processing.md | ||
querying.md | ||
scan-query.md | ||
searchquery.md | ||
segmentmetadataquery.md | ||
select-query.md | ||
sorting-orders.md | ||
sql-aggregations.md | ||
sql-array-functions.md | ||
sql-data-types.md | ||
sql-functions.md | ||
sql-json-functions.md | ||
sql-metadata-tables.md | ||
sql-multivalue-string-functions.md | ||
sql-operators.md | ||
sql-query-context.md | ||
sql-scalar.md | ||
sql-translation.md | ||
sql-window-functions.md | ||
sql.md | ||
timeboundaryquery.md | ||
timeseriesquery.md | ||
tips-good-queries.md | ||
topnmetricspec.md | ||
topnquery.md | ||
troubleshooting.md | ||
using-caching.md | ||
virtual-columns.md |