Changes:
- Add `LookupLoadingSpec` to support 3 modes of lookup loading: ALL, NONE, ONLY_REQUIRED
- Add method `Task.getLookupLoadingSpec()`
- Do not load any lookups for `KillUnusedSegmentsTask`
Issue: #14989
The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information.
This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.
* Additional short circuiting knowledge in filter bundles.
Three updates:
1) The parameter "selectionRowCount" on "makeFilterBundle" is renamed
"applyRowCount", and redefined as an upper bound on rows remaining
after short-circuiting (rather than number of rows selected so far).
This definition works better for OR filters, which pass through the
FALSE set rather than the TRUE set to the next subfilter.
2) AndFilter uses min(applyRowCount, indexIntersectionSize) rather
than using selectionRowCount for the first subfilter and indexIntersectionSize
for each filter thereafter. This improves accuracy when the incoming
applyRowCount is smaller than the row count from the first few indexes.
3) OrFilter uses min(applyRowCount, totalRowCount - indexUnionSize) rather
than applyRowCount for subfilters. This allows an OR filter to pass
information about short-circuiting to its subfilters.
To help write tests for this, the patch also moves the sampled
wikiticker data file from sql to processing.
* Forbidden APIs.
* Forbidden APIs.
* Better comments.
* Fix inspection.
* Adjustments to tests.
Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines
* Updated the drill test expected results which are failing due to druid's default sorting algorithm taking nulls first approach.
* Corrected the queries where date time values are directly provided
* marked 2 cases failing with resultset casting issues
* SQL tests: avoid mixing skip and cannot vectorize.
skipVectorize switches off vectorization tests completely, and
cannotVectorize turns vectorization tests into negative tests. It doesn't
make sense to use them together, so this patch makes it an error to do so,
and cleans up cases where both are mentioned.
This patch also has the effect of changing various tests from skipVectorize
to cannotVectorize, because in the past when both were mentioned,
skipVectorize would take priority.
* Fix bug with StringAnyAggregatorFactory attempting to vectorize when it cannt.
* Fix tests.
This PR aims to introduce Window functions on MSQ by doing the following:
Introduce a Window querykit for handling window queries along with its factory and a processor for window queries
If a window operator is present with a partition by clause, pushes the partition as a shuffle spec of the previous stage
In presence of empty OVER() clause lets all operators loose on a single rac
In presence of no empty OVER() clause, breaks down each window into individual stages
Associated machinery to handle window functions in MSQ
Introduced a separate hidden engine feature WINDOW_LEAF_OPERATOR which is set only for MSQ engine. In presence of this feature, the planner plans without the leaf operators by creating a window query over an inner scan query. In case of native this is set to false and the planner generates the leafOperators
Guardrails around materialization
Comprehensive UTs
This commit allows to use the MV_FILTER_ONLY & MV_FILTER_NONE functions
with a non literal argument.
Currently `select mv_filter_only('mvd_dim', 'array_dim') from 'table'`
returns a `Unhandled Query Planning Failure`
This is being tackled and also considered for the cases where the `array_dim`
having null & empty values.
Changed classes:
* `MultiValueStringOperatorConversions`
* `ApplyFunction`
* `CalciteMultiValueStringQueryTest`
* Rewrite exotic LAST_VALUE/FIRST_VALUE to self-reference.
* rewrite `LAST_VALUE(x) OVER (ORDER BY y)` to `LAG(x,0) OVER (ORDER BY y)`
* not directly to `x` because some queries get unplannable that way
* restrict `NTILE` from framing - as its not supported
* add test to ensure that all of the `KNOWN_WINDOW_FNS`'s framing is accounted for
* checkstyle/etc
* add test
* apidoc
* add assume to avoid MSQ fail
changes:
* adds TypedInFilter which preserves matching sets in the native match value type
* SQL planner uses new TypedInFilter when druid.generic.useDefaultValueForNull=false (the default)
* SortMerge join support for IS NOT DISTINCT FROM.
The patch adds a "requiredNonNullKeyParts" field to the sortMerge
processor, which has the list of key parts that must be nonnull for
an equijoin condition to match. Conditions with SQL "=" are present in
the list; conditions with SQL "IS NOT DISTINCT FROM" are absent from
the list.
* Fix test.
* Update javadoc.
* Update Calcite*Test to use junit5
* change the way temp dirs are handled
* add openrewrite workflow to safeguard upgrade
* replace junitparamrunner with standard junit5 parametered tests
* update a few rules to junit5 api
* lots of boring changes
* cleanup QueryLogHook
* cleanup
* fix compile error: ARRAYS_DATASOURCE
* fix test
* remove enclosed
* empty
+TEST:TDigestSketchSqlAggregatorTest,HllSketchSqlAggregatorTest,DoublesSketchSqlAggregatorTest,ThetaSketchSqlAggregatorTest,ArrayOfDoublesSketchSqlAggregatorTest,BloomFilterSqlAggregatorTest,BloomDimFilterSqlTest,CatalogIngestionTest,CatalogQueryTest,FixedBucketsHistogramQuantileSqlAggregatorTest,QuantileSqlAggregatorTest,MSQArraysTest,MSQDataSketchesTest,MSQExportTest,MSQFaultsTest,MSQInsertTest,MSQLoadedSegmentTests,MSQParseExceptionsTest,MSQReplaceTest,MSQSelectTest,InsertLockPreemptedFaultTest,MSQWarningsTest,SqlMSQStatementResourcePostTest,SqlStatementResourceTest,CalciteSelectJoinQueryMSQTest,CalciteSelectQueryMSQTest,CalciteUnionQueryMSQTest,MSQTestBase,VarianceSqlAggregatorTest,SleepSqlTest,SqlRowTransformerTest,DruidAvaticaHandlerTest,DruidStatementTest,BaseCalciteQueryTest,CalciteArraysQueryTest,CalciteCorrelatedQueryTest,CalciteExplainQueryTest,CalciteExportTest,CalciteIngestionDmlTest,CalciteInsertDmlTest,CalciteJoinQueryTest,CalciteLookupFunctionQueryTest,CalciteMultiValueStringQueryTest,CalciteNestedDataQueryTest,CalciteParameterQueryTest,CalciteQueryTest,CalciteReplaceDmlTest,CalciteScanSignatureTest,CalciteSelectQueryTest,CalciteSimpleQueryTest,CalciteSubqueryTest,CalciteSysQueryTest,CalciteTableAppendTest,CalciteTimeBoundaryQueryTest,CalciteUnionQueryTest,CalciteWindowQueryTest,DecoupledPlanningCalciteJoinQueryTest,DecoupledPlanningCalciteQueryTest,DecoupledPlanningCalciteUnionQueryTest,DrillWindowQueryTest,DruidPlannerResourceAnalyzeTest,IngestTableFunctionTest,QueryTestRunner,SqlTestFrameworkConfig,SqlAggregationModuleTest,ExpressionsTest,GreatestExpressionTest,IPv4AddressMatchExpressionTest,IPv4AddressParseExpressionTest,IPv4AddressStringifyExpressionTest,LeastExpressionTest,TimeFormatOperatorConversionTest,CombineAndSimplifyBoundsTest,FiltrationTest,SqlQueryTest,CalcitePlannerModuleTest,CalcitesTest,DruidCalciteSchemaModuleTest,DruidSchemaNoDataInitTest,InformationSchemaTest,NamedDruidSchemaTest,NamedLookupSchemaTest,NamedSystemSchemaTest,RootSchemaProviderTest,SystemSchemaTest,CalciteTestBase,SqlResourceTest
* use @Nested
* add rule to remove enclosed; upgrade surefire
* remove enclosed
* cleanup
* add comment about surefire exclude
changes:
* fix issues with array_contains and array_overlap with null left side arguments
* modify singleThreaded stuff to allow optimizing Function similar to how we do for ExprMacro - removed SingleThreadSpecializable in favor of default impl of asSingleThreaded on Expr with clear javadocs that most callers shouldn't be calling it directly and should be using Expr.singleThreaded static method which uses a shuttle and delegates to asSingleThreaded instead
* add optimized 'singleThreaded' versions of array_contains and array_overlap
* add mv_harmonize_nulls native expression to use with MV_CONTAINS and MV_OVERLAP to allow them to behave consistently with filter rewrites, coercing null and [] into [null]
* fix bug with casting rhs argument for native array_contains and array_overlap expressions
* MSQ: Validate that strings and string arrays are not mixed.
When multi-value strings and string arrays coexist in the same column,
it causes problems with "classic MVD" style queries such as:
select * from wikipedia -- fails at runtime
select count(*) from wikipedia where flags = 'B' -- fails at planning time
select flags, count(*) from wikipedia group by 1 -- fails at runtime
To avoid these problems, this patch adds type verification for INSERT
and REPLACE. It is targeted: the only type changes that are blocked are
string-to-array and array-to-string. There is also a way to exclude
certain columns from the type checks, if the user really knows what
they're doing.
* Fixes.
* Tests and docs and error messages.
* More docs.
* Adjustments.
* Adjust message.
* Fix tests.
* Fix test in DV mode.
* MSQ: Plan without implicit sorting.
This patch adds an EngineFeature "GROUPBY_IMPLICITLY_SORTS" and sets
it true for native, false for MSQ. It's useful for two reasons:
1) In the future we'll likely want MSQ to hash-partition for GROUP BY
instead of using a global sort, which would mean MSQ would not
implicitly ORDER BY when there is a GROUP BY.
2) When doing REPLACE with MSQ, CLUSTERED BY is transformed to ORDER BY.
We should retain that ORDER BY, as it may be a subset of the GROUP BY,
and it is important to remember which fields the user wanted to include in
range shard specs.
* Fix tests.
* Fix tests for real.
* Fix test.
* Pull up literals in InputAccessor
* pull up literals in `InputAccessor`
* remove the need to pass `constants` of `Window` operator
Fixes#15353
* update test
* enable relax_nulls
Handling array with boolean literals like ARRAY[true, false]
Druid appears to be able to convert an array with boolean expressions like this array[added=deleted, added=delta] into a numeric array of 0 and 1: select array[added=deleted, added=delta] from wikipedia
However, select array[true, false] from wikipedia doesn't work.
This PR fixes this.
Recently this test started other tests from executing by triggering a bug somewhere in surefire.
This patch disables the testcases in case of non-sql compat mode.
While converting Sequence<ScanResultValue> to Sequence<Frames>, when maxSubqueryBytes is enabled, we batch the results to prevent creating a single frame per ScanResultValue. Batching requires peeking into the actual value, and checking if the row signature of the scan result’s value matches that of the previous value.
Since we can do this indefinitely (in the worst case all of them have the same signature), we keep fetching them and accumulating them in a list (on the heap). We don’t really know how much to batch before we actually write the value as frames.
The PR modifies the batching logic to not accumulate the results in an intermediary list
* plan join(s) in decoupled mode
* configure DecoupledPlanningCalciteJoinQueryTest
the test has 593 cases; however there are quite a few parameterized
from the 107 methods annotated with @Test - 42 is not yet working
* replace the isRoot hack in DruidQueryGenerator with a logic that instead looks ahead for the next node; and doesn't let the previous node do the Project - this makes it plan more likely than the existing planner
allow a hashjoin result to be converted to RowsAndColumns
added StorageAdapterRowsAndColumns
fix incorrect isConcrete() return values during early phase of planning
The code in the groupBy engine and the topN engine assume that the dimensions are comparable and can call dimA.compareTo(dimB) to sort the dimensions and group them together.
This works well for the primitive dimensions, because they are Comparable, however falls apart when the dimensions can be arrays (or in future scenarios complex columns). In cases when the dimensions are not comparable, Druid resorts to having a wrapper type ComparableStringArray and ComparableList, which is a Comparable, based on the list comparator.
* Fix up typos, inaccuracies and clean up code related to PARTITIONED BY.
* Remove wrapper function and update tests to use DruidExceptionMatcher.
* Checkstyle and Intellij inspection fixes.
This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner https://github.com/apache/druid/pull/13686 from @paul-rogers, Refactoring the IngestHandler and subclasses to produce a validated SqlInsert instance node instead of the previous Insert source node. The SqlInsert node is then validated in the calcite validator. The validation that is implemented as part of this pr, is only that for the source node, and some of the validation that was previously done in the ingest handlers. As part of this change, the partitionedBy clause can be supplied by the table catalog metadata if it exists, and can be omitted from the ingest time query in this case.
* Globally disable AUTO_CLOSE_JSON_CONTENT.
This JsonGenerator feature is on by default. It causes problems with code
like this:
try (JsonGenerator jg = ...) {
jg.writeStartArray();
for (x : xs) {
jg.writeObject(x);
}
jg.writeEndArray();
}
If a jg.writeObject call fails due to some problem with the data it's
reading, the JsonGenerator will write the end array marker automatically
when closed as part of the try-with-resources. If the generator is writing
to a stream where the reader does not have some other mechanism to realize
that an exception was thrown, this leads the reader to believe that the
array is complete when it actually isn't.
Prior to this patch, we disabled AUTO_CLOSE_JSON_CONTENT for JSON-wrapped
SQL result formats in #11685, which fixed an issue where such results
could be erroneously interpreted as complete. This patch fixes a similar
issue with task reports, and all similar issues that may exist elsewhere,
by disabling the feature globally.
* Update test.
This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner https://github.com/apache/druid/pull/13686 from @paul-rogers, extending the PARTITION BY clause to accept string literals for the time partitioning
Executing single value correlated queries will throw an exception today since single_value function is not available in druid.
With these added classes, this provides druid, the capability to plan and run such queries.
Fixes an oversight after #14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements).
If lots of keys map to the same value, reversing a LOOKUP call can slow
things down unacceptably. To protect against this, this patch introduces
a parameter sqlReverseLookupThreshold representing the maximum size of an
IN filter that will be created as part of lookup reversal.
If inSubQueryThreshold is set to a smaller value than
sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead.
This allows users to use that single parameter to control IN sizes if they
wish.
* Identify not range filters without negating subexpressions
Earlier betweenish (range/bounds) filters were identified thru
a process of negating the subexpressions which may have not performed that well.
(it could have dominated the runtime in some cases)
This patch makes that unnecessary as its able to create the negate expression directly.
* add test;fix for multiple intervals
introduce checks to ensure that window frame is supported
added check to ensure that no expressions are set as bounds
added logic to detect following/following like cases - described in Window function fails to demarcate if 2 following are used #15739
currently RANGE frames are only supported correctly if both endpoints are unbounded or current row Offset based window range support #15767
added windowingStrictValidation context key to provide a way to override the check
Fixes a bug introduced in #15609, where queries involving filters on
TIME_FLOOR could encounter ClassCastException when comparing RangeValue
in CombineAndSimplifyBounds.
Prior to #15609, CombineAndSimplifyBounds would remove, rebuild, and
re-add all numeric range filters as part of consolidating numeric range
filters for the same column under the least restrictive type. #15609
included a change to only rebuild numeric range filters when a consolidation
opportunity actually arises. The bug was introduced because the unconditional
rebuild, as a side effect, masked the fact that in some cases range filters
would be created with string match values and a LONG match value type.
This patch changes the fixup to happen at the time the range filter is
initially created, rather than in CombineAndSimplifyBounds.
A low value of inSubQueryThreshold can cause queries with IN filter to plan as joins more commonly. However, some of these join queries may not get planned as IN filter on data nodes and causes significant perf regression.
* support groups windowing mode; which is a close relative of ranges (but not in the standard)
* all windows with range expressions will be executed wit it groups
* it will be 100% correct in case for both bounds its true that: isCurrentRow() || isUnBounded()
* this covers OVER ( ORDER BY COL )
* for other cases it will have some chances of getting correct results...
* Add ImmutableLookupMap for static lookups.
This patch adds a new ImmutableLookupMap, which comes with an
ImmutableLookupExtractor. It uses a fastutil open hashmap plus two
lists to store its data in such a way that forward and reverse
lookups can both be done quickly. I also observed footprint to be
somewhat smaller than Java HashMap + MapLookupExtractor for a 1 million
row lookup.
The main advantage, though, is that reverse lookups can be done much
more quickly than MapLookupExtractor (which iterates the entire map
for each call to unapplyAll). This speeds up the recently added
ReverseLookupRule (#15626) during SQL planning with very large lookups.
* Use in one more test.
* Fix benchmark.
* Object2ObjectOpenHashMap
* Fixes, and LookupExtractor interface update to have asMap.
* Remove commented-out code.
* Fix style.
* Fix import order.
* Add fastutil.
* Avoid storing Map entries.
* Reverse, pull up lookups in the SQL planner.
Adds two new rules:
1) ReverseLookupRule, which eliminates calls to LOOKUP by doing
reverse lookups.
2) AggregatePullUpLookupRule, which pulls up calls to LOOKUP above
GROUP BY, when the lookup is injective.
Adds configs `sqlReverseLookup` and `sqlPullUpLookup` to control whether
these rules fire. Both are enabled by default.
To minimize the chance of performance problems due to many keys mapping to
the same value, ReverseLookupRule refrains from reversing a lookup if there
are more keys than `inSubQueryThreshold`. The rationale for using this setting
is that reversal works by generating an IN, and the `inSubQueryThreshold`
describes the largest IN the user wants the planner to create.
* Add additional line.
* Style.
* Remove commented-out lines.
* Fix tests.
* Add test.
* Fix doc link.
* Fix docs.
* Add one more test.
* Fix tests.
* Logic, test updates.
* - Make FilterDecomposeConcatRule more flexible.
- Make CalciteRulesManager apply reduction rules til fixpoint.
* Additional tests, simplify code.
* CONCAT flattening, filter decomposition.
Flattening: CONCAT(CONCAT(x, y), z) is flattened to CONCAT(x, y, z). This
is especially useful for the || operator, which is a binary operator and
leads to non-flat CONCAT calls.
Filter decomposition: transforms CONCAT(x, '-', y) = 'a-b' into
x = 'a' AND y = 'b'.
* One more test.
* Fix two tests.
* Adjustments from review.
* Fix empty string problem, add tests.
I was looking into adding a rule to do this, and found that it was already
happening as part of Calcite's RexSimplify. So this patch simply adds some
tests to ensure that it continues to happen.
The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This task encompasses addressing both realtime and finalized segments.
This modification specifically addresses the issue with realtime segments. Tasks will now routinely communicate the schema for realtime segments during the segment announcement process. The Coordinator will identify the schema alongside the segment announcement and subsequently update the schema for realtime segments in the metadata cache.
This PR enables the flag by default to queue excess query requests in the jetty queue. Still keeping the flag so that it can be turned off if necessary. But the flag will be removed in the future.
changes:
* ColumnIndexSelector now extends ColumnSelector. The only real implementation of ColumnIndexSelector, ColumnSelectorColumnIndexSelector, already has a ColumnSelector, so this isn't very disruptive
* removed getColumnNames from ColumnSelector since it was not used
* VirtualColumns and VirtualColumn getIndexSupplier method now needs argument of ColumnIndexSelector instead of ColumnSelector, which allows expression virtual columns to correctly recognize other virtual columns, fixing an issue which would incorrectly handle other virtual columns as non-existent columns instead
* fixed a bug with sql planner incorrectly not using expression filter for equality filters on columns with extractionFn and no virtual column registry
This logic error causes sarg expansion to happen twice for IN or NOT IN points.
It doesn't affect the final generated native query, because the
redundant expansions gets combined. But it slows down planning, especially
for large NOT IN.
FILTER_INTO_JOIN is mainly run along with the other rules with the Volcano planner; however if the query starts highly underdefined (join conditions in the where clauses) that generic query could give a lot of room for the other rules to play around with only enabled it for when the join uses subqueries for its inputs.
PROJECT_FILTER rule is not that useful. and could increase planning times by providing new plans. This problem worsened after we started supporting inner joins with arbitrary join conditions in https://github.com/apache/druid/pull/15302
- Rename ExprType to BaseType in CollectComparisons, since ExprType is a thing
that exists elsewhere.
- Remove unused "notInRexNodes" from SearchOperatorConversion.
* New handling for COALESCE, SEARCH, and filter optimization.
COALESCE is converted by Calcite's parser to CASE, which is largely
counterproductive for us, because it ends up duplicating expressions.
In the current code we end up un-doing it in our CaseOperatorConversion.
This patch has a different approach:
1) Add CaseToCoalesceRule to convert CASE back to COALESCE earlier, before
the Volcano planner runs, using CaseToCoalesceRule.
2) Add FilterDecomposeCoalesceRule to decompose calls like
"f(COALESCE(x, y))" into "(x IS NOT NULL AND f(x)) OR (x IS NULL AND f(y))".
This helps use indexes when available on x and y.
3) Add CoalesceLookupRule to push COALESCE into the third arg of LOOKUP.
4) Add a native "coalesce" function so we can convert 3+ arg COALESCE.
The advantage of this approach is that by un-doing the CASE to COALESCE
conversion earlier, we have flexibility to do more stuff with
COALESCE (like decomposition and pushing into LOOKUP).
SEARCH is an operator used internally by Calcite to represent matching
an argument against some set of ranges. This patch improves our handling
of SEARCH in two ways:
1) Expand NOT points (point "holes" in the range set) from SEARCH as
`!(a || b)` rather than `!a && !b`, which makes it possible to convert
them to a "not" of "in" filter later.
2) Generate those nice conversions for NOT points even if the SEARCH
is not composed of 100% NOT points. Without this change, a SEARCH
for "x NOT IN ('a', 'b') AND x < 'm'" would get converted like
"x < 'a' OR (x > 'a' AND x < 'b') OR (x > 'b' AND x < 'm')".
One of the steps we take when generating Druid queries from Calcite
plans is to optimize native filters. This patch improves this step:
1) Extract common ANDed predicates in ConvertSelectorsToIns, so we can
convert "(a && x = 'b') || (a && x = 'c')" into "a && x IN ('b', 'c')".
2) Speed up CombineAndSimplifyBounds and ConvertSelectorsToIns on
ORs with lots of children by adjusting the logic to avoid calling
"indexOf" and "remove" on an ArrayList.
3) Refactor ConvertSelectorsToIns to reduce duplicated code between the
handling for "selector" and "equals" filters.
* Not so final.
* Fixes.
* Fix test.
* Fix test.
Fixes#15072
Before this modification , the third parameter (timezone) require to be a Literal, it will throw a error when this parameter is column Identifier.