Commit Graph

1036 Commits

Author SHA1 Message Date
Zoltan Haindrich 8252d72e2a
Pull up literals in InputAccessor (#16033)
* Pull up literals in InputAccessor

* pull up literals in `InputAccessor`
* remove the need to pass `constants` of `Window`  operator

Fixes #15353

* update test

* enable relax_nulls
2024-03-12 09:14:31 -07:00
Sree Charan Manamala ef9637eef1
Handling array with boolean literals (#16093)
Handling array with boolean literals like ARRAY[true, false]

Druid appears to be able to convert an array with boolean expressions like this array[added=deleted, added=delta] into a numeric array of 0 and 1: select array[added=deleted, added=delta] from wikipedia

However, select array[true, false] from wikipedia doesn't work.
This PR fixes this.
2024-03-12 12:28:16 +05:30
Soumyava 85ee775390
Handling latest_by and earliest_by on numeric columns correctly (#15939)
* Handling latest_by and earliest_by on numeric columns correctly

* Adding test
2024-03-11 13:49:21 -07:00
Zoltan Haindrich 2eb7d7a89b
Calcite tests remove expected exception (#16046)
* Calcite tests remove expected exception

* update testcases using `expectedException` to utilize `assertThrows` instead
* remove `BaseCalciteQueryTest#expectedException`
* fixes `cannotVectorize` so it doesn't anymore stops further processing
* `msqIncompatible` is not anymore toggles a boolean - its an `Assume` instead

Fixes #15423

* cleanup

* move msqIncompat

* update test

* cleanup

* remove comment

* empty-commit

* empty-commit
2024-03-11 13:23:57 +05:30
Zoltan Haindrich aaa64832fd
Disable DecoupledPlanningCalciteJoinQueryTest until it gets fixed (#16070)
Recently this test started other tests from executing by triggering a bug somewhere in surefire.
This patch disables the testcases in case of non-sql compat mode.
2024-03-07 12:55:48 -08:00
Laksh Singla 5f588fa45c
Fix bug while materializing scan's result to frames (#15987)
While converting Sequence<ScanResultValue> to Sequence<Frames>, when maxSubqueryBytes is enabled, we batch the results to prevent creating a single frame per ScanResultValue. Batching requires peeking into the actual value, and checking if the row signature of the scan result’s value matches that of the previous value.

Since we can do this indefinitely (in the worst case all of them have the same signature), we keep fetching them and accumulating them in a list (on the heap). We don’t really know how much to batch before we actually write the value as frames.

The PR modifies the batching logic to not accumulate the results in an intermediary list
2024-03-07 17:11:44 +05:30
Vishesh Garg cf9bc507f6
Fix compilation failure due to missing constant MISSING_JOIN_CONVERSION (#16050)
* Reintroduce variable MISSING_JOIN_CONVERSION

* Remove redundant constant MISSING_JOIN_CONVERSION2

* Correct fix to address failing tests
2024-03-06 15:34:39 +08:00
Zoltan Haindrich 65c3b4d31a
Support join in decoupled mode (#15957)
* plan join(s) in decoupled mode
* configure DecoupledPlanningCalciteJoinQueryTest
        the test has 593 cases; however there are quite a few parameterized
        from the 107 methods annotated with @Test - 42 is not yet working
 * replace the isRoot hack in DruidQueryGenerator with a logic that instead looks ahead for the next node; and doesn't let the previous node do the Project - this makes it plan more likely than the existing planner
2024-03-05 19:10:13 -06:00
Zoltan Haindrich bb882727c0
Fix Windowing/scanAndSort query issues on top of Joins. (#15996)
allow a hashjoin result to be converted to RowsAndColumns
added StorageAdapterRowsAndColumns
fix incorrect isConcrete() return values during early phase of planning
2024-03-05 15:05:31 +05:30
Zoltan Haindrich e469b7ed34
Make setting QUERY_CONTEXT_DEFAULT explicit in tests (#16010) 2024-03-05 10:54:16 +05:30
Adarsh Sanjeev 93eeb05eaf
Revert explain attributes change to old behaviour. (#16004)
* Revert explain attributes change

* Fix tests

* Fix tests

* Rename function
2024-03-04 15:56:02 +05:30
Zoltan Haindrich bf0995f846
Introduce dynamic table append (#15897) 2024-03-01 04:31:57 -05:00
Laksh Singla 17e4f3ac60
Refactor GroupBy and TopN code to relax the constraint of dimensions being comparable (#15559)
The code in the groupBy engine and the topN engine assume that the dimensions are comparable and can call dimA.compareTo(dimB) to sort the dimensions and group them together.
This works well for the primitive dimensions, because they are Comparable, however falls apart when the dimensions can be arrays (or in future scenarios complex columns). In cases when the dimensions are not comparable, Druid resorts to having a wrapper type ComparableStringArray and ComparableList, which is a Comparable, based on the list comparator.
2024-02-27 11:39:29 +05:30
Soumyava 51cc729fd1
Enforcing type checking for flatten concat (#15903) 2024-02-26 21:53:49 -08:00
Abhishek Radhakrishnan 67a6224d91
Fix up incorrect `PARTITIONED BY` error messages (#15961)
* Fix up typos, inaccuracies and clean up code related to PARTITIONED BY.

* Remove wrapper function and update tests to use DruidExceptionMatcher.

* Checkstyle and Intellij inspection fixes.
2024-02-26 14:17:53 -05:00
Zoltan Haindrich 06deda9415
ScanAndSort query fails with NPE for simple queries (#15914)
* some stuff

* add dummy fields

* draft-fix

* rename test

* cleanup

* add null

* cleanup

* cleanup

* add test

* updates

* move check tp constructore

* cleanup

* updates/etc

* fix some more

* add rowSignatureMode

* checkstyle/etc

* override

* missing msqIncompat

* fix test

* fixes

* undo

* updates

* remove param
2024-02-24 15:33:50 -08:00
zachjsh 8ebf237576
Move INSERT & REPLACE validation to the Calcite validator (#15908)
This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner https://github.com/apache/druid/pull/13686 from @paul-rogers, Refactoring the IngestHandler and subclasses to produce a validated SqlInsert instance node instead of the previous Insert source node. The SqlInsert node is then validated in the calcite validator. The validation that is implemented as part of this pr, is only that for the source node, and some of the validation that was previously done in the ingest handlers. As part of this change, the partitionedBy clause can be supplied by the table catalog metadata if it exists, and can be omitted from the ingest time query in this case.
2024-02-22 14:01:59 -05:00
Zoltan Haindrich bcce0806d7
Support Union in decoupled mode (#15870) 2024-02-21 10:54:50 -05:00
Gian Merlino 9c41827dba
Globally disable AUTO_CLOSE_JSON_CONTENT. (#15880)
* Globally disable AUTO_CLOSE_JSON_CONTENT.

This JsonGenerator feature is on by default. It causes problems with code
like this:

  try (JsonGenerator jg = ...) {
    jg.writeStartArray();
    for (x : xs) {
      jg.writeObject(x);
    }
    jg.writeEndArray();
  }

If a jg.writeObject call fails due to some problem with the data it's
reading, the JsonGenerator will write the end array marker automatically
when closed as part of the try-with-resources. If the generator is writing
to a stream where the reader does not have some other mechanism to realize
that an exception was thrown, this leads the reader to believe that the
array is complete when it actually isn't.

Prior to this patch, we disabled AUTO_CLOSE_JSON_CONTENT for JSON-wrapped
SQL result formats in #11685, which fixed an issue where such results
could be erroneously interpreted as complete. This patch fixes a similar
issue with task reports, and all similar issues that may exist elsewhere,
by disabling the feature globally.

* Update test.
2024-02-16 08:52:48 -08:00
Clint Wylie fe2ba8cc28
fix return type inference of parse_long, which can also be null if string is not parseable into a long (#15909)
* fix return type inference of parse_long, which can also be null if string is not parseable into a long

* fix msq test
2024-02-15 08:45:34 -08:00
zachjsh f9ee2c353b
Extend the PARTITION BY clause to accept string literals for the time partitioning (#15836)
This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner https://github.com/apache/druid/pull/13686 from @paul-rogers, extending the PARTITION BY clause to accept string literals for the time partitioning
2024-02-09 11:45:38 -05:00
Sree Charan Manamala 57e12df352
Sql Single Value Aggregator for scalar queries (#15700)
Executing single value correlated queries will throw an exception today since single_value function is not available in druid.
With these added classes, this provides druid, the capability to plan and run such queries.
2024-02-08 19:20:30 +05:30
Soumyava f3996b96ff
Fixes for safe_divide with vectorize and datatypes (#15839)
* Fix for save_divide with vectorize

* More fixes

* Update to use expr.eval(null) for both cases when denominator is 0
2024-02-08 14:40:42 +05:30
Adarsh Sanjeev 514b3b4d01
Add export capabilities to MSQ with SQL syntax (#15689)
* Add test

* Parser changes to support export statements

* Fix builds

* Address comments

* Add frame processor

* Address review comments

* Fix builds

* Update syntax

* Webconsole workaround

* Refactor

* Refactor

* Change export file path

* Update docs

* Remove webconsole changes

* Fix spelling mistake

* Parser changes, add tests

* Parser changes, resolve build warnings

* Fix failing test

* Fix failing test

* Fix IT tests

* Add tests

* Cleanup

* Fix unparse

* Fix forbidden API

* Update docs

* Update docs

* Address review comments

* Address review comments

* Fix tests

* Address review comments

* Fix insert unparse

* Add external write resource action

* Fix tests

* Add resource check to overlord resource

* Fix tests

* Add IT

* Update syntax

* Update tests

* Update permission

* Address review comments

* Address review comments

* Address review comments

* Add tests

* Add check for runtime parameter for bucket and path

* Add check for runtime parameter for bucket and path

* Add tests

* Update docs

* Fix NPE

* Update docs, remove deadcode

* Fix formatting
2024-02-07 22:08:50 +05:30
Clint Wylie 23d4fade90
use NullFilter for SQL rewrite of MV_CONTAINS and MV_OVERLAP for null array elements (#15855)
Fixes an oversight after #14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements).
2024-02-07 19:40:41 +05:30
Zoltan Haindrich fdc7cec271
Support Window operators in decoupled planning (#15815) 2024-02-07 04:09:48 -05:00
Gian Merlino 54b30646f3
Add sqlReverseLookupThreshold for ReverseLookupRule. (#15832)
If lots of keys map to the same value, reversing a LOOKUP call can slow
things down unacceptably. To protect against this, this patch introduces
a parameter sqlReverseLookupThreshold representing the maximum size of an
IN filter that will be created as part of lookup reversal.

If inSubQueryThreshold is set to a smaller value than
sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead.
This allows users to use that single parameter to control IN sizes if they
wish.
2024-02-06 16:32:05 +05:30
Soumyava b86f31f2c0
Addressing shapeshifting issues with window functions (#15807)
Addressing shapeshifting issues with window functions
2024-02-06 11:12:20 +05:30
Zoltan Haindrich 392d585ff8
Identify not range filters without negating subexpressions (#15766)
* Identify not range filters without negating subexpressions

Earlier betweenish (range/bounds) filters were identified thru
a process of negating the subexpressions which may have not performed that well.
(it could have dominated the runtime in some cases)
This patch makes that unnecessary as its able to create the negate expression directly.

* add test;fix for multiple intervals
2024-02-05 19:12:58 -08:00
Zoltan Haindrich 8f5b7522c7
Strict window frame checks (#15746)
introduce checks to ensure that window frame is supported
added check to ensure that no expressions are set as bounds
added logic to detect following/following like cases - described in Window function fails to demarcate if 2 following are used #15739
currently RANGE frames are only supported correctly if both endpoints are unbounded or current row Offset based window range support #15767
added windowingStrictValidation context key to provide a way to override the check
2024-02-02 16:21:53 +05:30
Laksh Singla 7d65caf0c5
Update the docs for EARLIEST_BY/LATEST_BY aggregators with the newly added numeric capabilities (#15670) 2024-02-01 10:24:43 +05:30
Zoltan Haindrich f701197224
Enable ArrayListRowsAndColumns to StorageAdapter conversion (#15735) 2024-01-31 02:36:58 -05:00
Gian Merlino 38a1e827ab
Fix up value types when creating range filters. (#15778)
Fixes a bug introduced in #15609, where queries involving filters on
TIME_FLOOR could encounter ClassCastException when comparing RangeValue
in CombineAndSimplifyBounds.

Prior to #15609, CombineAndSimplifyBounds would remove, rebuild, and
re-add all numeric range filters as part of consolidating numeric range
filters for the same column under the least restrictive type. #15609
included a change to only rebuild numeric range filters when a consolidation
opportunity actually arises. The bug was introduced because the unconditional
rebuild, as a side effect, masked the fact that in some cases range filters
would be created with string match values and a LONG match value type.

This patch changes the fixup to happen at the time the range filter is
initially created, rather than in CombineAndSimplifyBounds.
2024-01-29 13:30:47 -08:00
Abhishek Agarwal 989a8f7874
Better error message for date_trunc operators (#15759)
IAEs are not bubbled up and show up as a runtime failure to the user which are not helpful. See https://apachedruidworkspace.slack.com/archives/C0303FDCZEZ/p1706185796975109 for one such example. This change will fix that.
2024-01-27 11:22:39 +05:30
Karan Kumar c4990f56d6
Prepare main branch for next 30.0.0 release. (#15707) 2024-01-23 15:55:54 +05:30
Zoltan Haindrich d6a12c4389
Add ability to enable ResultCache in tests (#15465) 2024-01-22 09:02:59 -05:00
Pranav 45b30dc07d
Revert "Change default inSubQueryThreshold (#15336)" (#15722)
A low value of inSubQueryThreshold can cause queries with IN filter to plan as joins more commonly. However, some of these join queries may not get planned as IN filter on data nodes and causes significant perf regression.
2024-01-22 11:34:39 +05:30
Zoltan Haindrich 8a43db9395
Range support in window expressions (support them as groups) (#15365)
* support groups windowing mode; which is a close relative of ranges (but not in the standard)
* all windows with range expressions will be executed wit it groups
* it will be 100% correct in case for both bounds its true that: isCurrentRow() || isUnBounded()
  * this covers OVER ( ORDER BY COL )
* for other cases it will have some chances of getting correct results...
2024-01-17 00:05:21 -06:00
Gian Merlino 500681d0cb
Add ImmutableLookupMap for static lookups. (#15675)
* Add ImmutableLookupMap for static lookups.

This patch adds a new ImmutableLookupMap, which comes with an
ImmutableLookupExtractor. It uses a fastutil open hashmap plus two
lists to store its data in such a way that forward and reverse
lookups can both be done quickly. I also observed footprint to be
somewhat smaller than Java HashMap + MapLookupExtractor for a 1 million
row lookup.

The main advantage, though, is that reverse lookups can be done much
more quickly than MapLookupExtractor (which iterates the entire map
for each call to unapplyAll). This speeds up the recently added
ReverseLookupRule (#15626) during SQL planning with very large lookups.

* Use in one more test.

* Fix benchmark.

* Object2ObjectOpenHashMap

* Fixes, and LookupExtractor interface update to have asMap.

* Remove commented-out code.

* Fix style.

* Fix import order.

* Add fastutil.

* Avoid storing Map entries.
2024-01-13 13:14:01 -08:00
Gian Merlino 866fe1cda6
Fix some naming related to AggregatePullUpLookupRule. (#15677)
It was called "split" rather than "pull up" in some places. This patch
standardizes on "pull up".
2024-01-12 15:41:58 -08:00
Gian Merlino cccf13ea82
Reverse, pull up lookups in the SQL planner. (#15626)
* Reverse, pull up lookups in the SQL planner.

Adds two new rules:

1) ReverseLookupRule, which eliminates calls to LOOKUP by doing
   reverse lookups.

2) AggregatePullUpLookupRule, which pulls up calls to LOOKUP above
   GROUP BY, when the lookup is injective.

Adds configs `sqlReverseLookup` and `sqlPullUpLookup` to control whether
these rules fire. Both are enabled by default.

To minimize the chance of performance problems due to many keys mapping to
the same value, ReverseLookupRule refrains from reversing a lookup if there
are more keys than `inSubQueryThreshold`. The rationale for using this setting
is that reversal works by generating an IN, and the `inSubQueryThreshold`
describes the largest IN the user wants the planner to create.

* Add additional line.

* Style.

* Remove commented-out lines.

* Fix tests.

* Add test.

* Fix doc link.

* Fix docs.

* Add one more test.

* Fix tests.

* Logic, test updates.

* - Make FilterDecomposeConcatRule more flexible.

- Make CalciteRulesManager apply reduction rules til fixpoint.

* Additional tests, simplify code.
2024-01-12 00:06:31 -08:00
Zoltan Haindrich e597cc2949
Remove UnaryFunctionOperatorConversion and RoundOperatorConversion (#15566)
* get rid of roun op conv

* cleanup

* use DirectOperatorConversion instead unary

* import order
2024-01-12 10:06:23 +05:30
Gian Merlino 6c18434028
CONCAT flattening, filter decomposition. (#15634)
* CONCAT flattening, filter decomposition.

Flattening: CONCAT(CONCAT(x, y), z) is flattened to CONCAT(x, y, z). This
is especially useful for the || operator, which is a binary operator and
leads to non-flat CONCAT calls.

Filter decomposition: transforms CONCAT(x, '-', y) = 'a-b' into
x = 'a' AND y = 'b'.

* One more test.

* Fix two tests.

* Adjustments from review.

* Fix empty string problem, add tests.
2024-01-11 11:18:50 -08:00
Gian Merlino ee77fa7fb3
Add tests for CASE decomposition. (#15639)
I was looking into adding a rule to do this, and found that it was already
happening as part of Calcite's RexSimplify. So this patch simply adds some
tests to ensure that it continues to happen.
2024-01-10 13:24:24 -08:00
Ankit Kothari 355c2f5da0
Add sql + ingestion compatibility for first/last on numeric values (#15607)
SQL compatibility for numeric last and first column types.
Ingestion UI now provides option for first and last aggregation as well.
2024-01-10 12:59:38 +05:30
Rishabh Singh 71f5307277
Eliminate Periodic Realtime Segment Metadata Queries: Task Now Publish Schema for Seamless Coordinator Updates (#15475)
The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This task encompasses addressing both realtime and finalized segments.

This modification specifically addresses the issue with realtime segments. Tasks will now routinely communicate the schema for realtime segments during the segment announcement process. The Coordinator will identify the schema alongside the segment announcement and subsequently update the schema for realtime segments in the metadata cache.
2024-01-10 08:55:56 +05:30
Abhishek Agarwal 468b99e608
Enable query request queuing by default when total laning is turned on. (#15440)
This PR enables the flag by default to queue excess query requests in the jetty queue. Still keeping the flag so that it can be turned off if necessary. But the flag will be removed in the future.
2024-01-09 07:54:26 +05:30
Clint Wylie df5bcd1367
fix bugs with expression virtual column indexes for expression virtual columns which refer to other virtual columns (#15633)
changes:
* ColumnIndexSelector now extends ColumnSelector. The only real implementation of ColumnIndexSelector, ColumnSelectorColumnIndexSelector, already has a ColumnSelector, so this isn't very disruptive
* removed getColumnNames from ColumnSelector since it was not used
* VirtualColumns and VirtualColumn getIndexSupplier method now needs argument of ColumnIndexSelector instead of ColumnSelector, which allows expression virtual columns to correctly recognize other virtual columns, fixing an issue which would incorrectly handle other virtual columns as non-existent columns instead
* fixed a bug with sql planner incorrectly not using expression filter for equality filters on columns with extractionFn and no virtual column registry
2024-01-08 13:10:11 -08:00
Jonathan Wei 5d1e66b8f9
Allow broker to use catalog for datasource schemas for SQL queries (#15469)
* Allow broker to use catalog for datasource schemas

* More PR comments

* PR comments
2024-01-08 13:46:08 -06:00
Gian Merlino 0422d9d507
Fix redundant expansion in SearchOperatorConversion. (#15625)
This logic error causes sarg expansion to happen twice for IN or NOT IN points.
It doesn't affect the final generated native query, because the
redundant expansions gets combined. But it slows down planning, especially
for large NOT IN.
2024-01-05 12:42:12 -08:00
Zoltan Haindrich b9679d0884
Run filter-into-join rule early for subqueries and disable project-filter rule (#15511)
FILTER_INTO_JOIN is mainly run along with the other rules with the Volcano planner; however if the query starts highly underdefined (join conditions in the where clauses) that generic query could give a lot of room for the other rules to play around with only enabled it for when the join uses subqueries for its inputs. 

PROJECT_FILTER rule is not that useful. and could increase planning times by providing new plans. This problem worsened after we started supporting inner joins with arbitrary join conditions in https://github.com/apache/druid/pull/15302
2024-01-04 15:33:45 +05:30
Gian Merlino 5c3391a084
Follow-ups to SEARCH and IN from #15609. (#15623)
- Rename ExprType to BaseType in CollectComparisons, since ExprType is a thing
  that exists elsewhere.
- Remove unused "notInRexNodes" from SearchOperatorConversion.
2024-01-03 22:38:12 -08:00
Clint Wylie f19ece146f
expression virtual column indexes (#15585)
* ExpressionVirtualColumn + indexes = bff. Expression virtual columns can now use indexes of the underlying columns similar to how expression filters
2024-01-03 21:00:39 -08:00
Gian Merlino 01eec4a55e
New handling for COALESCE, SEARCH, and filter optimization. (#15609)
* New handling for COALESCE, SEARCH, and filter optimization.

COALESCE is converted by Calcite's parser to CASE, which is largely
counterproductive for us, because it ends up duplicating expressions.
In the current code we end up un-doing it in our CaseOperatorConversion.
This patch has a different approach:

1) Add CaseToCoalesceRule to convert CASE back to COALESCE earlier, before
   the Volcano planner runs, using CaseToCoalesceRule.

2) Add FilterDecomposeCoalesceRule to decompose calls like
   "f(COALESCE(x, y))" into "(x IS NOT NULL AND f(x)) OR (x IS NULL AND f(y))".
   This helps use indexes when available on x and y.

3) Add CoalesceLookupRule to push COALESCE into the third arg of LOOKUP.

4) Add a native "coalesce" function so we can convert 3+ arg COALESCE.

The advantage of this approach is that by un-doing the CASE to COALESCE
conversion earlier, we have flexibility to do more stuff with
COALESCE (like decomposition and pushing into LOOKUP).

SEARCH is an operator used internally by Calcite to represent matching
an argument against some set of ranges. This patch improves our handling
of SEARCH in two ways:

1) Expand NOT points (point "holes" in the range set) from SEARCH as
   `!(a || b)` rather than `!a && !b`, which makes it possible to convert
   them to a "not" of "in" filter later.

2) Generate those nice conversions for NOT points even if the SEARCH
   is not composed of 100% NOT points. Without this change, a SEARCH
   for "x NOT IN ('a', 'b') AND x < 'm'" would get converted like
   "x < 'a' OR (x > 'a' AND x < 'b') OR (x > 'b' AND x < 'm')".

One of the steps we take when generating Druid queries from Calcite
plans is to optimize native filters. This patch improves this step:

1) Extract common ANDed predicates in ConvertSelectorsToIns, so we can
   convert "(a && x = 'b') || (a && x = 'c')" into "a && x IN ('b', 'c')".

2) Speed up CombineAndSimplifyBounds and ConvertSelectorsToIns on
   ORs with lots of children by adjusting the logic to avoid calling
   "indexOf" and "remove" on an ArrayList.

3) Refactor ConvertSelectorsToIns to reduce duplicated code between the
   handling for "selector" and "equals" filters.

* Not so final.

* Fixes.

* Fix test.

* Fix test.
2024-01-03 08:56:22 -08:00
AlbericByte a2e65e6a89
Support to pass dynamic values to timestamp Extract function (#15586)
Fixes #15072

Before this modification , the third parameter (timezone) require to be a Literal, it will throw a error when this parameter is column Identifier.
2023-12-21 11:57:52 +05:30
Clint Wylie e373f62692
fix expression post aggregator array handling when grouping wrapper types leak (#15543)
* fix expression post aggregator array handling when grouping wrapper types leak
* more consistent expression function error messaging
2023-12-15 21:43:27 -08:00
Soumyava 3e15522d6b
Round works correctly on system metadata columns (#15554) 2023-12-13 17:23:14 -08:00
Soumyava 38f3cf9e65
Fixing a case where datatype mismatch was happenning in join (#15541) 2023-12-12 12:50:32 -08:00
Clint Wylie 42f2496b7d
fix bug with nested empty array fields (#15532) 2023-12-09 12:20:21 -08:00
Clint Wylie e7c8f2e208
lift restriction of array_to_mv to only support direct column access (#15528) 2023-12-08 16:27:17 -08:00
Soumyava ca4ecdf7d0
Fixing NPE with virtual expression with unnest (#15513)
* Fixing NPE with virtual expression with unnest

* Fixing a comment
2023-12-08 10:51:56 -08:00
Clint Wylie e64b92eb35
add JSON_QUERY_ARRAY function to pluck ARRAY<COMPLEX<json>> out of COMPLEX<json> (#15521) 2023-12-08 05:28:46 -08:00
Adarsh Sanjeev 2e45eadc08
Add better error messages for using OVERWRITE with INSERT statments (#15517)
* Add better error messages for using OVERWRITE with INSERT statments
2023-12-08 15:33:46 +05:30
Zoltan Haindrich c353ccfdef
Windowed min aggregates null-s as 0 (#15371) 2023-12-08 01:41:16 -08:00
sb89594 5fda8613ad
Feature: Add IPv6 Match Function (#15212) 2023-12-07 23:09:06 -08:00
Clint Wylie c241c6980c
store auto columns with only empty or null containing arrays as ARRAY<LONG> instead of COMPLEX<json> (#15505) 2023-12-07 03:31:43 -08:00
Clint Wylie 557f3f6f57
add array column type support to EXTEND operator (#15458) 2023-12-06 23:21:35 -08:00
Rishabh Singh d968bb3f43
Rename config for enabling CentralizedDatasourceSchema feature (#15476)
* Rename property to druid.centralizedDatasourceSchema.enabled
* Update config name in docker-compose
2023-12-05 16:57:25 +05:30
Zoltan Haindrich a1aa4340d0
Changing the queryFrameWork in Calcite*Tests may have sideeffects (#15428)
changes how its configured a bit to use an annotation instead of methods
2023-12-04 00:38:01 +05:30
Clint Wylie 5ce4aab3b8
update ARRAY_OVERLAP to plan with ArrayContainsElement for ARRAY columns (#15451)
Updates ARRAY_OVERLAP to use the same ArrayContainsElement filter added in #15366 when filtering ARRAY typed columns so that it can also use indexes like ARRAY_CONTAINS.
2023-11-30 10:05:20 +05:30
Pranav 93cd638645
Enabling aggregateMultipleValues in all StringAnyAggregators (#15434)
* Enabling aggregateMultipleValues in all StringAnyAggregators

* Adding more tests

* More validation

* fix warning

* updating asserts in decoupled mode

* fix intellij inspection

* Addressing comments

* Addressing comments

* Adding early validations and make aggregate consistent across all

* fixing tests

* fixing tests

* Update docs/querying/sql-aggregations.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* fixing static check

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-11-29 14:32:49 -08:00
Clint Wylie 64fcb32bcf
add native 'array contains element' filter (#15366)
* add native arrayContainsElement filter to use array column element indexes
2023-11-29 03:33:00 -08:00
Abhishek Agarwal 0a56c87e93
SQL: Plan non-equijoin conditions as cross join followed by filter (#15302)
This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.
2023-11-29 13:46:11 +05:30
Clint Wylie 97623b408c
add optional 'castToType' parameter to 'auto' column schema (#15417)
* auto but.. with an expected type
2023-11-28 17:19:23 -08:00
Zoltan Haindrich eb056e23b5
Fix dictionarySize overrides in tests (#15354)
I think this is a problem as it discards the false return value when the putToKeyBuffer can't store the value because of the limit

Not forwarding the return value at that point may lead to the normal continuation here regardless something was not added to the dictionary like here
2023-11-28 18:49:09 +05:30
Zoltan Haindrich ca544e552c
Add option to compare results with relative error tolerance (#15429)
Adds a result comparision mode of EQUALS_RELATIVE_1000_ULPS ; which accepts floating point differences up-to 1000 units of least precision
2023-11-28 13:03:16 +05:30
Abhishek Agarwal 3113e7b350
Fix grouping aggregator when one of the dimension is a simple extraction (#15421)
This PR fixes an issue where the grouping aggregator wrongly assumes that a key dimension is a virtual column and assigns a wrong name to it. This results in a mismatch between the dimensions that grouping aggregator sees and the dimension names that rows are aggregated on. And finally, grouping aggregator generates wrong result.
2023-11-24 13:15:07 +05:30
Clint Wylie a95c22ce70
support non-constant expressions for path arguments for json_value and json_query (#15320)
* support dynamic expressions for path arguments for json_value and json_query
2023-11-17 01:12:05 -08:00
Adarsh Sanjeev a134cc30a6
Change default inSubQueryThreshold (#15336) 2023-11-14 14:08:12 +05:30
Rishabh Singh 5446494e63
Non-existent datasource shouldn't affect schema rebuilding for other datasources (#15355)
In pull request #14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information.

This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.
2023-11-14 12:52:33 +05:30
Rishabh Singh 8c802e4c9b
Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985)
In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal.

To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.
2023-11-04 19:33:25 +05:30
Laksh Singla 0cc8839a60
Allow casted literal values in SQL functions accepting literals (Part 2) (#15316) 2023-11-03 21:22:19 +05:30
Gian Merlino d87d92bc43
Add system fields to input sources. (#15276)
* Add system fields to input sources.

Main changes:

1) The SystemField enum defines system fields "__file_uri", "__file_path",
   and "__file_bucket". They are associated with each input entity.

2) The SystemFieldInputSource interface can be added to any InputSource
   to make it system-field-capable. It sets up serialization of a list
   of configured "systemFields" in the JSON form of the input source, and
   provides a method getSystemFieldValue for computing the value of each
   system field. Cloud object, HDFS, HTTP, and Local now have this.

* Fix various LocalInputSource calls.

* Fix style stuff.

* Fixups.

* Fix tests and coverage.
2023-11-02 10:31:28 -07:00
Clint Wylie d261587f4a
explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245)
* better documentation for the differences between arrays and mvds
* add outputType to ExpressionPostAggregator to make docs true
* add output coercion if outputType is defined on ExpressionPostAgg
* updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables
2023-11-02 00:31:37 -07:00
Gian Merlino 6b6d73b5d4
Use min of scheduler threads and server threads for subquery guardrails. (#15295)
* Use min of scheduler threads and server threads for subquery guardrails.

This allows more memory to be used for subqueries when the query scheduler
is configured to limit queries below the number of server threads. The patch
also refactors the code so SubqueryGuardrailHelper is provided by a Guice
Provider rather than being created by ClientQuerySegmentWalker, to achieve
better separation of concerns.

* Exclude provider from coverage.
2023-11-01 22:34:53 -07:00
Laksh Singla 2ea7177f15
Allow casted literal values in SQL functions accepting literals (#15282)
Functions that accept literals also allow casted literals. This shouldn't have an impact on the queries that the user writes. It enables the SQL functions to accept explicit cast, which is required with JDBC.
2023-11-01 10:38:48 +05:30
Zoltan Haindrich f4a74710e6
Process pure ordering changes with windowing operators (#15241)
- adds a new query build path: DruidQuery#toScanAndSortQuery which:
- builds a ScanQuery without considering the current ordering
- builds an operator to execute the sort
- fixes a null string to "null" literal string conversion in the frame serializer code
- fixes some DrillWindowQueryTest cases
- fix NPE in NaiveSortOperator in case there was no input
- enables back CoreRules.AGGREGATE_REMOVE
- adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts
- earlier window expressions on top of a subquery with an offset may have ignored the offset
2023-10-29 16:40:49 +05:30
Zoltan Haindrich 6784e9c507
Fix summary row issues in case postaggregations are happening (#15232)
* fix-1/2

* add message v1

* extend test to cover for IOB issue

* move stuff around

* change message

* fix testcase string

* compute postaggs (thank you Clint!)

* enable feature for test

* ignore tests in msq

---------

Co-authored-by: Soumyava Das <soumyava@users.noreply.github.com>
2023-10-24 20:33:59 -07:00
Soumyava 06f40a0019
remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query (#15237)
* Fixing nested group by query with order by in outer query

* Adding examples
2023-10-24 15:30:13 -07:00
Zoltan Haindrich 2e31cb2901
DrillWindowQueryTest: use proper way to decide if the query is ordered (#15118) 2023-10-23 10:54:28 -04:00
Zoltan Haindrich b95035f183
Fix VirtualColumn related issues in window expressions (#15119)
for some exotic queries like:

  SELECT
  	'_'||dim1,
    MIN(cast(0 as double)) OVER (),
    MIN(cast((cnt||cnt) as bigint)) OVER ()
  FROM foo
the compilation have resulted in NPE -s mostly because VirtualColumn -s were not handled properly
2023-10-23 14:05:59 +05:30
Zoltan Haindrich fbbb9c7730
Allow DESC ordering in window expressions (#15195) 2023-10-20 07:55:28 -04:00
Zoltan Haindrich 9fb0dbfc9f
Fix json inputs for drill windowing tests (#15148)
This PR:

adds a flag to JsonToParquet to do the fix during conversion
updates the json files to more correct conents
some resultset mismatches were fixed by this
updates parquet to 1.13.1
2023-10-19 14:02:41 +05:30
Clint Wylie 061cfee224
add native filters for "(filter) is true" and "(filter) is false" (#15182)
* add native filters for "(filter) is true" and "(filter) is false"

changes:
* add IsTrueDimFilter, IsFalseDimFilter, and abstract IsBooleanDimFilter for native json filter implementations of `(filter) IS TRUE` and `(filter) IS FALSE`
* add IsBooleanFilter for actual filtering logic for these filters, which ignore includeUnknown to always use matches with false for true and !matches with true for false
* fix test incorrectly adjusted to wrong answer in #15058
* add tests for default value mode
2023-10-18 13:07:35 -07:00
Zoltan Haindrich c58b7f40ee
Rename windowing option (#15184) 2023-10-18 10:54:20 +05:30
Laksh Singla dc8d2192c3
Introduce natural comparator for types that don't have a StringComparator (#15145)
Fixes a bug when executing queries with the ordering of arrays
2023-10-16 10:37:32 +05:30
Zoltan Haindrich 6d62c75866
Fix columns with null values in windowing expressions (#15131) 2023-10-13 10:42:45 -04:00
Clint Wylie a0fd9ec55c
fix issue with SQL boolean constants not respecting nulls when strict booleans and sql compatible null handling are enabled (#15135) 2023-10-12 01:23:24 -07:00
Clint Wylie d0f64608eb
sql compatible three-valued logic native filters (#15058)
* sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true
* log.warn if non-default configurations are used to guide operators towards SQL complaint behavior
2023-10-12 00:06:23 -07:00
Zoltan Haindrich ae88f2c0b6
Fix non-sqlcompat validation in CalciteWindowQueryTest (#15086)
* fixes

* check for latest rewrite place

* Revert "check for latest rewrite place"

This reverts commit 5cf1e2c1ca.

* some stuff

(cherry picked from commit ab346d4373ea888eb8ef6115e018e7fb0d27407f)

* update test output

* updates to test ouptuts

* some stuff

* move validator

* cleanup

* fix

* change test slightly

* add apidoc cleanup warnings

* cleanup/etc

* instead of telling the story; add a fail with some reason whats the issue

* lead-lag fix

* add test

* remove unnecessary throw

* druidexception-trial

* Revert "druidexception-trial"

This reverts commit 8fa06644bc.

* undo changes to no_grouping; add no_grouping2

* add missing assert on resultcount

* rename method; update

* introduce enum/etc

* make resultmatchmode accessible from TestBuilder#expectedResults

* fix dump results to use log

* fix

* handle null correctly

* disable feature type based things for MSQ

* fix varianssqlaggtest

* use eps in other test

* fix intellij error

* add final

* addrss review

* update test/string/etc

* write concat in 3 lines :D
2023-10-11 12:34:31 -07:00
Vishesh Garg c6ca990f1f
Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY (#15095)
EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors.

This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.
2023-10-11 19:48:36 +05:30
Laksh Singla 5f86072456
Prepare master for Druid 29 (#15121)
Prepare master for Druid 29
2023-10-11 10:33:45 +05:30
Zoltan Haindrich 23605c1edd
Enable resultset validation of Drill tests (#15096)
- introduces a test_X method for every testcase (995 testcases)
- added a resultset parser which reads the expected resultset based on the result schema
- loaded a few more datasets
- added a testcase to ensure that all files have a corresponding testcase
- renamed DecoupledIgnore to NegativeTest
- categorized the failing 268 tests
2023-10-10 14:40:50 +05:30
Clint Wylie 1fc8fb1b20
add a bunch of tests with array typed columns to CalciteArraysQueryTest (#15101)
* add a bunch of tests with array typed columns to CalciteArraysQueryTest
* fix a bug with unnest filter pushdown when filtering on unnested array columns
2023-10-09 06:16:06 -07:00
Laksh Singla 549ef56288
UNION ALLs in MSQ (#14981)
MSQ now supports UNION ALL with UnionDataSource
2023-10-09 18:18:15 +05:30
Zoltan Haindrich b5a87fd89b
Support constant args in window functions (#15071)
Instead of passing the constants around in a new parameter; InputAccessor was introduced to take care of transparently handling the constants - this new class started picking up some copy-paste debris around field accesses; and made them a little bit more readble.
2023-10-08 12:14:25 +05:30
Zoltan Haindrich 7b869fd37a
Change type of AVG aggregates to double (#15089)
The sql standard is not very restrictive regarding this:

If AVG is specified and DT is exact numeric, then the declared type of the result is an implemen-
tation-defined exact numeric type with precision not less than the precision of DT and scale not
less than the scale of DT.

so; using the same type is also ok (without patch);
however the avg of 0 and 1 is 0 right now because of the retention of the integer typ

Postgres,MySql and Oracle and Drill seem to increase precision ; mssql returns 0
http://sqlfiddle.com/#!9/6f7248/1

I think we should also increase precision as its already calculated more precisely
2023-10-07 18:01:09 +05:30
Soumyava 57ab8e13dc
Updating plans when using joins with unnest on the left (#15075)
* Updating plans when using joins with unnest on the left

* Correcting segment map function for hashJoin

* The changes done here are not reflected into MSQ yet so these tests might not run in MSQ

* native tests

* Self joins with unnest data source

* Making this pass

* Addressing comments by adding explanation and new test
2023-10-06 19:23:12 -07:00
Soumyava 1a06ef5a24
Fixing old function used (#15099) 2023-10-05 17:25:00 -07:00
Pranav 06c5527c85
Allow aliasing of Macros and add new alias for complex decode 64 (#15034)
* Add AliasExprMacro to allow aliasing of native expression macros
* Add decode_base64_complex alias for complex_decode_base64
2023-10-05 16:24:36 -07:00
Zoltan Haindrich 36d7b3cc65
Add CalciteSysQueryTest to enable some testing of bindable plans. (#15070) 2023-10-05 11:37:49 -07:00
Clint Wylie b4bc9b6950
fix issue with auto columns with mix of scalar values and empty arrays (#15083) 2023-10-05 10:15:45 +05:30
Laksh Singla b8d03d36b0
Free up the resources when materializing the results as Frames (#15032)
Refactor the code to clean up the result sequences when materializing the results as Frames
2023-10-05 10:14:27 +05:30
Laksh Singla 30cf76db99
Field writers for numerical arrays (#14900)
Row-based frames, and by extension, MSQ now supports numeric array types. This means that all queries consuming or producing arrays would also work with MSQ. Numeric arrays can also be ingested via MSQ. Post this patch, queries like, SELECT [1, 2] would work with MSQ since they consume a numeric array, instead of failing with an unsupported column type exception.
2023-10-04 23:16:47 +05:30
Zoltan Haindrich 90e4b25620
Fix lead/lag to be usable without offset (#15057) 2023-10-04 17:38:46 +05:30
Zoltan Haindrich 3342e03ea8
Windowing processing may have run into Exceptions when the whole table was processed (#15064)
Earlier when the query was processing the whole table; the planning may have ended with a NPE; as it was not possible to create a scanquery from it.
2023-10-04 11:27:11 +05:30
Xavier Léauté adef2069b1
Make unit tests pass with Java 21 (#15014)
This change updates dependencies as needed and fixes tests to remove code incompatible with Java 21
As a result all unit tests now pass with Java 21.

* update maven-shade-plugin to 3.5.0 and follow-up to #15042
  * explain why we need to override configuration when specifying outputFile
  * remove configuration from dependency management in favor of explicit overrides in each module.
* update to mockito to 5.5.0 for Java 21 support when running with Java 11+
  * continue using latest mockito 4.x (4.11.0) when running with Java 8  
  * remove need to mock private fields
* exclude incorrectly declared mockito dependency from pac4j-oidc
* remove mocking of ByteBuffer, since sealed classes can no longer be mocked in Java 21
* add JVM options workaround for system-rules junit plugin not supporting Java 18+
* exclude older versions of byte-buddy from assertj-core
* fix for Java 19 changes in floating point string representation
* fix missing InitializedNullHandlingTest
* update easymock to 5.2.0 for Java 21 compatibility
* update animal-sniffer-plugin to 1.23
* update nl.jqno.equalsverifier to 3.15.1
* update exec-maven-plugin to 3.1.0
2023-10-03 22:41:21 -07:00
Soumyava cb050282a0
Intervals are updated properly for Unnest queries (#15020)
Fixes a bug where the unnest queries were not updated with the correct intervals.
2023-10-04 02:52:10 +05:30
Zoltan Haindrich f3d1c8b70e
Enable back testcases in CalciteWindowQueryTest (#15045)
Most of the testcases were disabled in CalciteWindowQueryTest during the Calcite-1.35 upgrade; there were some changes arising from the fact that the removal of DRUID_SUM had some unexpected sideffects:

SqlStdOperatorTable.SUM became the SUM operator
because of that SqlToRelConverter started rewriting windowed SUM -s into SUM0 -s
my opinion is that w.r.t to Druid this rewrite provides no real advantage - as SUM0 is serviced by SUM here
I believe that's not 100% correct in cases when it aggregates just null-s but that doesnt matter in this case
I propose to introduce back a local DRUID_SUM thing as an unchanged SUM and later when CALCITE-6020 is fixed ; we can drop that.
2023-10-03 10:18:44 +05:30
Soumyava 261f54dc04
coalesce on unnest row mismatch fix (#15019)
* coalesce on unnest row mismatch fix

* new example with coalesce over unnest with nested array columns

* New example with change in order which triggers the nvl

* new test plan update for useDefault=true
2023-10-02 17:26:50 -07:00
Pranav f1edd671fb
Exposing optional replaceMissingValueWith in lookup function and macros (#14956)
* Exposing optional replaceMissingValueWith in lookup function and macros

* args range validation

* Updating docs

* Addressing comments

* Update docs/querying/sql-scalar.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/querying/sql-functions.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Addressing comments

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-10-02 17:09:23 -07:00
Zoltan Haindrich 2785e062d7
Correct quotation in drill query files (#15044) 2023-10-02 08:17:15 -07:00
Pranav 07c28f17ca
Fix missing format strings in calls to DruidException.build (#15056)
* Fix the NPE bug in nonStrictFormat

* using non null format string

* using Assert.assertThrows
2023-09-29 17:00:36 -07:00
Zoltan Haindrich db71e28808
Enable SortProjectTransposeRule (#15002)
contains Enable already passing tests in DecoupledPlanningCalciteQueryTest #14996
enables a transpose rule to support a query plan in which the plan was in the shape:
Sort
  Project
     Aggregate
2023-09-29 10:49:03 +05:30
Zoltan Haindrich 022950a0c5
MV_FILTER_ONLY may run into Exceptions in case duplicate values were processed (#15012) 2023-09-27 19:19:42 +05:30
Gian Merlino 3dabfead05
Fix getResultType for HLL, quantiles aggregators. (#15043)
The aggregators had incorrect types for getResultType when shouldFinalze
is false. They had the finalized type, but they should have had the
intermediate type.

Also includes a refactor of how ExprMacroTable is handled in tests, to make
it easier to add tests for this to the MSQ module. The bug was originally
noticed because the incorrect result types caused MSQ queries with DS_HLL
to behave erratically.
2023-09-27 08:51:14 +05:30
Soumyava 75af741a96
Revert "SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)" (#15029)
This reverts commit 4f498e6469.
2023-09-25 11:35:44 -07:00
Gian Merlino 0850e615b2
Remove istrue, isfalse vectorized impls. (#14991)
These were added in #14977, but the implementations are incorrect, because they return null when the input arg is null. They should return false when the input is null. Remove them for now, rather than fixing them, since they're so new that they might as well never have existed.
2023-09-25 11:34:24 +05:30
Soumyava c184b5250f
Unnest now works on MSQ (#14886)
This entails:
    Removing the enableUnnest flag and additional machinery
    Updating the datasource plan and frame processors to support unnest
    Adding support in MSQ for UnnestDataSource and FilteredDataSource
    CalciteArrayTest now has a MSQ test component
    Additional tests for Unnest on MSQ
2023-09-25 09:19:21 +05:30
Zoltan Haindrich e76962f453
Use annotation to mark DecoupleIgnore (#15005) 2023-09-21 12:36:52 +05:30
Laksh Singla ebb794632a
Allow users with STATE permissions to read and write the state APIs for querying with deep storage (#14944)
Currently, only the user who has submitted the async query has permission to interact with the status APIs for that async query. However, often we want an administrator to interact with these resources as well.
Druid handles these with the STATE resource traditionally, and if the requesting user has necessary permissions on it as well, alternatively, they should be allowed to interact with the status APIs, irrespective of whether they are the submitter of the query.
2023-09-21 06:55:07 +05:30
Pranav 883c2692d2
Adding new function decode_base64_utf8 and expr macro (#14943)
* Adding new function decode_base64_utf8 and expr macro

* using BaseScalarUnivariateMacroFunctionExpr

* Print stack trace in case of debug in ChainedExecutionQueryRunner

* fix static check
2023-09-20 17:06:34 -07:00
Gian Merlino 823f620ede
Add IS [NOT] DISTINCT FROM to SQL and join matchers. (#14976)
* Add IS [NOT] DISTINCT FROM to SQL and join matchers.

Changes:

1) Add "isdistinctfrom" and "notdistinctfrom" native expressions.

2) Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions
   when generating expressions, and is treated the same as equals and
   not-equals when generating native filters on literals.

3) Update join matchers to have an "includeNull" parameter that determines
   whether we are operating in "equals" mode or "is not distinct from"
   mode.

* Main changes:

- Add ARRAY handling to "notdistinctfrom" and "isdistinctfrom".
- Include null in pushed-down filters when using "notdistinctfrom" in a join.

Other changes:
- Adjust join filter analyzer to more explicitly use InDimFilter's ValuesSets,
  relying less on remembering to get it right to avoid copies.

* Remove unused "wrap" method.

* Fixes.

* Remove methods we do not need.

* Fix bug with INPUT_REF.
2023-09-20 10:44:32 -07:00
Zoltan Haindrich e8773f4d0f
Enable already passing tests in DecoupledPlanningCalciteQueryTest (#14996) 2023-09-20 15:42:52 +05:30
Gian Merlino 4f498e6469
SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)
* SQL: Plan non-equijoin conditions as cross join followed by filter.

Druid has previously refused to execute joins with non-equality-based
conditions. This was well-intentioned: the idea was to push people to
write their queries in a different, hopefully more performant way.

But as we're moving towards fuller SQL support, it makes more sense to
allow these conditions to go through with the best plan we can come up
with: a cross join followed by a filter. In some cases this will allow
the query to run, and people will be happy with that. In other cases,
it will run into resource limits during execution. But we should at
least give the query a chance.

This patch also updates the documentation to explain how people can
tell whether their queries are being planned this way.

* cartesian is a word.

* Adjust tests.

* Update docs/querying/datasource.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-09-19 10:23:42 -07:00
Soumyava 279b3818f0
Make Unnest work with nullif operator (#14993)
This is due to the recursive filter creation in unnest storage adapter not performing correctly in case of an empty children. This PR addresses the issue
2023-09-15 09:54:14 +05:30
Gian Merlino 3ae5e97801
Add IS [NOT] TRUE, IS [NOT] FALSE native functions. (#14977)
They are not quite the same as "x == true", "x != true", etc. These
functions never return null, even when "x" itself is null.
2023-09-14 09:19:09 -07:00
Soumyava 7bbefd5741
Updating version in from.ftl (#14982) 2023-09-14 05:11:36 +00:00
Soumyava bf99d2c7b2
Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (#14924)
* Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly

* Fixing a failed test

* Updating numericNilAgg

* Moving to use default values in case of nil agg

* Adding the same for first agg

* Fixing a test

* fixing vectorized string agg for last/first with cast if numeric

* Updating tests to remove mockito and cover the case of string first/last on non string columns

* Updating a test to vectorize

* Addressing review comments: Name change to NilVectorAggregator and using static variables now

* fixing intellij inspections
2023-09-13 13:15:14 -07:00
Laksh Singla 4c57504960
Fix the uncaught exceptions when materializing results as frames (#14970)
When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code).
This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.
2023-09-13 15:41:28 +05:30
Clint Wylie 891f0a3fe9
longer compatibility window for nested column format v4 (#14955)
changes:
* add back nested column v4 serializers
* 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
* add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
2023-09-12 14:07:53 -07:00
Zoltan Haindrich 5d16d0edf0
Count distinct returned incorrect results without useApproximateCountDistinct (#14748)
* fix grouping engine handling of summaries when result set is empty
2023-09-12 13:57:54 -07:00
Clint Wylie 5cecf6ce8f
fix issue with segment metadata cache and complex types when doing out of order upgrades from 0.22 (#14948) 2023-09-12 10:54:35 +08:00
Suneet Saldanha 757603a773
Set task location as k8sPodName for mm-less ingestion (#14959)
* Set task location as k8sPodName for mm-less ingestion

* tests
2023-09-11 19:44:26 -07:00
Zoltan Haindrich 699893bcff
Fix StringLastAggregatorFactory equals/toString (#14907)
* update test

* update test

* format

* test

* fix0

* Revert "fix0"

This reverts commit 44992cb393.

* ok resultset

* add plan

* update test

* before rewind

* test

* fix toString/compare/test

* move test

* add timeColumn to hashCode
2023-09-08 09:20:54 -07:00
Soumyava a8fa979115
Unnest dont push down not (#14942)
* Not pushing down not filters

* New test case

* Updating tests

* Removing a stale comment
2023-09-06 08:57:03 -07:00
Zoltan Haindrich 23308c050d
Remove DruidAggregateCaseToFilterRule (#14940)
The issue due to which the custom rule was added has been fixed as a part of https://issues.apache.org/jira/browse/CALCITE-3763 and accommodated during Calcite upgrade
2023-09-06 19:11:58 +05:30
Laksh Singla 6ee0b06e38
Auto configuration for maxSubqueryBytes (#14808)
A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.
2023-09-06 05:47:19 +00:00
Soumyava 8088a763a6
Vectorize earliest aggregator for both numeric and string types (#14408)
* Vectorizing earliest for numeric

* Vectorizing earliest string aggregator

* checkstyle fix

* Removing unnecessary exceptions

* Ignoring tests in MSQ as earliest is not supported for numeric there

* Fixing benchmarks

* Updating tests as MSQ does not support earliest for some cases

* Addressing review comments by adding the following:
1. Checking capabilities first before creating selectors
2. Removing mockito in tests for numeric first aggs
3. Removing unnecessary tests

* Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string

* Adding a flag for multi value dimension selector

* Addressing comments

* 1 more change

* Handling review comments part 1

* Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order

* Updating numeric first vector agg

* Revert "Updating numeric first vector agg"

This reverts commit 4291709901.

* Updating code for correctness issues

* fixing an issue with latest agg

* Adding more comments and removing an unnecessary check

* Addressing null checks for tie selector and only vectorize false for quantile sketches
2023-09-05 08:41:42 -07:00
Kashif Faraz 7f26b80e21
Simplify ServiceMetricEvent.Builder (#14933)
Changes:
- Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent>
and thus convert it to a plain builder rather than a builder of builder.
- Add methods setCreatedTime , setMetricAndValue to the builder
2023-09-01 11:30:45 +05:30