Commit Graph

957 Commits

Author SHA1 Message Date
Zoltan Haindrich 8f5b7522c7
Strict window frame checks (#15746)
introduce checks to ensure that window frame is supported
added check to ensure that no expressions are set as bounds
added logic to detect following/following like cases - described in Window function fails to demarcate if 2 following are used #15739
currently RANGE frames are only supported correctly if both endpoints are unbounded or current row Offset based window range support #15767
added windowingStrictValidation context key to provide a way to override the check
2024-02-02 16:21:53 +05:30
Laksh Singla 7d65caf0c5
Update the docs for EARLIEST_BY/LATEST_BY aggregators with the newly added numeric capabilities (#15670) 2024-02-01 10:24:43 +05:30
Zoltan Haindrich f701197224
Enable ArrayListRowsAndColumns to StorageAdapter conversion (#15735) 2024-01-31 02:36:58 -05:00
Gian Merlino 38a1e827ab
Fix up value types when creating range filters. (#15778)
Fixes a bug introduced in #15609, where queries involving filters on
TIME_FLOOR could encounter ClassCastException when comparing RangeValue
in CombineAndSimplifyBounds.

Prior to #15609, CombineAndSimplifyBounds would remove, rebuild, and
re-add all numeric range filters as part of consolidating numeric range
filters for the same column under the least restrictive type. #15609
included a change to only rebuild numeric range filters when a consolidation
opportunity actually arises. The bug was introduced because the unconditional
rebuild, as a side effect, masked the fact that in some cases range filters
would be created with string match values and a LONG match value type.

This patch changes the fixup to happen at the time the range filter is
initially created, rather than in CombineAndSimplifyBounds.
2024-01-29 13:30:47 -08:00
Abhishek Agarwal 989a8f7874
Better error message for date_trunc operators (#15759)
IAEs are not bubbled up and show up as a runtime failure to the user which are not helpful. See https://apachedruidworkspace.slack.com/archives/C0303FDCZEZ/p1706185796975109 for one such example. This change will fix that.
2024-01-27 11:22:39 +05:30
Karan Kumar c4990f56d6
Prepare main branch for next 30.0.0 release. (#15707) 2024-01-23 15:55:54 +05:30
Zoltan Haindrich d6a12c4389
Add ability to enable ResultCache in tests (#15465) 2024-01-22 09:02:59 -05:00
Pranav 45b30dc07d
Revert "Change default inSubQueryThreshold (#15336)" (#15722)
A low value of inSubQueryThreshold can cause queries with IN filter to plan as joins more commonly. However, some of these join queries may not get planned as IN filter on data nodes and causes significant perf regression.
2024-01-22 11:34:39 +05:30
Zoltan Haindrich 8a43db9395
Range support in window expressions (support them as groups) (#15365)
* support groups windowing mode; which is a close relative of ranges (but not in the standard)
* all windows with range expressions will be executed wit it groups
* it will be 100% correct in case for both bounds its true that: isCurrentRow() || isUnBounded()
  * this covers OVER ( ORDER BY COL )
* for other cases it will have some chances of getting correct results...
2024-01-17 00:05:21 -06:00
Gian Merlino 500681d0cb
Add ImmutableLookupMap for static lookups. (#15675)
* Add ImmutableLookupMap for static lookups.

This patch adds a new ImmutableLookupMap, which comes with an
ImmutableLookupExtractor. It uses a fastutil open hashmap plus two
lists to store its data in such a way that forward and reverse
lookups can both be done quickly. I also observed footprint to be
somewhat smaller than Java HashMap + MapLookupExtractor for a 1 million
row lookup.

The main advantage, though, is that reverse lookups can be done much
more quickly than MapLookupExtractor (which iterates the entire map
for each call to unapplyAll). This speeds up the recently added
ReverseLookupRule (#15626) during SQL planning with very large lookups.

* Use in one more test.

* Fix benchmark.

* Object2ObjectOpenHashMap

* Fixes, and LookupExtractor interface update to have asMap.

* Remove commented-out code.

* Fix style.

* Fix import order.

* Add fastutil.

* Avoid storing Map entries.
2024-01-13 13:14:01 -08:00
Gian Merlino 866fe1cda6
Fix some naming related to AggregatePullUpLookupRule. (#15677)
It was called "split" rather than "pull up" in some places. This patch
standardizes on "pull up".
2024-01-12 15:41:58 -08:00
Gian Merlino cccf13ea82
Reverse, pull up lookups in the SQL planner. (#15626)
* Reverse, pull up lookups in the SQL planner.

Adds two new rules:

1) ReverseLookupRule, which eliminates calls to LOOKUP by doing
   reverse lookups.

2) AggregatePullUpLookupRule, which pulls up calls to LOOKUP above
   GROUP BY, when the lookup is injective.

Adds configs `sqlReverseLookup` and `sqlPullUpLookup` to control whether
these rules fire. Both are enabled by default.

To minimize the chance of performance problems due to many keys mapping to
the same value, ReverseLookupRule refrains from reversing a lookup if there
are more keys than `inSubQueryThreshold`. The rationale for using this setting
is that reversal works by generating an IN, and the `inSubQueryThreshold`
describes the largest IN the user wants the planner to create.

* Add additional line.

* Style.

* Remove commented-out lines.

* Fix tests.

* Add test.

* Fix doc link.

* Fix docs.

* Add one more test.

* Fix tests.

* Logic, test updates.

* - Make FilterDecomposeConcatRule more flexible.

- Make CalciteRulesManager apply reduction rules til fixpoint.

* Additional tests, simplify code.
2024-01-12 00:06:31 -08:00
Zoltan Haindrich e597cc2949
Remove UnaryFunctionOperatorConversion and RoundOperatorConversion (#15566)
* get rid of roun op conv

* cleanup

* use DirectOperatorConversion instead unary

* import order
2024-01-12 10:06:23 +05:30
Gian Merlino 6c18434028
CONCAT flattening, filter decomposition. (#15634)
* CONCAT flattening, filter decomposition.

Flattening: CONCAT(CONCAT(x, y), z) is flattened to CONCAT(x, y, z). This
is especially useful for the || operator, which is a binary operator and
leads to non-flat CONCAT calls.

Filter decomposition: transforms CONCAT(x, '-', y) = 'a-b' into
x = 'a' AND y = 'b'.

* One more test.

* Fix two tests.

* Adjustments from review.

* Fix empty string problem, add tests.
2024-01-11 11:18:50 -08:00
Gian Merlino ee77fa7fb3
Add tests for CASE decomposition. (#15639)
I was looking into adding a rule to do this, and found that it was already
happening as part of Calcite's RexSimplify. So this patch simply adds some
tests to ensure that it continues to happen.
2024-01-10 13:24:24 -08:00
Ankit Kothari 355c2f5da0
Add sql + ingestion compatibility for first/last on numeric values (#15607)
SQL compatibility for numeric last and first column types.
Ingestion UI now provides option for first and last aggregation as well.
2024-01-10 12:59:38 +05:30
Rishabh Singh 71f5307277
Eliminate Periodic Realtime Segment Metadata Queries: Task Now Publish Schema for Seamless Coordinator Updates (#15475)
The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This task encompasses addressing both realtime and finalized segments.

This modification specifically addresses the issue with realtime segments. Tasks will now routinely communicate the schema for realtime segments during the segment announcement process. The Coordinator will identify the schema alongside the segment announcement and subsequently update the schema for realtime segments in the metadata cache.
2024-01-10 08:55:56 +05:30
Abhishek Agarwal 468b99e608
Enable query request queuing by default when total laning is turned on. (#15440)
This PR enables the flag by default to queue excess query requests in the jetty queue. Still keeping the flag so that it can be turned off if necessary. But the flag will be removed in the future.
2024-01-09 07:54:26 +05:30
Clint Wylie df5bcd1367
fix bugs with expression virtual column indexes for expression virtual columns which refer to other virtual columns (#15633)
changes:
* ColumnIndexSelector now extends ColumnSelector. The only real implementation of ColumnIndexSelector, ColumnSelectorColumnIndexSelector, already has a ColumnSelector, so this isn't very disruptive
* removed getColumnNames from ColumnSelector since it was not used
* VirtualColumns and VirtualColumn getIndexSupplier method now needs argument of ColumnIndexSelector instead of ColumnSelector, which allows expression virtual columns to correctly recognize other virtual columns, fixing an issue which would incorrectly handle other virtual columns as non-existent columns instead
* fixed a bug with sql planner incorrectly not using expression filter for equality filters on columns with extractionFn and no virtual column registry
2024-01-08 13:10:11 -08:00
Jonathan Wei 5d1e66b8f9
Allow broker to use catalog for datasource schemas for SQL queries (#15469)
* Allow broker to use catalog for datasource schemas

* More PR comments

* PR comments
2024-01-08 13:46:08 -06:00
Gian Merlino 0422d9d507
Fix redundant expansion in SearchOperatorConversion. (#15625)
This logic error causes sarg expansion to happen twice for IN or NOT IN points.
It doesn't affect the final generated native query, because the
redundant expansions gets combined. But it slows down planning, especially
for large NOT IN.
2024-01-05 12:42:12 -08:00
Zoltan Haindrich b9679d0884
Run filter-into-join rule early for subqueries and disable project-filter rule (#15511)
FILTER_INTO_JOIN is mainly run along with the other rules with the Volcano planner; however if the query starts highly underdefined (join conditions in the where clauses) that generic query could give a lot of room for the other rules to play around with only enabled it for when the join uses subqueries for its inputs. 

PROJECT_FILTER rule is not that useful. and could increase planning times by providing new plans. This problem worsened after we started supporting inner joins with arbitrary join conditions in https://github.com/apache/druid/pull/15302
2024-01-04 15:33:45 +05:30
Gian Merlino 5c3391a084
Follow-ups to SEARCH and IN from #15609. (#15623)
- Rename ExprType to BaseType in CollectComparisons, since ExprType is a thing
  that exists elsewhere.
- Remove unused "notInRexNodes" from SearchOperatorConversion.
2024-01-03 22:38:12 -08:00
Clint Wylie f19ece146f
expression virtual column indexes (#15585)
* ExpressionVirtualColumn + indexes = bff. Expression virtual columns can now use indexes of the underlying columns similar to how expression filters
2024-01-03 21:00:39 -08:00
Gian Merlino 01eec4a55e
New handling for COALESCE, SEARCH, and filter optimization. (#15609)
* New handling for COALESCE, SEARCH, and filter optimization.

COALESCE is converted by Calcite's parser to CASE, which is largely
counterproductive for us, because it ends up duplicating expressions.
In the current code we end up un-doing it in our CaseOperatorConversion.
This patch has a different approach:

1) Add CaseToCoalesceRule to convert CASE back to COALESCE earlier, before
   the Volcano planner runs, using CaseToCoalesceRule.

2) Add FilterDecomposeCoalesceRule to decompose calls like
   "f(COALESCE(x, y))" into "(x IS NOT NULL AND f(x)) OR (x IS NULL AND f(y))".
   This helps use indexes when available on x and y.

3) Add CoalesceLookupRule to push COALESCE into the third arg of LOOKUP.

4) Add a native "coalesce" function so we can convert 3+ arg COALESCE.

The advantage of this approach is that by un-doing the CASE to COALESCE
conversion earlier, we have flexibility to do more stuff with
COALESCE (like decomposition and pushing into LOOKUP).

SEARCH is an operator used internally by Calcite to represent matching
an argument against some set of ranges. This patch improves our handling
of SEARCH in two ways:

1) Expand NOT points (point "holes" in the range set) from SEARCH as
   `!(a || b)` rather than `!a && !b`, which makes it possible to convert
   them to a "not" of "in" filter later.

2) Generate those nice conversions for NOT points even if the SEARCH
   is not composed of 100% NOT points. Without this change, a SEARCH
   for "x NOT IN ('a', 'b') AND x < 'm'" would get converted like
   "x < 'a' OR (x > 'a' AND x < 'b') OR (x > 'b' AND x < 'm')".

One of the steps we take when generating Druid queries from Calcite
plans is to optimize native filters. This patch improves this step:

1) Extract common ANDed predicates in ConvertSelectorsToIns, so we can
   convert "(a && x = 'b') || (a && x = 'c')" into "a && x IN ('b', 'c')".

2) Speed up CombineAndSimplifyBounds and ConvertSelectorsToIns on
   ORs with lots of children by adjusting the logic to avoid calling
   "indexOf" and "remove" on an ArrayList.

3) Refactor ConvertSelectorsToIns to reduce duplicated code between the
   handling for "selector" and "equals" filters.

* Not so final.

* Fixes.

* Fix test.

* Fix test.
2024-01-03 08:56:22 -08:00
AlbericByte a2e65e6a89
Support to pass dynamic values to timestamp Extract function (#15586)
Fixes #15072

Before this modification , the third parameter (timezone) require to be a Literal, it will throw a error when this parameter is column Identifier.
2023-12-21 11:57:52 +05:30
Clint Wylie e373f62692
fix expression post aggregator array handling when grouping wrapper types leak (#15543)
* fix expression post aggregator array handling when grouping wrapper types leak
* more consistent expression function error messaging
2023-12-15 21:43:27 -08:00
Soumyava 3e15522d6b
Round works correctly on system metadata columns (#15554) 2023-12-13 17:23:14 -08:00
Soumyava 38f3cf9e65
Fixing a case where datatype mismatch was happenning in join (#15541) 2023-12-12 12:50:32 -08:00
Clint Wylie 42f2496b7d
fix bug with nested empty array fields (#15532) 2023-12-09 12:20:21 -08:00
Clint Wylie e7c8f2e208
lift restriction of array_to_mv to only support direct column access (#15528) 2023-12-08 16:27:17 -08:00
Soumyava ca4ecdf7d0
Fixing NPE with virtual expression with unnest (#15513)
* Fixing NPE with virtual expression with unnest

* Fixing a comment
2023-12-08 10:51:56 -08:00
Clint Wylie e64b92eb35
add JSON_QUERY_ARRAY function to pluck ARRAY<COMPLEX<json>> out of COMPLEX<json> (#15521) 2023-12-08 05:28:46 -08:00
Adarsh Sanjeev 2e45eadc08
Add better error messages for using OVERWRITE with INSERT statments (#15517)
* Add better error messages for using OVERWRITE with INSERT statments
2023-12-08 15:33:46 +05:30
Zoltan Haindrich c353ccfdef
Windowed min aggregates null-s as 0 (#15371) 2023-12-08 01:41:16 -08:00
sb89594 5fda8613ad
Feature: Add IPv6 Match Function (#15212) 2023-12-07 23:09:06 -08:00
Clint Wylie c241c6980c
store auto columns with only empty or null containing arrays as ARRAY<LONG> instead of COMPLEX<json> (#15505) 2023-12-07 03:31:43 -08:00
Clint Wylie 557f3f6f57
add array column type support to EXTEND operator (#15458) 2023-12-06 23:21:35 -08:00
Rishabh Singh d968bb3f43
Rename config for enabling CentralizedDatasourceSchema feature (#15476)
* Rename property to druid.centralizedDatasourceSchema.enabled
* Update config name in docker-compose
2023-12-05 16:57:25 +05:30
Zoltan Haindrich a1aa4340d0
Changing the queryFrameWork in Calcite*Tests may have sideeffects (#15428)
changes how its configured a bit to use an annotation instead of methods
2023-12-04 00:38:01 +05:30
Clint Wylie 5ce4aab3b8
update ARRAY_OVERLAP to plan with ArrayContainsElement for ARRAY columns (#15451)
Updates ARRAY_OVERLAP to use the same ArrayContainsElement filter added in #15366 when filtering ARRAY typed columns so that it can also use indexes like ARRAY_CONTAINS.
2023-11-30 10:05:20 +05:30
Pranav 93cd638645
Enabling aggregateMultipleValues in all StringAnyAggregators (#15434)
* Enabling aggregateMultipleValues in all StringAnyAggregators

* Adding more tests

* More validation

* fix warning

* updating asserts in decoupled mode

* fix intellij inspection

* Addressing comments

* Addressing comments

* Adding early validations and make aggregate consistent across all

* fixing tests

* fixing tests

* Update docs/querying/sql-aggregations.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* fixing static check

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-11-29 14:32:49 -08:00
Clint Wylie 64fcb32bcf
add native 'array contains element' filter (#15366)
* add native arrayContainsElement filter to use array column element indexes
2023-11-29 03:33:00 -08:00
Abhishek Agarwal 0a56c87e93
SQL: Plan non-equijoin conditions as cross join followed by filter (#15302)
This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.
2023-11-29 13:46:11 +05:30
Clint Wylie 97623b408c
add optional 'castToType' parameter to 'auto' column schema (#15417)
* auto but.. with an expected type
2023-11-28 17:19:23 -08:00
Zoltan Haindrich eb056e23b5
Fix dictionarySize overrides in tests (#15354)
I think this is a problem as it discards the false return value when the putToKeyBuffer can't store the value because of the limit

Not forwarding the return value at that point may lead to the normal continuation here regardless something was not added to the dictionary like here
2023-11-28 18:49:09 +05:30
Zoltan Haindrich ca544e552c
Add option to compare results with relative error tolerance (#15429)
Adds a result comparision mode of EQUALS_RELATIVE_1000_ULPS ; which accepts floating point differences up-to 1000 units of least precision
2023-11-28 13:03:16 +05:30
Abhishek Agarwal 3113e7b350
Fix grouping aggregator when one of the dimension is a simple extraction (#15421)
This PR fixes an issue where the grouping aggregator wrongly assumes that a key dimension is a virtual column and assigns a wrong name to it. This results in a mismatch between the dimensions that grouping aggregator sees and the dimension names that rows are aggregated on. And finally, grouping aggregator generates wrong result.
2023-11-24 13:15:07 +05:30
Clint Wylie a95c22ce70
support non-constant expressions for path arguments for json_value and json_query (#15320)
* support dynamic expressions for path arguments for json_value and json_query
2023-11-17 01:12:05 -08:00
Adarsh Sanjeev a134cc30a6
Change default inSubQueryThreshold (#15336) 2023-11-14 14:08:12 +05:30
Rishabh Singh 5446494e63
Non-existent datasource shouldn't affect schema rebuilding for other datasources (#15355)
In pull request #14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information.

This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.
2023-11-14 12:52:33 +05:30
Rishabh Singh 8c802e4c9b
Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985)
In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal.

To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.
2023-11-04 19:33:25 +05:30
Laksh Singla 0cc8839a60
Allow casted literal values in SQL functions accepting literals (Part 2) (#15316) 2023-11-03 21:22:19 +05:30
Gian Merlino d87d92bc43
Add system fields to input sources. (#15276)
* Add system fields to input sources.

Main changes:

1) The SystemField enum defines system fields "__file_uri", "__file_path",
   and "__file_bucket". They are associated with each input entity.

2) The SystemFieldInputSource interface can be added to any InputSource
   to make it system-field-capable. It sets up serialization of a list
   of configured "systemFields" in the JSON form of the input source, and
   provides a method getSystemFieldValue for computing the value of each
   system field. Cloud object, HDFS, HTTP, and Local now have this.

* Fix various LocalInputSource calls.

* Fix style stuff.

* Fixups.

* Fix tests and coverage.
2023-11-02 10:31:28 -07:00
Clint Wylie d261587f4a
explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245)
* better documentation for the differences between arrays and mvds
* add outputType to ExpressionPostAggregator to make docs true
* add output coercion if outputType is defined on ExpressionPostAgg
* updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables
2023-11-02 00:31:37 -07:00
Gian Merlino 6b6d73b5d4
Use min of scheduler threads and server threads for subquery guardrails. (#15295)
* Use min of scheduler threads and server threads for subquery guardrails.

This allows more memory to be used for subqueries when the query scheduler
is configured to limit queries below the number of server threads. The patch
also refactors the code so SubqueryGuardrailHelper is provided by a Guice
Provider rather than being created by ClientQuerySegmentWalker, to achieve
better separation of concerns.

* Exclude provider from coverage.
2023-11-01 22:34:53 -07:00
Laksh Singla 2ea7177f15
Allow casted literal values in SQL functions accepting literals (#15282)
Functions that accept literals also allow casted literals. This shouldn't have an impact on the queries that the user writes. It enables the SQL functions to accept explicit cast, which is required with JDBC.
2023-11-01 10:38:48 +05:30
Zoltan Haindrich f4a74710e6
Process pure ordering changes with windowing operators (#15241)
- adds a new query build path: DruidQuery#toScanAndSortQuery which:
- builds a ScanQuery without considering the current ordering
- builds an operator to execute the sort
- fixes a null string to "null" literal string conversion in the frame serializer code
- fixes some DrillWindowQueryTest cases
- fix NPE in NaiveSortOperator in case there was no input
- enables back CoreRules.AGGREGATE_REMOVE
- adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts
- earlier window expressions on top of a subquery with an offset may have ignored the offset
2023-10-29 16:40:49 +05:30
Zoltan Haindrich 6784e9c507
Fix summary row issues in case postaggregations are happening (#15232)
* fix-1/2

* add message v1

* extend test to cover for IOB issue

* move stuff around

* change message

* fix testcase string

* compute postaggs (thank you Clint!)

* enable feature for test

* ignore tests in msq

---------

Co-authored-by: Soumyava Das <soumyava@users.noreply.github.com>
2023-10-24 20:33:59 -07:00
Soumyava 06f40a0019
remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query (#15237)
* Fixing nested group by query with order by in outer query

* Adding examples
2023-10-24 15:30:13 -07:00
Zoltan Haindrich 2e31cb2901
DrillWindowQueryTest: use proper way to decide if the query is ordered (#15118) 2023-10-23 10:54:28 -04:00
Zoltan Haindrich b95035f183
Fix VirtualColumn related issues in window expressions (#15119)
for some exotic queries like:

  SELECT
  	'_'||dim1,
    MIN(cast(0 as double)) OVER (),
    MIN(cast((cnt||cnt) as bigint)) OVER ()
  FROM foo
the compilation have resulted in NPE -s mostly because VirtualColumn -s were not handled properly
2023-10-23 14:05:59 +05:30
Zoltan Haindrich fbbb9c7730
Allow DESC ordering in window expressions (#15195) 2023-10-20 07:55:28 -04:00
Zoltan Haindrich 9fb0dbfc9f
Fix json inputs for drill windowing tests (#15148)
This PR:

adds a flag to JsonToParquet to do the fix during conversion
updates the json files to more correct conents
some resultset mismatches were fixed by this
updates parquet to 1.13.1
2023-10-19 14:02:41 +05:30
Clint Wylie 061cfee224
add native filters for "(filter) is true" and "(filter) is false" (#15182)
* add native filters for "(filter) is true" and "(filter) is false"

changes:
* add IsTrueDimFilter, IsFalseDimFilter, and abstract IsBooleanDimFilter for native json filter implementations of `(filter) IS TRUE` and `(filter) IS FALSE`
* add IsBooleanFilter for actual filtering logic for these filters, which ignore includeUnknown to always use matches with false for true and !matches with true for false
* fix test incorrectly adjusted to wrong answer in #15058
* add tests for default value mode
2023-10-18 13:07:35 -07:00
Zoltan Haindrich c58b7f40ee
Rename windowing option (#15184) 2023-10-18 10:54:20 +05:30
Laksh Singla dc8d2192c3
Introduce natural comparator for types that don't have a StringComparator (#15145)
Fixes a bug when executing queries with the ordering of arrays
2023-10-16 10:37:32 +05:30
Zoltan Haindrich 6d62c75866
Fix columns with null values in windowing expressions (#15131) 2023-10-13 10:42:45 -04:00
Clint Wylie a0fd9ec55c
fix issue with SQL boolean constants not respecting nulls when strict booleans and sql compatible null handling are enabled (#15135) 2023-10-12 01:23:24 -07:00
Clint Wylie d0f64608eb
sql compatible three-valued logic native filters (#15058)
* sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true
* log.warn if non-default configurations are used to guide operators towards SQL complaint behavior
2023-10-12 00:06:23 -07:00
Zoltan Haindrich ae88f2c0b6
Fix non-sqlcompat validation in CalciteWindowQueryTest (#15086)
* fixes

* check for latest rewrite place

* Revert "check for latest rewrite place"

This reverts commit 5cf1e2c1ca.

* some stuff

(cherry picked from commit ab346d4373ea888eb8ef6115e018e7fb0d27407f)

* update test output

* updates to test ouptuts

* some stuff

* move validator

* cleanup

* fix

* change test slightly

* add apidoc cleanup warnings

* cleanup/etc

* instead of telling the story; add a fail with some reason whats the issue

* lead-lag fix

* add test

* remove unnecessary throw

* druidexception-trial

* Revert "druidexception-trial"

This reverts commit 8fa06644bc.

* undo changes to no_grouping; add no_grouping2

* add missing assert on resultcount

* rename method; update

* introduce enum/etc

* make resultmatchmode accessible from TestBuilder#expectedResults

* fix dump results to use log

* fix

* handle null correctly

* disable feature type based things for MSQ

* fix varianssqlaggtest

* use eps in other test

* fix intellij error

* add final

* addrss review

* update test/string/etc

* write concat in 3 lines :D
2023-10-11 12:34:31 -07:00
Vishesh Garg c6ca990f1f
Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY (#15095)
EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors.

This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.
2023-10-11 19:48:36 +05:30
Laksh Singla 5f86072456
Prepare master for Druid 29 (#15121)
Prepare master for Druid 29
2023-10-11 10:33:45 +05:30
Zoltan Haindrich 23605c1edd
Enable resultset validation of Drill tests (#15096)
- introduces a test_X method for every testcase (995 testcases)
- added a resultset parser which reads the expected resultset based on the result schema
- loaded a few more datasets
- added a testcase to ensure that all files have a corresponding testcase
- renamed DecoupledIgnore to NegativeTest
- categorized the failing 268 tests
2023-10-10 14:40:50 +05:30
Clint Wylie 1fc8fb1b20
add a bunch of tests with array typed columns to CalciteArraysQueryTest (#15101)
* add a bunch of tests with array typed columns to CalciteArraysQueryTest
* fix a bug with unnest filter pushdown when filtering on unnested array columns
2023-10-09 06:16:06 -07:00
Laksh Singla 549ef56288
UNION ALLs in MSQ (#14981)
MSQ now supports UNION ALL with UnionDataSource
2023-10-09 18:18:15 +05:30
Zoltan Haindrich b5a87fd89b
Support constant args in window functions (#15071)
Instead of passing the constants around in a new parameter; InputAccessor was introduced to take care of transparently handling the constants - this new class started picking up some copy-paste debris around field accesses; and made them a little bit more readble.
2023-10-08 12:14:25 +05:30
Zoltan Haindrich 7b869fd37a
Change type of AVG aggregates to double (#15089)
The sql standard is not very restrictive regarding this:

If AVG is specified and DT is exact numeric, then the declared type of the result is an implemen-
tation-defined exact numeric type with precision not less than the precision of DT and scale not
less than the scale of DT.

so; using the same type is also ok (without patch);
however the avg of 0 and 1 is 0 right now because of the retention of the integer typ

Postgres,MySql and Oracle and Drill seem to increase precision ; mssql returns 0
http://sqlfiddle.com/#!9/6f7248/1

I think we should also increase precision as its already calculated more precisely
2023-10-07 18:01:09 +05:30
Soumyava 57ab8e13dc
Updating plans when using joins with unnest on the left (#15075)
* Updating plans when using joins with unnest on the left

* Correcting segment map function for hashJoin

* The changes done here are not reflected into MSQ yet so these tests might not run in MSQ

* native tests

* Self joins with unnest data source

* Making this pass

* Addressing comments by adding explanation and new test
2023-10-06 19:23:12 -07:00
Soumyava 1a06ef5a24
Fixing old function used (#15099) 2023-10-05 17:25:00 -07:00
Pranav 06c5527c85
Allow aliasing of Macros and add new alias for complex decode 64 (#15034)
* Add AliasExprMacro to allow aliasing of native expression macros
* Add decode_base64_complex alias for complex_decode_base64
2023-10-05 16:24:36 -07:00
Zoltan Haindrich 36d7b3cc65
Add CalciteSysQueryTest to enable some testing of bindable plans. (#15070) 2023-10-05 11:37:49 -07:00
Clint Wylie b4bc9b6950
fix issue with auto columns with mix of scalar values and empty arrays (#15083) 2023-10-05 10:15:45 +05:30
Laksh Singla b8d03d36b0
Free up the resources when materializing the results as Frames (#15032)
Refactor the code to clean up the result sequences when materializing the results as Frames
2023-10-05 10:14:27 +05:30
Laksh Singla 30cf76db99
Field writers for numerical arrays (#14900)
Row-based frames, and by extension, MSQ now supports numeric array types. This means that all queries consuming or producing arrays would also work with MSQ. Numeric arrays can also be ingested via MSQ. Post this patch, queries like, SELECT [1, 2] would work with MSQ since they consume a numeric array, instead of failing with an unsupported column type exception.
2023-10-04 23:16:47 +05:30
Zoltan Haindrich 90e4b25620
Fix lead/lag to be usable without offset (#15057) 2023-10-04 17:38:46 +05:30
Zoltan Haindrich 3342e03ea8
Windowing processing may have run into Exceptions when the whole table was processed (#15064)
Earlier when the query was processing the whole table; the planning may have ended with a NPE; as it was not possible to create a scanquery from it.
2023-10-04 11:27:11 +05:30
Xavier Léauté adef2069b1
Make unit tests pass with Java 21 (#15014)
This change updates dependencies as needed and fixes tests to remove code incompatible with Java 21
As a result all unit tests now pass with Java 21.

* update maven-shade-plugin to 3.5.0 and follow-up to #15042
  * explain why we need to override configuration when specifying outputFile
  * remove configuration from dependency management in favor of explicit overrides in each module.
* update to mockito to 5.5.0 for Java 21 support when running with Java 11+
  * continue using latest mockito 4.x (4.11.0) when running with Java 8  
  * remove need to mock private fields
* exclude incorrectly declared mockito dependency from pac4j-oidc
* remove mocking of ByteBuffer, since sealed classes can no longer be mocked in Java 21
* add JVM options workaround for system-rules junit plugin not supporting Java 18+
* exclude older versions of byte-buddy from assertj-core
* fix for Java 19 changes in floating point string representation
* fix missing InitializedNullHandlingTest
* update easymock to 5.2.0 for Java 21 compatibility
* update animal-sniffer-plugin to 1.23
* update nl.jqno.equalsverifier to 3.15.1
* update exec-maven-plugin to 3.1.0
2023-10-03 22:41:21 -07:00
Soumyava cb050282a0
Intervals are updated properly for Unnest queries (#15020)
Fixes a bug where the unnest queries were not updated with the correct intervals.
2023-10-04 02:52:10 +05:30
Zoltan Haindrich f3d1c8b70e
Enable back testcases in CalciteWindowQueryTest (#15045)
Most of the testcases were disabled in CalciteWindowQueryTest during the Calcite-1.35 upgrade; there were some changes arising from the fact that the removal of DRUID_SUM had some unexpected sideffects:

SqlStdOperatorTable.SUM became the SUM operator
because of that SqlToRelConverter started rewriting windowed SUM -s into SUM0 -s
my opinion is that w.r.t to Druid this rewrite provides no real advantage - as SUM0 is serviced by SUM here
I believe that's not 100% correct in cases when it aggregates just null-s but that doesnt matter in this case
I propose to introduce back a local DRUID_SUM thing as an unchanged SUM and later when CALCITE-6020 is fixed ; we can drop that.
2023-10-03 10:18:44 +05:30
Soumyava 261f54dc04
coalesce on unnest row mismatch fix (#15019)
* coalesce on unnest row mismatch fix

* new example with coalesce over unnest with nested array columns

* New example with change in order which triggers the nvl

* new test plan update for useDefault=true
2023-10-02 17:26:50 -07:00
Pranav f1edd671fb
Exposing optional replaceMissingValueWith in lookup function and macros (#14956)
* Exposing optional replaceMissingValueWith in lookup function and macros

* args range validation

* Updating docs

* Addressing comments

* Update docs/querying/sql-scalar.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/querying/sql-functions.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Addressing comments

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-10-02 17:09:23 -07:00
Zoltan Haindrich 2785e062d7
Correct quotation in drill query files (#15044) 2023-10-02 08:17:15 -07:00
Pranav 07c28f17ca
Fix missing format strings in calls to DruidException.build (#15056)
* Fix the NPE bug in nonStrictFormat

* using non null format string

* using Assert.assertThrows
2023-09-29 17:00:36 -07:00
Zoltan Haindrich db71e28808
Enable SortProjectTransposeRule (#15002)
contains Enable already passing tests in DecoupledPlanningCalciteQueryTest #14996
enables a transpose rule to support a query plan in which the plan was in the shape:
Sort
  Project
     Aggregate
2023-09-29 10:49:03 +05:30
Zoltan Haindrich 022950a0c5
MV_FILTER_ONLY may run into Exceptions in case duplicate values were processed (#15012) 2023-09-27 19:19:42 +05:30
Gian Merlino 3dabfead05
Fix getResultType for HLL, quantiles aggregators. (#15043)
The aggregators had incorrect types for getResultType when shouldFinalze
is false. They had the finalized type, but they should have had the
intermediate type.

Also includes a refactor of how ExprMacroTable is handled in tests, to make
it easier to add tests for this to the MSQ module. The bug was originally
noticed because the incorrect result types caused MSQ queries with DS_HLL
to behave erratically.
2023-09-27 08:51:14 +05:30
Soumyava 75af741a96
Revert "SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)" (#15029)
This reverts commit 4f498e6469.
2023-09-25 11:35:44 -07:00
Gian Merlino 0850e615b2
Remove istrue, isfalse vectorized impls. (#14991)
These were added in #14977, but the implementations are incorrect, because they return null when the input arg is null. They should return false when the input is null. Remove them for now, rather than fixing them, since they're so new that they might as well never have existed.
2023-09-25 11:34:24 +05:30
Soumyava c184b5250f
Unnest now works on MSQ (#14886)
This entails:
    Removing the enableUnnest flag and additional machinery
    Updating the datasource plan and frame processors to support unnest
    Adding support in MSQ for UnnestDataSource and FilteredDataSource
    CalciteArrayTest now has a MSQ test component
    Additional tests for Unnest on MSQ
2023-09-25 09:19:21 +05:30