Commit Graph

879 Commits

Author SHA1 Message Date
Soumyava 38f3cf9e65
Fixing a case where datatype mismatch was happenning in join (#15541) 2023-12-12 12:50:32 -08:00
Clint Wylie 42f2496b7d
fix bug with nested empty array fields (#15532) 2023-12-09 12:20:21 -08:00
Clint Wylie e7c8f2e208
lift restriction of array_to_mv to only support direct column access (#15528) 2023-12-08 16:27:17 -08:00
Soumyava ca4ecdf7d0
Fixing NPE with virtual expression with unnest (#15513)
* Fixing NPE with virtual expression with unnest

* Fixing a comment
2023-12-08 10:51:56 -08:00
Clint Wylie e64b92eb35
add JSON_QUERY_ARRAY function to pluck ARRAY<COMPLEX<json>> out of COMPLEX<json> (#15521) 2023-12-08 05:28:46 -08:00
Adarsh Sanjeev 2e45eadc08
Add better error messages for using OVERWRITE with INSERT statments (#15517)
* Add better error messages for using OVERWRITE with INSERT statments
2023-12-08 15:33:46 +05:30
Zoltan Haindrich c353ccfdef
Windowed min aggregates null-s as 0 (#15371) 2023-12-08 01:41:16 -08:00
sb89594 5fda8613ad
Feature: Add IPv6 Match Function (#15212) 2023-12-07 23:09:06 -08:00
Clint Wylie c241c6980c
store auto columns with only empty or null containing arrays as ARRAY<LONG> instead of COMPLEX<json> (#15505) 2023-12-07 03:31:43 -08:00
Clint Wylie 557f3f6f57
add array column type support to EXTEND operator (#15458) 2023-12-06 23:21:35 -08:00
Rishabh Singh d968bb3f43
Rename config for enabling CentralizedDatasourceSchema feature (#15476)
* Rename property to druid.centralizedDatasourceSchema.enabled
* Update config name in docker-compose
2023-12-05 16:57:25 +05:30
Zoltan Haindrich a1aa4340d0
Changing the queryFrameWork in Calcite*Tests may have sideeffects (#15428)
changes how its configured a bit to use an annotation instead of methods
2023-12-04 00:38:01 +05:30
Clint Wylie 5ce4aab3b8
update ARRAY_OVERLAP to plan with ArrayContainsElement for ARRAY columns (#15451)
Updates ARRAY_OVERLAP to use the same ArrayContainsElement filter added in #15366 when filtering ARRAY typed columns so that it can also use indexes like ARRAY_CONTAINS.
2023-11-30 10:05:20 +05:30
Pranav 93cd638645
Enabling aggregateMultipleValues in all StringAnyAggregators (#15434)
* Enabling aggregateMultipleValues in all StringAnyAggregators

* Adding more tests

* More validation

* fix warning

* updating asserts in decoupled mode

* fix intellij inspection

* Addressing comments

* Addressing comments

* Adding early validations and make aggregate consistent across all

* fixing tests

* fixing tests

* Update docs/querying/sql-aggregations.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* fixing static check

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-11-29 14:32:49 -08:00
Clint Wylie 64fcb32bcf
add native 'array contains element' filter (#15366)
* add native arrayContainsElement filter to use array column element indexes
2023-11-29 03:33:00 -08:00
Abhishek Agarwal 0a56c87e93
SQL: Plan non-equijoin conditions as cross join followed by filter (#15302)
This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.
2023-11-29 13:46:11 +05:30
Clint Wylie 97623b408c
add optional 'castToType' parameter to 'auto' column schema (#15417)
* auto but.. with an expected type
2023-11-28 17:19:23 -08:00
Zoltan Haindrich eb056e23b5
Fix dictionarySize overrides in tests (#15354)
I think this is a problem as it discards the false return value when the putToKeyBuffer can't store the value because of the limit

Not forwarding the return value at that point may lead to the normal continuation here regardless something was not added to the dictionary like here
2023-11-28 18:49:09 +05:30
Zoltan Haindrich ca544e552c
Add option to compare results with relative error tolerance (#15429)
Adds a result comparision mode of EQUALS_RELATIVE_1000_ULPS ; which accepts floating point differences up-to 1000 units of least precision
2023-11-28 13:03:16 +05:30
Abhishek Agarwal 3113e7b350
Fix grouping aggregator when one of the dimension is a simple extraction (#15421)
This PR fixes an issue where the grouping aggregator wrongly assumes that a key dimension is a virtual column and assigns a wrong name to it. This results in a mismatch between the dimensions that grouping aggregator sees and the dimension names that rows are aggregated on. And finally, grouping aggregator generates wrong result.
2023-11-24 13:15:07 +05:30
Clint Wylie a95c22ce70
support non-constant expressions for path arguments for json_value and json_query (#15320)
* support dynamic expressions for path arguments for json_value and json_query
2023-11-17 01:12:05 -08:00
Adarsh Sanjeev a134cc30a6
Change default inSubQueryThreshold (#15336) 2023-11-14 14:08:12 +05:30
Rishabh Singh 5446494e63
Non-existent datasource shouldn't affect schema rebuilding for other datasources (#15355)
In pull request #14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information.

This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.
2023-11-14 12:52:33 +05:30
Rishabh Singh 8c802e4c9b
Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985)
In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal.

To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.
2023-11-04 19:33:25 +05:30
Laksh Singla 0cc8839a60
Allow casted literal values in SQL functions accepting literals (Part 2) (#15316) 2023-11-03 21:22:19 +05:30
Gian Merlino d87d92bc43
Add system fields to input sources. (#15276)
* Add system fields to input sources.

Main changes:

1) The SystemField enum defines system fields "__file_uri", "__file_path",
   and "__file_bucket". They are associated with each input entity.

2) The SystemFieldInputSource interface can be added to any InputSource
   to make it system-field-capable. It sets up serialization of a list
   of configured "systemFields" in the JSON form of the input source, and
   provides a method getSystemFieldValue for computing the value of each
   system field. Cloud object, HDFS, HTTP, and Local now have this.

* Fix various LocalInputSource calls.

* Fix style stuff.

* Fixups.

* Fix tests and coverage.
2023-11-02 10:31:28 -07:00
Clint Wylie d261587f4a
explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245)
* better documentation for the differences between arrays and mvds
* add outputType to ExpressionPostAggregator to make docs true
* add output coercion if outputType is defined on ExpressionPostAgg
* updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables
2023-11-02 00:31:37 -07:00
Gian Merlino 6b6d73b5d4
Use min of scheduler threads and server threads for subquery guardrails. (#15295)
* Use min of scheduler threads and server threads for subquery guardrails.

This allows more memory to be used for subqueries when the query scheduler
is configured to limit queries below the number of server threads. The patch
also refactors the code so SubqueryGuardrailHelper is provided by a Guice
Provider rather than being created by ClientQuerySegmentWalker, to achieve
better separation of concerns.

* Exclude provider from coverage.
2023-11-01 22:34:53 -07:00
Laksh Singla 2ea7177f15
Allow casted literal values in SQL functions accepting literals (#15282)
Functions that accept literals also allow casted literals. This shouldn't have an impact on the queries that the user writes. It enables the SQL functions to accept explicit cast, which is required with JDBC.
2023-11-01 10:38:48 +05:30
Zoltan Haindrich f4a74710e6
Process pure ordering changes with windowing operators (#15241)
- adds a new query build path: DruidQuery#toScanAndSortQuery which:
- builds a ScanQuery without considering the current ordering
- builds an operator to execute the sort
- fixes a null string to "null" literal string conversion in the frame serializer code
- fixes some DrillWindowQueryTest cases
- fix NPE in NaiveSortOperator in case there was no input
- enables back CoreRules.AGGREGATE_REMOVE
- adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts
- earlier window expressions on top of a subquery with an offset may have ignored the offset
2023-10-29 16:40:49 +05:30
Zoltan Haindrich 6784e9c507
Fix summary row issues in case postaggregations are happening (#15232)
* fix-1/2

* add message v1

* extend test to cover for IOB issue

* move stuff around

* change message

* fix testcase string

* compute postaggs (thank you Clint!)

* enable feature for test

* ignore tests in msq

---------

Co-authored-by: Soumyava Das <soumyava@users.noreply.github.com>
2023-10-24 20:33:59 -07:00
Soumyava 06f40a0019
remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query (#15237)
* Fixing nested group by query with order by in outer query

* Adding examples
2023-10-24 15:30:13 -07:00
Zoltan Haindrich 2e31cb2901
DrillWindowQueryTest: use proper way to decide if the query is ordered (#15118) 2023-10-23 10:54:28 -04:00
Zoltan Haindrich b95035f183
Fix VirtualColumn related issues in window expressions (#15119)
for some exotic queries like:

  SELECT
  	'_'||dim1,
    MIN(cast(0 as double)) OVER (),
    MIN(cast((cnt||cnt) as bigint)) OVER ()
  FROM foo
the compilation have resulted in NPE -s mostly because VirtualColumn -s were not handled properly
2023-10-23 14:05:59 +05:30
Zoltan Haindrich fbbb9c7730
Allow DESC ordering in window expressions (#15195) 2023-10-20 07:55:28 -04:00
Zoltan Haindrich 9fb0dbfc9f
Fix json inputs for drill windowing tests (#15148)
This PR:

adds a flag to JsonToParquet to do the fix during conversion
updates the json files to more correct conents
some resultset mismatches were fixed by this
updates parquet to 1.13.1
2023-10-19 14:02:41 +05:30
Clint Wylie 061cfee224
add native filters for "(filter) is true" and "(filter) is false" (#15182)
* add native filters for "(filter) is true" and "(filter) is false"

changes:
* add IsTrueDimFilter, IsFalseDimFilter, and abstract IsBooleanDimFilter for native json filter implementations of `(filter) IS TRUE` and `(filter) IS FALSE`
* add IsBooleanFilter for actual filtering logic for these filters, which ignore includeUnknown to always use matches with false for true and !matches with true for false
* fix test incorrectly adjusted to wrong answer in #15058
* add tests for default value mode
2023-10-18 13:07:35 -07:00
Zoltan Haindrich c58b7f40ee
Rename windowing option (#15184) 2023-10-18 10:54:20 +05:30
Laksh Singla dc8d2192c3
Introduce natural comparator for types that don't have a StringComparator (#15145)
Fixes a bug when executing queries with the ordering of arrays
2023-10-16 10:37:32 +05:30
Zoltan Haindrich 6d62c75866
Fix columns with null values in windowing expressions (#15131) 2023-10-13 10:42:45 -04:00
Clint Wylie a0fd9ec55c
fix issue with SQL boolean constants not respecting nulls when strict booleans and sql compatible null handling are enabled (#15135) 2023-10-12 01:23:24 -07:00
Clint Wylie d0f64608eb
sql compatible three-valued logic native filters (#15058)
* sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true
* log.warn if non-default configurations are used to guide operators towards SQL complaint behavior
2023-10-12 00:06:23 -07:00
Zoltan Haindrich ae88f2c0b6
Fix non-sqlcompat validation in CalciteWindowQueryTest (#15086)
* fixes

* check for latest rewrite place

* Revert "check for latest rewrite place"

This reverts commit 5cf1e2c1ca.

* some stuff

(cherry picked from commit ab346d4373ea888eb8ef6115e018e7fb0d27407f)

* update test output

* updates to test ouptuts

* some stuff

* move validator

* cleanup

* fix

* change test slightly

* add apidoc cleanup warnings

* cleanup/etc

* instead of telling the story; add a fail with some reason whats the issue

* lead-lag fix

* add test

* remove unnecessary throw

* druidexception-trial

* Revert "druidexception-trial"

This reverts commit 8fa06644bc.

* undo changes to no_grouping; add no_grouping2

* add missing assert on resultcount

* rename method; update

* introduce enum/etc

* make resultmatchmode accessible from TestBuilder#expectedResults

* fix dump results to use log

* fix

* handle null correctly

* disable feature type based things for MSQ

* fix varianssqlaggtest

* use eps in other test

* fix intellij error

* add final

* addrss review

* update test/string/etc

* write concat in 3 lines :D
2023-10-11 12:34:31 -07:00
Vishesh Garg c6ca990f1f
Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY (#15095)
EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors.

This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.
2023-10-11 19:48:36 +05:30
Laksh Singla 5f86072456
Prepare master for Druid 29 (#15121)
Prepare master for Druid 29
2023-10-11 10:33:45 +05:30
Zoltan Haindrich 23605c1edd
Enable resultset validation of Drill tests (#15096)
- introduces a test_X method for every testcase (995 testcases)
- added a resultset parser which reads the expected resultset based on the result schema
- loaded a few more datasets
- added a testcase to ensure that all files have a corresponding testcase
- renamed DecoupledIgnore to NegativeTest
- categorized the failing 268 tests
2023-10-10 14:40:50 +05:30
Clint Wylie 1fc8fb1b20
add a bunch of tests with array typed columns to CalciteArraysQueryTest (#15101)
* add a bunch of tests with array typed columns to CalciteArraysQueryTest
* fix a bug with unnest filter pushdown when filtering on unnested array columns
2023-10-09 06:16:06 -07:00
Laksh Singla 549ef56288
UNION ALLs in MSQ (#14981)
MSQ now supports UNION ALL with UnionDataSource
2023-10-09 18:18:15 +05:30
Zoltan Haindrich b5a87fd89b
Support constant args in window functions (#15071)
Instead of passing the constants around in a new parameter; InputAccessor was introduced to take care of transparently handling the constants - this new class started picking up some copy-paste debris around field accesses; and made them a little bit more readble.
2023-10-08 12:14:25 +05:30
Zoltan Haindrich 7b869fd37a
Change type of AVG aggregates to double (#15089)
The sql standard is not very restrictive regarding this:

If AVG is specified and DT is exact numeric, then the declared type of the result is an implemen-
tation-defined exact numeric type with precision not less than the precision of DT and scale not
less than the scale of DT.

so; using the same type is also ok (without patch);
however the avg of 0 and 1 is 0 right now because of the retention of the integer typ

Postgres,MySql and Oracle and Drill seem to increase precision ; mssql returns 0
http://sqlfiddle.com/#!9/6f7248/1

I think we should also increase precision as its already calculated more precisely
2023-10-07 18:01:09 +05:30