925 Commits

Author SHA1 Message Date
Clint Wylie
b4bc9b6950
fix issue with auto columns with mix of scalar values and empty arrays (#15083) 2023-10-05 10:15:45 +05:30
Laksh Singla
b8d03d36b0
Free up the resources when materializing the results as Frames (#15032)
Refactor the code to clean up the result sequences when materializing the results as Frames
2023-10-05 10:14:27 +05:30
Laksh Singla
30cf76db99
Field writers for numerical arrays (#14900)
Row-based frames, and by extension, MSQ now supports numeric array types. This means that all queries consuming or producing arrays would also work with MSQ. Numeric arrays can also be ingested via MSQ. Post this patch, queries like, SELECT [1, 2] would work with MSQ since they consume a numeric array, instead of failing with an unsupported column type exception.
2023-10-04 23:16:47 +05:30
Zoltan Haindrich
90e4b25620
Fix lead/lag to be usable without offset (#15057) 2023-10-04 17:38:46 +05:30
Zoltan Haindrich
3342e03ea8
Windowing processing may have run into Exceptions when the whole table was processed (#15064)
Earlier when the query was processing the whole table; the planning may have ended with a NPE; as it was not possible to create a scanquery from it.
2023-10-04 11:27:11 +05:30
Xavier Léauté
adef2069b1
Make unit tests pass with Java 21 (#15014)
This change updates dependencies as needed and fixes tests to remove code incompatible with Java 21
As a result all unit tests now pass with Java 21.

* update maven-shade-plugin to 3.5.0 and follow-up to #15042
  * explain why we need to override configuration when specifying outputFile
  * remove configuration from dependency management in favor of explicit overrides in each module.
* update to mockito to 5.5.0 for Java 21 support when running with Java 11+
  * continue using latest mockito 4.x (4.11.0) when running with Java 8  
  * remove need to mock private fields
* exclude incorrectly declared mockito dependency from pac4j-oidc
* remove mocking of ByteBuffer, since sealed classes can no longer be mocked in Java 21
* add JVM options workaround for system-rules junit plugin not supporting Java 18+
* exclude older versions of byte-buddy from assertj-core
* fix for Java 19 changes in floating point string representation
* fix missing InitializedNullHandlingTest
* update easymock to 5.2.0 for Java 21 compatibility
* update animal-sniffer-plugin to 1.23
* update nl.jqno.equalsverifier to 3.15.1
* update exec-maven-plugin to 3.1.0
2023-10-03 22:41:21 -07:00
Soumyava
cb050282a0
Intervals are updated properly for Unnest queries (#15020)
Fixes a bug where the unnest queries were not updated with the correct intervals.
2023-10-04 02:52:10 +05:30
Zoltan Haindrich
f3d1c8b70e
Enable back testcases in CalciteWindowQueryTest (#15045)
Most of the testcases were disabled in CalciteWindowQueryTest during the Calcite-1.35 upgrade; there were some changes arising from the fact that the removal of DRUID_SUM had some unexpected sideffects:

SqlStdOperatorTable.SUM became the SUM operator
because of that SqlToRelConverter started rewriting windowed SUM -s into SUM0 -s
my opinion is that w.r.t to Druid this rewrite provides no real advantage - as SUM0 is serviced by SUM here
I believe that's not 100% correct in cases when it aggregates just null-s but that doesnt matter in this case
I propose to introduce back a local DRUID_SUM thing as an unchanged SUM and later when CALCITE-6020 is fixed ; we can drop that.
2023-10-03 10:18:44 +05:30
Soumyava
261f54dc04
coalesce on unnest row mismatch fix (#15019)
* coalesce on unnest row mismatch fix

* new example with coalesce over unnest with nested array columns

* New example with change in order which triggers the nvl

* new test plan update for useDefault=true
2023-10-02 17:26:50 -07:00
Pranav
f1edd671fb
Exposing optional replaceMissingValueWith in lookup function and macros (#14956)
* Exposing optional replaceMissingValueWith in lookup function and macros

* args range validation

* Updating docs

* Addressing comments

* Update docs/querying/sql-scalar.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/querying/sql-functions.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Addressing comments

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-10-02 17:09:23 -07:00
Zoltan Haindrich
2785e062d7
Correct quotation in drill query files (#15044) 2023-10-02 08:17:15 -07:00
Pranav
07c28f17ca
Fix missing format strings in calls to DruidException.build (#15056)
* Fix the NPE bug in nonStrictFormat

* using non null format string

* using Assert.assertThrows
2023-09-29 17:00:36 -07:00
Zoltan Haindrich
db71e28808
Enable SortProjectTransposeRule (#15002)
contains Enable already passing tests in DecoupledPlanningCalciteQueryTest #14996
enables a transpose rule to support a query plan in which the plan was in the shape:
Sort
  Project
     Aggregate
2023-09-29 10:49:03 +05:30
Zoltan Haindrich
022950a0c5
MV_FILTER_ONLY may run into Exceptions in case duplicate values were processed (#15012) 2023-09-27 19:19:42 +05:30
Gian Merlino
3dabfead05
Fix getResultType for HLL, quantiles aggregators. (#15043)
The aggregators had incorrect types for getResultType when shouldFinalze
is false. They had the finalized type, but they should have had the
intermediate type.

Also includes a refactor of how ExprMacroTable is handled in tests, to make
it easier to add tests for this to the MSQ module. The bug was originally
noticed because the incorrect result types caused MSQ queries with DS_HLL
to behave erratically.
2023-09-27 08:51:14 +05:30
Soumyava
75af741a96
Revert "SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)" (#15029)
This reverts commit 4f498e64691ecd22eaa2c940d1d0d57e769ee9e7.
2023-09-25 11:35:44 -07:00
Gian Merlino
0850e615b2
Remove istrue, isfalse vectorized impls. (#14991)
These were added in #14977, but the implementations are incorrect, because they return null when the input arg is null. They should return false when the input is null. Remove them for now, rather than fixing them, since they're so new that they might as well never have existed.
2023-09-25 11:34:24 +05:30
Soumyava
c184b5250f
Unnest now works on MSQ (#14886)
This entails:
    Removing the enableUnnest flag and additional machinery
    Updating the datasource plan and frame processors to support unnest
    Adding support in MSQ for UnnestDataSource and FilteredDataSource
    CalciteArrayTest now has a MSQ test component
    Additional tests for Unnest on MSQ
2023-09-25 09:19:21 +05:30
Zoltan Haindrich
e76962f453
Use annotation to mark DecoupleIgnore (#15005) 2023-09-21 12:36:52 +05:30
Laksh Singla
ebb794632a
Allow users with STATE permissions to read and write the state APIs for querying with deep storage (#14944)
Currently, only the user who has submitted the async query has permission to interact with the status APIs for that async query. However, often we want an administrator to interact with these resources as well.
Druid handles these with the STATE resource traditionally, and if the requesting user has necessary permissions on it as well, alternatively, they should be allowed to interact with the status APIs, irrespective of whether they are the submitter of the query.
2023-09-21 06:55:07 +05:30
Pranav
883c2692d2
Adding new function decode_base64_utf8 and expr macro (#14943)
* Adding new function decode_base64_utf8 and expr macro

* using BaseScalarUnivariateMacroFunctionExpr

* Print stack trace in case of debug in ChainedExecutionQueryRunner

* fix static check
2023-09-20 17:06:34 -07:00
Gian Merlino
823f620ede
Add IS [NOT] DISTINCT FROM to SQL and join matchers. (#14976)
* Add IS [NOT] DISTINCT FROM to SQL and join matchers.

Changes:

1) Add "isdistinctfrom" and "notdistinctfrom" native expressions.

2) Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions
   when generating expressions, and is treated the same as equals and
   not-equals when generating native filters on literals.

3) Update join matchers to have an "includeNull" parameter that determines
   whether we are operating in "equals" mode or "is not distinct from"
   mode.

* Main changes:

- Add ARRAY handling to "notdistinctfrom" and "isdistinctfrom".
- Include null in pushed-down filters when using "notdistinctfrom" in a join.

Other changes:
- Adjust join filter analyzer to more explicitly use InDimFilter's ValuesSets,
  relying less on remembering to get it right to avoid copies.

* Remove unused "wrap" method.

* Fixes.

* Remove methods we do not need.

* Fix bug with INPUT_REF.
2023-09-20 10:44:32 -07:00
Zoltan Haindrich
e8773f4d0f
Enable already passing tests in DecoupledPlanningCalciteQueryTest (#14996) 2023-09-20 15:42:52 +05:30
Gian Merlino
4f498e6469
SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)
* SQL: Plan non-equijoin conditions as cross join followed by filter.

Druid has previously refused to execute joins with non-equality-based
conditions. This was well-intentioned: the idea was to push people to
write their queries in a different, hopefully more performant way.

But as we're moving towards fuller SQL support, it makes more sense to
allow these conditions to go through with the best plan we can come up
with: a cross join followed by a filter. In some cases this will allow
the query to run, and people will be happy with that. In other cases,
it will run into resource limits during execution. But we should at
least give the query a chance.

This patch also updates the documentation to explain how people can
tell whether their queries are being planned this way.

* cartesian is a word.

* Adjust tests.

* Update docs/querying/datasource.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-09-19 10:23:42 -07:00
Soumyava
279b3818f0
Make Unnest work with nullif operator (#14993)
This is due to the recursive filter creation in unnest storage adapter not performing correctly in case of an empty children. This PR addresses the issue
2023-09-15 09:54:14 +05:30
Gian Merlino
3ae5e97801
Add IS [NOT] TRUE, IS [NOT] FALSE native functions. (#14977)
They are not quite the same as "x == true", "x != true", etc. These
functions never return null, even when "x" itself is null.
2023-09-14 09:19:09 -07:00
Soumyava
7bbefd5741
Updating version in from.ftl (#14982) 2023-09-14 05:11:36 +00:00
Soumyava
bf99d2c7b2
Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (#14924)
* Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly

* Fixing a failed test

* Updating numericNilAgg

* Moving to use default values in case of nil agg

* Adding the same for first agg

* Fixing a test

* fixing vectorized string agg for last/first with cast if numeric

* Updating tests to remove mockito and cover the case of string first/last on non string columns

* Updating a test to vectorize

* Addressing review comments: Name change to NilVectorAggregator and using static variables now

* fixing intellij inspections
2023-09-13 13:15:14 -07:00
Laksh Singla
4c57504960
Fix the uncaught exceptions when materializing results as frames (#14970)
When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code).
This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.
2023-09-13 15:41:28 +05:30
Clint Wylie
891f0a3fe9
longer compatibility window for nested column format v4 (#14955)
changes:
* add back nested column v4 serializers
* 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
* add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
2023-09-12 14:07:53 -07:00
Zoltan Haindrich
5d16d0edf0
Count distinct returned incorrect results without useApproximateCountDistinct (#14748)
* fix grouping engine handling of summaries when result set is empty
2023-09-12 13:57:54 -07:00
Clint Wylie
5cecf6ce8f
fix issue with segment metadata cache and complex types when doing out of order upgrades from 0.22 (#14948) 2023-09-12 10:54:35 +08:00
Suneet Saldanha
757603a773
Set task location as k8sPodName for mm-less ingestion (#14959)
* Set task location as k8sPodName for mm-less ingestion

* tests
2023-09-11 19:44:26 -07:00
Zoltan Haindrich
699893bcff
Fix StringLastAggregatorFactory equals/toString (#14907)
* update test

* update test

* format

* test

* fix0

* Revert "fix0"

This reverts commit 44992cb3932158c1253134bc689884abd4650fd3.

* ok resultset

* add plan

* update test

* before rewind

* test

* fix toString/compare/test

* move test

* add timeColumn to hashCode
2023-09-08 09:20:54 -07:00
Soumyava
a8fa979115
Unnest dont push down not (#14942)
* Not pushing down not filters

* New test case

* Updating tests

* Removing a stale comment
2023-09-06 08:57:03 -07:00
Zoltan Haindrich
23308c050d
Remove DruidAggregateCaseToFilterRule (#14940)
The issue due to which the custom rule was added has been fixed as a part of https://issues.apache.org/jira/browse/CALCITE-3763 and accommodated during Calcite upgrade
2023-09-06 19:11:58 +05:30
Laksh Singla
6ee0b06e38
Auto configuration for maxSubqueryBytes (#14808)
A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.
2023-09-06 05:47:19 +00:00
Soumyava
8088a763a6
Vectorize earliest aggregator for both numeric and string types (#14408)
* Vectorizing earliest for numeric

* Vectorizing earliest string aggregator

* checkstyle fix

* Removing unnecessary exceptions

* Ignoring tests in MSQ as earliest is not supported for numeric there

* Fixing benchmarks

* Updating tests as MSQ does not support earliest for some cases

* Addressing review comments by adding the following:
1. Checking capabilities first before creating selectors
2. Removing mockito in tests for numeric first aggs
3. Removing unnecessary tests

* Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string

* Adding a flag for multi value dimension selector

* Addressing comments

* 1 more change

* Handling review comments part 1

* Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order

* Updating numeric first vector agg

* Revert "Updating numeric first vector agg"

This reverts commit 429170990192883e51812311c49d2e461e6db732.

* Updating code for correctness issues

* fixing an issue with latest agg

* Adding more comments and removing an unnecessary check

* Addressing null checks for tie selector and only vectorize false for quantile sketches
2023-09-05 08:41:42 -07:00
Kashif Faraz
7f26b80e21
Simplify ServiceMetricEvent.Builder (#14933)
Changes:
- Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent>
and thus convert it to a plain builder rather than a builder of builder.
- Add methods setCreatedTime , setMetricAndValue to the builder
2023-09-01 11:30:45 +05:30
Zoltan Haindrich
e806d09309
Allow EARLIEST/EARLIEST_BY/LATEST/LATEST_BY for STRING columns without specifying maxStringBytes (#14848) 2023-08-22 22:50:19 -07:00
Zoltan Haindrich
b9a33949fd
Fix aggregation filter expression processing in the absense of projection (#14893)
* test

* fix

* add 33 test

* crap

* Revert "crap"

This reverts commit 2751198debdcf3ee0c0ab9f56a8dfa7477308d93.

* cleanup test

* celanup

* rename test
2023-08-22 10:17:14 -07:00
Zoltan Haindrich
14c1aff150
Fix error messages relating to OVERWRITE keyword (#14870)
OVERWRITE should not be a fully reserved keyword
2023-08-22 16:17:49 +05:30
Clint Wylie
194a9c9abc
set druid.expressions.useStrictBooleans to true by default (#14734) 2023-08-22 00:19:56 -07:00
Clint Wylie
6b14dde50e
deprecate config-magic in favor of json configuration stuff (#14695)
* json config based processing and broker merge configs to deprecate config-magic
2023-08-16 18:23:57 -07:00
Pranav
26d82fd342
fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite (#14587) 2023-08-16 17:57:16 -07:00
Rishabh Singh
0dc305f9e4
Upgrade hibernate validator version to fix CVE-2019-10219 (#14757) 2023-08-14 11:50:51 +05:30
Soumyava
afe22907a5
Calcite upgrade 1.35 (#14510)
* Update to Calcite 1.35.0
* Update from.ftl for Calcite 1.35.0.
* Fixed tests in Calcite upgrade by doing the following:
1. Added a new rule, CoreRules.PROJECT_FILTER_TRANSPOSE_WHOLE_PROJECT_EXPRESSIONS, to Base rules
2. Refactored the CorrelateUnnestRule
3. Updated CorrelateUnnestRel accordingly
4. Fixed a case with selector filters on the left where Calcite was eliding the virtual column
5. Additional test cases for fixes in 2,3,4
6. Update to StringListAggregator to fail a query if separators are not propagated appropriately
* Refactored for testcases to pass after the upgrade, introduced 2 new data sources for handling filters and select projects
* Added a literalSqlAggregator as the upgraded Calcite involved changes to subquery remove rule. This corrected plans for 2 queries with joins and subqueries by replacing an useless literal dimension with a post agg. Additionally a test with COUNT DISTINCT and FILTER which was failing with Calcite 1.21 is added here which passes with 1.35
* Updated to latest avatica and updated code as SqlUnknownTimeStamp is now used in Calcite which needs to be resolved to a timestamp literal
* Added a wrapper segment ref to use for unnest and filter segment reference
2023-08-11 12:47:16 -07:00
Adarsh Sanjeev
56ab81f381
Add support for different result formats to MSQ SqlStatementResource (#14571)
* Add support for different result format

* Add tests

* Add tests

* Fix checkstyle

* Remove changes to destination

* Removed some unwanted code

* Address review comments

* Rename parameter

* Fix tests
2023-08-07 20:48:59 +05:30
Soumyava
0d73480c8f
Latest aggregator factories should accept time as VectorValueSelecto… (#14753)
Fix the queries that have latest aggregator with an expression as time column
2023-08-04 13:04:25 +05:30
Clint Wylie
94fb41a4df
fix nested field virtual column array column element vector object selector (#14729)
Fixes a case I missed in #14688 when the return type is STRING but its coming from a top level array typed column instead of a nested array column while making a vector object selector.

Also while here I noticed that the internal JSON_VALUE functions for array types were named inconsistently with the non-array functions, so I renamed them. These are not documented so it should not be disruptive in any way, since they are only used internally for rewrites while planning to make the correctly virtual column.

JSON_VALUE_RETURNING_ARRAY_VARCHAR -> JSON_VALUE_ARRAY_VARCHAR
JSON_VALUE_RETURNING_ARRAY_BIGINT -> JSON_VALUE_ARRAY_BIGINT
JSON_VALUE_RETURNING_ARRAY_DOUBLE -> JSON_VALUE_ARRAY_DOUBLE
The internal non-array functions are JSON_VALUE_VARCHAR, JSON_VALUE_BIGINT, and JSON_VALUE_DOUBLE.
2023-08-02 17:08:24 +05:30