Commit Graph

798 Commits

Author SHA1 Message Date
Soumyava bf99d2c7b2
Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (#14924)
* Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly

* Fixing a failed test

* Updating numericNilAgg

* Moving to use default values in case of nil agg

* Adding the same for first agg

* Fixing a test

* fixing vectorized string agg for last/first with cast if numeric

* Updating tests to remove mockito and cover the case of string first/last on non string columns

* Updating a test to vectorize

* Addressing review comments: Name change to NilVectorAggregator and using static variables now

* fixing intellij inspections
2023-09-13 13:15:14 -07:00
Laksh Singla 4c57504960
Fix the uncaught exceptions when materializing results as frames (#14970)
When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code).
This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.
2023-09-13 15:41:28 +05:30
Clint Wylie 891f0a3fe9
longer compatibility window for nested column format v4 (#14955)
changes:
* add back nested column v4 serializers
* 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
* add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
2023-09-12 14:07:53 -07:00
Zoltan Haindrich 5d16d0edf0
Count distinct returned incorrect results without useApproximateCountDistinct (#14748)
* fix grouping engine handling of summaries when result set is empty
2023-09-12 13:57:54 -07:00
Clint Wylie 5cecf6ce8f
fix issue with segment metadata cache and complex types when doing out of order upgrades from 0.22 (#14948) 2023-09-12 10:54:35 +08:00
Suneet Saldanha 757603a773
Set task location as k8sPodName for mm-less ingestion (#14959)
* Set task location as k8sPodName for mm-less ingestion

* tests
2023-09-11 19:44:26 -07:00
Zoltan Haindrich 699893bcff
Fix StringLastAggregatorFactory equals/toString (#14907)
* update test

* update test

* format

* test

* fix0

* Revert "fix0"

This reverts commit 44992cb393.

* ok resultset

* add plan

* update test

* before rewind

* test

* fix toString/compare/test

* move test

* add timeColumn to hashCode
2023-09-08 09:20:54 -07:00
Soumyava a8fa979115
Unnest dont push down not (#14942)
* Not pushing down not filters

* New test case

* Updating tests

* Removing a stale comment
2023-09-06 08:57:03 -07:00
Zoltan Haindrich 23308c050d
Remove DruidAggregateCaseToFilterRule (#14940)
The issue due to which the custom rule was added has been fixed as a part of https://issues.apache.org/jira/browse/CALCITE-3763 and accommodated during Calcite upgrade
2023-09-06 19:11:58 +05:30
Laksh Singla 6ee0b06e38
Auto configuration for maxSubqueryBytes (#14808)
A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.
2023-09-06 05:47:19 +00:00
Soumyava 8088a763a6
Vectorize earliest aggregator for both numeric and string types (#14408)
* Vectorizing earliest for numeric

* Vectorizing earliest string aggregator

* checkstyle fix

* Removing unnecessary exceptions

* Ignoring tests in MSQ as earliest is not supported for numeric there

* Fixing benchmarks

* Updating tests as MSQ does not support earliest for some cases

* Addressing review comments by adding the following:
1. Checking capabilities first before creating selectors
2. Removing mockito in tests for numeric first aggs
3. Removing unnecessary tests

* Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string

* Adding a flag for multi value dimension selector

* Addressing comments

* 1 more change

* Handling review comments part 1

* Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order

* Updating numeric first vector agg

* Revert "Updating numeric first vector agg"

This reverts commit 4291709901.

* Updating code for correctness issues

* fixing an issue with latest agg

* Adding more comments and removing an unnecessary check

* Addressing null checks for tie selector and only vectorize false for quantile sketches
2023-09-05 08:41:42 -07:00
Kashif Faraz 7f26b80e21
Simplify ServiceMetricEvent.Builder (#14933)
Changes:
- Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent>
and thus convert it to a plain builder rather than a builder of builder.
- Add methods setCreatedTime , setMetricAndValue to the builder
2023-09-01 11:30:45 +05:30
Zoltan Haindrich e806d09309
Allow EARLIEST/EARLIEST_BY/LATEST/LATEST_BY for STRING columns without specifying maxStringBytes (#14848) 2023-08-22 22:50:19 -07:00
Zoltan Haindrich b9a33949fd
Fix aggregation filter expression processing in the absense of projection (#14893)
* test

* fix

* add 33 test

* crap

* Revert "crap"

This reverts commit 2751198deb.

* cleanup test

* celanup

* rename test
2023-08-22 10:17:14 -07:00
Zoltan Haindrich 14c1aff150
Fix error messages relating to OVERWRITE keyword (#14870)
OVERWRITE should not be a fully reserved keyword
2023-08-22 16:17:49 +05:30
Clint Wylie 194a9c9abc
set druid.expressions.useStrictBooleans to true by default (#14734) 2023-08-22 00:19:56 -07:00
Clint Wylie 6b14dde50e
deprecate config-magic in favor of json configuration stuff (#14695)
* json config based processing and broker merge configs to deprecate config-magic
2023-08-16 18:23:57 -07:00
Pranav 26d82fd342
fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite (#14587) 2023-08-16 17:57:16 -07:00
Rishabh Singh 0dc305f9e4
Upgrade hibernate validator version to fix CVE-2019-10219 (#14757) 2023-08-14 11:50:51 +05:30
Soumyava afe22907a5
Calcite upgrade 1.35 (#14510)
* Update to Calcite 1.35.0
* Update from.ftl for Calcite 1.35.0.
* Fixed tests in Calcite upgrade by doing the following:
1. Added a new rule, CoreRules.PROJECT_FILTER_TRANSPOSE_WHOLE_PROJECT_EXPRESSIONS, to Base rules
2. Refactored the CorrelateUnnestRule
3. Updated CorrelateUnnestRel accordingly
4. Fixed a case with selector filters on the left where Calcite was eliding the virtual column
5. Additional test cases for fixes in 2,3,4
6. Update to StringListAggregator to fail a query if separators are not propagated appropriately
* Refactored for testcases to pass after the upgrade, introduced 2 new data sources for handling filters and select projects
* Added a literalSqlAggregator as the upgraded Calcite involved changes to subquery remove rule. This corrected plans for 2 queries with joins and subqueries by replacing an useless literal dimension with a post agg. Additionally a test with COUNT DISTINCT and FILTER which was failing with Calcite 1.21 is added here which passes with 1.35
* Updated to latest avatica and updated code as SqlUnknownTimeStamp is now used in Calcite which needs to be resolved to a timestamp literal
* Added a wrapper segment ref to use for unnest and filter segment reference
2023-08-11 12:47:16 -07:00
Adarsh Sanjeev 56ab81f381
Add support for different result formats to MSQ SqlStatementResource (#14571)
* Add support for different result format

* Add tests

* Add tests

* Fix checkstyle

* Remove changes to destination

* Removed some unwanted code

* Address review comments

* Rename parameter

* Fix tests
2023-08-07 20:48:59 +05:30
Soumyava 0d73480c8f
Latest aggregator factories should accept time as VectorValueSelecto… (#14753)
Fix the queries that have latest aggregator with an expression as time column
2023-08-04 13:04:25 +05:30
Clint Wylie 94fb41a4df
fix nested field virtual column array column element vector object selector (#14729)
Fixes a case I missed in #14688 when the return type is STRING but its coming from a top level array typed column instead of a nested array column while making a vector object selector.

Also while here I noticed that the internal JSON_VALUE functions for array types were named inconsistently with the non-array functions, so I renamed them. These are not documented so it should not be disruptive in any way, since they are only used internally for rewrites while planning to make the correctly virtual column.

JSON_VALUE_RETURNING_ARRAY_VARCHAR -> JSON_VALUE_ARRAY_VARCHAR
JSON_VALUE_RETURNING_ARRAY_BIGINT -> JSON_VALUE_ARRAY_BIGINT
JSON_VALUE_RETURNING_ARRAY_DOUBLE -> JSON_VALUE_ARRAY_DOUBLE
The internal non-array functions are JSON_VALUE_VARCHAR, JSON_VALUE_BIGINT, and JSON_VALUE_DOUBLE.
2023-08-02 17:08:24 +05:30
Kashif Faraz 10328c0743
Rename metadatacache and serverview metrics (#14716) 2023-08-01 14:18:20 +05:30
Clint Wylie 5f72f4f37d
fixes for nested virtual column array element vector selectors and fixes for variant and nested variant numeric columns
* fix issue with nested virtual column array element vector selectors when input is numeric array but output is non-numeric
* add vector value selector for mixed numeric type variant and nested variant fields, tests
2023-07-28 15:14:29 -07:00
Clint Wylie d406bafdfc
fix issues with equality and range filters matching double values to long typed inputs (#14654)
* fix issues with equality and range filters matching double values to long typed inputs
* adjust to ensure we never homogenize null, [], and [null] into [null] for expressions on real array columns
2023-07-27 16:01:21 -07:00
Adarsh Sanjeev 6a42a24426
Fix a comment in the Calcite UT testExactCountDistinctWithFilter (#14628) 2023-07-26 06:32:26 +00:00
Gian Merlino 2f9619a96f
Use OverlordClient for all Overlord RPCs. (#14581)
* Use OverlordClient for all Overlord RPCs.

Continuing the work from #12696, this patch removes HttpIndexingServiceClient
and the IndexingService flavor of DruidLeaderClient completely. All remaining
usages are migrated to OverlordClient.

Supporting changes include:

1) Add a variety of methods to OverlordClient.

2) Update MetadataTaskStorage to skip the complete-task lookup when
   the caller requests zero completed tasks. This helps performance of
   the "get active tasks" APIs, which don't want to see complete ones.

* Use less forbidden APIs.

* Fixes from CI.

* Add test coverage.

* Two more tests.

* Fix test.

* Updates from CR.

* Remove unthrown exceptions.

* Refactor to improve testability and test coverage.

* Add isNil tests.

* Remove unnecessary "deserialize" methods.
2023-07-24 21:14:27 -07:00
Gian Merlino c2e6758580
Simplify bounds/range vs selectors/equality logic in SQL planning. (#14619)
* Simplify bounds/range vs selectors/equality logic in SQL planning.

1) Consolidate duplicate code related to Expressions#buildTimeFloorFilter.

2) Cleaner logic in Expressions#toSimpleLeafFilter: choose bounds vs range
   filter based solely on plannerContext.isUseBoundsAndSelectors, not also
   considering rhs kind. Use parsed rhs in both paths (except for numerics
   in the bound path).

3) Fix ArrayContains, ArrayOverlap to avoid equality filters when there is
   an extractionFn present. Fixes a bug introduced in #14612.

* Avoid sending nonprimitives down the bound path.
2023-07-19 22:40:47 -07:00
Clint Wylie 68fd22169f
remove extractionFn from equality, null, and range filters (#14612)
* remove extractionFn from equality, null, and range filters
changes:
* EqualityFilter, NullFilter, and RangeFilter no longer support extractionFn
* SQL planner will use ExpressionFilter in the small number of cases where an extractionFn would have been used if sqlUseBoundsAndSelectors is set to false instead of equality/null/range filters
* fix bugs and add tests with serde, equals, and cache key for null, equality, and range filters

* test coverage fixes bugs

* adjust

* adjust again

* so persnickety
2023-07-19 10:37:57 -07:00
Clint Wylie 913416c669
add equality, null, and range filter (#14542)
changes:
* new filters that preserve match value typing to better handle filtering different column types
* sql planner uses new filters by default in sql compatible null handling mode
* remove isFilterable from column capabilities
* proper handling of array filtering, add array processor to column processors
* javadoc for sql test filter functions
* range filter support for arrays, tons more tests, fixes
* add dimension selector tests for mixed type roots
* support json equality
* rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes
* add cooler equality index maker, ValueIndexes 
* fix missing string utf8 index supplier
* expression array comparator stuff
2023-07-18 12:15:22 -07:00
AmatyaAvadhanula 0412f40d36
Prepare master branch for next release, 28.0.0 (#14595)
* Prepare master branch for next release, 28.0.0
2023-07-18 09:22:30 +05:30
Laksh Singla c1c7dff2ad
Using DruidExceptions in MSQ (changes related to the Broker) (#14534)
MSQ engine returns correct error codes for invalid user inputs in the query context. Also, using DruidExceptions for MSQ related errors happening in the Broker with improved error messages.
2023-07-13 19:08:49 +00:00
Abhishek Radhakrishnan f4ee58eaa8
Add `aggregatorMergeStrategy` property in SegmentMetadata queries (#14560)
* Add aggregatorMergeStrategy property to SegmentMetadaQuery.

- Adds a new property aggregatorMergeStrategy to segmentMetadata query.
aggregatorMergeStrategy currently supports three types of merge strategies -
the legacy strict and lenient strategies, and the new latest strategy.
- The latest strategy considers the latest aggregator from the latest segment
by time order when there's a conflict when merging aggregators from different
segments.
- Deprecate lenientAggregatorMerge property; The API validates that both the new
and old properties are not set, and returns an exception.
- When merging segments as part of segmentMetadata query, the segments have a more
elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to
the name format that segments usually contain. Previously it was simply "merged".
- Adjust unit tests to test the latest strategy, to assert the returned complete
SegmentAnalysis object instead of just the aggregators for completeness.

* Don't explicitly set strict strategy in tests

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/querying/segmentmetadataquery.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-13 12:37:36 -04:00
imply-cheddar 7650a71d37
Add window query test files from Drill (#14561) 2023-07-12 20:14:39 -07:00
imply-cheddar 65e1b27aa7
Fix a resource leak with Window processing (#14573)
* Fix a resource leak with Window processing

Additionally, in order to find the leak, there were
adjustments to the StupidPool to track leaks a bit better.
It would appear that the pool objects get GC'd during testing
for some reason which was causing some incorrect identification
of leaks from objects that had been returned but were GC'd along
with the pool.

* Suppress unused warning
2023-07-12 17:25:42 -05:00
Laksh Singla 5ce536355e
Fix planning bug while using sort merge frame processor (#14450)
sqlJoinAlgorithm is now a hint to the planner to execute the join in the specified manner. The planner can decide to ignore the hint if it deduces that the specified algorithm can be detrimental to the performance of the join beforehand.
2023-07-11 09:58:44 +00:00
Gian Merlino 63ee69b4e8
Claim full support for Java 17. (#14384)
* Claim full support for Java 17.

No production code has changed, except the startup scripts.

Changes:

1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK.

2) Include the full list of opens and exports on both Java 11 and 17.

3) Document that Java 17 is both supported and preferred.

4) Switch some tests from Java 11 to 17 to get better coverage on the
   preferred version.

* Doc update.

* Update errorprone.

* Update docker_build_containers.sh.

* Update errorprone in licenses.yaml.

* Add some more run-javas.

* Additional run-javas.

* Update errorprone.

* Suppress new errorprone error.

* Add exports and opens in ForkingTaskRunner for Java 11+.

Test, doc changes.

* Additional errorprone updates.

* Update for errorprone.

* Restore old fomatting in LdapCredentialsValidator.

* Copy bin/ too.

* Fix Java 15, 17 build line in docker_build_containers.sh.

* Update busybox image.

* One more java command.

* Fix interpolation.

* IT commandline refinements.

* Switch to busybox 1.34.1-glibc.

* POM adjustments, build and test one IT on 17.

* Additional debugging.

* Fix silly thing.

* Adjust command line.

* Add exports and opens one more place.

* Additional harmonization of strong encapsulation parameters.
2023-07-07 12:52:35 -07:00
Gian Merlino dd78e00dc5
Fix ColumnSignature error message and jdk17 test issue. (#14538)
* Fix ColumnSignature error message and jdk17 test issue.

On jdk17, the "problem" part of the error message could change from
NullPointerException to:

  Cannot invoke "String.length()" because "s" is null

Due to the new more-helpful NPEs in Java 17. This broke the expectation
and led to test failures on this case.

This patch fixes the problem by improving the error message so it isn't
a generic NullPointerException.

* Fix format.
2023-07-06 15:10:59 -07:00
Abhishek Radhakrishnan d02bb8bb6e
Set explain attributes after the query is prepared (#14490)
* Add support for DML WITH AS.

* One more UT for with as subquery.

* Add a test with join query

* Use root query prepared node instead of individual SqlNode types.

- Set the explain plan attributes after the query is prepared when
the query is planned and we've the finalized output names in the root
source rel node.
- Adjust tests; add unit test for negative ordinal case.
- Remove the exception / error handling logic from resolveClusteredBy
function since the validations now happen before it comes to the function

* Update comment.
2023-07-06 14:13:32 -04:00
imply-cheddar 5fc122a144
Add window-focused tests from Drill (#13773)
This commit borrows some test definitions from Drill's test suite
and tries to use them to flesh out the full validation of window
function capbilities.

In order to be able to run these tests, we also add the ability to
run a Scan operation against segments, which also meant an
implementation of RowsAndColumns for frames.
2023-07-06 09:20:32 -07:00
Soumyava 78db7a4414
A query in MSQ would issue wrong error code (#14531)
with a RuntimeException. Now the RuntimeException is being replaced by an user facing DruidException of Invalid category which would allow calcite not to throw an uncategorized exception.
2023-07-06 08:59:35 +05:30
Jonathan Wei f29a9faa94
Better surfacing of invalid pattern errors for SQL REGEXP_EXTRACT function (#14505) 2023-07-05 17:12:54 -05:00
Pranav 2d5b27358e
Logging the fieldName in the coerce exceptions (#14483)
Logging the fieldName in the coerce exceptions
2023-07-03 14:13:27 +05:30
Gian Merlino e10e35aa2c
Add REGEXP_REPLACE function. (#14460)
* Add REGEXP_REPLACE function.

Replaces all instances of a pattern with a replacement string.

* Fixes.

* Improve test coverage.

* Adjust behavior.
2023-06-29 13:47:57 -07:00
Gian Merlino a6cabbe10f
SQL: Avoid "intervals" for non-table-based datasources. (#14336)
In these other cases, stick to plain "filter". This simplifies lots of
logic downstream, and doesn't hurt since we don't have intervals-specific
optimizations outside of tables.

Fixes an issue where we couldn't properly filter on a column from an
external datasource if it was named __time.
2023-06-29 09:57:11 +05:30
Gian Merlino 34c55a0bde
SQL: SUBSTRING support for non-literals. (#14480)
* SQL: SUBSTRING support for non-literals.

* Fix AssertionError test.

* Fix header.
2023-06-28 13:43:05 -07:00
Jonathan Wei c36f12f1d8
Support complex variance object inputs for variance SQL agg function (#14463)
* Support complex variance object inputs for variance SQL agg function

* Add test

* Include complexTypeChecker, address PR comments

* Checkstyle, javadoc link
2023-06-28 13:14:19 -05:00
Karan Kumar cb3a9d2b57
Adding Interactive API's for MSQ engine (#14416)
This PR aims to expose a new API called
"@path("/druid/v2/sql/statements/")" which takes the same payload as the current "/druid/v2/sql" endpoint and allows users to fetch results in an async manner.
2023-06-28 17:51:58 +05:30
Gian Merlino c78d885b80
Cache parsed expressions and binding analysis in more places. (#14124)
* Cache parsed expressions and binding analysis in more places.

Main changes:

1) Cache parsed and analyzed expressions within PlannerContext for a
   single SQL query.

2) Cache parsed expressions together with input binding analysis using
   a new class AnalyzeExpr.

This speeds up SQL planning, because SQL planning involves parsing
analyzing the same expression strings over and over again.

* Fixes.

* Fix style.

* Fix test.

* Simplify: get rid of AnalyzedExpr, focus on caching.

* Rename parse -> parseExpression.
2023-06-27 13:40:35 -07:00