druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	806649f8af	SQL: Fix nullable DATE, TIMESTAMP reduction. (#16915 ) Reduction of nullable DATE and TIMESTAMP expressions did not perform a necessary null check, so would in some cases reduce to 1970-01-01 00:00:00 (epoch) rather than NULL.	2024-08-16 22:41:12 -07:00
Clint Wylie	4283b270e3	rework cursor creation (#16533 ) changes: * Added `CursorBuildSpec` which captures all of the 'interesting' stuff that goes into producing a cursor as a replacement for the method arguments of `CursorFactory.canVectorize`, `CursorFactory.makeCursor`, and `CursorFactory.makeVectorCursor` * added new interface `CursorHolder` and new interface `CursorHolderFactory` as a replacement for `CursorFactory`, with method `makeCursorHolder`, which takes a `CursorBuildSpec` as an argument and replaces `CursorFactory.canVectorize`, `CursorFactory.makeCursor`, and `CursorFactory.makeVectorCursor` * `CursorFactory.makeCursors` previously returned a `Sequence<Cursor>` corresponding to the query granularity buckets, with a separate `Cursor` per bucket. `CursorHolder.asCursor` instead returns a single `Cursor` (equivalent to 'ALL' granularity), and a new `CursorGranularizer` has been added for query engines to iterate over the cursor and divide into granularity buckets. This makes the non-vectorized engine behave the same way as the vectorized query engine (with its `VectorCursorGranularizer`), and simplifies a lot of stuff that has to read segments particularly if it does not care about bucketing the results into granularities. * Deprecated `CursorFactory`, `CursorFactory.canVectorize`, `CursorFactory.makeCursors`, and `CursorFactory.makeVectorCursor` * updated all `StorageAdapter` implementations to implement `makeCursorHolder`, transitioned direct `CursorFactory` implementations to instead implement `CursorMakerFactory`. `StorageAdapter` being a `CursorMakerFactory` is intended to be a transitional thing, ideally will not be released in favor of moving `CursorMakerFactory` to be fetched directly from `Segment`, however this PR was already large enough so this will be done in a follow-up. * updated all query engines to use `makeCursorHolder`, granularity based engines to use `CursorGranularizer`.	2024-08-16 11:34:10 -07:00
Sree Charan Manamala	964cf47bb5	fix NPE (#16897 )	2024-08-15 18:12:22 +08:00
Akshat Jain	3d6cedb25f	Fix IndexOutOfBoundsException for MSQ window function queries with empty RAC (#16865 ) * Fix IndexOutOfBoundsException for MSQ window function queries with empty RAC	2024-08-09 11:39:53 +05:30
zachjsh	cb09b572e6	Fix Druid table schema resolution when table defined in catalog and has schema manager (#16869 ) * SQL syntax error should target USER persona * * revert change to queryHandler and related tests, based on review comments * * add test * Properly handle Druid schema blending with catalog definition and segment metadata * * add javadocs	2024-08-08 21:21:03 -04:00
Zoltan Haindrich	408702e100	Add ability to run MSQ in Quidem tests (#16798 ) * implements some jdbc facade to enable msq usage * adds an !msqPlan command * adds more guice usage to testsystem startup	2024-08-08 06:37:06 +02:00
Gian Merlino	de40d81b29	SQL: Add ProjectableFilterableTable to SegmentsTable. (#16841 ) * SQL: Add ProjectableFilterableTable to SegmentsTable. This allows us to skip serialization of expensive fields such as shard_spec, dimensions, metrics, and last_compaction_state, if those fields are not actually being queried. * Restructure logic to avoid unnecessary toString() as well.	2024-08-06 06:40:21 -07:00
Sree Charan Manamala	ed6b547481	Handle default bounds correctly in WINDOW clause (#16833 ) When a window is defined as WINDOW W AS <DEF> and using a syntax of (PARTITION BY col1 ORDER BY col2 ROWS x PRECEDING), we would need to default the other bound to CURRENT ROW We already have implemented this earlier, but when defined as WINDOW W AS <DEF>, Calcite takes a different route to validate the window.	2024-08-06 09:58:44 +02:00
Zoltan Haindrich	26e3c44f4b	Quidem record (#16624 ) * enables to launch a fake broker based on test resources (druidtest uri) * could record queries into new testfiles during usage * instead of re-purpose Calcite's Hook migrates to use DruidHook which we can add further keys * added a quidem-ut module which could be the place for tests which could iteract with modules/etc	2024-08-05 14:58:32 +02:00
Sree Charan Manamala	c7eacd079e	fallback SQL IN filter to expression filter when VirtualColumnRegistry is null (#16836 )	2024-08-05 11:27:51 +05:30
Abhishek Radhakrishnan	31b43753fb	Add `druid.indexing.formats.stringMultiValueHandlingMode` system config (#16822 ) This patch introduces an optional cluster configuration, druid.indexing.formats.stringMultiValueHandlingMode, allowing operators to override the default mode SORTED_SET for string dimensions. The possible values for the config are SORTED_SET, SORTED_ARRAY, or ARRAY (SORTED_SET is the default). Case insensitive values are allowed. While this cluster property allows users to manage the multi-value handling mode for string dimension types, it's recommended to migrate to using real array types instead of MVDs. This fixes a long-standing issue where compaction will honor the configured cluster wide property instead of rewriting it as the default SORTED_ARRAY always, even if the data was originally ingested with ARRAY or SORTED_SET.	2024-08-03 10:23:44 -07:00
Zoltan Haindrich	c7cde31a89	HAVING clauses may not contain window functions (#16742 ) Rejects having clauses if they contain windowed expressions. Also added a check to produce a more descriptive error if an OVER expression reaches the filter translation layer. --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-29 04:11:36 -04:00
Sree Charan Manamala	9b76d13ff8	Check for Aggregation inside a window clause when syntax used as - WINDOW W AS DEF (#16801 )	2024-07-26 11:18:35 +02:00
Clint Wylie	14954c7eb9	serialize legacy as false for scan query for rolling downgrade/upgrade (#16793 ) Fixes rolling downgrades/upgrades after #16659 by hard coding scan query "legacy":false since it is a required property during deserialization.	2024-07-25 14:51:58 +05:30
Zoltan Haindrich	7e3fab5bf9	Make WindowFrames more specific (#16741 ) Changes the WindowFrame internals / representation a bit; introduces dedicated frametypes for rows and groups which corresponds to the implemented processing methods	2024-07-25 04:57:36 +02:00
Akshat Jain	a0437b6c93	MSQ window functions: Fix partition boundary issues for arrays (#16780 ) * MSQ window functions: Fix partition boundary issues for arrays * Address review comments * Cache type strategies * Trigger Build * Convert typeStrategies from list to array	2024-07-24 18:47:04 +05:30
Sree Charan Manamala	3f4d66c399	Check for Unsupported Aggregation with Distinct when useApproxCountDistinct is enabled (#16770 ) * init * add NativelySupportsDistinct * refactor * javadoc * refactor * fix tests * fix drill tests * comments * Update sql/src/test/java/org/apache/druid/sql/calcite/DrillWindowQueryTest.java --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-24 11:13:22 +08:00
Laksh Singla	11bb40981e	Deduce type from the aggregators when materializing subquery results (#16703 ) For aggregators like StringFirst/Last, whose intermediate type isn't the same as the final type, using them in GroupBy, TopN or Timeseries subqueries causes a fallback when maxSubqueryBytes is set. This is because we assume that the finalization is not known, due to which the row signature cannot determine whether to use the intermediate or the final type, and it puts it as null. This PR figures out the finalization from the query context and uses the intermediate or the final type appropriately.	2024-07-23 11:52:39 +05:30
Akshat Jain	c45d4fdbca	MSQ window functions: Minor cleanup for empty over clause related flows + Exhaustive tests (#16754 ) * MSQ window functions: Revamp logic to create separate window stages when empty over() clause is present * Fix tests * Revert changes of creating separate stages for empty over clause * Address review comments	2024-07-23 11:37:34 +05:30
Akshat Jain	6a2348b78b	Preemptive restriction for queries with approximate count distinct on complex columns of unsupported type (#16682 ) This PR aims to check if the complex column being queried aligns with the supported types in the aggregator and aggregator factories, and throws a user-friendly error message if they don't.	2024-07-22 21:34:06 +05:30
Sree Charan Manamala	149d7c5207	Throw exceptions in SqlValidator when DISTINCT used over WINDOW (#16738 ) * Throw exception if DISTINCT used with window functions aggregate call * Improve error message when unsupported aggregations are used with window functions	2024-07-22 16:29:46 +02:00
Sree Charan Manamala	c9aae9d8e6	Enable WINDOW_LEAF_OPERATOR for native engine to support queries without group by (#16753 )	2024-07-22 12:31:55 +02:00
Clint Wylie	35b876436b	remove native scan query legacy mode (#16659 )	2024-07-18 23:33:27 -07:00
Akshat Jain	b53c26f5c5	Fix issues with partitioning boundaries for MSQ window functions (#16729 ) * Fix issues with partitioning boundaries for MSQ window functions * Address review comments * Address review comments * Add test for coverage check failure * Address review comment * Remove DruidWindowQueryTest and WindowQueryTestBase, move those tests to DrillWindowQueryTest * Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java * Address review comments * Add test for equals and hashcode for WindowOperatorQueryFrameProcessorFactory * Address review comment * Fix checkstyle --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-18 10:05:09 +08:00
Sree Charan Manamala	40ef9fc4ec	Bug fix for array type selector causing array aggregation over window frame fail (#16653 )	2024-07-17 14:09:56 +02:00
Sree Charan Manamala	78a4a09d01	Window Function offset correction for RAC (#16718 ) * When an ArrayList RAC creates a child RAC, the start and end offsets need to have the offset of parent's start offset * Defaults the 2nd window bound to CURRENT ROW when only a single bound is specified * Removes the windowingStrictValidation warning and throws a hard exception when Order By alongside RANGE clause is not provided with UNBOUNDED or CURRENT ROW as both bounds	2024-07-15 12:43:27 +02:00
Rishabh Singh	64104533ac	Enable querying entirely cold datasources (#16676 ) Add ability to query entirely cold datasources.	2024-07-15 15:02:59 +05:30
Vishesh Garg	197c54f673	Auto-Compaction using Multi-Stage Query Engine (#16291 ) Description: Compaction operations issued by the Coordinator currently run using the native query engine. As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative that we support compaction on MSQ to make Compaction more robust and possibly faster. For instance, we have seen OOM errors in native compaction that MSQ could have handled by its auto-calculation of tuning parameters. This commit enables compaction on MSQ to remove the dependency on native engine. Main changes: * `DataSourceCompactionConfig` now has an additional field `engine` that can be one of `[native, msq]` with `native` being the default. * if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which could happen in case a fraction of the tasks were allotted and they eventually fell short of the number of tasks required by the MSQ engine to run the compaction. * `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field. * `CompactionTask` now has `CompactionRunner` interface instance with its implementations `NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension. The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the `CompactionRunner` instance that is mapped to the specified type [`native`, `msq`]. * `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks. * `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in the segment schema. If present, the task is created with a native group-by query; if not, the task is issued with a scan query. The `storeCompactionState` flag is set in the context. * Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask` since otherwise, the workers will be unable to determine the controller task's location for communication (as they haven't been launched via the overlord).	2024-07-12 16:40:20 +05:30
Sree Charan Manamala	760d70312f	Window Drill tests coverage improvement (#16722 ) Window Drill tests coverage improvement	2024-07-11 19:11:36 +05:30
Zoltan Haindrich	a9bd0eea2a	Fix queries filtering for the same condition with both an IN and EQUALS to not return empty results (#16597 ) temp fix until CALCITE-6435 gets fixed (released&upgraded to) added a custom rule (FixIncorrectInExpansionTypes) to fix-up types of the affected literals added a testcase which will alert on upgrade	2024-07-09 12:28:21 +05:30
Alberic Liu	c6c2652c89	unified the code format in NestedDataOperatorConversions (#16695 )	2024-07-08 10:06:24 +08:00
Akshat Jain	34c80ee3de	Add MSQ engine support for window function drill tests (#16665 ) * Add MSQ engine support for window function drill tests * Address review comments * Revert formatting changes in TestDataBuilder	2024-06-28 11:14:17 +05:30
Rishabh Singh	b9c7664ac3	Fix empty datasource schema on the Broker when metadata query is disabled (#16645 ) * Fix build * Fix empty datasource schema on the broker * review comment * Remove unused import	2024-06-28 11:06:56 +05:30
Clint Wylie	d4f2636325	fix greatest/least function non-vectorized processing to ignore null argument types (#16649 )	2024-06-26 12:59:42 -07:00
Tom	52c9929019	Column name in parse exceptions (#16529 ) * first pass * more changes * fix tests and formatting * fix kinesis failing tests * fix kafka tests * add dimension name to float parse errors * double and convertToType handling of dimensionName can report parse errors with dimension name * fix checkstyle issue * fix tests * more cases to have better parse exception messages * fix test * fix tests * partially address comments * annotate method parameter with nullable * address comments * fix tests * let float, double, long dimensionIndexer pass dimensionName down to dimensionHandlerUtils * fix compilation error and clean up formatting * clean up whitespace * address feedback. undo change, pass down report parse exception for convertToType * fix test	2024-06-25 13:42:52 -07:00
Clint Wylie	37a50e6803	Remove index_realtime and index_realtime_appenderator tasks (#16602 ) index_realtime tasks were removed from the documentation in #13107. Even at that time, they weren't really documented per se— just mentioned. They existed solely to support Tranquility, which is an obsolete ingestion method that predates migration of Druid to ASF and is no longer being maintained. Tranquility docs were also de-linked from the sidebars and the other doc pages in #11134. Only a stub remains, so people with links to the page can see that it's no longer recommended. index_realtime_appenderator tasks existed in the code base, but were never documented, nor as far as I am aware were they used for any purpose. This patch removes both task types completely, as well as removes all supporting code that was otherwise unused. It also updates the stub doc for Tranquility to be firmer that it is not compatible. (Previously, the stub doc said it wasn't recommended, and pointed out that it is built against an ancient 0.9.2 version of Druid.) ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2024-06-24 20:13:33 -07:00
Sree Charan Manamala	990fd5f5fb	Make use group iterator for all window frames & support for same bound kinds (#16603 ) Fixes apache/druid#15739	2024-06-24 15:52:41 +02:00
Laksh Singla	00c96432af	Materialize scan results correctly when columns are not present in the segments (#16619 ) Fixes a bug causing maxSubqueryBytes not to work when segments have missing columns.	2024-06-23 23:15:45 +05:30
Abhishek Radhakrishnan	b20c3dbadf	Fix malformed period throwing `ADMIN` persona error (#16626 ) * Turn invalid periods into user-facing exception providing more context. The current exception is targeting the ADMIN persona. Catch that and turn it into a USER persona instead. Also, provide more context in the error message. * Review comment: pass the wrapping expression and stringify. * Update processing/src/main/java/org/apache/druid/query/expression/ExprUtils.java Co-authored-by: Clint Wylie <cjwylie@gmail.com> --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2024-06-20 08:40:28 -07:00
Sree Charan Manamala	7ac0862287	Grouping Engine fix when a limit spec with different order by columns is applied (#16534 )	2024-06-20 11:35:58 +02:00
Laksh Singla	da1e293a57	Deserialize dimensions in group by queries to their respective types when reading from their serialized format (#16511 ) * init * tests, pair groupable * framework change * tests * update benchmarks * comments * add javadoc for the jsonMapper * remove extra deserialization * add special serde for map based result rows * revert unnecessary change --------- Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-14 16:27:47 +08:00
Zoltan Haindrich	ac19b148c2	Upgrade calcite to 1.37.0 (#16504 ) * contains Make a full copy of the parser and apply our modifications to it #16503 * some minor api changes pair/entry * some unnecessary aggregation was removed from a set of queries in `CalciteSubqueryTest` * `AliasedOperatorConversion` was detecting `CHAR_LENGTH` as not a function ; I've removed the check * the field it was using doesn't look maintained that much * the `kind` is passed for the created `SqlFunction` so I don't think this check is actually needed * some decoupled test cases become broken - will be fixed later * some aggregate related changes: due to the fact that SUM() and COUNT() of no inputs are different * upgrade avatica to 1.25.0 * `CalciteQueryTest#testExactCountDistinctWithFilter` is now executable Close apache/druid#16503	2024-06-13 08:47:50 +02:00
Zoltan Haindrich	f8645de341	Remove incorrect utf8 conversion of ResultCache keys (#16569 )	2024-06-12 13:12:05 -07:00
Clint Wylie	fee509df2e	fix NestedDataColumnIndexerV4 to not report cardinality (#16507 ) * fix NestedDataColumnIndexerV4 to not report cardinality changes: * fix issue similar to #16489 but for NestedDataColumnIndexerV4, which can report STRING type if it only processes a single type of values. this should be less common than the auto indexer problem * fix some issues with sql benchmarks	2024-06-11 20:58:12 -07:00
zachjsh	3f5f5921e0	Fix sql syntax error user (#16583 ) This fixes an issue where in some cases, a SQL syntax error encountered when parsing / planning a query results in an error returned to the user with persona a `admin` when it should instead be `user`.	2024-06-11 18:08:35 -04:00
Clint Wylie	3fb6ba22e8	fix expression column capabilities to not report dictionary encoded unless input is string (#16577 )	2024-06-08 13:05:19 -07:00
Gian Merlino	277006446d	Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. (#16366 ) * Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr. In addition: - Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well. - In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes it easier to identify tests that can now vectorize. - Fix a null-matcher bug in StringObjectVectorValueMatcher. * Fix tests. * Fixes. * Fix tests. * Fix test. * Fix test.	2024-06-05 20:03:02 -07:00
Gian Merlino	b837ce565b	Simplify serialized form of JsonInputFormat. (#15691 ) * Simplify serialized form of JsonInputFormat. Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and useJsonNodeReader. Because the default value of keepNullColumns is variable, we store the original configured value rather than the derived value, and include if the original value is nonnull. * Fix test.	2024-06-05 20:01:14 -07:00
Gian Merlino	1040a29bc5	Fix capabilities reported by UnnestStorageAdapter. (#16551 ) UnnestStorageAdapter and its cursors did not return capabilities correctly for the output column. This patch fixes two problems: 1) UnnestStorageAdapter returned the capabilities of the unnest virtual column prior to unnesting. It should return the post-unnest capabilities. 2) UnnestColumnValueSelectorCursor passed through isDictionaryEncoded from the unnest virtual column. This is incorrect, because the dimension selector created by this class never has a dictionary. This is the cause of #16543.	2024-06-05 15:19:42 -07:00
Akshat Jain	6d7d2ffa63	Add interface method for returning canonical lookup name (#16557 ) * Add interface method for returning canonical lookup name * Address review comment * Add test in LookupReferencesManagerTest for coverage check * Add test in LookupSerdeModuleTest for coverage check	2024-06-05 14:33:18 -07:00
Abhishek Radhakrishnan	b9ba286423	Fix task bootstrapping & simplify segment load/drop flows (#16475 ) * Fix task bootstrap locations. * Remove dependency of SegmentCacheManager from SegmentLoadDropHandler. - The load drop handler code talks to the local cache manager via SegmentManager. * Clean up unused imports and stuff. * Test fixes. * Intellij inspections and test bind. * Clean up dependencies some more * Extract test load spec and factory to its own class. * Cleanup test util * Pull SegmentForTesting out to TestSegmentUtils. * Fix up. * Minor changes to infoDir * Replace server announcer mock and verify that. * Add tests. * Update javadocs. * Address review comments. * Separate methods for download and bootstrap load * Clean up return types and exception handling. * No callback for loadSegment(). * Minor cleanup * Pull out the test helpers into its own static class so it can have better state control. * LocalCacheManager stuff * Fix build. * Fix build. * Address some CI warnings. * Minor updates to javadocs and test code. * Address some CodeQL test warnings and checkstyle fix. * Pass a Consumer<DataSegment> instead of boolean & rename variables. * Small updates * Remove one test constructor. * Remove the other constructor that wasn't initializing fully and update usages. * Cleanup withInfoDir() builder and unnecessary test hooks. * Remove mocks and elaborate on comments. * Commentary * Fix a few Intellij inspection warnings. * Suppress corePoolSize intellij-inspect warning. The intellij-inspect tool doesn't seem to correctly inspect lambda usages. See ScheduledExecutors. * Update docs and add more tests. * Use hamcrest for asserting order on expectation. * Shutdown bootstrap exec. * Fix checkstyle	2024-06-04 10:44:46 -07:00
Sree Charan Manamala	6bbf9613f8	Throw soft exception in case of empty signature while building Scan Query (#16502 )	2024-05-29 09:41:54 +02:00
Sree Charan Manamala	27cfe12f4a	Enable reordering of window operators (#16482 ) This commit aims to enable the re-ordering of window operators in order to optimise the sort and partition operators. Example : ``` SELECT m1, m2, SUM(m1) OVER(PARTITION BY m2) as sum1, SUM(m2) OVER() as sum2 from numFoo GROUP BY m1,m2 ``` In order to compute this query, we can order the operators as to first compute the operators corresponding to sum2 and then place the operators corresponding to sum1 which would help us in reducing one sort operator if we order our operators by sum1 and then sum2.	2024-05-29 12:17:12 +05:30
Clint Wylie	4e1de50e30	fix issue with auto column grouping (#16489 ) * fix issue with auto column grouping changes: * fixes bug where AutoTypeColumnIndexer reports incorrect cardinality, allowing it to incorrectly use array grouper algorithm for realtime queries producing incorrect results for strings * fixes bug where auto LONG and DOUBLE type columns incorrectly report not having null values, resulting in incorrect null handling when grouping * fix test	2024-05-27 11:18:17 +05:30
zachjsh	b0cc1ee84b	Add ability to turn off Druid Catalog specific validation done on catalog defined tables in Druid (#16465 ) * * add property to enable / disable catalog validation and add tests * * add integration tests for catalog validation disabled * * add integration tests * * remove debugging logs * * fix forbidden api call	2024-05-23 13:19:51 -04:00
Zoltan Haindrich	12f79acc7e	Enable quidem shadowing for decoupled testcases (#16431 ) * Altered `QueryTestBuilder` to be able to switch to a backing quidem test * added a small crc to ensure that the shadow testcase does not deviate from the original one * Packaged all decoupled related things into a a single `DecoupledExtension` to reduce copy-paste * `DecoupledTestConfig#quidemReason` must describe why its being used * `DecoupledTestConfig#separateDefaultModeTest` can be used to make multiple case files based on `NullHandling` state * fixed a cosmetic bug during decoupled join translation * enhanced `!druidPlan` to report the final logical plan in non-decoupled mode as well * add check to ensure that only supported params are present in a druidtest uri * enabled shadow testcases for previously disabled testcases	2024-05-23 07:03:16 +02:00
Gian Merlino	599586bcfc	Add SQL DIV function. (#16464 ) * Add SQL DIV function. This function has been documented for some time, but lacked a binding, so it wasn't usable. * Add a case with two expression inputs.	2024-05-17 11:11:32 -07:00
Gian Merlino	0fb09445a5	Fix ExpressionPredicateIndexSupplier numeric replace-with-default behavior. (#16448 ) * Fix ExpressionPredicateIndexSupplier numeric replace-with-default behavior. In replace-with-default mode, null numeric values from the index should be interpreted as zeroes by expressions. This makes the index supplier more consistent with the behavior of the selectors created by the expression virtual column. * Fix test case.	2024-05-15 15:11:47 +05:30
Akshat Jain	ddfd62d9a9	Disable loading lookups by default in CompactionTask (#16420 ) This PR updates CompactionTask to not load any lookups by default, unless transformSpec is present. If transformSpec is present, we will make the decision based on context values, loading all lookups by default. This is done to ensure backward compatibility since transformSpec can reference lookups. If transform spec is not present and no context value is passed, we donot load any lookup. This behavior can be overridden by supplying lookupLoadingMode and lookupsToLoad in the task context.	2024-05-15 11:39:23 +05:30
Gian Merlino	72432c2e78	Speed up SQL IN using SCALAR_IN_ARRAY. (#16388 ) * Speed up SQL IN using SCALAR_IN_ARRAY. Main changes: 1) DruidSqlValidator now includes a rewrite of IN to SCALAR_IN_ARRAY, when the size of the IN is above inFunctionThreshold. The default value of inFunctionThreshold is 100. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 2) SearchOperatorConversion now generates SCALAR_IN_ARRAY when converting to a regular expression, when the size of the SEARCH is above inFunctionExprThreshold. The default value of inFunctionExprThreshold is 2. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 3) ReverseLookupRule generates SCALAR_IN_ARRAY if the set of reverse-looked-up values is greater than inFunctionThreshold. * Revert test. * Additional coverage. * Update docs/querying/sql-query-context.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * New test. --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-05-14 08:09:27 -07:00
Sree Charan Manamala	b8dd7478d0	Custom Calcite Rule to remove redundant references (#16402 ) Custom calcite rule mimicking AggregateProjectMergeRule to extend support to expressions. The current calcite rule return null in such cases. In addition, this removes the redundant references.	2024-05-14 06:38:05 +02:00
Laksh Singla	4bfc186153	Support sorting on complex columns in MSQ (#16322 ) MSQ sorts the columns in a highly specialized manner by byte comparisons. As such the values are serialized differently. This works well for the primitive types and primitive arrays, however complex types cannot be serialized specially. This PR adds the support for sorting the complex columns by deserializing the value from the field and comparing it via the type strategy. This is a lot slower than the byte comparisons, however, it's the only way to support sorting on complex columns that can have arbitrary serialization not optimized for MSQ. The primitives and the arrays are still compared via the byte comparison, therefore this doesn't affect the performance of the queries supported before the patch. If there's a sorting key with mixed complex and primitive/primitive array types, for example: longCol1 ASC, longCol2 ASC, complexCol1 DESC, complexCol2 DESC, stringCol1 DESC, longCol3 DESC, longCol4 ASC, the comparison will happen like: longCol1, longCol2 (ASC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in ascending order complexCol1 (DESC) - Compared via deserialization, cannot be clubbed with any other field complexCol2 (DESC) - Compared via deserialization, cannot be clubbed with any other field, even though the prior field was a complex column with the same order stringCol1, longCol3 (DESC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in descending order longCol4 (ASC) - Compared via byte-comparison, couldn't be coalesced with the previous fields as the direction was different This way, we only deserialize the field wherever required	2024-05-13 15:07:05 +05:30
Zoltan Haindrich	1811674753	Enable quidem tests to use different suppliers (#16382 ) * enable quidem uri support for `druidtest:///?ComponentSupplier=Nested` and similar * changes the way `SqlTestFrameworkConfig` is being applied; all options will have their own annotation (its kinda impossible to detect that an annotation has a set value or its the default) * enables hierarchical processing of config annotation (was needed to enable class level supplier annotation) * moves uri processing related string2config stuff into `SqlTestFrameworkConfig`	2024-05-09 09:21:02 +02:00
Akshat Jain	775d654a6c	Load only the required lookups for MSQ tasks (#16358 ) With this PR changes, MSQ tasks (MSQControllerTask and MSQWorkerTask) only load the required lookups during querying and ingestion, based on the value of CTX_LOOKUPS_TO_LOAD key in the query context.	2024-05-09 11:21:54 +05:30
Misha	b5958b6b07	Feature configurable calcite bloat (#16248 ) * Configurable bloat for calcite ProjectMergeRule implemented * Comment added * Default bloat value increased to 1000 * Implemented bloat configuration from QueryContext * Code refactored, docs updated --------- Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi>	2024-05-06 20:43:39 +05:30
Gian Merlino	588d442422	Add native filter conversion for SCALAR_IN_ARRAY. (#16312 ) * Add native filter conversion for SCALAR_IN_ARRAY. Main changes: 1) Add an implementation of "toDruidFilter" in ScalarInArrayOperatorConversion. 2) Split up Expressions.literalToDruidExpression into two functions, so the first half (literalToExprEval) can be used by ScalarInArrayOperatorConversion to more efficiently create the list of match values. * Fix type in time arithmetic conversion. * Test updates. * Update test cases to use null instead of '' in default-value mode. * Switch test from msqIncompatible to compatible with a different result. * Update one more test. * Fix test. * Update tests. * Use ExprEvalWrapper to differentiate between empty string and null. * Fix tests some more. * Fix test. * Additional comment. * Style adjustment. * Fix tests. * trueValue -> actualValue. * Use different approach, DruidLiteral instead of ExprEvalWrapper. * Revert changes in ArrayOfDoublesSketchSqlAggregatorTest.	2024-05-03 13:00:33 -07:00
zachjsh	fb7c84fb5d	Catalog clustering keys fixes (#16351 ) * * add another catalog clustering columns unit test * * dissallow clusterKeys with descending order * * make more clear that clustering is re-written into ingest node whether a catalog table or not * * when partitionedBy is stored in catalog, user shouldnt need to specify it in order to specify clustering * * fix intellij inspection failure	2024-05-03 14:02:56 -04:00
Zoltan Haindrich	2d0e86cbdc	Use quidem to run tests (#16249 ) * test scoped jdbc driver for druidtest:/// backed DruidAvaticaTestDriver ** DecoupledTestConfig is used inside the URI - this will make it possible to attach to existing things more easily * DruidQuidemTestBase can be used to create module level set of quidem tests * added quidem commands: !convertedPlan, !logicalPlan, !druidPlan, !nativePlan ** for these I've used some values of the Hook which was there in calcite * there are some shortcuts with proxies(they are only used during testing) - we can probably remove those later	2024-05-02 02:12:42 -04:00
Laksh Singla	e695e52d3f	Improve code flow in the First/Last vector aggregators and unify the numeric aggregators with the String implementations (#16230 ) This PR fixes the first and last vector aggregators and improves their readability. Following changes are introduced The folding is broken in the vectorized versions. We consider time before checking the folded object. If the numerical aggregator gets passed any other object type for some other reason (like String), then the aggregator considers it to be folded, even though it shouldn’t be. We should convert these objects to the desired type, and aggregate them properly. The aggregators must properly use generics. This would minimize the ClassCastException issues that can happen with mixed segment types. We are unifying the string first/last aggregators with numeric versions as well. The aggregators must aggregate null values (https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java#L55-L56 ). The aggregator should only ignore pairs with time == null, and not value == null Time nullity is ignored when trying to vectorize the data. String versions initialized with DateTimes.MIN that is equal to Long.MIN / 2. This can cause incorrect results in case the user enters a custom time column. NOTE: This is still present because it would require a larger refactor in all of the versions. There is a difference in what users might expect from the results because the code flow is changed (for example, the direction of the for loops, etc), however, this will only change the results, and not the contract set by first/last aggregators, which is that if multiple values have the same timestamp, then any of them can get picked. If the column is non-existent, the users might expect a change in the timestamp from DateTime.MAX to Long.MAX, because the code incorrectly used DateTime.MAX to initialize the aggregator, however, in case of a custom timestamp column, this might not be the case. The SQL query might be prohibited from using any Long since it requires a cast to the timestamp function that can fail, but AFAICT native queries don't have such limitations.	2024-04-30 15:13:14 +05:30
Laksh Singla	26d63e7b65	Prevent joining on nested arrays and complex types (#16349 ) #16068 modified DimensionHandlerUtils to accept complex types to be dimensions. This had an unintended side effect of allowing complex types to be joined upon (which wasn't guarded explicitly, it doesn't work). This PR modifies the IndexedTable to reject building the index on the complex types to prevent joining on complex types. The PR adds back the check in the same place, explicitly.	2024-04-30 11:36:53 +05:30
Akshat Jain	9d2cae40c3	Add support for selective loading of lookups in the task layer (#16328 ) Changes: - Add `LookupLoadingSpec` to support 3 modes of lookup loading: ALL, NONE, ONLY_REQUIRED - Add method `Task.getLookupLoadingSpec()` - Do not load any lookups for `KillUnusedSegmentsTask`	2024-04-29 07:19:59 +05:30
zachjsh	365cd7e8e7	INSERT/REPLACE can omit clustering when catalog has default (#16260 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * implement and add tests * * address review comments * * address test review comments * * fix checkstyle * * fix dependencies * * all tests passing * * cleanup * * remove unneeded code * * remove unused dependency * * fix checkstyle	2024-04-26 10:19:45 -04:00
Adarsh Sanjeev	9a2d7c28bc	Prepare master branch for 31.0.0 release (#16333 )	2024-04-26 09:22:43 +05:30
Gian Merlino	68d6e682e8	Fix TimeBoundary planning when filters require virtual columns. (#16337 ) The timeBoundary query does not support virtual columns, so we should avoid it if the query requires virtual columns.	2024-04-25 16:49:40 -07:00
Zoltan Haindrich	9c0bd56f5b	Make QueryComponentSupliers independent from test classes (#16275 )	2024-04-25 02:12:07 -04:00
Laksh Singla	6bca406d31	Grouping on complex columns aka unifying GroupBy strategies (#16068 ) Users can pass complex types as dimensions to the group by queries. For example: SELECT nested_col1, count(*) FROM foo GROUP BY nested_col1	2024-04-24 23:00:14 +05:30
Rishabh Singh	e30790e013	Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817 ) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.	2024-04-24 22:22:53 +05:30
Sree Charan Manamala	080476f9ea	WINDOWING - Fix 2 nodes with same digest causing mapping issue (#16301 ) Fixes the mapping issue in window fucntions where 2 nodes get the same reference.	2024-04-24 16:45:02 +05:30
Laksh Singla	b9bbde5c0a	Fix deadlock that can occur while merging group by results (#15420 ) This PR prevents such a deadlock from happening by acquiring the merge buffers in a single place and passing it down to the runner that might need it.	2024-04-22 14:10:44 +05:30
Sree Charan Manamala	ad5701e891	new SCALAR_IN_ARRAY function analogous to DRUID_IN (#16306 ) * scalar_in function * api doc * refactor	2024-04-18 21:15:15 -07:00
Sree Charan Manamala	960a674442	Corrected Strict NON NULL return type checks (#16279 )	2024-04-18 12:17:13 +02:00
Gian Merlino	ccc1ffb032	Additional short circuiting knowledge in filter bundles. (#16292 ) * Additional short circuiting knowledge in filter bundles. Three updates: 1) The parameter "selectionRowCount" on "makeFilterBundle" is renamed "applyRowCount", and redefined as an upper bound on rows remaining after short-circuiting (rather than number of rows selected so far). This definition works better for OR filters, which pass through the FALSE set rather than the TRUE set to the next subfilter. 2) AndFilter uses min(applyRowCount, indexIntersectionSize) rather than using selectionRowCount for the first subfilter and indexIntersectionSize for each filter thereafter. This improves accuracy when the incoming applyRowCount is smaller than the row count from the first few indexes. 3) OrFilter uses min(applyRowCount, totalRowCount - indexUnionSize) rather than applyRowCount for subfilters. This allows an OR filter to pass information about short-circuiting to its subfilters. To help write tests for this, the patch also moves the sampled wikiticker data file from sql to processing. * Forbidden APIs. * Forbidden APIs. * Better comments. * Fix inspection. * Adjustments to tests.	2024-04-16 22:42:28 -07:00
zachjsh	a5428e75ff	INSERT/REPLACE complex target column types are validated against source input expressions (#16223 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * address review comments * * address test review comments * * fix checkstyle	2024-04-16 17:20:35 -04:00
Sree Charan Manamala	5247059d2f	Allow Double & null values in sql type array through dynamic params (#16274 )	2024-04-15 10:44:42 +02:00
Adarsh Sanjeev	3df00aef9d	Add manifest file for MSQ export (#15953 ) Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines	2024-04-15 11:37:31 +05:30
Sree Charan Manamala	3340b200db	Fix window function drill tests failures falling under RESULT_MISMATCH & RESULT_COUNT_MISMATCH (#16264 ) * Updated the drill test expected results which are failing due to druid's default sorting algorithm taking nulls first approach. * Corrected the queries where date time values are directly provided * marked 2 cases failing with resultset casting issues	2024-04-12 13:54:48 +02:00
Sree Charan Manamala	f65c166327	Windowed aggregates should update the aggregation value based on final compute (#16244 )	2024-04-12 08:28:33 +02:00
Gian Merlino	9f358f5f4a	SQL tests: avoid mixing skip and cannot vectorize. (#16251 ) * SQL tests: avoid mixing skip and cannot vectorize. skipVectorize switches off vectorization tests completely, and cannotVectorize turns vectorization tests into negative tests. It doesn't make sense to use them together, so this patch makes it an error to do so, and cleans up cases where both are mentioned. This patch also has the effect of changing various tests from skipVectorize to cannotVectorize, because in the past when both were mentioned, skipVectorize would take priority. * Fix bug with StringAnyAggregatorFactory attempting to vectorize when it cannt. * Fix tests.	2024-04-11 15:06:11 -07:00
Soumyava	7759f25095	Moving bitwise_or to use native calcite operator (#16237 )	2024-04-04 12:49:29 -07:00
Soumyava	972937659d	Fixing return type for IPV4 (#15916 ) * Fixing return type for IPV4 * Update ipv4match	2024-04-04 08:49:50 -07:00
Soumyava	4bea865697	Restore context flag for window functions (#16229 )	2024-04-03 13:57:13 +05:30
zachjsh	9b52c909e0	fix complex types returning UNKNOWN as their SQL type inference (#16216 ) * * fix * * fix * * address review comments	2024-04-02 14:36:01 -04:00
Aleksey Plekhanov	a818b8acb6	Fix CalciteQueryTest#testCountStarWithTimeFilterUsingStringLiterals (#16221 ) * Add cases to check handling equals between timestamp and string literal	2024-04-01 13:46:24 -07:00
Soumyava	524842a3bb	Window function on msq (#15470 ) This PR aims to introduce Window functions on MSQ by doing the following: Introduce a Window querykit for handling window queries along with its factory and a processor for window queries If a window operator is present with a partition by clause, pushes the partition as a shuffle spec of the previous stage In presence of empty OVER() clause lets all operators loose on a single rac In presence of no empty OVER() clause, breaks down each window into individual stages Associated machinery to handle window functions in MSQ Introduced a separate hidden engine feature WINDOW_LEAF_OPERATOR which is set only for MSQ engine. In presence of this feature, the planner plans without the leaf operators by creating a window query over an inner scan query. In case of native this is set to false and the planner generates the leafOperators Guardrails around materialization Comprehensive UTs	2024-03-28 14:58:34 +05:30
Sree Charan Manamala	f29c8ac368	Allow non literal rhs in MV_FILTER_ONLY and MV_FILTER_NONE (#16113 ) This commit allows to use the MV_FILTER_ONLY & MV_FILTER_NONE functions with a non literal argument. Currently `select mv_filter_only('mvd_dim', 'array_dim') from 'table'` returns a `Unhandled Query Planning Failure` This is being tackled and also considered for the cases where the `array_dim` having null & empty values. Changed classes: * `MultiValueStringOperatorConversions` * `ApplyFunction` * `CalciteMultiValueStringQueryTest`	2024-03-26 12:31:09 +05:30
Zoltan Haindrich	a16092b16a	Rewrite exotic LAST_VALUE/FIRST_VALUE to self-reference. (#16063 ) * Rewrite exotic LAST_VALUE/FIRST_VALUE to self-reference. * rewrite `LAST_VALUE(x) OVER (ORDER BY y)` to `LAG(x,0) OVER (ORDER BY y)` * not directly to `x` because some queries get unplannable that way * restrict `NTILE` from framing - as its not supported * add test to ensure that all of the `KNOWN_WINDOW_FNS`'s framing is accounted for * checkstyle/etc * add test * apidoc * add assume to avoid MSQ fail	2024-03-25 11:03:47 -07:00
zachjsh	8370db106c	INSERT/REPLACE dimension target column types are validated against source input expressions (#15962 ) * * address remaining comments from https://github.com/apache/druid/pull/15836 * * address remaining comments from https://github.com/apache/druid/pull/15908 * * add test that exposes relational algebra issue * * simplify test exposing issue * * fix * * add tests for sealed / non-sealed * * update test descriptions * * fix test failure when -Ddruid.generic.useDefaultValueForNull=true * * check type assignment based on natice Druid types * * add tests that cover missing jacoco coverage * * add replace tests * * add more tests and comments about column ordering * * simplify tests * * review comments * * remove commented line * * STRING family types should be validated as non-null	2024-03-25 12:34:07 -04:00
Clint Wylie	b0a9c318d6	add new typed in filter (#16039 ) changes: * adds TypedInFilter which preserves matching sets in the native match value type * SQL planner uses new TypedInFilter when druid.generic.useDefaultValueForNull=false (the default)	2024-03-22 12:45:08 -07:00
Clint Wylie	48b8d42698	fix regexp_like, contains_string, icontains_string to return null instead of false for null inputs in sql compatible mode (#15963 )	2024-03-19 22:12:47 -07:00
Gian Merlino	c96b215dd6	SortMerge join support for IS NOT DISTINCT FROM. (#16003 ) * SortMerge join support for IS NOT DISTINCT FROM. The patch adds a "requiredNonNullKeyParts" field to the sortMerge processor, which has the list of key parts that must be nonnull for an equijoin condition to match. Conditions with SQL "=" are present in the list; conditions with SQL "IS NOT DISTINCT FROM" are absent from the list. * Fix test. * Update javadoc.	2024-03-19 12:02:13 -07:00

1 2 3 4 5 ...

1094 Commits