druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	588d442422	Add native filter conversion for SCALAR_IN_ARRAY. (#16312 ) * Add native filter conversion for SCALAR_IN_ARRAY. Main changes: 1) Add an implementation of "toDruidFilter" in ScalarInArrayOperatorConversion. 2) Split up Expressions.literalToDruidExpression into two functions, so the first half (literalToExprEval) can be used by ScalarInArrayOperatorConversion to more efficiently create the list of match values. * Fix type in time arithmetic conversion. * Test updates. * Update test cases to use null instead of '' in default-value mode. * Switch test from msqIncompatible to compatible with a different result. * Update one more test. * Fix test. * Update tests. * Use ExprEvalWrapper to differentiate between empty string and null. * Fix tests some more. * Fix test. * Additional comment. * Style adjustment. * Fix tests. * trueValue -> actualValue. * Use different approach, DruidLiteral instead of ExprEvalWrapper. * Revert changes in ArrayOfDoublesSketchSqlAggregatorTest.	2024-05-03 13:00:33 -07:00
zachjsh	fb7c84fb5d	Catalog clustering keys fixes (#16351 ) * * add another catalog clustering columns unit test * * dissallow clusterKeys with descending order * * make more clear that clustering is re-written into ingest node whether a catalog table or not * * when partitionedBy is stored in catalog, user shouldnt need to specify it in order to specify clustering * * fix intellij inspection failure	2024-05-03 14:02:56 -04:00
Zoltan Haindrich	2d0e86cbdc	Use quidem to run tests (#16249 ) * test scoped jdbc driver for druidtest:/// backed DruidAvaticaTestDriver ** DecoupledTestConfig is used inside the URI - this will make it possible to attach to existing things more easily * DruidQuidemTestBase can be used to create module level set of quidem tests * added quidem commands: !convertedPlan, !logicalPlan, !druidPlan, !nativePlan ** for these I've used some values of the Hook which was there in calcite * there are some shortcuts with proxies(they are only used during testing) - we can probably remove those later	2024-05-02 02:12:42 -04:00
Laksh Singla	e695e52d3f	Improve code flow in the First/Last vector aggregators and unify the numeric aggregators with the String implementations (#16230 ) This PR fixes the first and last vector aggregators and improves their readability. Following changes are introduced The folding is broken in the vectorized versions. We consider time before checking the folded object. If the numerical aggregator gets passed any other object type for some other reason (like String), then the aggregator considers it to be folded, even though it shouldn’t be. We should convert these objects to the desired type, and aggregate them properly. The aggregators must properly use generics. This would minimize the ClassCastException issues that can happen with mixed segment types. We are unifying the string first/last aggregators with numeric versions as well. The aggregators must aggregate null values (https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java#L55-L56 ). The aggregator should only ignore pairs with time == null, and not value == null Time nullity is ignored when trying to vectorize the data. String versions initialized with DateTimes.MIN that is equal to Long.MIN / 2. This can cause incorrect results in case the user enters a custom time column. NOTE: This is still present because it would require a larger refactor in all of the versions. There is a difference in what users might expect from the results because the code flow is changed (for example, the direction of the for loops, etc), however, this will only change the results, and not the contract set by first/last aggregators, which is that if multiple values have the same timestamp, then any of them can get picked. If the column is non-existent, the users might expect a change in the timestamp from DateTime.MAX to Long.MAX, because the code incorrectly used DateTime.MAX to initialize the aggregator, however, in case of a custom timestamp column, this might not be the case. The SQL query might be prohibited from using any Long since it requires a cast to the timestamp function that can fail, but AFAICT native queries don't have such limitations.	2024-04-30 15:13:14 +05:30
Laksh Singla	26d63e7b65	Prevent joining on nested arrays and complex types (#16349 ) #16068 modified DimensionHandlerUtils to accept complex types to be dimensions. This had an unintended side effect of allowing complex types to be joined upon (which wasn't guarded explicitly, it doesn't work). This PR modifies the IndexedTable to reject building the index on the complex types to prevent joining on complex types. The PR adds back the check in the same place, explicitly.	2024-04-30 11:36:53 +05:30
Akshat Jain	9d2cae40c3	Add support for selective loading of lookups in the task layer (#16328 ) Changes: - Add `LookupLoadingSpec` to support 3 modes of lookup loading: ALL, NONE, ONLY_REQUIRED - Add method `Task.getLookupLoadingSpec()` - Do not load any lookups for `KillUnusedSegmentsTask`	2024-04-29 07:19:59 +05:30
zachjsh	365cd7e8e7	INSERT/REPLACE can omit clustering when catalog has default (#16260 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * implement and add tests * * address review comments * * address test review comments * * fix checkstyle * * fix dependencies * * all tests passing * * cleanup * * remove unneeded code * * remove unused dependency * * fix checkstyle	2024-04-26 10:19:45 -04:00
Adarsh Sanjeev	9a2d7c28bc	Prepare master branch for 31.0.0 release (#16333 )	2024-04-26 09:22:43 +05:30
Gian Merlino	68d6e682e8	Fix TimeBoundary planning when filters require virtual columns. (#16337 ) The timeBoundary query does not support virtual columns, so we should avoid it if the query requires virtual columns.	2024-04-25 16:49:40 -07:00
Zoltan Haindrich	9c0bd56f5b	Make QueryComponentSupliers independent from test classes (#16275 )	2024-04-25 02:12:07 -04:00
Laksh Singla	6bca406d31	Grouping on complex columns aka unifying GroupBy strategies (#16068 ) Users can pass complex types as dimensions to the group by queries. For example: SELECT nested_col1, count(*) FROM foo GROUP BY nested_col1	2024-04-24 23:00:14 +05:30
Rishabh Singh	e30790e013	Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817 ) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.	2024-04-24 22:22:53 +05:30
Sree Charan Manamala	080476f9ea	WINDOWING - Fix 2 nodes with same digest causing mapping issue (#16301 ) Fixes the mapping issue in window fucntions where 2 nodes get the same reference.	2024-04-24 16:45:02 +05:30
Laksh Singla	b9bbde5c0a	Fix deadlock that can occur while merging group by results (#15420 ) This PR prevents such a deadlock from happening by acquiring the merge buffers in a single place and passing it down to the runner that might need it.	2024-04-22 14:10:44 +05:30
Sree Charan Manamala	ad5701e891	new SCALAR_IN_ARRAY function analogous to DRUID_IN (#16306 ) * scalar_in function * api doc * refactor	2024-04-18 21:15:15 -07:00
Sree Charan Manamala	960a674442	Corrected Strict NON NULL return type checks (#16279 )	2024-04-18 12:17:13 +02:00
Gian Merlino	ccc1ffb032	Additional short circuiting knowledge in filter bundles. (#16292 ) * Additional short circuiting knowledge in filter bundles. Three updates: 1) The parameter "selectionRowCount" on "makeFilterBundle" is renamed "applyRowCount", and redefined as an upper bound on rows remaining after short-circuiting (rather than number of rows selected so far). This definition works better for OR filters, which pass through the FALSE set rather than the TRUE set to the next subfilter. 2) AndFilter uses min(applyRowCount, indexIntersectionSize) rather than using selectionRowCount for the first subfilter and indexIntersectionSize for each filter thereafter. This improves accuracy when the incoming applyRowCount is smaller than the row count from the first few indexes. 3) OrFilter uses min(applyRowCount, totalRowCount - indexUnionSize) rather than applyRowCount for subfilters. This allows an OR filter to pass information about short-circuiting to its subfilters. To help write tests for this, the patch also moves the sampled wikiticker data file from sql to processing. * Forbidden APIs. * Forbidden APIs. * Better comments. * Fix inspection. * Adjustments to tests.	2024-04-16 22:42:28 -07:00
zachjsh	a5428e75ff	INSERT/REPLACE complex target column types are validated against source input expressions (#16223 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * address review comments * * address test review comments * * fix checkstyle	2024-04-16 17:20:35 -04:00
Sree Charan Manamala	5247059d2f	Allow Double & null values in sql type array through dynamic params (#16274 )	2024-04-15 10:44:42 +02:00
Adarsh Sanjeev	3df00aef9d	Add manifest file for MSQ export (#15953 ) Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines	2024-04-15 11:37:31 +05:30
Sree Charan Manamala	3340b200db	Fix window function drill tests failures falling under RESULT_MISMATCH & RESULT_COUNT_MISMATCH (#16264 ) * Updated the drill test expected results which are failing due to druid's default sorting algorithm taking nulls first approach. * Corrected the queries where date time values are directly provided * marked 2 cases failing with resultset casting issues	2024-04-12 13:54:48 +02:00
Sree Charan Manamala	f65c166327	Windowed aggregates should update the aggregation value based on final compute (#16244 )	2024-04-12 08:28:33 +02:00
Gian Merlino	9f358f5f4a	SQL tests: avoid mixing skip and cannot vectorize. (#16251 ) * SQL tests: avoid mixing skip and cannot vectorize. skipVectorize switches off vectorization tests completely, and cannotVectorize turns vectorization tests into negative tests. It doesn't make sense to use them together, so this patch makes it an error to do so, and cleans up cases where both are mentioned. This patch also has the effect of changing various tests from skipVectorize to cannotVectorize, because in the past when both were mentioned, skipVectorize would take priority. * Fix bug with StringAnyAggregatorFactory attempting to vectorize when it cannt. * Fix tests.	2024-04-11 15:06:11 -07:00
Soumyava	7759f25095	Moving bitwise_or to use native calcite operator (#16237 )	2024-04-04 12:49:29 -07:00
Soumyava	972937659d	Fixing return type for IPV4 (#15916 ) * Fixing return type for IPV4 * Update ipv4match	2024-04-04 08:49:50 -07:00
Soumyava	4bea865697	Restore context flag for window functions (#16229 )	2024-04-03 13:57:13 +05:30
zachjsh	9b52c909e0	fix complex types returning UNKNOWN as their SQL type inference (#16216 ) * * fix * * fix * * address review comments	2024-04-02 14:36:01 -04:00
Aleksey Plekhanov	a818b8acb6	Fix CalciteQueryTest#testCountStarWithTimeFilterUsingStringLiterals (#16221 ) * Add cases to check handling equals between timestamp and string literal	2024-04-01 13:46:24 -07:00
Soumyava	524842a3bb	Window function on msq (#15470 ) This PR aims to introduce Window functions on MSQ by doing the following: Introduce a Window querykit for handling window queries along with its factory and a processor for window queries If a window operator is present with a partition by clause, pushes the partition as a shuffle spec of the previous stage In presence of empty OVER() clause lets all operators loose on a single rac In presence of no empty OVER() clause, breaks down each window into individual stages Associated machinery to handle window functions in MSQ Introduced a separate hidden engine feature WINDOW_LEAF_OPERATOR which is set only for MSQ engine. In presence of this feature, the planner plans without the leaf operators by creating a window query over an inner scan query. In case of native this is set to false and the planner generates the leafOperators Guardrails around materialization Comprehensive UTs	2024-03-28 14:58:34 +05:30
Sree Charan Manamala	f29c8ac368	Allow non literal rhs in MV_FILTER_ONLY and MV_FILTER_NONE (#16113 ) This commit allows to use the MV_FILTER_ONLY & MV_FILTER_NONE functions with a non literal argument. Currently `select mv_filter_only('mvd_dim', 'array_dim') from 'table'` returns a `Unhandled Query Planning Failure` This is being tackled and also considered for the cases where the `array_dim` having null & empty values. Changed classes: * `MultiValueStringOperatorConversions` * `ApplyFunction` * `CalciteMultiValueStringQueryTest`	2024-03-26 12:31:09 +05:30
Zoltan Haindrich	a16092b16a	Rewrite exotic LAST_VALUE/FIRST_VALUE to self-reference. (#16063 ) * Rewrite exotic LAST_VALUE/FIRST_VALUE to self-reference. * rewrite `LAST_VALUE(x) OVER (ORDER BY y)` to `LAG(x,0) OVER (ORDER BY y)` * not directly to `x` because some queries get unplannable that way * restrict `NTILE` from framing - as its not supported * add test to ensure that all of the `KNOWN_WINDOW_FNS`'s framing is accounted for * checkstyle/etc * add test * apidoc * add assume to avoid MSQ fail	2024-03-25 11:03:47 -07:00
zachjsh	8370db106c	INSERT/REPLACE dimension target column types are validated against source input expressions (#15962 ) * * address remaining comments from https://github.com/apache/druid/pull/15836 * * address remaining comments from https://github.com/apache/druid/pull/15908 * * add test that exposes relational algebra issue * * simplify test exposing issue * * fix * * add tests for sealed / non-sealed * * update test descriptions * * fix test failure when -Ddruid.generic.useDefaultValueForNull=true * * check type assignment based on natice Druid types * * add tests that cover missing jacoco coverage * * add replace tests * * add more tests and comments about column ordering * * simplify tests * * review comments * * remove commented line * * STRING family types should be validated as non-null	2024-03-25 12:34:07 -04:00
Clint Wylie	b0a9c318d6	add new typed in filter (#16039 ) changes: * adds TypedInFilter which preserves matching sets in the native match value type * SQL planner uses new TypedInFilter when druid.generic.useDefaultValueForNull=false (the default)	2024-03-22 12:45:08 -07:00
Clint Wylie	48b8d42698	fix regexp_like, contains_string, icontains_string to return null instead of false for null inputs in sql compatible mode (#15963 )	2024-03-19 22:12:47 -07:00
Gian Merlino	c96b215dd6	SortMerge join support for IS NOT DISTINCT FROM. (#16003 ) * SortMerge join support for IS NOT DISTINCT FROM. The patch adds a "requiredNonNullKeyParts" field to the sortMerge processor, which has the list of key parts that must be nonnull for an equijoin condition to match. Conditions with SQL "=" are present in the list; conditions with SQL "IS NOT DISTINCT FROM" are absent from the list. * Fix test. * Update javadoc.	2024-03-19 12:02:13 -07:00
Zoltan Haindrich	0a42342cef	Update CalciteTest to use junit5 (#16106 ) Update CalciteTest to use junit5 change the way temp dirs are handled * add openrewrite workflow to safeguard upgrade * replace junitparamrunner with standard junit5 parametered tests * update a few rules to junit5 api * lots of boring changes * cleanup QueryLogHook * cleanup * fix compile error: ARRAYS_DATASOURCE * fix test * remove enclosed * empty +TEST:TDigestSketchSqlAggregatorTest,HllSketchSqlAggregatorTest,DoublesSketchSqlAggregatorTest,ThetaSketchSqlAggregatorTest,ArrayOfDoublesSketchSqlAggregatorTest,BloomFilterSqlAggregatorTest,BloomDimFilterSqlTest,CatalogIngestionTest,CatalogQueryTest,FixedBucketsHistogramQuantileSqlAggregatorTest,QuantileSqlAggregatorTest,MSQArraysTest,MSQDataSketchesTest,MSQExportTest,MSQFaultsTest,MSQInsertTest,MSQLoadedSegmentTests,MSQParseExceptionsTest,MSQReplaceTest,MSQSelectTest,InsertLockPreemptedFaultTest,MSQWarningsTest,SqlMSQStatementResourcePostTest,SqlStatementResourceTest,CalciteSelectJoinQueryMSQTest,CalciteSelectQueryMSQTest,CalciteUnionQueryMSQTest,MSQTestBase,VarianceSqlAggregatorTest,SleepSqlTest,SqlRowTransformerTest,DruidAvaticaHandlerTest,DruidStatementTest,BaseCalciteQueryTest,CalciteArraysQueryTest,CalciteCorrelatedQueryTest,CalciteExplainQueryTest,CalciteExportTest,CalciteIngestionDmlTest,CalciteInsertDmlTest,CalciteJoinQueryTest,CalciteLookupFunctionQueryTest,CalciteMultiValueStringQueryTest,CalciteNestedDataQueryTest,CalciteParameterQueryTest,CalciteQueryTest,CalciteReplaceDmlTest,CalciteScanSignatureTest,CalciteSelectQueryTest,CalciteSimpleQueryTest,CalciteSubqueryTest,CalciteSysQueryTest,CalciteTableAppendTest,CalciteTimeBoundaryQueryTest,CalciteUnionQueryTest,CalciteWindowQueryTest,DecoupledPlanningCalciteJoinQueryTest,DecoupledPlanningCalciteQueryTest,DecoupledPlanningCalciteUnionQueryTest,DrillWindowQueryTest,DruidPlannerResourceAnalyzeTest,IngestTableFunctionTest,QueryTestRunner,SqlTestFrameworkConfig,SqlAggregationModuleTest,ExpressionsTest,GreatestExpressionTest,IPv4AddressMatchExpressionTest,IPv4AddressParseExpressionTest,IPv4AddressStringifyExpressionTest,LeastExpressionTest,TimeFormatOperatorConversionTest,CombineAndSimplifyBoundsTest,FiltrationTest,SqlQueryTest,CalcitePlannerModuleTest,CalcitesTest,DruidCalciteSchemaModuleTest,DruidSchemaNoDataInitTest,InformationSchemaTest,NamedDruidSchemaTest,NamedLookupSchemaTest,NamedSystemSchemaTest,RootSchemaProviderTest,SystemSchemaTest,CalciteTestBase,SqlResourceTest * use @Nested * add rule to remove enclosed; upgrade surefire * remove enclosed * cleanup * add comment about surefire exclude	2024-03-19 04:05:12 -07:00
Clint Wylie	5afd5c41a5	fix ColumnType to RelDataType conversion for nested arrays (#16138 ) * fix ColumnType to RelDataType conversion for nested arrays * fix test	2024-03-18 23:34:08 -07:00
Zoltan Haindrich	d3e22c6e92	fix compile error: ARRAYS_DATASOURCE (#16120 )	2024-03-14 18:15:43 +05:30
Clint Wylie	dd9bc3749a	fix issues with array_contains and array_overlap with null left side arguments (#15974 ) changes: * fix issues with array_contains and array_overlap with null left side arguments * modify singleThreaded stuff to allow optimizing Function similar to how we do for ExprMacro - removed SingleThreadSpecializable in favor of default impl of asSingleThreaded on Expr with clear javadocs that most callers shouldn't be calling it directly and should be using Expr.singleThreaded static method which uses a shuttle and delegates to asSingleThreaded instead * add optimized 'singleThreaded' versions of array_contains and array_overlap * add mv_harmonize_nulls native expression to use with MV_CONTAINS and MV_OVERLAP to allow them to behave consistently with filter rewrites, coercing null and [] into [null] * fix bug with casting rhs argument for native array_contains and array_overlap expressions	2024-03-13 18:16:10 -07:00
Sree Charan Manamala	e9d2caccb6	Handling null operand in JSON_QUERY_ARRAY (#16118 ) * fix return type inference for JSON_QUERY_ARRAY to be nullable	2024-03-13 18:06:27 -07:00
Gian Merlino	256160aba6	MSQ: Validate that strings and string arrays are not mixed. (#15920 ) * MSQ: Validate that strings and string arrays are not mixed. When multi-value strings and string arrays coexist in the same column, it causes problems with "classic MVD" style queries such as: select * from wikipedia -- fails at runtime select count() from wikipedia where flags = 'B' -- fails at planning time select flags, count() from wikipedia group by 1 -- fails at runtime To avoid these problems, this patch adds type verification for INSERT and REPLACE. It is targeted: the only type changes that are blocked are string-to-array and array-to-string. There is also a way to exclude certain columns from the type checks, if the user really knows what they're doing. * Fixes. * Tests and docs and error messages. * More docs. * Adjustments. * Adjust message. * Fix tests. * Fix test in DV mode.	2024-03-13 15:37:27 -07:00
Gian Merlino	910124d4de	MSQ: Plan without implicit sorting. (#16073 ) * MSQ: Plan without implicit sorting. This patch adds an EngineFeature "GROUPBY_IMPLICITLY_SORTS" and sets it true for native, false for MSQ. It's useful for two reasons: 1) In the future we'll likely want MSQ to hash-partition for GROUP BY instead of using a global sort, which would mean MSQ would not implicitly ORDER BY when there is a GROUP BY. 2) When doing REPLACE with MSQ, CLUSTERED BY is transformed to ORDER BY. We should retain that ORDER BY, as it may be a subset of the GROUP BY, and it is important to remember which fields the user wanted to include in range shard specs. * Fix tests. * Fix tests for real. * Fix test.	2024-03-13 08:27:39 -07:00
Clint Wylie	795e342ba8	fix sql results mixed array and scalar values (#16105 ) * fix sql results mixed array and scalar values * simplify	2024-03-12 23:47:35 -07:00
Zoltan Haindrich	8252d72e2a	Pull up literals in InputAccessor (#16033 ) * Pull up literals in InputAccessor * pull up literals in `InputAccessor` * remove the need to pass `constants` of `Window` operator Fixes #15353 * update test * enable relax_nulls	2024-03-12 09:14:31 -07:00
Sree Charan Manamala	ef9637eef1	Handling array with boolean literals (#16093 ) Handling array with boolean literals like ARRAY[true, false] Druid appears to be able to convert an array with boolean expressions like this array[added=deleted, added=delta] into a numeric array of 0 and 1: select array[added=deleted, added=delta] from wikipedia However, select array[true, false] from wikipedia doesn't work. This PR fixes this.	2024-03-12 12:28:16 +05:30
Soumyava	85ee775390	Handling latest_by and earliest_by on numeric columns correctly (#15939 ) * Handling latest_by and earliest_by on numeric columns correctly * Adding test	2024-03-11 13:49:21 -07:00
Zoltan Haindrich	2eb7d7a89b	Calcite tests remove expected exception (#16046 ) * Calcite tests remove expected exception * update testcases using `expectedException` to utilize `assertThrows` instead * remove `BaseCalciteQueryTest#expectedException` * fixes `cannotVectorize` so it doesn't anymore stops further processing * `msqIncompatible` is not anymore toggles a boolean - its an `Assume` instead Fixes #15423 * cleanup * move msqIncompat * update test * cleanup * remove comment * empty-commit * empty-commit	2024-03-11 13:23:57 +05:30
Zoltan Haindrich	aaa64832fd	Disable DecoupledPlanningCalciteJoinQueryTest until it gets fixed (#16070 ) Recently this test started other tests from executing by triggering a bug somewhere in surefire. This patch disables the testcases in case of non-sql compat mode.	2024-03-07 12:55:48 -08:00
Laksh Singla	5f588fa45c	Fix bug while materializing scan's result to frames (#15987 ) While converting Sequence<ScanResultValue> to Sequence<Frames>, when maxSubqueryBytes is enabled, we batch the results to prevent creating a single frame per ScanResultValue. Batching requires peeking into the actual value, and checking if the row signature of the scan result’s value matches that of the previous value. Since we can do this indefinitely (in the worst case all of them have the same signature), we keep fetching them and accumulating them in a list (on the heap). We don’t really know how much to batch before we actually write the value as frames. The PR modifies the batching logic to not accumulate the results in an intermediary list	2024-03-07 17:11:44 +05:30
Vishesh Garg	cf9bc507f6	Fix compilation failure due to missing constant MISSING_JOIN_CONVERSION (#16050 ) * Reintroduce variable MISSING_JOIN_CONVERSION * Remove redundant constant MISSING_JOIN_CONVERSION2 * Correct fix to address failing tests	2024-03-06 15:34:39 +08:00
Zoltan Haindrich	65c3b4d31a	Support join in decoupled mode (#15957 ) * plan join(s) in decoupled mode * configure DecoupledPlanningCalciteJoinQueryTest the test has 593 cases; however there are quite a few parameterized from the 107 methods annotated with @Test - 42 is not yet working * replace the isRoot hack in DruidQueryGenerator with a logic that instead looks ahead for the next node; and doesn't let the previous node do the Project - this makes it plan more likely than the existing planner	2024-03-05 19:10:13 -06:00
Zoltan Haindrich	bb882727c0	Fix Windowing/scanAndSort query issues on top of Joins. (#15996 ) allow a hashjoin result to be converted to RowsAndColumns added StorageAdapterRowsAndColumns fix incorrect isConcrete() return values during early phase of planning	2024-03-05 15:05:31 +05:30
Zoltan Haindrich	e469b7ed34	Make setting QUERY_CONTEXT_DEFAULT explicit in tests (#16010 )	2024-03-05 10:54:16 +05:30
Adarsh Sanjeev	93eeb05eaf	Revert explain attributes change to old behaviour. (#16004 ) * Revert explain attributes change * Fix tests * Fix tests * Rename function	2024-03-04 15:56:02 +05:30
Zoltan Haindrich	bf0995f846	Introduce dynamic table append (#15897 )	2024-03-01 04:31:57 -05:00
Laksh Singla	17e4f3ac60	Refactor GroupBy and TopN code to relax the constraint of dimensions being comparable (#15559 ) The code in the groupBy engine and the topN engine assume that the dimensions are comparable and can call dimA.compareTo(dimB) to sort the dimensions and group them together. This works well for the primitive dimensions, because they are Comparable, however falls apart when the dimensions can be arrays (or in future scenarios complex columns). In cases when the dimensions are not comparable, Druid resorts to having a wrapper type ComparableStringArray and ComparableList, which is a Comparable, based on the list comparator.	2024-02-27 11:39:29 +05:30
Soumyava	51cc729fd1	Enforcing type checking for flatten concat (#15903 )	2024-02-26 21:53:49 -08:00
Abhishek Radhakrishnan	67a6224d91	Fix up incorrect `PARTITIONED BY` error messages (#15961 ) * Fix up typos, inaccuracies and clean up code related to PARTITIONED BY. * Remove wrapper function and update tests to use DruidExceptionMatcher. * Checkstyle and Intellij inspection fixes.	2024-02-26 14:17:53 -05:00
Zoltan Haindrich	06deda9415	ScanAndSort query fails with NPE for simple queries (#15914 ) * some stuff * add dummy fields * draft-fix * rename test * cleanup * add null * cleanup * cleanup * add test * updates * move check tp constructore * cleanup * updates/etc * fix some more * add rowSignatureMode * checkstyle/etc * override * missing msqIncompat * fix test * fixes * undo * updates * remove param	2024-02-24 15:33:50 -08:00
zachjsh	8ebf237576	Move INSERT & REPLACE validation to the Calcite validator (#15908 ) This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner https://github.com/apache/druid/pull/13686 from @paul-rogers, Refactoring the IngestHandler and subclasses to produce a validated SqlInsert instance node instead of the previous Insert source node. The SqlInsert node is then validated in the calcite validator. The validation that is implemented as part of this pr, is only that for the source node, and some of the validation that was previously done in the ingest handlers. As part of this change, the partitionedBy clause can be supplied by the table catalog metadata if it exists, and can be omitted from the ingest time query in this case.	2024-02-22 14:01:59 -05:00
Zoltan Haindrich	bcce0806d7	Support Union in decoupled mode (#15870 )	2024-02-21 10:54:50 -05:00
Gian Merlino	9c41827dba	Globally disable AUTO_CLOSE_JSON_CONTENT. (#15880 ) * Globally disable AUTO_CLOSE_JSON_CONTENT. This JsonGenerator feature is on by default. It causes problems with code like this: try (JsonGenerator jg = ...) { jg.writeStartArray(); for (x : xs) { jg.writeObject(x); } jg.writeEndArray(); } If a jg.writeObject call fails due to some problem with the data it's reading, the JsonGenerator will write the end array marker automatically when closed as part of the try-with-resources. If the generator is writing to a stream where the reader does not have some other mechanism to realize that an exception was thrown, this leads the reader to believe that the array is complete when it actually isn't. Prior to this patch, we disabled AUTO_CLOSE_JSON_CONTENT for JSON-wrapped SQL result formats in #11685, which fixed an issue where such results could be erroneously interpreted as complete. This patch fixes a similar issue with task reports, and all similar issues that may exist elsewhere, by disabling the feature globally. * Update test.	2024-02-16 08:52:48 -08:00
Clint Wylie	fe2ba8cc28	fix return type inference of parse_long, which can also be null if string is not parseable into a long (#15909 ) * fix return type inference of parse_long, which can also be null if string is not parseable into a long * fix msq test	2024-02-15 08:45:34 -08:00
zachjsh	f9ee2c353b	Extend the PARTITION BY clause to accept string literals for the time partitioning (#15836 ) This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner https://github.com/apache/druid/pull/13686 from @paul-rogers, extending the PARTITION BY clause to accept string literals for the time partitioning	2024-02-09 11:45:38 -05:00
Sree Charan Manamala	57e12df352	Sql Single Value Aggregator for scalar queries (#15700 ) Executing single value correlated queries will throw an exception today since single_value function is not available in druid. With these added classes, this provides druid, the capability to plan and run such queries.	2024-02-08 19:20:30 +05:30
Soumyava	f3996b96ff	Fixes for safe_divide with vectorize and datatypes (#15839 ) * Fix for save_divide with vectorize * More fixes * Update to use expr.eval(null) for both cases when denominator is 0	2024-02-08 14:40:42 +05:30
Adarsh Sanjeev	514b3b4d01	Add export capabilities to MSQ with SQL syntax (#15689 ) * Add test * Parser changes to support export statements * Fix builds * Address comments * Add frame processor * Address review comments * Fix builds * Update syntax * Webconsole workaround * Refactor * Refactor * Change export file path * Update docs * Remove webconsole changes * Fix spelling mistake * Parser changes, add tests * Parser changes, resolve build warnings * Fix failing test * Fix failing test * Fix IT tests * Add tests * Cleanup * Fix unparse * Fix forbidden API * Update docs * Update docs * Address review comments * Address review comments * Fix tests * Address review comments * Fix insert unparse * Add external write resource action * Fix tests * Add resource check to overlord resource * Fix tests * Add IT * Update syntax * Update tests * Update permission * Address review comments * Address review comments * Address review comments * Add tests * Add check for runtime parameter for bucket and path * Add check for runtime parameter for bucket and path * Add tests * Update docs * Fix NPE * Update docs, remove deadcode * Fix formatting	2024-02-07 22:08:50 +05:30
Clint Wylie	23d4fade90	use NullFilter for SQL rewrite of MV_CONTAINS and MV_OVERLAP for null array elements (#15855 ) Fixes an oversight after #14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements).	2024-02-07 19:40:41 +05:30
Zoltan Haindrich	fdc7cec271	Support Window operators in decoupled planning (#15815 )	2024-02-07 04:09:48 -05:00
Gian Merlino	54b30646f3	Add sqlReverseLookupThreshold for ReverseLookupRule. (#15832 ) If lots of keys map to the same value, reversing a LOOKUP call can slow things down unacceptably. To protect against this, this patch introduces a parameter sqlReverseLookupThreshold representing the maximum size of an IN filter that will be created as part of lookup reversal. If inSubQueryThreshold is set to a smaller value than sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead. This allows users to use that single parameter to control IN sizes if they wish.	2024-02-06 16:32:05 +05:30
Soumyava	b86f31f2c0	Addressing shapeshifting issues with window functions (#15807 ) Addressing shapeshifting issues with window functions	2024-02-06 11:12:20 +05:30
Zoltan Haindrich	392d585ff8	Identify not range filters without negating subexpressions (#15766 ) * Identify not range filters without negating subexpressions Earlier betweenish (range/bounds) filters were identified thru a process of negating the subexpressions which may have not performed that well. (it could have dominated the runtime in some cases) This patch makes that unnecessary as its able to create the negate expression directly. * add test;fix for multiple intervals	2024-02-05 19:12:58 -08:00
Zoltan Haindrich	8f5b7522c7	Strict window frame checks (#15746 ) introduce checks to ensure that window frame is supported added check to ensure that no expressions are set as bounds added logic to detect following/following like cases - described in Window function fails to demarcate if 2 following are used #15739 currently RANGE frames are only supported correctly if both endpoints are unbounded or current row Offset based window range support #15767 added windowingStrictValidation context key to provide a way to override the check	2024-02-02 16:21:53 +05:30
Laksh Singla	7d65caf0c5	Update the docs for EARLIEST_BY/LATEST_BY aggregators with the newly added numeric capabilities (#15670 )	2024-02-01 10:24:43 +05:30
Zoltan Haindrich	f701197224	Enable ArrayListRowsAndColumns to StorageAdapter conversion (#15735 )	2024-01-31 02:36:58 -05:00
Gian Merlino	38a1e827ab	Fix up value types when creating range filters. (#15778 ) Fixes a bug introduced in #15609, where queries involving filters on TIME_FLOOR could encounter ClassCastException when comparing RangeValue in CombineAndSimplifyBounds. Prior to #15609, CombineAndSimplifyBounds would remove, rebuild, and re-add all numeric range filters as part of consolidating numeric range filters for the same column under the least restrictive type. #15609 included a change to only rebuild numeric range filters when a consolidation opportunity actually arises. The bug was introduced because the unconditional rebuild, as a side effect, masked the fact that in some cases range filters would be created with string match values and a LONG match value type. This patch changes the fixup to happen at the time the range filter is initially created, rather than in CombineAndSimplifyBounds.	2024-01-29 13:30:47 -08:00
Abhishek Agarwal	989a8f7874	Better error message for date_trunc operators (#15759 ) IAEs are not bubbled up and show up as a runtime failure to the user which are not helpful. See https://apachedruidworkspace.slack.com/archives/C0303FDCZEZ/p1706185796975109 for one such example. This change will fix that.	2024-01-27 11:22:39 +05:30
Karan Kumar	c4990f56d6	Prepare main branch for next 30.0.0 release. (#15707 )	2024-01-23 15:55:54 +05:30
Zoltan Haindrich	d6a12c4389	Add ability to enable ResultCache in tests (#15465 )	2024-01-22 09:02:59 -05:00
Pranav	45b30dc07d	Revert "Change default inSubQueryThreshold (#15336 )" (#15722 ) A low value of inSubQueryThreshold can cause queries with IN filter to plan as joins more commonly. However, some of these join queries may not get planned as IN filter on data nodes and causes significant perf regression.	2024-01-22 11:34:39 +05:30
Zoltan Haindrich	8a43db9395	Range support in window expressions (support them as groups) (#15365 ) * support groups windowing mode; which is a close relative of ranges (but not in the standard) * all windows with range expressions will be executed wit it groups * it will be 100% correct in case for both bounds its true that: isCurrentRow() \|\| isUnBounded() * this covers OVER ( ORDER BY COL ) * for other cases it will have some chances of getting correct results...	2024-01-17 00:05:21 -06:00
Gian Merlino	500681d0cb	Add ImmutableLookupMap for static lookups. (#15675 ) * Add ImmutableLookupMap for static lookups. This patch adds a new ImmutableLookupMap, which comes with an ImmutableLookupExtractor. It uses a fastutil open hashmap plus two lists to store its data in such a way that forward and reverse lookups can both be done quickly. I also observed footprint to be somewhat smaller than Java HashMap + MapLookupExtractor for a 1 million row lookup. The main advantage, though, is that reverse lookups can be done much more quickly than MapLookupExtractor (which iterates the entire map for each call to unapplyAll). This speeds up the recently added ReverseLookupRule (#15626) during SQL planning with very large lookups. * Use in one more test. * Fix benchmark. * Object2ObjectOpenHashMap * Fixes, and LookupExtractor interface update to have asMap. * Remove commented-out code. * Fix style. * Fix import order. * Add fastutil. * Avoid storing Map entries.	2024-01-13 13:14:01 -08:00
Gian Merlino	866fe1cda6	Fix some naming related to AggregatePullUpLookupRule. (#15677 ) It was called "split" rather than "pull up" in some places. This patch standardizes on "pull up".	2024-01-12 15:41:58 -08:00
Gian Merlino	cccf13ea82	Reverse, pull up lookups in the SQL planner. (#15626 ) * Reverse, pull up lookups in the SQL planner. Adds two new rules: 1) ReverseLookupRule, which eliminates calls to LOOKUP by doing reverse lookups. 2) AggregatePullUpLookupRule, which pulls up calls to LOOKUP above GROUP BY, when the lookup is injective. Adds configs `sqlReverseLookup` and `sqlPullUpLookup` to control whether these rules fire. Both are enabled by default. To minimize the chance of performance problems due to many keys mapping to the same value, ReverseLookupRule refrains from reversing a lookup if there are more keys than `inSubQueryThreshold`. The rationale for using this setting is that reversal works by generating an IN, and the `inSubQueryThreshold` describes the largest IN the user wants the planner to create. * Add additional line. * Style. * Remove commented-out lines. * Fix tests. * Add test. * Fix doc link. * Fix docs. * Add one more test. * Fix tests. * Logic, test updates. * - Make FilterDecomposeConcatRule more flexible. - Make CalciteRulesManager apply reduction rules til fixpoint. * Additional tests, simplify code.	2024-01-12 00:06:31 -08:00
Zoltan Haindrich	e597cc2949	Remove UnaryFunctionOperatorConversion and RoundOperatorConversion (#15566 ) * get rid of roun op conv * cleanup * use DirectOperatorConversion instead unary * import order	2024-01-12 10:06:23 +05:30
Gian Merlino	6c18434028	CONCAT flattening, filter decomposition. (#15634 ) * CONCAT flattening, filter decomposition. Flattening: CONCAT(CONCAT(x, y), z) is flattened to CONCAT(x, y, z). This is especially useful for the \|\| operator, which is a binary operator and leads to non-flat CONCAT calls. Filter decomposition: transforms CONCAT(x, '-', y) = 'a-b' into x = 'a' AND y = 'b'. * One more test. * Fix two tests. * Adjustments from review. * Fix empty string problem, add tests.	2024-01-11 11:18:50 -08:00
Gian Merlino	ee77fa7fb3	Add tests for CASE decomposition. (#15639 ) I was looking into adding a rule to do this, and found that it was already happening as part of Calcite's RexSimplify. So this patch simply adds some tests to ensure that it continues to happen.	2024-01-10 13:24:24 -08:00
Ankit Kothari	355c2f5da0	Add sql + ingestion compatibility for first/last on numeric values (#15607 ) SQL compatibility for numeric last and first column types. Ingestion UI now provides option for first and last aggregation as well.	2024-01-10 12:59:38 +05:30
Rishabh Singh	71f5307277	Eliminate Periodic Realtime Segment Metadata Queries: Task Now Publish Schema for Seamless Coordinator Updates (#15475 ) The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This task encompasses addressing both realtime and finalized segments. This modification specifically addresses the issue with realtime segments. Tasks will now routinely communicate the schema for realtime segments during the segment announcement process. The Coordinator will identify the schema alongside the segment announcement and subsequently update the schema for realtime segments in the metadata cache.	2024-01-10 08:55:56 +05:30
Abhishek Agarwal	468b99e608	Enable query request queuing by default when total laning is turned on. (#15440 ) This PR enables the flag by default to queue excess query requests in the jetty queue. Still keeping the flag so that it can be turned off if necessary. But the flag will be removed in the future.	2024-01-09 07:54:26 +05:30
Clint Wylie	df5bcd1367	fix bugs with expression virtual column indexes for expression virtual columns which refer to other virtual columns (#15633 ) changes: * ColumnIndexSelector now extends ColumnSelector. The only real implementation of ColumnIndexSelector, ColumnSelectorColumnIndexSelector, already has a ColumnSelector, so this isn't very disruptive * removed getColumnNames from ColumnSelector since it was not used * VirtualColumns and VirtualColumn getIndexSupplier method now needs argument of ColumnIndexSelector instead of ColumnSelector, which allows expression virtual columns to correctly recognize other virtual columns, fixing an issue which would incorrectly handle other virtual columns as non-existent columns instead * fixed a bug with sql planner incorrectly not using expression filter for equality filters on columns with extractionFn and no virtual column registry	2024-01-08 13:10:11 -08:00
Jonathan Wei	5d1e66b8f9	Allow broker to use catalog for datasource schemas for SQL queries (#15469 ) * Allow broker to use catalog for datasource schemas * More PR comments * PR comments	2024-01-08 13:46:08 -06:00
Gian Merlino	0422d9d507	Fix redundant expansion in SearchOperatorConversion. (#15625 ) This logic error causes sarg expansion to happen twice for IN or NOT IN points. It doesn't affect the final generated native query, because the redundant expansions gets combined. But it slows down planning, especially for large NOT IN.	2024-01-05 12:42:12 -08:00
Zoltan Haindrich	b9679d0884	Run filter-into-join rule early for subqueries and disable project-filter rule (#15511 ) FILTER_INTO_JOIN is mainly run along with the other rules with the Volcano planner; however if the query starts highly underdefined (join conditions in the where clauses) that generic query could give a lot of room for the other rules to play around with only enabled it for when the join uses subqueries for its inputs. PROJECT_FILTER rule is not that useful. and could increase planning times by providing new plans. This problem worsened after we started supporting inner joins with arbitrary join conditions in https://github.com/apache/druid/pull/15302	2024-01-04 15:33:45 +05:30
Gian Merlino	5c3391a084	Follow-ups to SEARCH and IN from #15609 . (#15623 ) - Rename ExprType to BaseType in CollectComparisons, since ExprType is a thing that exists elsewhere. - Remove unused "notInRexNodes" from SearchOperatorConversion.	2024-01-03 22:38:12 -08:00
Clint Wylie	f19ece146f	expression virtual column indexes (#15585 ) * ExpressionVirtualColumn + indexes = bff. Expression virtual columns can now use indexes of the underlying columns similar to how expression filters	2024-01-03 21:00:39 -08:00
Gian Merlino	01eec4a55e	New handling for COALESCE, SEARCH, and filter optimization. (#15609 ) * New handling for COALESCE, SEARCH, and filter optimization. COALESCE is converted by Calcite's parser to CASE, which is largely counterproductive for us, because it ends up duplicating expressions. In the current code we end up un-doing it in our CaseOperatorConversion. This patch has a different approach: 1) Add CaseToCoalesceRule to convert CASE back to COALESCE earlier, before the Volcano planner runs, using CaseToCoalesceRule. 2) Add FilterDecomposeCoalesceRule to decompose calls like "f(COALESCE(x, y))" into "(x IS NOT NULL AND f(x)) OR (x IS NULL AND f(y))". This helps use indexes when available on x and y. 3) Add CoalesceLookupRule to push COALESCE into the third arg of LOOKUP. 4) Add a native "coalesce" function so we can convert 3+ arg COALESCE. The advantage of this approach is that by un-doing the CASE to COALESCE conversion earlier, we have flexibility to do more stuff with COALESCE (like decomposition and pushing into LOOKUP). SEARCH is an operator used internally by Calcite to represent matching an argument against some set of ranges. This patch improves our handling of SEARCH in two ways: 1) Expand NOT points (point "holes" in the range set) from SEARCH as `!(a \|\| b)` rather than `!a && !b`, which makes it possible to convert them to a "not" of "in" filter later. 2) Generate those nice conversions for NOT points even if the SEARCH is not composed of 100% NOT points. Without this change, a SEARCH for "x NOT IN ('a', 'b') AND x < 'm'" would get converted like "x < 'a' OR (x > 'a' AND x < 'b') OR (x > 'b' AND x < 'm')". One of the steps we take when generating Druid queries from Calcite plans is to optimize native filters. This patch improves this step: 1) Extract common ANDed predicates in ConvertSelectorsToIns, so we can convert "(a && x = 'b') \|\| (a && x = 'c')" into "a && x IN ('b', 'c')". 2) Speed up CombineAndSimplifyBounds and ConvertSelectorsToIns on ORs with lots of children by adjusting the logic to avoid calling "indexOf" and "remove" on an ArrayList. 3) Refactor ConvertSelectorsToIns to reduce duplicated code between the handling for "selector" and "equals" filters. * Not so final. * Fixes. * Fix test. * Fix test.	2024-01-03 08:56:22 -08:00
AlbericByte	a2e65e6a89	Support to pass dynamic values to timestamp Extract function (#15586 ) Fixes #15072 Before this modification , the third parameter (timezone) require to be a Literal, it will throw a error when this parameter is column Identifier.	2023-12-21 11:57:52 +05:30
Clint Wylie	e373f62692	fix expression post aggregator array handling when grouping wrapper types leak (#15543 ) * fix expression post aggregator array handling when grouping wrapper types leak * more consistent expression function error messaging	2023-12-15 21:43:27 -08:00
Soumyava	3e15522d6b	Round works correctly on system metadata columns (#15554 )	2023-12-13 17:23:14 -08:00
Soumyava	38f3cf9e65	Fixing a case where datatype mismatch was happenning in join (#15541 )	2023-12-12 12:50:32 -08:00
Clint Wylie	42f2496b7d	fix bug with nested empty array fields (#15532 )	2023-12-09 12:20:21 -08:00
Clint Wylie	e7c8f2e208	lift restriction of array_to_mv to only support direct column access (#15528 )	2023-12-08 16:27:17 -08:00
Soumyava	ca4ecdf7d0	Fixing NPE with virtual expression with unnest (#15513 ) * Fixing NPE with virtual expression with unnest * Fixing a comment	2023-12-08 10:51:56 -08:00
Clint Wylie	e64b92eb35	add JSON_QUERY_ARRAY function to pluck ARRAY<COMPLEX<json>> out of COMPLEX<json> (#15521 )	2023-12-08 05:28:46 -08:00
Adarsh Sanjeev	2e45eadc08	Add better error messages for using OVERWRITE with INSERT statments (#15517 ) * Add better error messages for using OVERWRITE with INSERT statments	2023-12-08 15:33:46 +05:30
Zoltan Haindrich	c353ccfdef	Windowed min aggregates null-s as 0 (#15371 )	2023-12-08 01:41:16 -08:00
sb89594	5fda8613ad	Feature: Add IPv6 Match Function (#15212 )	2023-12-07 23:09:06 -08:00
Clint Wylie	c241c6980c	store auto columns with only empty or null containing arrays as ARRAY<LONG> instead of COMPLEX<json> (#15505 )	2023-12-07 03:31:43 -08:00
Clint Wylie	557f3f6f57	add array column type support to EXTEND operator (#15458 )	2023-12-06 23:21:35 -08:00
Rishabh Singh	d968bb3f43	Rename config for enabling CentralizedDatasourceSchema feature (#15476 ) * Rename property to druid.centralizedDatasourceSchema.enabled * Update config name in docker-compose	2023-12-05 16:57:25 +05:30
Zoltan Haindrich	a1aa4340d0	Changing the queryFrameWork in Calcite*Tests may have sideeffects (#15428 ) changes how its configured a bit to use an annotation instead of methods	2023-12-04 00:38:01 +05:30
Clint Wylie	5ce4aab3b8	update ARRAY_OVERLAP to plan with ArrayContainsElement for ARRAY columns (#15451 ) Updates ARRAY_OVERLAP to use the same ArrayContainsElement filter added in #15366 when filtering ARRAY typed columns so that it can also use indexes like ARRAY_CONTAINS.	2023-11-30 10:05:20 +05:30
Pranav	93cd638645	Enabling aggregateMultipleValues in all StringAnyAggregators (#15434 ) * Enabling aggregateMultipleValues in all StringAnyAggregators * Adding more tests * More validation * fix warning * updating asserts in decoupled mode * fix intellij inspection * Addressing comments * Addressing comments * Adding early validations and make aggregate consistent across all * fixing tests * fixing tests * Update docs/querying/sql-aggregations.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * fixing static check --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2023-11-29 14:32:49 -08:00
Clint Wylie	64fcb32bcf	add native 'array contains element' filter (#15366 ) * add native arrayContainsElement filter to use array column element indexes	2023-11-29 03:33:00 -08:00
Abhishek Agarwal	0a56c87e93	SQL: Plan non-equijoin conditions as cross join followed by filter (#15302 ) This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.	2023-11-29 13:46:11 +05:30
Clint Wylie	97623b408c	add optional 'castToType' parameter to 'auto' column schema (#15417 ) * auto but.. with an expected type	2023-11-28 17:19:23 -08:00
Zoltan Haindrich	eb056e23b5	Fix dictionarySize overrides in tests (#15354 ) I think this is a problem as it discards the false return value when the putToKeyBuffer can't store the value because of the limit Not forwarding the return value at that point may lead to the normal continuation here regardless something was not added to the dictionary like here	2023-11-28 18:49:09 +05:30
Zoltan Haindrich	ca544e552c	Add option to compare results with relative error tolerance (#15429 ) Adds a result comparision mode of EQUALS_RELATIVE_1000_ULPS ; which accepts floating point differences up-to 1000 units of least precision	2023-11-28 13:03:16 +05:30
Abhishek Agarwal	3113e7b350	Fix grouping aggregator when one of the dimension is a simple extraction (#15421 ) This PR fixes an issue where the grouping aggregator wrongly assumes that a key dimension is a virtual column and assigns a wrong name to it. This results in a mismatch between the dimensions that grouping aggregator sees and the dimension names that rows are aggregated on. And finally, grouping aggregator generates wrong result.	2023-11-24 13:15:07 +05:30
Clint Wylie	a95c22ce70	support non-constant expressions for path arguments for json_value and json_query (#15320 ) * support dynamic expressions for path arguments for json_value and json_query	2023-11-17 01:12:05 -08:00
Adarsh Sanjeev	a134cc30a6	Change default inSubQueryThreshold (#15336 )	2023-11-14 14:08:12 +05:30
Rishabh Singh	5446494e63	Non-existent datasource shouldn't affect schema rebuilding for other datasources (#15355 ) In pull request #14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information. This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.	2023-11-14 12:52:33 +05:30
Rishabh Singh	8c802e4c9b	Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985 ) In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.	2023-11-04 19:33:25 +05:30
Laksh Singla	0cc8839a60	Allow casted literal values in SQL functions accepting literals (Part 2) (#15316 )	2023-11-03 21:22:19 +05:30
Gian Merlino	d87d92bc43	Add system fields to input sources. (#15276 ) * Add system fields to input sources. Main changes: 1) The SystemField enum defines system fields "__file_uri", "__file_path", and "__file_bucket". They are associated with each input entity. 2) The SystemFieldInputSource interface can be added to any InputSource to make it system-field-capable. It sets up serialization of a list of configured "systemFields" in the JSON form of the input source, and provides a method getSystemFieldValue for computing the value of each system field. Cloud object, HDFS, HTTP, and Local now have this. * Fix various LocalInputSource calls. * Fix style stuff. * Fixups. * Fix tests and coverage.	2023-11-02 10:31:28 -07:00
Clint Wylie	d261587f4a	explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245 ) * better documentation for the differences between arrays and mvds * add outputType to ExpressionPostAggregator to make docs true * add output coercion if outputType is defined on ExpressionPostAgg * updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables	2023-11-02 00:31:37 -07:00
Gian Merlino	6b6d73b5d4	Use min of scheduler threads and server threads for subquery guardrails. (#15295 ) * Use min of scheduler threads and server threads for subquery guardrails. This allows more memory to be used for subqueries when the query scheduler is configured to limit queries below the number of server threads. The patch also refactors the code so SubqueryGuardrailHelper is provided by a Guice Provider rather than being created by ClientQuerySegmentWalker, to achieve better separation of concerns. * Exclude provider from coverage.	2023-11-01 22:34:53 -07:00
Laksh Singla	2ea7177f15	Allow casted literal values in SQL functions accepting literals (#15282 ) Functions that accept literals also allow casted literals. This shouldn't have an impact on the queries that the user writes. It enables the SQL functions to accept explicit cast, which is required with JDBC.	2023-11-01 10:38:48 +05:30
Zoltan Haindrich	f4a74710e6	Process pure ordering changes with windowing operators (#15241 ) - adds a new query build path: DruidQuery#toScanAndSortQuery which: - builds a ScanQuery without considering the current ordering - builds an operator to execute the sort - fixes a null string to "null" literal string conversion in the frame serializer code - fixes some DrillWindowQueryTest cases - fix NPE in NaiveSortOperator in case there was no input - enables back CoreRules.AGGREGATE_REMOVE - adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts - earlier window expressions on top of a subquery with an offset may have ignored the offset	2023-10-29 16:40:49 +05:30
Zoltan Haindrich	6784e9c507	Fix summary row issues in case postaggregations are happening (#15232 ) * fix-1/2 * add message v1 * extend test to cover for IOB issue * move stuff around * change message * fix testcase string * compute postaggs (thank you Clint!) * enable feature for test * ignore tests in msq --------- Co-authored-by: Soumyava Das <soumyava@users.noreply.github.com>	2023-10-24 20:33:59 -07:00
Soumyava	06f40a0019	remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query (#15237 ) * Fixing nested group by query with order by in outer query * Adding examples	2023-10-24 15:30:13 -07:00
Zoltan Haindrich	2e31cb2901	DrillWindowQueryTest: use proper way to decide if the query is ordered (#15118 )	2023-10-23 10:54:28 -04:00
Zoltan Haindrich	b95035f183	Fix VirtualColumn related issues in window expressions (#15119 ) for some exotic queries like: SELECT '_'\|\|dim1, MIN(cast(0 as double)) OVER (), MIN(cast((cnt\|\|cnt) as bigint)) OVER () FROM foo the compilation have resulted in NPE -s mostly because VirtualColumn -s were not handled properly	2023-10-23 14:05:59 +05:30
Zoltan Haindrich	fbbb9c7730	Allow DESC ordering in window expressions (#15195 )	2023-10-20 07:55:28 -04:00
Zoltan Haindrich	9fb0dbfc9f	Fix json inputs for drill windowing tests (#15148 ) This PR: adds a flag to JsonToParquet to do the fix during conversion updates the json files to more correct conents some resultset mismatches were fixed by this updates parquet to 1.13.1	2023-10-19 14:02:41 +05:30
Clint Wylie	061cfee224	add native filters for "(filter) is true" and "(filter) is false" (#15182 ) * add native filters for "(filter) is true" and "(filter) is false" changes: * add IsTrueDimFilter, IsFalseDimFilter, and abstract IsBooleanDimFilter for native json filter implementations of `(filter) IS TRUE` and `(filter) IS FALSE` * add IsBooleanFilter for actual filtering logic for these filters, which ignore includeUnknown to always use matches with false for true and !matches with true for false * fix test incorrectly adjusted to wrong answer in #15058 * add tests for default value mode	2023-10-18 13:07:35 -07:00
Zoltan Haindrich	c58b7f40ee	Rename windowing option (#15184 )	2023-10-18 10:54:20 +05:30
Laksh Singla	dc8d2192c3	Introduce natural comparator for types that don't have a StringComparator (#15145 ) Fixes a bug when executing queries with the ordering of arrays	2023-10-16 10:37:32 +05:30
Zoltan Haindrich	6d62c75866	Fix columns with null values in windowing expressions (#15131 )	2023-10-13 10:42:45 -04:00
Clint Wylie	a0fd9ec55c	fix issue with SQL boolean constants not respecting nulls when strict booleans and sql compatible null handling are enabled (#15135 )	2023-10-12 01:23:24 -07:00
Clint Wylie	d0f64608eb	sql compatible three-valued logic native filters (#15058 ) * sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true * log.warn if non-default configurations are used to guide operators towards SQL complaint behavior	2023-10-12 00:06:23 -07:00
Zoltan Haindrich	ae88f2c0b6	Fix non-sqlcompat validation in CalciteWindowQueryTest (#15086 ) * fixes * check for latest rewrite place * Revert "check for latest rewrite place" This reverts commit `5cf1e2c1ca`. * some stuff (cherry picked from commit ab346d4373ea888eb8ef6115e018e7fb0d27407f) * update test output * updates to test ouptuts * some stuff * move validator * cleanup * fix * change test slightly * add apidoc cleanup warnings * cleanup/etc * instead of telling the story; add a fail with some reason whats the issue * lead-lag fix * add test * remove unnecessary throw * druidexception-trial * Revert "druidexception-trial" This reverts commit `8fa06644bc`. * undo changes to no_grouping; add no_grouping2 * add missing assert on resultcount * rename method; update * introduce enum/etc * make resultmatchmode accessible from TestBuilder#expectedResults * fix dump results to use log * fix * handle null correctly * disable feature type based things for MSQ * fix varianssqlaggtest * use eps in other test * fix intellij error * add final * addrss review * update test/string/etc * write concat in 3 lines :D	2023-10-11 12:34:31 -07:00
Vishesh Garg	c6ca990f1f	Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY (#15095 ) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.	2023-10-11 19:48:36 +05:30
Laksh Singla	5f86072456	Prepare master for Druid 29 (#15121 ) Prepare master for Druid 29	2023-10-11 10:33:45 +05:30
Zoltan Haindrich	23605c1edd	Enable resultset validation of Drill tests (#15096 ) - introduces a test_X method for every testcase (995 testcases) - added a resultset parser which reads the expected resultset based on the result schema - loaded a few more datasets - added a testcase to ensure that all files have a corresponding testcase - renamed DecoupledIgnore to NegativeTest - categorized the failing 268 tests	2023-10-10 14:40:50 +05:30
Clint Wylie	1fc8fb1b20	add a bunch of tests with array typed columns to CalciteArraysQueryTest (#15101 ) * add a bunch of tests with array typed columns to CalciteArraysQueryTest * fix a bug with unnest filter pushdown when filtering on unnested array columns	2023-10-09 06:16:06 -07:00
Laksh Singla	549ef56288	UNION ALLs in MSQ (#14981 ) MSQ now supports UNION ALL with UnionDataSource	2023-10-09 18:18:15 +05:30
Zoltan Haindrich	b5a87fd89b	Support constant args in window functions (#15071 ) Instead of passing the constants around in a new parameter; InputAccessor was introduced to take care of transparently handling the constants - this new class started picking up some copy-paste debris around field accesses; and made them a little bit more readble.	2023-10-08 12:14:25 +05:30
Zoltan Haindrich	7b869fd37a	Change type of AVG aggregates to double (#15089 ) The sql standard is not very restrictive regarding this: If AVG is specified and DT is exact numeric, then the declared type of the result is an implemen- tation-defined exact numeric type with precision not less than the precision of DT and scale not less than the scale of DT. so; using the same type is also ok (without patch); however the avg of 0 and 1 is 0 right now because of the retention of the integer typ Postgres,MySql and Oracle and Drill seem to increase precision ; mssql returns 0 http://sqlfiddle.com/#!9/6f7248/1 I think we should also increase precision as its already calculated more precisely	2023-10-07 18:01:09 +05:30

1 2 3 4 5 ...

1079 Commits