druid

Commit Graph

Author	SHA1	Message	Date
Zoltan Haindrich	ba09e7d1de	add	2024-05-21 14:00:02 +00:00
Zoltan Haindrich	08b73d1969	Revert "some stuff" This reverts commit `52598d3bca`.	2024-05-21 12:02:46 +00:00
Zoltan Haindrich	52598d3bca	some stuff	2024-05-21 12:02:44 +00:00
Zoltan Haindrich	e7e119b559	reduce copypaste	2024-05-16 13:33:27 +00:00
Zoltan Haindrich	f4c73e1499	make old defaults to overides	2024-05-16 11:31:28 +00:00
Zoltan Haindrich	59be71c068	add other	2024-05-16 11:23:26 +00:00
Zoltan Haindrich	607ef174c5	indent	2024-05-16 11:18:54 +00:00
Zoltan Haindrich	3658ab24c3	remove f	2024-05-16 11:18:24 +00:00
Zoltan Haindrich	bec1f38a0e	move sqlmodule down	2024-05-16 11:17:05 +00:00
Zoltan Haindrich	688611eab3	undo	2024-05-16 11:11:55 +00:00
Zoltan Haindrich	93892b6524	undo some	2024-05-16 11:11:03 +00:00
Zoltan Haindrich	118eb61939	there - with 1 boot	2024-05-16 10:31:38 +00:00
Zoltan Haindrich	28ea884e19	almost ready?	2024-05-16 10:01:22 +00:00
Zoltan Haindrich	8ee41f58d0	it does work	2024-05-15 15:14:43 +00:00
Zoltan Haindrich	d4b052a579	stuff	2024-05-15 11:57:13 +00:00
Zoltan Haindrich	a16f982699	remove crap	2024-05-14 16:04:19 +00:00
Zoltan Haindrich	b7b73fa7fe	fix context key order	2024-05-14 08:54:53 +00:00
Zoltan Haindrich	9578953678	Merge remote-tracking branch 'apache/master' into quidem-runner-extension-submit	2024-05-14 07:36:48 +00:00
Zoltan Haindrich	3132c12781	remove unnecessary \\	2024-05-14 07:36:07 +00:00
Sree Charan Manamala	b8dd7478d0	Custom Calcite Rule to remove redundant references (#16402 ) Custom calcite rule mimicking AggregateProjectMergeRule to extend support to expressions. The current calcite rule return null in such cases. In addition, this removes the redundant references.	2024-05-14 06:38:05 +02:00
Zoltan Haindrich	e36c46a85a	fix import style fixes clenaup	2024-05-13 15:52:03 +00:00
Laksh Singla	4bfc186153	Support sorting on complex columns in MSQ (#16322 ) MSQ sorts the columns in a highly specialized manner by byte comparisons. As such the values are serialized differently. This works well for the primitive types and primitive arrays, however complex types cannot be serialized specially. This PR adds the support for sorting the complex columns by deserializing the value from the field and comparing it via the type strategy. This is a lot slower than the byte comparisons, however, it's the only way to support sorting on complex columns that can have arbitrary serialization not optimized for MSQ. The primitives and the arrays are still compared via the byte comparison, therefore this doesn't affect the performance of the queries supported before the patch. If there's a sorting key with mixed complex and primitive/primitive array types, for example: longCol1 ASC, longCol2 ASC, complexCol1 DESC, complexCol2 DESC, stringCol1 DESC, longCol3 DESC, longCol4 ASC, the comparison will happen like: longCol1, longCol2 (ASC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in ascending order complexCol1 (DESC) - Compared via deserialization, cannot be clubbed with any other field complexCol2 (DESC) - Compared via deserialization, cannot be clubbed with any other field, even though the prior field was a complex column with the same order stringCol1, longCol3 (DESC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in descending order longCol4 (ASC) - Compared via byte-comparison, couldn't be coalesced with the previous fields as the direction was different This way, we only deserialize the field wherever required	2024-05-13 15:07:05 +05:30
Zoltan Haindrich	e13d560b6e	Enable quidem shadowing for decoupled testcases * Altered `QueryTestBuilder` to be able to switch to a backing quidem test * added a small crc to ensure that the shadow testcase does not deviate from the original one * Packaged all decoupled related things into a a single `DecoupledExtension` to reduce copy-paste * `DecoupledTestConfig#quidemReason` must describe why its being used * `DecoupledTestConfig#separateDefaultModeTest` can be used to make multiple case files based on `NullHandling` state * fixed a cosmetic bug during decoupled join translation * enhanced `!druidPlan` to report the final logical plan in non-decoupled mode as well * add check to ensure that only supported params are present in a druidtest uri * enabled shadow testcases for previously disabled testcases	2024-05-10 13:38:54 +00:00
Zoltan Haindrich	1811674753	Enable quidem tests to use different suppliers (#16382 ) * enable quidem uri support for `druidtest:///?ComponentSupplier=Nested` and similar * changes the way `SqlTestFrameworkConfig` is being applied; all options will have their own annotation (its kinda impossible to detect that an annotation has a set value or its the default) * enables hierarchical processing of config annotation (was needed to enable class level supplier annotation) * moves uri processing related string2config stuff into `SqlTestFrameworkConfig`	2024-05-09 09:21:02 +02:00
Akshat Jain	775d654a6c	Load only the required lookups for MSQ tasks (#16358 ) With this PR changes, MSQ tasks (MSQControllerTask and MSQWorkerTask) only load the required lookups during querying and ingestion, based on the value of CTX_LOOKUPS_TO_LOAD key in the query context.	2024-05-09 11:21:54 +05:30
Misha	b5958b6b07	Feature configurable calcite bloat (#16248 ) * Configurable bloat for calcite ProjectMergeRule implemented * Comment added * Default bloat value increased to 1000 * Implemented bloat configuration from QueryContext * Code refactored, docs updated --------- Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi>	2024-05-06 20:43:39 +05:30
Gian Merlino	588d442422	Add native filter conversion for SCALAR_IN_ARRAY. (#16312 ) * Add native filter conversion for SCALAR_IN_ARRAY. Main changes: 1) Add an implementation of "toDruidFilter" in ScalarInArrayOperatorConversion. 2) Split up Expressions.literalToDruidExpression into two functions, so the first half (literalToExprEval) can be used by ScalarInArrayOperatorConversion to more efficiently create the list of match values. * Fix type in time arithmetic conversion. * Test updates. * Update test cases to use null instead of '' in default-value mode. * Switch test from msqIncompatible to compatible with a different result. * Update one more test. * Fix test. * Update tests. * Use ExprEvalWrapper to differentiate between empty string and null. * Fix tests some more. * Fix test. * Additional comment. * Style adjustment. * Fix tests. * trueValue -> actualValue. * Use different approach, DruidLiteral instead of ExprEvalWrapper. * Revert changes in ArrayOfDoublesSketchSqlAggregatorTest.	2024-05-03 13:00:33 -07:00
zachjsh	fb7c84fb5d	Catalog clustering keys fixes (#16351 ) * * add another catalog clustering columns unit test * * dissallow clusterKeys with descending order * * make more clear that clustering is re-written into ingest node whether a catalog table or not * * when partitionedBy is stored in catalog, user shouldnt need to specify it in order to specify clustering * * fix intellij inspection failure	2024-05-03 14:02:56 -04:00
Zoltan Haindrich	2d0e86cbdc	Use quidem to run tests (#16249 ) * test scoped jdbc driver for druidtest:/// backed DruidAvaticaTestDriver ** DecoupledTestConfig is used inside the URI - this will make it possible to attach to existing things more easily * DruidQuidemTestBase can be used to create module level set of quidem tests * added quidem commands: !convertedPlan, !logicalPlan, !druidPlan, !nativePlan ** for these I've used some values of the Hook which was there in calcite * there are some shortcuts with proxies(they are only used during testing) - we can probably remove those later	2024-05-02 02:12:42 -04:00
Laksh Singla	e695e52d3f	Improve code flow in the First/Last vector aggregators and unify the numeric aggregators with the String implementations (#16230 ) This PR fixes the first and last vector aggregators and improves their readability. Following changes are introduced The folding is broken in the vectorized versions. We consider time before checking the folded object. If the numerical aggregator gets passed any other object type for some other reason (like String), then the aggregator considers it to be folded, even though it shouldn’t be. We should convert these objects to the desired type, and aggregate them properly. The aggregators must properly use generics. This would minimize the ClassCastException issues that can happen with mixed segment types. We are unifying the string first/last aggregators with numeric versions as well. The aggregators must aggregate null values (https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java#L55-L56 ). The aggregator should only ignore pairs with time == null, and not value == null Time nullity is ignored when trying to vectorize the data. String versions initialized with DateTimes.MIN that is equal to Long.MIN / 2. This can cause incorrect results in case the user enters a custom time column. NOTE: This is still present because it would require a larger refactor in all of the versions. There is a difference in what users might expect from the results because the code flow is changed (for example, the direction of the for loops, etc), however, this will only change the results, and not the contract set by first/last aggregators, which is that if multiple values have the same timestamp, then any of them can get picked. If the column is non-existent, the users might expect a change in the timestamp from DateTime.MAX to Long.MAX, because the code incorrectly used DateTime.MAX to initialize the aggregator, however, in case of a custom timestamp column, this might not be the case. The SQL query might be prohibited from using any Long since it requires a cast to the timestamp function that can fail, but AFAICT native queries don't have such limitations.	2024-04-30 15:13:14 +05:30
Laksh Singla	26d63e7b65	Prevent joining on nested arrays and complex types (#16349 ) #16068 modified DimensionHandlerUtils to accept complex types to be dimensions. This had an unintended side effect of allowing complex types to be joined upon (which wasn't guarded explicitly, it doesn't work). This PR modifies the IndexedTable to reject building the index on the complex types to prevent joining on complex types. The PR adds back the check in the same place, explicitly.	2024-04-30 11:36:53 +05:30
Akshat Jain	9d2cae40c3	Add support for selective loading of lookups in the task layer (#16328 ) Changes: - Add `LookupLoadingSpec` to support 3 modes of lookup loading: ALL, NONE, ONLY_REQUIRED - Add method `Task.getLookupLoadingSpec()` - Do not load any lookups for `KillUnusedSegmentsTask`	2024-04-29 07:19:59 +05:30
zachjsh	365cd7e8e7	INSERT/REPLACE can omit clustering when catalog has default (#16260 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * implement and add tests * * address review comments * * address test review comments * * fix checkstyle * * fix dependencies * * all tests passing * * cleanup * * remove unneeded code * * remove unused dependency * * fix checkstyle	2024-04-26 10:19:45 -04:00
Adarsh Sanjeev	9a2d7c28bc	Prepare master branch for 31.0.0 release (#16333 )	2024-04-26 09:22:43 +05:30
Gian Merlino	68d6e682e8	Fix TimeBoundary planning when filters require virtual columns. (#16337 ) The timeBoundary query does not support virtual columns, so we should avoid it if the query requires virtual columns.	2024-04-25 16:49:40 -07:00
Zoltan Haindrich	9c0bd56f5b	Make QueryComponentSupliers independent from test classes (#16275 )	2024-04-25 02:12:07 -04:00
Laksh Singla	6bca406d31	Grouping on complex columns aka unifying GroupBy strategies (#16068 ) Users can pass complex types as dimensions to the group by queries. For example: SELECT nested_col1, count(*) FROM foo GROUP BY nested_col1	2024-04-24 23:00:14 +05:30
Rishabh Singh	e30790e013	Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817 ) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.	2024-04-24 22:22:53 +05:30
Sree Charan Manamala	080476f9ea	WINDOWING - Fix 2 nodes with same digest causing mapping issue (#16301 ) Fixes the mapping issue in window fucntions where 2 nodes get the same reference.	2024-04-24 16:45:02 +05:30
Laksh Singla	b9bbde5c0a	Fix deadlock that can occur while merging group by results (#15420 ) This PR prevents such a deadlock from happening by acquiring the merge buffers in a single place and passing it down to the runner that might need it.	2024-04-22 14:10:44 +05:30
Sree Charan Manamala	ad5701e891	new SCALAR_IN_ARRAY function analogous to DRUID_IN (#16306 ) * scalar_in function * api doc * refactor	2024-04-18 21:15:15 -07:00
Sree Charan Manamala	960a674442	Corrected Strict NON NULL return type checks (#16279 )	2024-04-18 12:17:13 +02:00
Gian Merlino	ccc1ffb032	Additional short circuiting knowledge in filter bundles. (#16292 ) * Additional short circuiting knowledge in filter bundles. Three updates: 1) The parameter "selectionRowCount" on "makeFilterBundle" is renamed "applyRowCount", and redefined as an upper bound on rows remaining after short-circuiting (rather than number of rows selected so far). This definition works better for OR filters, which pass through the FALSE set rather than the TRUE set to the next subfilter. 2) AndFilter uses min(applyRowCount, indexIntersectionSize) rather than using selectionRowCount for the first subfilter and indexIntersectionSize for each filter thereafter. This improves accuracy when the incoming applyRowCount is smaller than the row count from the first few indexes. 3) OrFilter uses min(applyRowCount, totalRowCount - indexUnionSize) rather than applyRowCount for subfilters. This allows an OR filter to pass information about short-circuiting to its subfilters. To help write tests for this, the patch also moves the sampled wikiticker data file from sql to processing. * Forbidden APIs. * Forbidden APIs. * Better comments. * Fix inspection. * Adjustments to tests.	2024-04-16 22:42:28 -07:00
zachjsh	a5428e75ff	INSERT/REPLACE complex target column types are validated against source input expressions (#16223 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * address review comments * * address test review comments * * fix checkstyle	2024-04-16 17:20:35 -04:00
Sree Charan Manamala	5247059d2f	Allow Double & null values in sql type array through dynamic params (#16274 )	2024-04-15 10:44:42 +02:00
Adarsh Sanjeev	3df00aef9d	Add manifest file for MSQ export (#15953 ) Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines	2024-04-15 11:37:31 +05:30
Sree Charan Manamala	3340b200db	Fix window function drill tests failures falling under RESULT_MISMATCH & RESULT_COUNT_MISMATCH (#16264 ) * Updated the drill test expected results which are failing due to druid's default sorting algorithm taking nulls first approach. * Corrected the queries where date time values are directly provided * marked 2 cases failing with resultset casting issues	2024-04-12 13:54:48 +02:00
Sree Charan Manamala	f65c166327	Windowed aggregates should update the aggregation value based on final compute (#16244 )	2024-04-12 08:28:33 +02:00
Gian Merlino	9f358f5f4a	SQL tests: avoid mixing skip and cannot vectorize. (#16251 ) * SQL tests: avoid mixing skip and cannot vectorize. skipVectorize switches off vectorization tests completely, and cannotVectorize turns vectorization tests into negative tests. It doesn't make sense to use them together, so this patch makes it an error to do so, and cleans up cases where both are mentioned. This patch also has the effect of changing various tests from skipVectorize to cannotVectorize, because in the past when both were mentioned, skipVectorize would take priority. * Fix bug with StringAnyAggregatorFactory attempting to vectorize when it cannt. * Fix tests.	2024-04-11 15:06:11 -07:00
Soumyava	7759f25095	Moving bitwise_or to use native calcite operator (#16237 )	2024-04-04 12:49:29 -07:00

1 2 3 4 5 ...

1005 Commits