druid

Commit Graph

Author	SHA1	Message	Date
Zoltan Haindrich	ca544e552c	Add option to compare results with relative error tolerance (#15429 ) Adds a result comparision mode of EQUALS_RELATIVE_1000_ULPS ; which accepts floating point differences up-to 1000 units of least precision	2023-11-28 13:03:16 +05:30
Abhishek Agarwal	3113e7b350	Fix grouping aggregator when one of the dimension is a simple extraction (#15421 ) This PR fixes an issue where the grouping aggregator wrongly assumes that a key dimension is a virtual column and assigns a wrong name to it. This results in a mismatch between the dimensions that grouping aggregator sees and the dimension names that rows are aggregated on. And finally, grouping aggregator generates wrong result.	2023-11-24 13:15:07 +05:30
Clint Wylie	a95c22ce70	support non-constant expressions for path arguments for json_value and json_query (#15320 ) * support dynamic expressions for path arguments for json_value and json_query	2023-11-17 01:12:05 -08:00
Adarsh Sanjeev	a134cc30a6	Change default inSubQueryThreshold (#15336 )	2023-11-14 14:08:12 +05:30
Rishabh Singh	5446494e63	Non-existent datasource shouldn't affect schema rebuilding for other datasources (#15355 ) In pull request #14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information. This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.	2023-11-14 12:52:33 +05:30
Rishabh Singh	8c802e4c9b	Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985 ) In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.	2023-11-04 19:33:25 +05:30
Laksh Singla	0cc8839a60	Allow casted literal values in SQL functions accepting literals (Part 2) (#15316 )	2023-11-03 21:22:19 +05:30
Gian Merlino	d87d92bc43	Add system fields to input sources. (#15276 ) * Add system fields to input sources. Main changes: 1) The SystemField enum defines system fields "__file_uri", "__file_path", and "__file_bucket". They are associated with each input entity. 2) The SystemFieldInputSource interface can be added to any InputSource to make it system-field-capable. It sets up serialization of a list of configured "systemFields" in the JSON form of the input source, and provides a method getSystemFieldValue for computing the value of each system field. Cloud object, HDFS, HTTP, and Local now have this. * Fix various LocalInputSource calls. * Fix style stuff. * Fixups. * Fix tests and coverage.	2023-11-02 10:31:28 -07:00
Clint Wylie	d261587f4a	explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245 ) * better documentation for the differences between arrays and mvds * add outputType to ExpressionPostAggregator to make docs true * add output coercion if outputType is defined on ExpressionPostAgg * updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables	2023-11-02 00:31:37 -07:00
Gian Merlino	6b6d73b5d4	Use min of scheduler threads and server threads for subquery guardrails. (#15295 ) * Use min of scheduler threads and server threads for subquery guardrails. This allows more memory to be used for subqueries when the query scheduler is configured to limit queries below the number of server threads. The patch also refactors the code so SubqueryGuardrailHelper is provided by a Guice Provider rather than being created by ClientQuerySegmentWalker, to achieve better separation of concerns. * Exclude provider from coverage.	2023-11-01 22:34:53 -07:00
Laksh Singla	2ea7177f15	Allow casted literal values in SQL functions accepting literals (#15282 ) Functions that accept literals also allow casted literals. This shouldn't have an impact on the queries that the user writes. It enables the SQL functions to accept explicit cast, which is required with JDBC.	2023-11-01 10:38:48 +05:30
Zoltan Haindrich	f4a74710e6	Process pure ordering changes with windowing operators (#15241 ) - adds a new query build path: DruidQuery#toScanAndSortQuery which: - builds a ScanQuery without considering the current ordering - builds an operator to execute the sort - fixes a null string to "null" literal string conversion in the frame serializer code - fixes some DrillWindowQueryTest cases - fix NPE in NaiveSortOperator in case there was no input - enables back CoreRules.AGGREGATE_REMOVE - adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts - earlier window expressions on top of a subquery with an offset may have ignored the offset	2023-10-29 16:40:49 +05:30
Zoltan Haindrich	6784e9c507	Fix summary row issues in case postaggregations are happening (#15232 ) * fix-1/2 * add message v1 * extend test to cover for IOB issue * move stuff around * change message * fix testcase string * compute postaggs (thank you Clint!) * enable feature for test * ignore tests in msq --------- Co-authored-by: Soumyava Das <soumyava@users.noreply.github.com>	2023-10-24 20:33:59 -07:00
Soumyava	06f40a0019	remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query (#15237 ) * Fixing nested group by query with order by in outer query * Adding examples	2023-10-24 15:30:13 -07:00
Zoltan Haindrich	2e31cb2901	DrillWindowQueryTest: use proper way to decide if the query is ordered (#15118 )	2023-10-23 10:54:28 -04:00
Zoltan Haindrich	b95035f183	Fix VirtualColumn related issues in window expressions (#15119 ) for some exotic queries like: SELECT '_'\|\|dim1, MIN(cast(0 as double)) OVER (), MIN(cast((cnt\|\|cnt) as bigint)) OVER () FROM foo the compilation have resulted in NPE -s mostly because VirtualColumn -s were not handled properly	2023-10-23 14:05:59 +05:30
Zoltan Haindrich	fbbb9c7730	Allow DESC ordering in window expressions (#15195 )	2023-10-20 07:55:28 -04:00
Zoltan Haindrich	9fb0dbfc9f	Fix json inputs for drill windowing tests (#15148 ) This PR: adds a flag to JsonToParquet to do the fix during conversion updates the json files to more correct conents some resultset mismatches were fixed by this updates parquet to 1.13.1	2023-10-19 14:02:41 +05:30
Clint Wylie	061cfee224	add native filters for "(filter) is true" and "(filter) is false" (#15182 ) * add native filters for "(filter) is true" and "(filter) is false" changes: * add IsTrueDimFilter, IsFalseDimFilter, and abstract IsBooleanDimFilter for native json filter implementations of `(filter) IS TRUE` and `(filter) IS FALSE` * add IsBooleanFilter for actual filtering logic for these filters, which ignore includeUnknown to always use matches with false for true and !matches with true for false * fix test incorrectly adjusted to wrong answer in #15058 * add tests for default value mode	2023-10-18 13:07:35 -07:00
Zoltan Haindrich	c58b7f40ee	Rename windowing option (#15184 )	2023-10-18 10:54:20 +05:30
Laksh Singla	dc8d2192c3	Introduce natural comparator for types that don't have a StringComparator (#15145 ) Fixes a bug when executing queries with the ordering of arrays	2023-10-16 10:37:32 +05:30
Zoltan Haindrich	6d62c75866	Fix columns with null values in windowing expressions (#15131 )	2023-10-13 10:42:45 -04:00
Clint Wylie	a0fd9ec55c	fix issue with SQL boolean constants not respecting nulls when strict booleans and sql compatible null handling are enabled (#15135 )	2023-10-12 01:23:24 -07:00
Clint Wylie	d0f64608eb	sql compatible three-valued logic native filters (#15058 ) * sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true * log.warn if non-default configurations are used to guide operators towards SQL complaint behavior	2023-10-12 00:06:23 -07:00
Zoltan Haindrich	ae88f2c0b6	Fix non-sqlcompat validation in CalciteWindowQueryTest (#15086 ) * fixes * check for latest rewrite place * Revert "check for latest rewrite place" This reverts commit `5cf1e2c1ca`. * some stuff (cherry picked from commit ab346d4373ea888eb8ef6115e018e7fb0d27407f) * update test output * updates to test ouptuts * some stuff * move validator * cleanup * fix * change test slightly * add apidoc cleanup warnings * cleanup/etc * instead of telling the story; add a fail with some reason whats the issue * lead-lag fix * add test * remove unnecessary throw * druidexception-trial * Revert "druidexception-trial" This reverts commit `8fa06644bc`. * undo changes to no_grouping; add no_grouping2 * add missing assert on resultcount * rename method; update * introduce enum/etc * make resultmatchmode accessible from TestBuilder#expectedResults * fix dump results to use log * fix * handle null correctly * disable feature type based things for MSQ * fix varianssqlaggtest * use eps in other test * fix intellij error * add final * addrss review * update test/string/etc * write concat in 3 lines :D	2023-10-11 12:34:31 -07:00
Vishesh Garg	c6ca990f1f	Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY (#15095 ) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.	2023-10-11 19:48:36 +05:30
Laksh Singla	5f86072456	Prepare master for Druid 29 (#15121 ) Prepare master for Druid 29	2023-10-11 10:33:45 +05:30
Zoltan Haindrich	23605c1edd	Enable resultset validation of Drill tests (#15096 ) - introduces a test_X method for every testcase (995 testcases) - added a resultset parser which reads the expected resultset based on the result schema - loaded a few more datasets - added a testcase to ensure that all files have a corresponding testcase - renamed DecoupledIgnore to NegativeTest - categorized the failing 268 tests	2023-10-10 14:40:50 +05:30
Clint Wylie	1fc8fb1b20	add a bunch of tests with array typed columns to CalciteArraysQueryTest (#15101 ) * add a bunch of tests with array typed columns to CalciteArraysQueryTest * fix a bug with unnest filter pushdown when filtering on unnested array columns	2023-10-09 06:16:06 -07:00
Laksh Singla	549ef56288	UNION ALLs in MSQ (#14981 ) MSQ now supports UNION ALL with UnionDataSource	2023-10-09 18:18:15 +05:30
Zoltan Haindrich	b5a87fd89b	Support constant args in window functions (#15071 ) Instead of passing the constants around in a new parameter; InputAccessor was introduced to take care of transparently handling the constants - this new class started picking up some copy-paste debris around field accesses; and made them a little bit more readble.	2023-10-08 12:14:25 +05:30
Zoltan Haindrich	7b869fd37a	Change type of AVG aggregates to double (#15089 ) The sql standard is not very restrictive regarding this: If AVG is specified and DT is exact numeric, then the declared type of the result is an implemen- tation-defined exact numeric type with precision not less than the precision of DT and scale not less than the scale of DT. so; using the same type is also ok (without patch); however the avg of 0 and 1 is 0 right now because of the retention of the integer typ Postgres,MySql and Oracle and Drill seem to increase precision ; mssql returns 0 http://sqlfiddle.com/#!9/6f7248/1 I think we should also increase precision as its already calculated more precisely	2023-10-07 18:01:09 +05:30
Soumyava	57ab8e13dc	Updating plans when using joins with unnest on the left (#15075 ) * Updating plans when using joins with unnest on the left * Correcting segment map function for hashJoin * The changes done here are not reflected into MSQ yet so these tests might not run in MSQ * native tests * Self joins with unnest data source * Making this pass * Addressing comments by adding explanation and new test	2023-10-06 19:23:12 -07:00
Soumyava	1a06ef5a24	Fixing old function used (#15099 )	2023-10-05 17:25:00 -07:00
Pranav	06c5527c85	Allow aliasing of Macros and add new alias for complex decode 64 (#15034 ) * Add AliasExprMacro to allow aliasing of native expression macros * Add decode_base64_complex alias for complex_decode_base64	2023-10-05 16:24:36 -07:00
Zoltan Haindrich	36d7b3cc65	Add CalciteSysQueryTest to enable some testing of bindable plans. (#15070 )	2023-10-05 11:37:49 -07:00
Clint Wylie	b4bc9b6950	fix issue with auto columns with mix of scalar values and empty arrays (#15083 )	2023-10-05 10:15:45 +05:30
Laksh Singla	b8d03d36b0	Free up the resources when materializing the results as Frames (#15032 ) Refactor the code to clean up the result sequences when materializing the results as Frames	2023-10-05 10:14:27 +05:30
Laksh Singla	30cf76db99	Field writers for numerical arrays (#14900 ) Row-based frames, and by extension, MSQ now supports numeric array types. This means that all queries consuming or producing arrays would also work with MSQ. Numeric arrays can also be ingested via MSQ. Post this patch, queries like, SELECT [1, 2] would work with MSQ since they consume a numeric array, instead of failing with an unsupported column type exception.	2023-10-04 23:16:47 +05:30
Zoltan Haindrich	90e4b25620	Fix lead/lag to be usable without offset (#15057 )	2023-10-04 17:38:46 +05:30
Zoltan Haindrich	3342e03ea8	Windowing processing may have run into Exceptions when the whole table was processed (#15064 ) Earlier when the query was processing the whole table; the planning may have ended with a NPE; as it was not possible to create a scanquery from it.	2023-10-04 11:27:11 +05:30
Xavier Léauté	adef2069b1	Make unit tests pass with Java 21 (#15014 ) This change updates dependencies as needed and fixes tests to remove code incompatible with Java 21 As a result all unit tests now pass with Java 21. * update maven-shade-plugin to 3.5.0 and follow-up to #15042 * explain why we need to override configuration when specifying outputFile * remove configuration from dependency management in favor of explicit overrides in each module. * update to mockito to 5.5.0 for Java 21 support when running with Java 11+ * continue using latest mockito 4.x (4.11.0) when running with Java 8 * remove need to mock private fields * exclude incorrectly declared mockito dependency from pac4j-oidc * remove mocking of ByteBuffer, since sealed classes can no longer be mocked in Java 21 * add JVM options workaround for system-rules junit plugin not supporting Java 18+ * exclude older versions of byte-buddy from assertj-core * fix for Java 19 changes in floating point string representation * fix missing InitializedNullHandlingTest * update easymock to 5.2.0 for Java 21 compatibility * update animal-sniffer-plugin to 1.23 * update nl.jqno.equalsverifier to 3.15.1 * update exec-maven-plugin to 3.1.0	2023-10-03 22:41:21 -07:00
Soumyava	cb050282a0	Intervals are updated properly for Unnest queries (#15020 ) Fixes a bug where the unnest queries were not updated with the correct intervals.	2023-10-04 02:52:10 +05:30
Zoltan Haindrich	f3d1c8b70e	Enable back testcases in CalciteWindowQueryTest (#15045 ) Most of the testcases were disabled in CalciteWindowQueryTest during the Calcite-1.35 upgrade; there were some changes arising from the fact that the removal of DRUID_SUM had some unexpected sideffects: SqlStdOperatorTable.SUM became the SUM operator because of that SqlToRelConverter started rewriting windowed SUM -s into SUM0 -s my opinion is that w.r.t to Druid this rewrite provides no real advantage - as SUM0 is serviced by SUM here I believe that's not 100% correct in cases when it aggregates just null-s but that doesnt matter in this case I propose to introduce back a local DRUID_SUM thing as an unchanged SUM and later when CALCITE-6020 is fixed ; we can drop that.	2023-10-03 10:18:44 +05:30
Soumyava	261f54dc04	coalesce on unnest row mismatch fix (#15019 ) * coalesce on unnest row mismatch fix * new example with coalesce over unnest with nested array columns * New example with change in order which triggers the nvl * new test plan update for useDefault=true	2023-10-02 17:26:50 -07:00
Pranav	f1edd671fb	Exposing optional replaceMissingValueWith in lookup function and macros (#14956 ) * Exposing optional replaceMissingValueWith in lookup function and macros * args range validation * Updating docs * Addressing comments * Update docs/querying/sql-scalar.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Update docs/querying/sql-functions.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Addressing comments --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2023-10-02 17:09:23 -07:00
Zoltan Haindrich	2785e062d7	Correct quotation in drill query files (#15044 )	2023-10-02 08:17:15 -07:00
Pranav	07c28f17ca	Fix missing format strings in calls to DruidException.build (#15056 ) * Fix the NPE bug in nonStrictFormat * using non null format string * using Assert.assertThrows	2023-09-29 17:00:36 -07:00
Zoltan Haindrich	db71e28808	Enable SortProjectTransposeRule (#15002 ) contains Enable already passing tests in DecoupledPlanningCalciteQueryTest #14996 enables a transpose rule to support a query plan in which the plan was in the shape: Sort Project Aggregate	2023-09-29 10:49:03 +05:30
Zoltan Haindrich	022950a0c5	MV_FILTER_ONLY may run into Exceptions in case duplicate values were processed (#15012 )	2023-09-27 19:19:42 +05:30

1 2 3 4 5 ...

861 Commits