druid

Commit Graph

Author	SHA1	Message	Date
Zoltan Haindrich	e8773f4d0f	Enable already passing tests in DecoupledPlanningCalciteQueryTest (#14996 )	2023-09-20 15:42:52 +05:30
Gian Merlino	4f498e6469	SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978 ) * SQL: Plan non-equijoin conditions as cross join followed by filter. Druid has previously refused to execute joins with non-equality-based conditions. This was well-intentioned: the idea was to push people to write their queries in a different, hopefully more performant way. But as we're moving towards fuller SQL support, it makes more sense to allow these conditions to go through with the best plan we can come up with: a cross join followed by a filter. In some cases this will allow the query to run, and people will be happy with that. In other cases, it will run into resource limits during execution. But we should at least give the query a chance. This patch also updates the documentation to explain how people can tell whether their queries are being planned this way. * cartesian is a word. * Adjust tests. * Update docs/querying/datasource.md Co-authored-by: Benedict Jin <asdf2014@apache.org> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2023-09-19 10:23:42 -07:00
Soumyava	279b3818f0	Make Unnest work with nullif operator (#14993 ) This is due to the recursive filter creation in unnest storage adapter not performing correctly in case of an empty children. This PR addresses the issue	2023-09-15 09:54:14 +05:30
Gian Merlino	3ae5e97801	Add IS [NOT] TRUE, IS [NOT] FALSE native functions. (#14977 ) They are not quite the same as "x == true", "x != true", etc. These functions never return null, even when "x" itself is null.	2023-09-14 09:19:09 -07:00
Soumyava	7bbefd5741	Updating version in from.ftl (#14982 )	2023-09-14 05:11:36 +00:00
Soumyava	bf99d2c7b2	Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (#14924 ) * Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly * Fixing a failed test * Updating numericNilAgg * Moving to use default values in case of nil agg * Adding the same for first agg * Fixing a test * fixing vectorized string agg for last/first with cast if numeric * Updating tests to remove mockito and cover the case of string first/last on non string columns * Updating a test to vectorize * Addressing review comments: Name change to NilVectorAggregator and using static variables now * fixing intellij inspections	2023-09-13 13:15:14 -07:00
Laksh Singla	4c57504960	Fix the uncaught exceptions when materializing results as frames (#14970 ) When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code). This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.	2023-09-13 15:41:28 +05:30
Clint Wylie	891f0a3fe9	longer compatibility window for nested column format v4 (#14955 ) changes: * add back nested column v4 serializers * 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs * add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'	2023-09-12 14:07:53 -07:00
Zoltan Haindrich	5d16d0edf0	Count distinct returned incorrect results without useApproximateCountDistinct (#14748 ) * fix grouping engine handling of summaries when result set is empty	2023-09-12 13:57:54 -07:00
Clint Wylie	5cecf6ce8f	fix issue with segment metadata cache and complex types when doing out of order upgrades from 0.22 (#14948 )	2023-09-12 10:54:35 +08:00
Suneet Saldanha	757603a773	Set task location as k8sPodName for mm-less ingestion (#14959 ) * Set task location as k8sPodName for mm-less ingestion * tests	2023-09-11 19:44:26 -07:00
Zoltan Haindrich	699893bcff	Fix StringLastAggregatorFactory equals/toString (#14907 ) * update test * update test * format * test * fix0 * Revert "fix0" This reverts commit `44992cb393`. * ok resultset * add plan * update test * before rewind * test * fix toString/compare/test * move test * add timeColumn to hashCode	2023-09-08 09:20:54 -07:00
Soumyava	a8fa979115	Unnest dont push down not (#14942 ) * Not pushing down not filters * New test case * Updating tests * Removing a stale comment	2023-09-06 08:57:03 -07:00
Zoltan Haindrich	23308c050d	Remove DruidAggregateCaseToFilterRule (#14940 ) The issue due to which the custom rule was added has been fixed as a part of https://issues.apache.org/jira/browse/CALCITE-3763 and accommodated during Calcite upgrade	2023-09-06 19:11:58 +05:30
Laksh Singla	6ee0b06e38	Auto configuration for maxSubqueryBytes (#14808 ) A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.	2023-09-06 05:47:19 +00:00
Soumyava	8088a763a6	Vectorize earliest aggregator for both numeric and string types (#14408 ) * Vectorizing earliest for numeric * Vectorizing earliest string aggregator * checkstyle fix * Removing unnecessary exceptions * Ignoring tests in MSQ as earliest is not supported for numeric there * Fixing benchmarks * Updating tests as MSQ does not support earliest for some cases * Addressing review comments by adding the following: 1. Checking capabilities first before creating selectors 2. Removing mockito in tests for numeric first aggs 3. Removing unnecessary tests * Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string * Adding a flag for multi value dimension selector * Addressing comments * 1 more change * Handling review comments part 1 * Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order * Updating numeric first vector agg * Revert "Updating numeric first vector agg" This reverts commit `4291709901`. * Updating code for correctness issues * fixing an issue with latest agg * Adding more comments and removing an unnecessary check * Addressing null checks for tie selector and only vectorize false for quantile sketches	2023-09-05 08:41:42 -07:00
Kashif Faraz	7f26b80e21	Simplify ServiceMetricEvent.Builder (#14933 ) Changes: - Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent> and thus convert it to a plain builder rather than a builder of builder. - Add methods setCreatedTime , setMetricAndValue to the builder	2023-09-01 11:30:45 +05:30
Zoltan Haindrich	e806d09309	Allow EARLIEST/EARLIEST_BY/LATEST/LATEST_BY for STRING columns without specifying maxStringBytes (#14848 )	2023-08-22 22:50:19 -07:00
Zoltan Haindrich	b9a33949fd	Fix aggregation filter expression processing in the absense of projection (#14893 ) * test * fix * add 33 test * crap * Revert "crap" This reverts commit `2751198deb`. * cleanup test * celanup * rename test	2023-08-22 10:17:14 -07:00
Zoltan Haindrich	14c1aff150	Fix error messages relating to OVERWRITE keyword (#14870 ) OVERWRITE should not be a fully reserved keyword	2023-08-22 16:17:49 +05:30
Clint Wylie	194a9c9abc	set druid.expressions.useStrictBooleans to true by default (#14734 )	2023-08-22 00:19:56 -07:00
Clint Wylie	6b14dde50e	deprecate config-magic in favor of json configuration stuff (#14695 ) * json config based processing and broker merge configs to deprecate config-magic	2023-08-16 18:23:57 -07:00
Pranav	26d82fd342	fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite (#14587 )	2023-08-16 17:57:16 -07:00
Rishabh Singh	0dc305f9e4	Upgrade hibernate validator version to fix CVE-2019-10219 (#14757 )	2023-08-14 11:50:51 +05:30
Soumyava	afe22907a5	Calcite upgrade 1.35 (#14510 ) * Update to Calcite 1.35.0 * Update from.ftl for Calcite 1.35.0. * Fixed tests in Calcite upgrade by doing the following: 1. Added a new rule, CoreRules.PROJECT_FILTER_TRANSPOSE_WHOLE_PROJECT_EXPRESSIONS, to Base rules 2. Refactored the CorrelateUnnestRule 3. Updated CorrelateUnnestRel accordingly 4. Fixed a case with selector filters on the left where Calcite was eliding the virtual column 5. Additional test cases for fixes in 2,3,4 6. Update to StringListAggregator to fail a query if separators are not propagated appropriately * Refactored for testcases to pass after the upgrade, introduced 2 new data sources for handling filters and select projects * Added a literalSqlAggregator as the upgraded Calcite involved changes to subquery remove rule. This corrected plans for 2 queries with joins and subqueries by replacing an useless literal dimension with a post agg. Additionally a test with COUNT DISTINCT and FILTER which was failing with Calcite 1.21 is added here which passes with 1.35 * Updated to latest avatica and updated code as SqlUnknownTimeStamp is now used in Calcite which needs to be resolved to a timestamp literal * Added a wrapper segment ref to use for unnest and filter segment reference	2023-08-11 12:47:16 -07:00
Adarsh Sanjeev	56ab81f381	Add support for different result formats to MSQ SqlStatementResource (#14571 ) * Add support for different result format * Add tests * Add tests * Fix checkstyle * Remove changes to destination * Removed some unwanted code * Address review comments * Rename parameter * Fix tests	2023-08-07 20:48:59 +05:30
Soumyava	0d73480c8f	Latest aggregator factories should accept time as VectorValueSelecto… (#14753 ) Fix the queries that have latest aggregator with an expression as time column	2023-08-04 13:04:25 +05:30
Clint Wylie	94fb41a4df	fix nested field virtual column array column element vector object selector (#14729 ) Fixes a case I missed in #14688 when the return type is STRING but its coming from a top level array typed column instead of a nested array column while making a vector object selector. Also while here I noticed that the internal JSON_VALUE functions for array types were named inconsistently with the non-array functions, so I renamed them. These are not documented so it should not be disruptive in any way, since they are only used internally for rewrites while planning to make the correctly virtual column. JSON_VALUE_RETURNING_ARRAY_VARCHAR -> JSON_VALUE_ARRAY_VARCHAR JSON_VALUE_RETURNING_ARRAY_BIGINT -> JSON_VALUE_ARRAY_BIGINT JSON_VALUE_RETURNING_ARRAY_DOUBLE -> JSON_VALUE_ARRAY_DOUBLE The internal non-array functions are JSON_VALUE_VARCHAR, JSON_VALUE_BIGINT, and JSON_VALUE_DOUBLE.	2023-08-02 17:08:24 +05:30
Kashif Faraz	10328c0743	Rename metadatacache and serverview metrics (#14716 )	2023-08-01 14:18:20 +05:30
Clint Wylie	5f72f4f37d	fixes for nested virtual column array element vector selectors and fixes for variant and nested variant numeric columns * fix issue with nested virtual column array element vector selectors when input is numeric array but output is non-numeric * add vector value selector for mixed numeric type variant and nested variant fields, tests	2023-07-28 15:14:29 -07:00
Clint Wylie	d406bafdfc	fix issues with equality and range filters matching double values to long typed inputs (#14654 ) * fix issues with equality and range filters matching double values to long typed inputs * adjust to ensure we never homogenize null, [], and [null] into [null] for expressions on real array columns	2023-07-27 16:01:21 -07:00
Adarsh Sanjeev	6a42a24426	Fix a comment in the Calcite UT testExactCountDistinctWithFilter (#14628 )	2023-07-26 06:32:26 +00:00
Gian Merlino	2f9619a96f	Use OverlordClient for all Overlord RPCs. (#14581 ) * Use OverlordClient for all Overlord RPCs. Continuing the work from #12696, this patch removes HttpIndexingServiceClient and the IndexingService flavor of DruidLeaderClient completely. All remaining usages are migrated to OverlordClient. Supporting changes include: 1) Add a variety of methods to OverlordClient. 2) Update MetadataTaskStorage to skip the complete-task lookup when the caller requests zero completed tasks. This helps performance of the "get active tasks" APIs, which don't want to see complete ones. * Use less forbidden APIs. * Fixes from CI. * Add test coverage. * Two more tests. * Fix test. * Updates from CR. * Remove unthrown exceptions. * Refactor to improve testability and test coverage. * Add isNil tests. * Remove unnecessary "deserialize" methods.	2023-07-24 21:14:27 -07:00
Gian Merlino	c2e6758580	Simplify bounds/range vs selectors/equality logic in SQL planning. (#14619 ) * Simplify bounds/range vs selectors/equality logic in SQL planning. 1) Consolidate duplicate code related to Expressions#buildTimeFloorFilter. 2) Cleaner logic in Expressions#toSimpleLeafFilter: choose bounds vs range filter based solely on plannerContext.isUseBoundsAndSelectors, not also considering rhs kind. Use parsed rhs in both paths (except for numerics in the bound path). 3) Fix ArrayContains, ArrayOverlap to avoid equality filters when there is an extractionFn present. Fixes a bug introduced in #14612. * Avoid sending nonprimitives down the bound path.	2023-07-19 22:40:47 -07:00
Clint Wylie	68fd22169f	remove extractionFn from equality, null, and range filters (#14612 ) * remove extractionFn from equality, null, and range filters changes: * EqualityFilter, NullFilter, and RangeFilter no longer support extractionFn * SQL planner will use ExpressionFilter in the small number of cases where an extractionFn would have been used if sqlUseBoundsAndSelectors is set to false instead of equality/null/range filters * fix bugs and add tests with serde, equals, and cache key for null, equality, and range filters * test coverage fixes bugs * adjust * adjust again * so persnickety	2023-07-19 10:37:57 -07:00
Clint Wylie	913416c669	add equality, null, and range filter (#14542 ) changes: * new filters that preserve match value typing to better handle filtering different column types * sql planner uses new filters by default in sql compatible null handling mode * remove isFilterable from column capabilities * proper handling of array filtering, add array processor to column processors * javadoc for sql test filter functions * range filter support for arrays, tons more tests, fixes * add dimension selector tests for mixed type roots * support json equality * rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes * add cooler equality index maker, ValueIndexes * fix missing string utf8 index supplier * expression array comparator stuff	2023-07-18 12:15:22 -07:00
AmatyaAvadhanula	0412f40d36	Prepare master branch for next release, 28.0.0 (#14595 ) * Prepare master branch for next release, 28.0.0	2023-07-18 09:22:30 +05:30
Laksh Singla	c1c7dff2ad	Using DruidExceptions in MSQ (changes related to the Broker) (#14534 ) MSQ engine returns correct error codes for invalid user inputs in the query context. Also, using DruidExceptions for MSQ related errors happening in the Broker with improved error messages.	2023-07-13 19:08:49 +00:00
Abhishek Radhakrishnan	f4ee58eaa8	Add `aggregatorMergeStrategy` property in SegmentMetadata queries (#14560 ) * Add aggregatorMergeStrategy property to SegmentMetadaQuery. - Adds a new property aggregatorMergeStrategy to segmentMetadata query. aggregatorMergeStrategy currently supports three types of merge strategies - the legacy strict and lenient strategies, and the new latest strategy. - The latest strategy considers the latest aggregator from the latest segment by time order when there's a conflict when merging aggregators from different segments. - Deprecate lenientAggregatorMerge property; The API validates that both the new and old properties are not set, and returns an exception. - When merging segments as part of segmentMetadata query, the segments have a more elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to the name format that segments usually contain. Previously it was simply "merged". - Adjust unit tests to test the latest strategy, to assert the returned complete SegmentAnalysis object instead of just the aggregators for completeness. * Don't explicitly set strict strategy in tests * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/segmentmetadataquery.md * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-07-13 12:37:36 -04:00
imply-cheddar	7650a71d37	Add window query test files from Drill (#14561 )	2023-07-12 20:14:39 -07:00
imply-cheddar	65e1b27aa7	Fix a resource leak with Window processing (#14573 ) * Fix a resource leak with Window processing Additionally, in order to find the leak, there were adjustments to the StupidPool to track leaks a bit better. It would appear that the pool objects get GC'd during testing for some reason which was causing some incorrect identification of leaks from objects that had been returned but were GC'd along with the pool. * Suppress unused warning	2023-07-12 17:25:42 -05:00
Laksh Singla	5ce536355e	Fix planning bug while using sort merge frame processor (#14450 ) sqlJoinAlgorithm is now a hint to the planner to execute the join in the specified manner. The planner can decide to ignore the hint if it deduces that the specified algorithm can be detrimental to the performance of the join beforehand.	2023-07-11 09:58:44 +00:00
Gian Merlino	63ee69b4e8	Claim full support for Java 17. (#14384 ) * Claim full support for Java 17. No production code has changed, except the startup scripts. Changes: 1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK. 2) Include the full list of opens and exports on both Java 11 and 17. 3) Document that Java 17 is both supported and preferred. 4) Switch some tests from Java 11 to 17 to get better coverage on the preferred version. * Doc update. * Update errorprone. * Update docker_build_containers.sh. * Update errorprone in licenses.yaml. * Add some more run-javas. * Additional run-javas. * Update errorprone. * Suppress new errorprone error. * Add exports and opens in ForkingTaskRunner for Java 11+. Test, doc changes. * Additional errorprone updates. * Update for errorprone. * Restore old fomatting in LdapCredentialsValidator. * Copy bin/ too. * Fix Java 15, 17 build line in docker_build_containers.sh. * Update busybox image. * One more java command. * Fix interpolation. * IT commandline refinements. * Switch to busybox 1.34.1-glibc. * POM adjustments, build and test one IT on 17. * Additional debugging. * Fix silly thing. * Adjust command line. * Add exports and opens one more place. * Additional harmonization of strong encapsulation parameters.	2023-07-07 12:52:35 -07:00
Gian Merlino	dd78e00dc5	Fix ColumnSignature error message and jdk17 test issue. (#14538 ) * Fix ColumnSignature error message and jdk17 test issue. On jdk17, the "problem" part of the error message could change from NullPointerException to: Cannot invoke "String.length()" because "s" is null Due to the new more-helpful NPEs in Java 17. This broke the expectation and led to test failures on this case. This patch fixes the problem by improving the error message so it isn't a generic NullPointerException. * Fix format.	2023-07-06 15:10:59 -07:00
Abhishek Radhakrishnan	d02bb8bb6e	Set explain attributes after the query is prepared (#14490 ) * Add support for DML WITH AS. * One more UT for with as subquery. * Add a test with join query * Use root query prepared node instead of individual SqlNode types. - Set the explain plan attributes after the query is prepared when the query is planned and we've the finalized output names in the root source rel node. - Adjust tests; add unit test for negative ordinal case. - Remove the exception / error handling logic from resolveClusteredBy function since the validations now happen before it comes to the function * Update comment.	2023-07-06 14:13:32 -04:00
imply-cheddar	5fc122a144	Add window-focused tests from Drill (#13773 ) This commit borrows some test definitions from Drill's test suite and tries to use them to flesh out the full validation of window function capbilities. In order to be able to run these tests, we also add the ability to run a Scan operation against segments, which also meant an implementation of RowsAndColumns for frames.	2023-07-06 09:20:32 -07:00
Soumyava	78db7a4414	A query in MSQ would issue wrong error code (#14531 ) with a RuntimeException. Now the RuntimeException is being replaced by an user facing DruidException of Invalid category which would allow calcite not to throw an uncategorized exception.	2023-07-06 08:59:35 +05:30
Jonathan Wei	f29a9faa94	Better surfacing of invalid pattern errors for SQL REGEXP_EXTRACT function (#14505 )	2023-07-05 17:12:54 -05:00
Pranav	2d5b27358e	Logging the fieldName in the coerce exceptions (#14483 ) Logging the fieldName in the coerce exceptions	2023-07-03 14:13:27 +05:30
Gian Merlino	e10e35aa2c	Add REGEXP_REPLACE function. (#14460 ) * Add REGEXP_REPLACE function. Replaces all instances of a pattern with a replacement string. * Fixes. * Improve test coverage. * Adjust behavior.	2023-06-29 13:47:57 -07:00
Gian Merlino	a6cabbe10f	SQL: Avoid "intervals" for non-table-based datasources. (#14336 ) In these other cases, stick to plain "filter". This simplifies lots of logic downstream, and doesn't hurt since we don't have intervals-specific optimizations outside of tables. Fixes an issue where we couldn't properly filter on a column from an external datasource if it was named __time.	2023-06-29 09:57:11 +05:30
Gian Merlino	34c55a0bde	SQL: SUBSTRING support for non-literals. (#14480 ) * SQL: SUBSTRING support for non-literals. * Fix AssertionError test. * Fix header.	2023-06-28 13:43:05 -07:00
Jonathan Wei	c36f12f1d8	Support complex variance object inputs for variance SQL agg function (#14463 ) * Support complex variance object inputs for variance SQL agg function * Add test * Include complexTypeChecker, address PR comments * Checkstyle, javadoc link	2023-06-28 13:14:19 -05:00
Karan Kumar	cb3a9d2b57	Adding Interactive API's for MSQ engine (#14416 ) This PR aims to expose a new API called "@path("/druid/v2/sql/statements/")" which takes the same payload as the current "/druid/v2/sql" endpoint and allows users to fetch results in an async manner.	2023-06-28 17:51:58 +05:30
Gian Merlino	c78d885b80	Cache parsed expressions and binding analysis in more places. (#14124 ) * Cache parsed expressions and binding analysis in more places. Main changes: 1) Cache parsed and analyzed expressions within PlannerContext for a single SQL query. 2) Cache parsed expressions together with input binding analysis using a new class AnalyzeExpr. This speeds up SQL planning, because SQL planning involves parsing analyzing the same expression strings over and over again. * Fixes. * Fix style. * Fix test. * Simplify: get rid of AnalyzedExpr, focus on caching. * Rename parse -> parseExpression.	2023-06-27 13:40:35 -07:00
Clint Wylie	6ba10c8b6c	fix bug with json_value expression array extraction (#14461 )	2023-06-26 21:02:44 -07:00
Abhishek Radhakrishnan	903addf7c2	Make agg and scalar routines test to depend on specific routine names. (#14482 )	2023-06-26 23:03:08 -04:00
Abhishek Radhakrishnan	79bff4bbf7	Improvements to `EXPLAIN PLAN` attributes (#14441 ) * Updates: use the target table directly, sanitized replace time chunks and clustered by cols. * Add DruidSqlParserUtil and tests. * minor refactor * Use SqlUtil.isLiteral * Throw ValidationException if CLUSTERED BY column descending order is specified. - Fails query planning * Some more tests. * fixup existing comment * Update comment * checkstyle fix: remove unused imports * Remove InsertCannotOrderByDescendingFault and deprecate the fault in readme. * minor naming * move deprecated field to the bottom * update docs. * add one more example. * Collapsible query and result * checkstyle fixes * Code cleanup * order by changes * conditionally set attributes only for explain queries. * Cleaner ordinal check. * Add limit test and update javadoc. * Commentary and minor adjustments. * Checkstyle fixes. * One more checkArg. * add unexpected kind to exception.	2023-06-26 23:01:11 -04:00
Laksh Singla	1647d5f4a0	Limit the subquery results by memory usage (#13952 ) Users can now add a guardrail to prevent subquery’s results from exceeding the set number of bytes by setting druid.server.http.maxSubqueryRows in Broker's config or maxSubqueryRows in the query context. This feature is experimental for now and would default back to row-based limiting in case it fails to get the accurate size of the results consumed by the query.	2023-06-26 18:12:28 +05:30
Gian Merlino	d7c9c2f367	SqlResults: Coerce arrays to lists for VARCHAR. (#14260 ) * SqlResults: Coerce arrays to lists for VARCHAR. Useful for STRING_TO_MV, which returns VARCHAR at the SQL layer and an ExprEval with String[] at the native layer. * Fix style. * Improve test coverage. * Remove unnecessary throws.	2023-06-25 09:35:18 -07:00
Gian Merlino	3d19b748fb	SQL OperatorConversions: Introduce.aggregatorBuilder, allow CAST-as-literal. (#14249 ) * SQL OperatorConversions: Introduce.aggregatorBuilder, allow CAST-as-literal. Four main changes: 1) Provide aggregatorBuilder, a more consistent way of defining the SqlAggFunction we need for all of our SQL aggregators. The mechanism is analogous to the one we already use for SQL functions (OperatorConversions.operatorBuilder). 2) Allow CASTs of constants to be considered as "literalOperands". This fixes an issue where various of our operators are defined with OperandTypes.LITERAL as part of their checkers, which doesn't allow casts. However, in these cases we generally _do_ want to allow casts. The important piece is that the value must be reducible to a constant, not that the SQL text is literally a literal. 3) Update DataSketches SQL aggregators to use the new aggregatorBuilder functionality. The main user-visible effect here is [2]: the aggregators would now accept, for example, "CAST(0.99 AS DOUBLE)" as a literal argument. Other aggregators could be updated in a future patch. 4) Rename "requiredOperands" to "requiredOperandCount", because the old name was confusing. (It rhymes with "literalOperands" but the arguments mean different things.) * Adjust method calls.	2023-06-23 16:25:04 -07:00
Rishabh Singh	155fde33ff	Add metrics to SegmentMetadataCache refresh (#14453 ) New metrics: - `segment/metadatacache/refresh/time`: time taken to refresh segments per datasource - `segment/metadatacache/refresh/count`: number of segments being refreshed per datasource	2023-06-23 16:51:08 +05:30
Rohan Garg	09d6c5a45e	Decouple logical planning and native query generation in SQL planning (#14232 ) Add a new planning strategy that explicitly decouples the DAG from building the native query. With this mode, it is Calcite's job to generate a "logical DAG" which is all of the various DruidProject, DruidFilter, etc. nodes. We then take those nodes and use them to build a native query. The current commit doesn't pass all tests, but it does work for some things and is a decent starting baseline.	2023-06-19 16:00:40 -07:00
imply-cheddar	cfd07a95b7	Errors take 3 (#14004 ) Introduce DruidException, an exception whose goal in life is to be delivered to a user. DruidException itself has javadoc on it to describe how it should be used. This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used. This work was a 3rd iteration on top of work that was started by Paul Rogers. I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.	2023-06-19 01:11:13 -07:00
Adarsh Sanjeev	128133fadc	Add column replication_factor column to sys.segments table (#14403 ) Description: Druid allows a configuration of load rules that may cause a used segment to not be loaded on any historical. This status is not tracked in the sys.segments table on the broker, which makes it difficult to determine if the unavailability of a segment is expected and if we should not wait for it to be loaded on a server after ingestion has finished. Changes: - Track replication factor in `SegmentReplicantLookup` during evaluation of load rules - Update API `/druid/coordinator/v1metadata/segments` to return replication factor - Add column `replication_factor` to the sys.segments virtual table and populate it in `MetadataSegmentView` - If this column is 0, the segment is not assigned to any historical and will not be loaded.	2023-06-18 10:02:21 +05:30
Abhishek Radhakrishnan	04fb75719e	Fail query planning if a `CLUSTERED BY` column contains descending order (#14436 ) * Throw ValidationException if CLUSTERED BY column descending order is specified. - Fails query planning * Some more tests. * fixup existing comment * Update comment * checkstyle fix: remove unused imports * Remove InsertCannotOrderByDescendingFault and deprecate the fault in readme. * move deprecated field to the bottom	2023-06-16 18:10:12 -04:00
Clint Wylie	359bd63cc9	allow expression "best effort" type determination to better handle mixed type arrays (#14438 )	2023-06-16 00:02:43 -07:00
Clint Wylie	8454cc619a	auto columns fixes (#14422 ) changes: * auto columns no longer participate in generic 'null column' handling, this was a mistake to try to support and caused ingestion failures due to mismatched ColumnFormat, and will be replaced in the future with nested common format constant column functionality (not in this PR) * fix bugs with auto columns which contain empty objects, empty arrays, or primitive types mixed with either of these empty constructs * fix bug with bound filter when upper is null equivalent but is strict	2023-06-14 08:57:06 -07:00
Abhishek Radhakrishnan	b8495d45a1	Expose Druid functions in `INFORMATION_SCHEMA.ROUTINES` table. (#14378 ) * Add INFORMATION_SCHEMA.ROUTINES to expose Druid operators and functions. * checkstyle * remove IS_DETERMISITIC. * test * cleanup test * remove logs and simplify * fixup unit test * Add docs for INFORMATION_SCHEMA.ROUTINES table. * Update test and add another SQL query. * add stuff to .spelling and checkstyle fix. * Add more tests for custom operators. * checkstyle and comment. * Some naming cleanup. * Add FUNCTION_ID * The different Calcite function syntax enums get translated to FUNCTION * Update docs. * Cleanup markdown table. * fixup test. * fixup intellij inspection * Review comment: nullable column; add a function to determine function syntax. * More tests; add non-function syntax operators. * More unit tests. Also add a separate test for DruidOperatorTable. * actually just validate non-zero count. * switch up the order * checkstyle fixes.	2023-06-13 15:44:04 -04:00
Abhishek Radhakrishnan	326f2c5020	Add more statement attributes to explain plan result. (#14391 ) This PR adds the following to the ATTRIBUTES column in the explain plan output: - partitionedBy - clusteredBy - replaceTimeChunks This PR leverages the work done in #14074, which added a new column ATTRIBUTES to encapsulate all the statement-related attributes.	2023-06-12 19:18:02 +05:30
Abhishek Radhakrishnan	2d258a95ad	Fix `EARLIEST_BY`/`LATEST_BY` signature and include function name in signature. (#14352 ) * Fix EarliestLatestBySqlAggregator signature; Include function name for all signatures. * Single quote function signatures, space between args and remove \n. * fixup UT assertion	2023-06-06 09:41:05 -07:00
zachjsh	04a82da63d	Input source security fixes (#14266 ) It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.	2023-06-01 16:37:19 -07:00
Clint Wylie	4096f51f0b	add configurable ColumnTypeMergePolicy to SegmentMetadataCache (#14319 ) This PR adds a new interface to control how SegmentMetadataCache chooses ColumnType when faced with differences between segments for SQL schemas which are computed, exposed as druid.sql.planner.metadataColumnTypeMergePolicy and adds a new 'least restrictive type' mode to allow choosing the type that data across all segments can best be coerced into and sets this as the default behavior. This is a behavior change around when segment driven schema migrations take effect for the SQL schema. With latestInterval, the SQL schema will be updated as soon as the first job with the new schema has published segments, while using leastRestrictive, the schema will only be updated once all segments are reindexed to the new type. The benefit of leastRestrictive is that it eliminates a bunch of type coercion errors that can happen in SQL when types are varied across segments with latestInterval because the newest type is not able to correctly represent older data, such as if the segments have a mix of ARRAY and number types, or any other combinations that lead to odd query plans.	2023-05-24 20:32:51 +05:30
Abhishek Radhakrishnan	338bdb35ea	Return `RESOURCES` in `EXPLAIN PLAN` as an ordered collection (#14323 ) * Make resources an ordered collection so it's deterministic. * test cleanup * fixup docs. * Replace deprecated ObjectNode#put() calls with ObjectNode#set().	2023-05-23 00:55:00 -05:00
Clint Wylie	d92b9fbfac	more resilient segment metadata, dont parallel merge internal segment metadata queries (#14296 )	2023-05-17 04:12:55 -07:00
Paul Rogers	3c0983c8e9	Extend the IT framework to allow tests in extensions (#13877 ) The "new" IT framework provides a convenient way to package and run integration tests (ITs), but only for core modules. We have a use case to run an IT for a contrib extension: the proposed gRPC query extension. This PR provides the IT framework functionality to allow non-core ITs.	2023-05-15 20:29:51 +05:30
imply-cheddar	f9861808bc	Be able to load segments on Peons (#14239 ) * Be able to load segments on Peons This change introduces a new config on WorkerConfig that indicates how many bytes of each storage location to use for storage of a task. Said config is divided up amongst the locations and slots and then used to set TaskConfig.tmpStorageBytesPerTask The Peons use their local task dir and tmpStorageBytesPerTask as their StorageLocations for the SegmentManager such that they can accept broadcast segments.	2023-05-12 16:51:00 -07:00
Soumyava	f128b9b666	Updates to filter processing for inner query in Joins (#14237 )	2023-05-11 17:21:41 +05:30
Clint Wylie	a58cebe491	add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays (#14236 )	2023-05-11 04:43:28 -07:00
Clint Wylie	8805d8d7db	fix issues with filtering nulls on values coerced to numeric types (#14139 ) * fix issues with filtering nulls on values coerced to numeric types * fix issues with 'auto' type numeric columns in default value mode * optimize variant typed columns without nested data * more tests for 'auto' type column ingestion	2023-05-08 13:19:02 -07:00
Rohan Garg	4d8feeb279	Fix planning in CASE expressions with complex WHEN and ELSE expressions (#14220 )	2023-05-08 11:35:04 +05:30
zachjsh	48cde236c4	Add columnMappings to explain plan output (#14187 ) * Add columnMappings to explain plan output * * fix checkstyle * add tests * * improve test coverage * * temporarily remove unit-test need to run ITs * * depend on build * * temporarily lower unit test threshold * * add back dependency on unit-tests * * add license headers * * fix header order * * review comments * * fix intellij inspection errors * * revert code coverage change	2023-05-04 10:36:28 -07:00
Gian Merlino	42c8c84eb6	TimeBoundary: Use cursor when datasource is not a regular table. (#14151 ) * TimeBoundary: Use cursor when datasource is not a regular table. Fixes a bug where TimeBoundary could return incorrect results with INNER Join or inline data. * Addl Javadocs.	2023-04-26 17:00:13 -07:00
Gian Merlino	89e7948159	MSQ: Subclass CalciteJoinQueryTest, other supporting changes. (#14105 ) * MSQ: Subclass CalciteJoinQueryTest, other supporting changes. The main change is the new tests: we now subclass CalciteJoinQueryTest in CalciteSelectJoinQueryMSQTest twice, once for Broadcast and once for SortMerge. Two supporting production changes for default-value mode: 1) InputNumberDataSource is marked as concrete, to allow leftFilter to be pushed down to it. 2) In default-value mode, numeric frame field readers can now return nulls. This is necessary when stacking joins on top of joins: nulls must be preserved for semantics that match broadcast joins and native queries. 3) In default-value mode, StringFieldReader.isNull returns true on empty strings in addition to nulls. This is more consistent with the behavior of the selectors, which map empty strings to null as well in that mode. As an effect of change (2), the InsertTimeNull change from #14020 (to replace null timestamps with default timestamps) is reverted. IMO, this is fine, as either behavior is defensible, and the change from #14020 hasn't been released yet. * Adjust tests. * Style fix. * Additional tests.	2023-04-25 12:10:23 -07:00
Gian Merlino	f643abdad9	SQL planning: Consider subqueries in fewer scenarios. (#14123 ) * SQL planning: Consider subqueries in fewer scenarios. Further adjusts logic in DruidRules that was previously adjusted in #13902. The reason for the original change was that the comment "Subquery must be a groupBy, so stage must be >= AGGREGATE" was no longer accurate. Subqueries do not need to be groupBy anymore; they can really be any type of query. If I recall correctly, the change was needed for certain window queries to be able to plan on top of Scan queries. However, this impacts performance negatively, because it causes many additional outer-query scenarios to be considered, which is expensive. So, this patch updates the matching logic to consider fewer scenarios. The skipped scenarios are ones where we expect that, for one reason or another, it isn't necessary to consider a subquery. * Remove unnecessary escaping. * Fix test.	2023-04-21 08:32:13 -07:00
Soumyava	8d60edcfcb	Updating segment map function for QueryDataSource to ensure group by … (#14112 ) * Updating segment map function for QueryDataSource to ensure group by of group by of join data source gets into proper segment map function path * Adding unit tests for the failed case * There you go coverage bot, be happy now	2023-04-20 13:22:29 -07:00
zachjsh	04da0102cb	KillTask should return empty inputSource resources (#14106 ) ### Description This pr fixes a few bugs found with the inputSource security feature. 1. `KillUnusedSegmentsTask` previously had no definition for the `getInputSourceResources`, which caused an unsupportedOperationException to be thrown when this task type was submitted with the inputSource security feature enabled. This task type should not require any input source specific resources, so returning an empty set for this task type now. 2. Fixed a bug where when the input source type security feature is enabled, all of the input source type specific resources used where authenticated against: `{"resource": {"name": "EXTERNAL", "type": "{INPUT_SOURCE_TYPE}"}, "action": "READ"}` When they should be instead authenticated against: `{"resource": {"name": "{INPUT_SOURCE_TYPE}", "type": "EXTERNAL"}, "action": "READ"}` 3. fixed bug where supervisor tasks were not authenticated against the specific input source types used, if input source security feature was enabled.	2023-04-18 15:27:16 -04:00
Clint Wylie	e7d2e8b914	fix bug filtering nested columns with expression filters (#14096 )	2023-04-17 14:21:32 -07:00
Abhishek Radhakrishnan	c98c66558f	Include statement attributes in `EXPLAIN PLAN` output (#14074 ) This commit adds attributes that contain metadata information about the query in the EXPLAIN PLAN output. The attributes currently contain two items: - `statementTyp`: SELECT, INSERT or REPLACE - `targetDataSource`: provides the target datasource name for DML statements It is added to both the legacy and native query plan outputs.	2023-04-17 21:00:25 +05:30
Gian Merlino	a8eb3f2f57	SQL: Fix natural comparator selection for groupBy. (#14075 ) * SQL: Fix natural comparator selection for groupBy. DruidQuery.computeSorting had some unique logic for finding natural comparators for SQL types. It should be using getStringComparatorForRelDataType instead. One good effect here is that the comparator for BOOLEAN is now NUMERIC rather than LEXICOGRAPHIC. The test case illustrates this. * Remove msqCompatible, for now. * Fix test.	2023-04-15 07:14:43 +05:30
Gian Merlino	eeed5ed7e2	MSQ: Use the same result coercion routines as the regular SQL endpoint. (#14046 ) * MSQ: Use the same result coercion routines as the regular SQL endpoint. The main changes are to move NativeQueryMaker.coerce to SqlResults, and to formally make the list of sqlTypeNames from the MSQ results reports use SqlTypeNames. - Change the default to MSQ-compatible rather than MSQ-incompatible. The explicit marker function is now "notMsqCompatible()".	2023-04-15 06:56:23 +05:30
Gian Merlino	0884a22c41	MSQ: Support for querying lookup and inline data directly. (#14048 ) * MSQ: Support for querying lookup and inline data directly. Main changes: 1) Add of LookupInputSpec and DataSourcePlan.forLookup. 2) Add InlineInputSpec, and modify of DataSourcePlan.forInline to use this instead of an ExternalInputSpec with JSON. This allows the inline data to act as the right-hand side of a join, if needed. Supporting changes: 1) Modify JoinDataSource's leftFilter validation to be a little less strict: it's now OK with leftFilter being attached to any concrete leaf (no children) datasource, rather than requiring it be a table. This allows MSQ to create JoinDataSource with InputNumberDataSource as the base. 2) Add SegmentWranglerModule to CliIndexer, CliPeon. This allows them to query lookups and inline data directly. * Updates based on CI. * Additional tests. * Style fix. * Remove unused import.	2023-04-14 14:04:02 -07:00
Atul Mohan	e3c160f2f2	Add start_time column to sys.servers (#13358 ) Adds a new column start_time to sys.servers that captures the time at which the server was added to the cluster.	2023-04-14 15:23:34 +05:30
zachjsh	2e87b5a901	Input source security sql layer can handle input source with multiple types (#14050 ) ### Description This change allows for input sources used during MSQ ingestion to be authorized for multiple input source types, instead of just 1. Such an input source that allows for multiple types is the CombiningInputSource. Also fixed bug that caused some input source specific functions to be authorized against the permissions ` [ new ResourceAction(new Resource(ResourceType.EXTERNAL, ResourceType.EXTERNAL), Action.READ), new ResourceAction(new Resource(ResourceType.EXTERNAL, {input_source_type}), Action.READ) ] ` when the inputSource based authorization feature is enabled, when it should instead be authorized against ` [ new ResourceAction(new Resource(ResourceType.EXTERNAL, {input_source_type}), Action.READ) ] `	2023-04-10 09:48:57 -04:00
Clint Wylie	1aef72aa7e	Bump up the version in pom to 27.0.0 in preparation of release (#14051 )	2023-04-10 14:56:59 +05:30
Gian Merlino	d52bc333aa	Frames: Ensure nulls are read as default values when appropriate. (#14020 ) * Frames: Ensure nulls are read as default values when appropriate. Fixes a bug where LongFieldWriter didn't write a properly transformed zero when writing out a null. This had no meaningful effect in SQL-compatible null handling mode, because the field would get treated as a null anyway. But it does have an effect in default-value mode: it would cause Long.MIN_VALUE to get read out instead of zero. Also adds NullHandling checks to the various frame-based column selectors, allowing reading of nullable frames by servers in default-value mode.	2023-04-10 05:28:46 +05:30
zachjsh	5c0221375c	Allow for Input source security in native task layer (#14003 ) Fixes #13837. ### Description This change allows for input source type security in the native task layer. To enable this feature, the user must set the following property to true: `druid.auth.enableInputSourceSecurity=true` The default value for this property is false, which will continue the existing functionality of needing authorization to write to the respective datasource. When this config is enabled, the users will be required to be authorized for the following resource action, in addition to write permission on the respective datasource. `new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ` where `{INPUT_SOURCE_TYPE}` is the type of the input source being used;, http, inline, s3, etc.. Only tasks that provide a non-default implementation of the `getInputSourceResources` method can be submitted when config `druid.auth.enableInputSourceSecurity=true` is set. Otherwise, a 400 error will be thrown.	2023-04-06 13:13:09 -04:00
Clint Wylie	b11c0bc249	smarter nested column index utilization (#13977 ) * smarter nested column index utilization changes: * adds skipValueRangeIndexScale and skipValuePredicateIndexScale to ColumnConfig (e.g. DruidProcessingConfig) available as system config via druid.processing.indexes.skipValueRangeIndexScale and druid.processing.indexes.skipValuePredicateIndexScale * NestedColumnIndexSupplier uses skipValueRangeIndexScale and skipValuePredicateIndexScale to multiply by the total number of rows to be processed to determine the threshold at which we should no longer consider using bitmap indexes because it will be too many operations * Default values for skipValueRangeIndexScale and skipValuePredicateIndexScale have been initially set to 0.08, but are separate to allow independent tuning * these are not documented on purpose yet because they are kind of hard to explain, the mainly exist to help conduct larger scale experiments than the jmh benchmarks used to derive the initial set of values * these changes provide a pretty sweet performance boost for filter processing on nested columns	2023-04-06 04:09:24 -07:00
Paul Rogers	030ed911d4	Temporarily revert extended table functions for Druid 26 (#14019 )	2023-04-05 21:09:33 -07:00
Gian Merlino	319f99db05	Always use file sizes when determining batch ingest splits (#13955 ) * Always use file sizes when determining batch ingest splits. Main changes: 1) Update CloudObjectInputSource and its subclasses (S3, GCS, Azure, Aliyun OSS) to use SplitHintSpecs in all cases. Previously, they were only used for prefixes, not uris or objects. 2) Update ExternalInputSpecSlicer (MSQ) to consider file size. Previously, file size was ignored; all files were treated as equal weight when determining splits. A side effect of these changes is that we'll make additional network calls to find the sizes of objects when users specify URIs or objects as opposed to prefixes. IMO, this is worth it because it's the only way to respect the user's split hint and task assignment settings. Secondary changes: 1) S3, Aliyun OSS: Use getObjectMetadata instead of listObjects to get metadata for a single object. This is a simpler call that is also expected to be less expensive. 2) Azure: Fix a bug where getBlobLength did not populate blob reference attributes, and therefore would not actually retrieve the blob length. 3) MSQ: Align dynamic slicing logic between ExternalInputSpecSlicer and TableInputSpecSlicer. 4) MSQ: Adjust WorkerInputs to ensure there is always at least one worker, even if it has a nil slice. * Add msqCompatible to testGroupByWithImpossibleTimeFilter. * Fix tests. * Add additional tests. * Remove unused stuff. * Remove more unused stuff. * Adjust thresholds. * Remove irrelevant test. * Fix comments. * Fix bug. * Updates.	2023-04-05 08:54:01 -07:00

1 2 3 4 5 ...

853 Commits