druid

Commit Graph

Author	SHA1	Message	Date
Zoltan Haindrich	e76962f453	Use annotation to mark DecoupleIgnore (#15005 )	2023-09-21 12:36:52 +05:30
Laksh Singla	ebb794632a	Allow users with STATE permissions to read and write the state APIs for querying with deep storage (#14944 ) Currently, only the user who has submitted the async query has permission to interact with the status APIs for that async query. However, often we want an administrator to interact with these resources as well. Druid handles these with the STATE resource traditionally, and if the requesting user has necessary permissions on it as well, alternatively, they should be allowed to interact with the status APIs, irrespective of whether they are the submitter of the query.	2023-09-21 06:55:07 +05:30
Pranav	883c2692d2	Adding new function decode_base64_utf8 and expr macro (#14943 ) * Adding new function decode_base64_utf8 and expr macro * using BaseScalarUnivariateMacroFunctionExpr * Print stack trace in case of debug in ChainedExecutionQueryRunner * fix static check	2023-09-20 17:06:34 -07:00
Gian Merlino	823f620ede	Add IS [NOT] DISTINCT FROM to SQL and join matchers. (#14976 ) * Add IS [NOT] DISTINCT FROM to SQL and join matchers. Changes: 1) Add "isdistinctfrom" and "notdistinctfrom" native expressions. 2) Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions when generating expressions, and is treated the same as equals and not-equals when generating native filters on literals. 3) Update join matchers to have an "includeNull" parameter that determines whether we are operating in "equals" mode or "is not distinct from" mode. * Main changes: - Add ARRAY handling to "notdistinctfrom" and "isdistinctfrom". - Include null in pushed-down filters when using "notdistinctfrom" in a join. Other changes: - Adjust join filter analyzer to more explicitly use InDimFilter's ValuesSets, relying less on remembering to get it right to avoid copies. * Remove unused "wrap" method. * Fixes. * Remove methods we do not need. * Fix bug with INPUT_REF.	2023-09-20 10:44:32 -07:00
Zoltan Haindrich	e8773f4d0f	Enable already passing tests in DecoupledPlanningCalciteQueryTest (#14996 )	2023-09-20 15:42:52 +05:30
Gian Merlino	4f498e6469	SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978 ) * SQL: Plan non-equijoin conditions as cross join followed by filter. Druid has previously refused to execute joins with non-equality-based conditions. This was well-intentioned: the idea was to push people to write their queries in a different, hopefully more performant way. But as we're moving towards fuller SQL support, it makes more sense to allow these conditions to go through with the best plan we can come up with: a cross join followed by a filter. In some cases this will allow the query to run, and people will be happy with that. In other cases, it will run into resource limits during execution. But we should at least give the query a chance. This patch also updates the documentation to explain how people can tell whether their queries are being planned this way. * cartesian is a word. * Adjust tests. * Update docs/querying/datasource.md Co-authored-by: Benedict Jin <asdf2014@apache.org> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2023-09-19 10:23:42 -07:00
Soumyava	279b3818f0	Make Unnest work with nullif operator (#14993 ) This is due to the recursive filter creation in unnest storage adapter not performing correctly in case of an empty children. This PR addresses the issue	2023-09-15 09:54:14 +05:30
Gian Merlino	3ae5e97801	Add IS [NOT] TRUE, IS [NOT] FALSE native functions. (#14977 ) They are not quite the same as "x == true", "x != true", etc. These functions never return null, even when "x" itself is null.	2023-09-14 09:19:09 -07:00
Soumyava	7bbefd5741	Updating version in from.ftl (#14982 )	2023-09-14 05:11:36 +00:00
Soumyava	bf99d2c7b2	Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (#14924 ) * Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly * Fixing a failed test * Updating numericNilAgg * Moving to use default values in case of nil agg * Adding the same for first agg * Fixing a test * fixing vectorized string agg for last/first with cast if numeric * Updating tests to remove mockito and cover the case of string first/last on non string columns * Updating a test to vectorize * Addressing review comments: Name change to NilVectorAggregator and using static variables now * fixing intellij inspections	2023-09-13 13:15:14 -07:00
Laksh Singla	4c57504960	Fix the uncaught exceptions when materializing results as frames (#14970 ) When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code). This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.	2023-09-13 15:41:28 +05:30
Clint Wylie	891f0a3fe9	longer compatibility window for nested column format v4 (#14955 ) changes: * add back nested column v4 serializers * 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs * add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'	2023-09-12 14:07:53 -07:00
Zoltan Haindrich	5d16d0edf0	Count distinct returned incorrect results without useApproximateCountDistinct (#14748 ) * fix grouping engine handling of summaries when result set is empty	2023-09-12 13:57:54 -07:00
Clint Wylie	5cecf6ce8f	fix issue with segment metadata cache and complex types when doing out of order upgrades from 0.22 (#14948 )	2023-09-12 10:54:35 +08:00
Suneet Saldanha	757603a773	Set task location as k8sPodName for mm-less ingestion (#14959 ) * Set task location as k8sPodName for mm-less ingestion * tests	2023-09-11 19:44:26 -07:00
Zoltan Haindrich	699893bcff	Fix StringLastAggregatorFactory equals/toString (#14907 ) * update test * update test * format * test * fix0 * Revert "fix0" This reverts commit `44992cb393`. * ok resultset * add plan * update test * before rewind * test * fix toString/compare/test * move test * add timeColumn to hashCode	2023-09-08 09:20:54 -07:00
Soumyava	a8fa979115	Unnest dont push down not (#14942 ) * Not pushing down not filters * New test case * Updating tests * Removing a stale comment	2023-09-06 08:57:03 -07:00
Zoltan Haindrich	23308c050d	Remove DruidAggregateCaseToFilterRule (#14940 ) The issue due to which the custom rule was added has been fixed as a part of https://issues.apache.org/jira/browse/CALCITE-3763 and accommodated during Calcite upgrade	2023-09-06 19:11:58 +05:30
Laksh Singla	6ee0b06e38	Auto configuration for maxSubqueryBytes (#14808 ) A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.	2023-09-06 05:47:19 +00:00
Soumyava	8088a763a6	Vectorize earliest aggregator for both numeric and string types (#14408 ) * Vectorizing earliest for numeric * Vectorizing earliest string aggregator * checkstyle fix * Removing unnecessary exceptions * Ignoring tests in MSQ as earliest is not supported for numeric there * Fixing benchmarks * Updating tests as MSQ does not support earliest for some cases * Addressing review comments by adding the following: 1. Checking capabilities first before creating selectors 2. Removing mockito in tests for numeric first aggs 3. Removing unnecessary tests * Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string * Adding a flag for multi value dimension selector * Addressing comments * 1 more change * Handling review comments part 1 * Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order * Updating numeric first vector agg * Revert "Updating numeric first vector agg" This reverts commit `4291709901`. * Updating code for correctness issues * fixing an issue with latest agg * Adding more comments and removing an unnecessary check * Addressing null checks for tie selector and only vectorize false for quantile sketches	2023-09-05 08:41:42 -07:00
Kashif Faraz	7f26b80e21	Simplify ServiceMetricEvent.Builder (#14933 ) Changes: - Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent> and thus convert it to a plain builder rather than a builder of builder. - Add methods setCreatedTime , setMetricAndValue to the builder	2023-09-01 11:30:45 +05:30
Zoltan Haindrich	e806d09309	Allow EARLIEST/EARLIEST_BY/LATEST/LATEST_BY for STRING columns without specifying maxStringBytes (#14848 )	2023-08-22 22:50:19 -07:00
Zoltan Haindrich	b9a33949fd	Fix aggregation filter expression processing in the absense of projection (#14893 ) * test * fix * add 33 test * crap * Revert "crap" This reverts commit `2751198deb`. * cleanup test * celanup * rename test	2023-08-22 10:17:14 -07:00
Zoltan Haindrich	14c1aff150	Fix error messages relating to OVERWRITE keyword (#14870 ) OVERWRITE should not be a fully reserved keyword	2023-08-22 16:17:49 +05:30
Clint Wylie	194a9c9abc	set druid.expressions.useStrictBooleans to true by default (#14734 )	2023-08-22 00:19:56 -07:00
Clint Wylie	6b14dde50e	deprecate config-magic in favor of json configuration stuff (#14695 ) * json config based processing and broker merge configs to deprecate config-magic	2023-08-16 18:23:57 -07:00
Pranav	26d82fd342	fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite (#14587 )	2023-08-16 17:57:16 -07:00
Rishabh Singh	0dc305f9e4	Upgrade hibernate validator version to fix CVE-2019-10219 (#14757 )	2023-08-14 11:50:51 +05:30
Soumyava	afe22907a5	Calcite upgrade 1.35 (#14510 ) * Update to Calcite 1.35.0 * Update from.ftl for Calcite 1.35.0. * Fixed tests in Calcite upgrade by doing the following: 1. Added a new rule, CoreRules.PROJECT_FILTER_TRANSPOSE_WHOLE_PROJECT_EXPRESSIONS, to Base rules 2. Refactored the CorrelateUnnestRule 3. Updated CorrelateUnnestRel accordingly 4. Fixed a case with selector filters on the left where Calcite was eliding the virtual column 5. Additional test cases for fixes in 2,3,4 6. Update to StringListAggregator to fail a query if separators are not propagated appropriately * Refactored for testcases to pass after the upgrade, introduced 2 new data sources for handling filters and select projects * Added a literalSqlAggregator as the upgraded Calcite involved changes to subquery remove rule. This corrected plans for 2 queries with joins and subqueries by replacing an useless literal dimension with a post agg. Additionally a test with COUNT DISTINCT and FILTER which was failing with Calcite 1.21 is added here which passes with 1.35 * Updated to latest avatica and updated code as SqlUnknownTimeStamp is now used in Calcite which needs to be resolved to a timestamp literal * Added a wrapper segment ref to use for unnest and filter segment reference	2023-08-11 12:47:16 -07:00
Adarsh Sanjeev	56ab81f381	Add support for different result formats to MSQ SqlStatementResource (#14571 ) * Add support for different result format * Add tests * Add tests * Fix checkstyle * Remove changes to destination * Removed some unwanted code * Address review comments * Rename parameter * Fix tests	2023-08-07 20:48:59 +05:30
Soumyava	0d73480c8f	Latest aggregator factories should accept time as VectorValueSelecto… (#14753 ) Fix the queries that have latest aggregator with an expression as time column	2023-08-04 13:04:25 +05:30
Clint Wylie	94fb41a4df	fix nested field virtual column array column element vector object selector (#14729 ) Fixes a case I missed in #14688 when the return type is STRING but its coming from a top level array typed column instead of a nested array column while making a vector object selector. Also while here I noticed that the internal JSON_VALUE functions for array types were named inconsistently with the non-array functions, so I renamed them. These are not documented so it should not be disruptive in any way, since they are only used internally for rewrites while planning to make the correctly virtual column. JSON_VALUE_RETURNING_ARRAY_VARCHAR -> JSON_VALUE_ARRAY_VARCHAR JSON_VALUE_RETURNING_ARRAY_BIGINT -> JSON_VALUE_ARRAY_BIGINT JSON_VALUE_RETURNING_ARRAY_DOUBLE -> JSON_VALUE_ARRAY_DOUBLE The internal non-array functions are JSON_VALUE_VARCHAR, JSON_VALUE_BIGINT, and JSON_VALUE_DOUBLE.	2023-08-02 17:08:24 +05:30
Kashif Faraz	10328c0743	Rename metadatacache and serverview metrics (#14716 )	2023-08-01 14:18:20 +05:30
Clint Wylie	5f72f4f37d	fixes for nested virtual column array element vector selectors and fixes for variant and nested variant numeric columns * fix issue with nested virtual column array element vector selectors when input is numeric array but output is non-numeric * add vector value selector for mixed numeric type variant and nested variant fields, tests	2023-07-28 15:14:29 -07:00
Clint Wylie	d406bafdfc	fix issues with equality and range filters matching double values to long typed inputs (#14654 ) * fix issues with equality and range filters matching double values to long typed inputs * adjust to ensure we never homogenize null, [], and [null] into [null] for expressions on real array columns	2023-07-27 16:01:21 -07:00
Adarsh Sanjeev	6a42a24426	Fix a comment in the Calcite UT testExactCountDistinctWithFilter (#14628 )	2023-07-26 06:32:26 +00:00
Gian Merlino	2f9619a96f	Use OverlordClient for all Overlord RPCs. (#14581 ) * Use OverlordClient for all Overlord RPCs. Continuing the work from #12696, this patch removes HttpIndexingServiceClient and the IndexingService flavor of DruidLeaderClient completely. All remaining usages are migrated to OverlordClient. Supporting changes include: 1) Add a variety of methods to OverlordClient. 2) Update MetadataTaskStorage to skip the complete-task lookup when the caller requests zero completed tasks. This helps performance of the "get active tasks" APIs, which don't want to see complete ones. * Use less forbidden APIs. * Fixes from CI. * Add test coverage. * Two more tests. * Fix test. * Updates from CR. * Remove unthrown exceptions. * Refactor to improve testability and test coverage. * Add isNil tests. * Remove unnecessary "deserialize" methods.	2023-07-24 21:14:27 -07:00
Gian Merlino	c2e6758580	Simplify bounds/range vs selectors/equality logic in SQL planning. (#14619 ) * Simplify bounds/range vs selectors/equality logic in SQL planning. 1) Consolidate duplicate code related to Expressions#buildTimeFloorFilter. 2) Cleaner logic in Expressions#toSimpleLeafFilter: choose bounds vs range filter based solely on plannerContext.isUseBoundsAndSelectors, not also considering rhs kind. Use parsed rhs in both paths (except for numerics in the bound path). 3) Fix ArrayContains, ArrayOverlap to avoid equality filters when there is an extractionFn present. Fixes a bug introduced in #14612. * Avoid sending nonprimitives down the bound path.	2023-07-19 22:40:47 -07:00
Clint Wylie	68fd22169f	remove extractionFn from equality, null, and range filters (#14612 ) * remove extractionFn from equality, null, and range filters changes: * EqualityFilter, NullFilter, and RangeFilter no longer support extractionFn * SQL planner will use ExpressionFilter in the small number of cases where an extractionFn would have been used if sqlUseBoundsAndSelectors is set to false instead of equality/null/range filters * fix bugs and add tests with serde, equals, and cache key for null, equality, and range filters * test coverage fixes bugs * adjust * adjust again * so persnickety	2023-07-19 10:37:57 -07:00
Clint Wylie	913416c669	add equality, null, and range filter (#14542 ) changes: * new filters that preserve match value typing to better handle filtering different column types * sql planner uses new filters by default in sql compatible null handling mode * remove isFilterable from column capabilities * proper handling of array filtering, add array processor to column processors * javadoc for sql test filter functions * range filter support for arrays, tons more tests, fixes * add dimension selector tests for mixed type roots * support json equality * rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes * add cooler equality index maker, ValueIndexes * fix missing string utf8 index supplier * expression array comparator stuff	2023-07-18 12:15:22 -07:00
AmatyaAvadhanula	0412f40d36	Prepare master branch for next release, 28.0.0 (#14595 ) * Prepare master branch for next release, 28.0.0	2023-07-18 09:22:30 +05:30
Laksh Singla	c1c7dff2ad	Using DruidExceptions in MSQ (changes related to the Broker) (#14534 ) MSQ engine returns correct error codes for invalid user inputs in the query context. Also, using DruidExceptions for MSQ related errors happening in the Broker with improved error messages.	2023-07-13 19:08:49 +00:00
Abhishek Radhakrishnan	f4ee58eaa8	Add `aggregatorMergeStrategy` property in SegmentMetadata queries (#14560 ) * Add aggregatorMergeStrategy property to SegmentMetadaQuery. - Adds a new property aggregatorMergeStrategy to segmentMetadata query. aggregatorMergeStrategy currently supports three types of merge strategies - the legacy strict and lenient strategies, and the new latest strategy. - The latest strategy considers the latest aggregator from the latest segment by time order when there's a conflict when merging aggregators from different segments. - Deprecate lenientAggregatorMerge property; The API validates that both the new and old properties are not set, and returns an exception. - When merging segments as part of segmentMetadata query, the segments have a more elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to the name format that segments usually contain. Previously it was simply "merged". - Adjust unit tests to test the latest strategy, to assert the returned complete SegmentAnalysis object instead of just the aggregators for completeness. * Don't explicitly set strict strategy in tests * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/segmentmetadataquery.md * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-07-13 12:37:36 -04:00
imply-cheddar	7650a71d37	Add window query test files from Drill (#14561 )	2023-07-12 20:14:39 -07:00
imply-cheddar	65e1b27aa7	Fix a resource leak with Window processing (#14573 ) * Fix a resource leak with Window processing Additionally, in order to find the leak, there were adjustments to the StupidPool to track leaks a bit better. It would appear that the pool objects get GC'd during testing for some reason which was causing some incorrect identification of leaks from objects that had been returned but were GC'd along with the pool. * Suppress unused warning	2023-07-12 17:25:42 -05:00
Laksh Singla	5ce536355e	Fix planning bug while using sort merge frame processor (#14450 ) sqlJoinAlgorithm is now a hint to the planner to execute the join in the specified manner. The planner can decide to ignore the hint if it deduces that the specified algorithm can be detrimental to the performance of the join beforehand.	2023-07-11 09:58:44 +00:00
Gian Merlino	63ee69b4e8	Claim full support for Java 17. (#14384 ) * Claim full support for Java 17. No production code has changed, except the startup scripts. Changes: 1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK. 2) Include the full list of opens and exports on both Java 11 and 17. 3) Document that Java 17 is both supported and preferred. 4) Switch some tests from Java 11 to 17 to get better coverage on the preferred version. * Doc update. * Update errorprone. * Update docker_build_containers.sh. * Update errorprone in licenses.yaml. * Add some more run-javas. * Additional run-javas. * Update errorprone. * Suppress new errorprone error. * Add exports and opens in ForkingTaskRunner for Java 11+. Test, doc changes. * Additional errorprone updates. * Update for errorprone. * Restore old fomatting in LdapCredentialsValidator. * Copy bin/ too. * Fix Java 15, 17 build line in docker_build_containers.sh. * Update busybox image. * One more java command. * Fix interpolation. * IT commandline refinements. * Switch to busybox 1.34.1-glibc. * POM adjustments, build and test one IT on 17. * Additional debugging. * Fix silly thing. * Adjust command line. * Add exports and opens one more place. * Additional harmonization of strong encapsulation parameters.	2023-07-07 12:52:35 -07:00
Gian Merlino	dd78e00dc5	Fix ColumnSignature error message and jdk17 test issue. (#14538 ) * Fix ColumnSignature error message and jdk17 test issue. On jdk17, the "problem" part of the error message could change from NullPointerException to: Cannot invoke "String.length()" because "s" is null Due to the new more-helpful NPEs in Java 17. This broke the expectation and led to test failures on this case. This patch fixes the problem by improving the error message so it isn't a generic NullPointerException. * Fix format.	2023-07-06 15:10:59 -07:00
Abhishek Radhakrishnan	d02bb8bb6e	Set explain attributes after the query is prepared (#14490 ) * Add support for DML WITH AS. * One more UT for with as subquery. * Add a test with join query * Use root query prepared node instead of individual SqlNode types. - Set the explain plan attributes after the query is prepared when the query is planned and we've the finalized output names in the root source rel node. - Adjust tests; add unit test for negative ordinal case. - Remove the exception / error handling logic from resolveClusteredBy function since the validations now happen before it comes to the function * Update comment.	2023-07-06 14:13:32 -04:00
imply-cheddar	5fc122a144	Add window-focused tests from Drill (#13773 ) This commit borrows some test definitions from Drill's test suite and tries to use them to flesh out the full validation of window function capbilities. In order to be able to run these tests, we also add the ability to run a Scan operation against segments, which also meant an implementation of RowsAndColumns for frames.	2023-07-06 09:20:32 -07:00
Soumyava	78db7a4414	A query in MSQ would issue wrong error code (#14531 ) with a RuntimeException. Now the RuntimeException is being replaced by an user facing DruidException of Invalid category which would allow calcite not to throw an uncategorized exception.	2023-07-06 08:59:35 +05:30
Jonathan Wei	f29a9faa94	Better surfacing of invalid pattern errors for SQL REGEXP_EXTRACT function (#14505 )	2023-07-05 17:12:54 -05:00
Pranav	2d5b27358e	Logging the fieldName in the coerce exceptions (#14483 ) Logging the fieldName in the coerce exceptions	2023-07-03 14:13:27 +05:30
Gian Merlino	e10e35aa2c	Add REGEXP_REPLACE function. (#14460 ) * Add REGEXP_REPLACE function. Replaces all instances of a pattern with a replacement string. * Fixes. * Improve test coverage. * Adjust behavior.	2023-06-29 13:47:57 -07:00
Gian Merlino	a6cabbe10f	SQL: Avoid "intervals" for non-table-based datasources. (#14336 ) In these other cases, stick to plain "filter". This simplifies lots of logic downstream, and doesn't hurt since we don't have intervals-specific optimizations outside of tables. Fixes an issue where we couldn't properly filter on a column from an external datasource if it was named __time.	2023-06-29 09:57:11 +05:30
Gian Merlino	34c55a0bde	SQL: SUBSTRING support for non-literals. (#14480 ) * SQL: SUBSTRING support for non-literals. * Fix AssertionError test. * Fix header.	2023-06-28 13:43:05 -07:00
Jonathan Wei	c36f12f1d8	Support complex variance object inputs for variance SQL agg function (#14463 ) * Support complex variance object inputs for variance SQL agg function * Add test * Include complexTypeChecker, address PR comments * Checkstyle, javadoc link	2023-06-28 13:14:19 -05:00
Karan Kumar	cb3a9d2b57	Adding Interactive API's for MSQ engine (#14416 ) This PR aims to expose a new API called "@path("/druid/v2/sql/statements/")" which takes the same payload as the current "/druid/v2/sql" endpoint and allows users to fetch results in an async manner.	2023-06-28 17:51:58 +05:30
Gian Merlino	c78d885b80	Cache parsed expressions and binding analysis in more places. (#14124 ) * Cache parsed expressions and binding analysis in more places. Main changes: 1) Cache parsed and analyzed expressions within PlannerContext for a single SQL query. 2) Cache parsed expressions together with input binding analysis using a new class AnalyzeExpr. This speeds up SQL planning, because SQL planning involves parsing analyzing the same expression strings over and over again. * Fixes. * Fix style. * Fix test. * Simplify: get rid of AnalyzedExpr, focus on caching. * Rename parse -> parseExpression.	2023-06-27 13:40:35 -07:00
Clint Wylie	6ba10c8b6c	fix bug with json_value expression array extraction (#14461 )	2023-06-26 21:02:44 -07:00
Abhishek Radhakrishnan	903addf7c2	Make agg and scalar routines test to depend on specific routine names. (#14482 )	2023-06-26 23:03:08 -04:00
Abhishek Radhakrishnan	79bff4bbf7	Improvements to `EXPLAIN PLAN` attributes (#14441 ) * Updates: use the target table directly, sanitized replace time chunks and clustered by cols. * Add DruidSqlParserUtil and tests. * minor refactor * Use SqlUtil.isLiteral * Throw ValidationException if CLUSTERED BY column descending order is specified. - Fails query planning * Some more tests. * fixup existing comment * Update comment * checkstyle fix: remove unused imports * Remove InsertCannotOrderByDescendingFault and deprecate the fault in readme. * minor naming * move deprecated field to the bottom * update docs. * add one more example. * Collapsible query and result * checkstyle fixes * Code cleanup * order by changes * conditionally set attributes only for explain queries. * Cleaner ordinal check. * Add limit test and update javadoc. * Commentary and minor adjustments. * Checkstyle fixes. * One more checkArg. * add unexpected kind to exception.	2023-06-26 23:01:11 -04:00
Laksh Singla	1647d5f4a0	Limit the subquery results by memory usage (#13952 ) Users can now add a guardrail to prevent subquery’s results from exceeding the set number of bytes by setting druid.server.http.maxSubqueryRows in Broker's config or maxSubqueryRows in the query context. This feature is experimental for now and would default back to row-based limiting in case it fails to get the accurate size of the results consumed by the query.	2023-06-26 18:12:28 +05:30
Gian Merlino	d7c9c2f367	SqlResults: Coerce arrays to lists for VARCHAR. (#14260 ) * SqlResults: Coerce arrays to lists for VARCHAR. Useful for STRING_TO_MV, which returns VARCHAR at the SQL layer and an ExprEval with String[] at the native layer. * Fix style. * Improve test coverage. * Remove unnecessary throws.	2023-06-25 09:35:18 -07:00
Gian Merlino	3d19b748fb	SQL OperatorConversions: Introduce.aggregatorBuilder, allow CAST-as-literal. (#14249 ) * SQL OperatorConversions: Introduce.aggregatorBuilder, allow CAST-as-literal. Four main changes: 1) Provide aggregatorBuilder, a more consistent way of defining the SqlAggFunction we need for all of our SQL aggregators. The mechanism is analogous to the one we already use for SQL functions (OperatorConversions.operatorBuilder). 2) Allow CASTs of constants to be considered as "literalOperands". This fixes an issue where various of our operators are defined with OperandTypes.LITERAL as part of their checkers, which doesn't allow casts. However, in these cases we generally _do_ want to allow casts. The important piece is that the value must be reducible to a constant, not that the SQL text is literally a literal. 3) Update DataSketches SQL aggregators to use the new aggregatorBuilder functionality. The main user-visible effect here is [2]: the aggregators would now accept, for example, "CAST(0.99 AS DOUBLE)" as a literal argument. Other aggregators could be updated in a future patch. 4) Rename "requiredOperands" to "requiredOperandCount", because the old name was confusing. (It rhymes with "literalOperands" but the arguments mean different things.) * Adjust method calls.	2023-06-23 16:25:04 -07:00
Rishabh Singh	155fde33ff	Add metrics to SegmentMetadataCache refresh (#14453 ) New metrics: - `segment/metadatacache/refresh/time`: time taken to refresh segments per datasource - `segment/metadatacache/refresh/count`: number of segments being refreshed per datasource	2023-06-23 16:51:08 +05:30
Rohan Garg	09d6c5a45e	Decouple logical planning and native query generation in SQL planning (#14232 ) Add a new planning strategy that explicitly decouples the DAG from building the native query. With this mode, it is Calcite's job to generate a "logical DAG" which is all of the various DruidProject, DruidFilter, etc. nodes. We then take those nodes and use them to build a native query. The current commit doesn't pass all tests, but it does work for some things and is a decent starting baseline.	2023-06-19 16:00:40 -07:00
imply-cheddar	cfd07a95b7	Errors take 3 (#14004 ) Introduce DruidException, an exception whose goal in life is to be delivered to a user. DruidException itself has javadoc on it to describe how it should be used. This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used. This work was a 3rd iteration on top of work that was started by Paul Rogers. I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.	2023-06-19 01:11:13 -07:00
Adarsh Sanjeev	128133fadc	Add column replication_factor column to sys.segments table (#14403 ) Description: Druid allows a configuration of load rules that may cause a used segment to not be loaded on any historical. This status is not tracked in the sys.segments table on the broker, which makes it difficult to determine if the unavailability of a segment is expected and if we should not wait for it to be loaded on a server after ingestion has finished. Changes: - Track replication factor in `SegmentReplicantLookup` during evaluation of load rules - Update API `/druid/coordinator/v1metadata/segments` to return replication factor - Add column `replication_factor` to the sys.segments virtual table and populate it in `MetadataSegmentView` - If this column is 0, the segment is not assigned to any historical and will not be loaded.	2023-06-18 10:02:21 +05:30
Abhishek Radhakrishnan	04fb75719e	Fail query planning if a `CLUSTERED BY` column contains descending order (#14436 ) * Throw ValidationException if CLUSTERED BY column descending order is specified. - Fails query planning * Some more tests. * fixup existing comment * Update comment * checkstyle fix: remove unused imports * Remove InsertCannotOrderByDescendingFault and deprecate the fault in readme. * move deprecated field to the bottom	2023-06-16 18:10:12 -04:00
Clint Wylie	359bd63cc9	allow expression "best effort" type determination to better handle mixed type arrays (#14438 )	2023-06-16 00:02:43 -07:00
Clint Wylie	8454cc619a	auto columns fixes (#14422 ) changes: * auto columns no longer participate in generic 'null column' handling, this was a mistake to try to support and caused ingestion failures due to mismatched ColumnFormat, and will be replaced in the future with nested common format constant column functionality (not in this PR) * fix bugs with auto columns which contain empty objects, empty arrays, or primitive types mixed with either of these empty constructs * fix bug with bound filter when upper is null equivalent but is strict	2023-06-14 08:57:06 -07:00
Abhishek Radhakrishnan	b8495d45a1	Expose Druid functions in `INFORMATION_SCHEMA.ROUTINES` table. (#14378 ) * Add INFORMATION_SCHEMA.ROUTINES to expose Druid operators and functions. * checkstyle * remove IS_DETERMISITIC. * test * cleanup test * remove logs and simplify * fixup unit test * Add docs for INFORMATION_SCHEMA.ROUTINES table. * Update test and add another SQL query. * add stuff to .spelling and checkstyle fix. * Add more tests for custom operators. * checkstyle and comment. * Some naming cleanup. * Add FUNCTION_ID * The different Calcite function syntax enums get translated to FUNCTION * Update docs. * Cleanup markdown table. * fixup test. * fixup intellij inspection * Review comment: nullable column; add a function to determine function syntax. * More tests; add non-function syntax operators. * More unit tests. Also add a separate test for DruidOperatorTable. * actually just validate non-zero count. * switch up the order * checkstyle fixes.	2023-06-13 15:44:04 -04:00
Abhishek Radhakrishnan	326f2c5020	Add more statement attributes to explain plan result. (#14391 ) This PR adds the following to the ATTRIBUTES column in the explain plan output: - partitionedBy - clusteredBy - replaceTimeChunks This PR leverages the work done in #14074, which added a new column ATTRIBUTES to encapsulate all the statement-related attributes.	2023-06-12 19:18:02 +05:30
Abhishek Radhakrishnan	2d258a95ad	Fix `EARLIEST_BY`/`LATEST_BY` signature and include function name in signature. (#14352 ) * Fix EarliestLatestBySqlAggregator signature; Include function name for all signatures. * Single quote function signatures, space between args and remove \n. * fixup UT assertion	2023-06-06 09:41:05 -07:00
zachjsh	04a82da63d	Input source security fixes (#14266 ) It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.	2023-06-01 16:37:19 -07:00
Clint Wylie	4096f51f0b	add configurable ColumnTypeMergePolicy to SegmentMetadataCache (#14319 ) This PR adds a new interface to control how SegmentMetadataCache chooses ColumnType when faced with differences between segments for SQL schemas which are computed, exposed as druid.sql.planner.metadataColumnTypeMergePolicy and adds a new 'least restrictive type' mode to allow choosing the type that data across all segments can best be coerced into and sets this as the default behavior. This is a behavior change around when segment driven schema migrations take effect for the SQL schema. With latestInterval, the SQL schema will be updated as soon as the first job with the new schema has published segments, while using leastRestrictive, the schema will only be updated once all segments are reindexed to the new type. The benefit of leastRestrictive is that it eliminates a bunch of type coercion errors that can happen in SQL when types are varied across segments with latestInterval because the newest type is not able to correctly represent older data, such as if the segments have a mix of ARRAY and number types, or any other combinations that lead to odd query plans.	2023-05-24 20:32:51 +05:30
Abhishek Radhakrishnan	338bdb35ea	Return `RESOURCES` in `EXPLAIN PLAN` as an ordered collection (#14323 ) * Make resources an ordered collection so it's deterministic. * test cleanup * fixup docs. * Replace deprecated ObjectNode#put() calls with ObjectNode#set().	2023-05-23 00:55:00 -05:00
Clint Wylie	d92b9fbfac	more resilient segment metadata, dont parallel merge internal segment metadata queries (#14296 )	2023-05-17 04:12:55 -07:00
Paul Rogers	3c0983c8e9	Extend the IT framework to allow tests in extensions (#13877 ) The "new" IT framework provides a convenient way to package and run integration tests (ITs), but only for core modules. We have a use case to run an IT for a contrib extension: the proposed gRPC query extension. This PR provides the IT framework functionality to allow non-core ITs.	2023-05-15 20:29:51 +05:30
imply-cheddar	f9861808bc	Be able to load segments on Peons (#14239 ) * Be able to load segments on Peons This change introduces a new config on WorkerConfig that indicates how many bytes of each storage location to use for storage of a task. Said config is divided up amongst the locations and slots and then used to set TaskConfig.tmpStorageBytesPerTask The Peons use their local task dir and tmpStorageBytesPerTask as their StorageLocations for the SegmentManager such that they can accept broadcast segments.	2023-05-12 16:51:00 -07:00
Soumyava	f128b9b666	Updates to filter processing for inner query in Joins (#14237 )	2023-05-11 17:21:41 +05:30
Clint Wylie	a58cebe491	add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays (#14236 )	2023-05-11 04:43:28 -07:00
Clint Wylie	8805d8d7db	fix issues with filtering nulls on values coerced to numeric types (#14139 ) * fix issues with filtering nulls on values coerced to numeric types * fix issues with 'auto' type numeric columns in default value mode * optimize variant typed columns without nested data * more tests for 'auto' type column ingestion	2023-05-08 13:19:02 -07:00
Rohan Garg	4d8feeb279	Fix planning in CASE expressions with complex WHEN and ELSE expressions (#14220 )	2023-05-08 11:35:04 +05:30
zachjsh	48cde236c4	Add columnMappings to explain plan output (#14187 ) * Add columnMappings to explain plan output * * fix checkstyle * add tests * * improve test coverage * * temporarily remove unit-test need to run ITs * * depend on build * * temporarily lower unit test threshold * * add back dependency on unit-tests * * add license headers * * fix header order * * review comments * * fix intellij inspection errors * * revert code coverage change	2023-05-04 10:36:28 -07:00
Gian Merlino	42c8c84eb6	TimeBoundary: Use cursor when datasource is not a regular table. (#14151 ) * TimeBoundary: Use cursor when datasource is not a regular table. Fixes a bug where TimeBoundary could return incorrect results with INNER Join or inline data. * Addl Javadocs.	2023-04-26 17:00:13 -07:00
Gian Merlino	89e7948159	MSQ: Subclass CalciteJoinQueryTest, other supporting changes. (#14105 ) * MSQ: Subclass CalciteJoinQueryTest, other supporting changes. The main change is the new tests: we now subclass CalciteJoinQueryTest in CalciteSelectJoinQueryMSQTest twice, once for Broadcast and once for SortMerge. Two supporting production changes for default-value mode: 1) InputNumberDataSource is marked as concrete, to allow leftFilter to be pushed down to it. 2) In default-value mode, numeric frame field readers can now return nulls. This is necessary when stacking joins on top of joins: nulls must be preserved for semantics that match broadcast joins and native queries. 3) In default-value mode, StringFieldReader.isNull returns true on empty strings in addition to nulls. This is more consistent with the behavior of the selectors, which map empty strings to null as well in that mode. As an effect of change (2), the InsertTimeNull change from #14020 (to replace null timestamps with default timestamps) is reverted. IMO, this is fine, as either behavior is defensible, and the change from #14020 hasn't been released yet. * Adjust tests. * Style fix. * Additional tests.	2023-04-25 12:10:23 -07:00
Gian Merlino	f643abdad9	SQL planning: Consider subqueries in fewer scenarios. (#14123 ) * SQL planning: Consider subqueries in fewer scenarios. Further adjusts logic in DruidRules that was previously adjusted in #13902. The reason for the original change was that the comment "Subquery must be a groupBy, so stage must be >= AGGREGATE" was no longer accurate. Subqueries do not need to be groupBy anymore; they can really be any type of query. If I recall correctly, the change was needed for certain window queries to be able to plan on top of Scan queries. However, this impacts performance negatively, because it causes many additional outer-query scenarios to be considered, which is expensive. So, this patch updates the matching logic to consider fewer scenarios. The skipped scenarios are ones where we expect that, for one reason or another, it isn't necessary to consider a subquery. * Remove unnecessary escaping. * Fix test.	2023-04-21 08:32:13 -07:00
Soumyava	8d60edcfcb	Updating segment map function for QueryDataSource to ensure group by … (#14112 ) * Updating segment map function for QueryDataSource to ensure group by of group by of join data source gets into proper segment map function path * Adding unit tests for the failed case * There you go coverage bot, be happy now	2023-04-20 13:22:29 -07:00
zachjsh	04da0102cb	KillTask should return empty inputSource resources (#14106 ) ### Description This pr fixes a few bugs found with the inputSource security feature. 1. `KillUnusedSegmentsTask` previously had no definition for the `getInputSourceResources`, which caused an unsupportedOperationException to be thrown when this task type was submitted with the inputSource security feature enabled. This task type should not require any input source specific resources, so returning an empty set for this task type now. 2. Fixed a bug where when the input source type security feature is enabled, all of the input source type specific resources used where authenticated against: `{"resource": {"name": "EXTERNAL", "type": "{INPUT_SOURCE_TYPE}"}, "action": "READ"}` When they should be instead authenticated against: `{"resource": {"name": "{INPUT_SOURCE_TYPE}", "type": "EXTERNAL"}, "action": "READ"}` 3. fixed bug where supervisor tasks were not authenticated against the specific input source types used, if input source security feature was enabled.	2023-04-18 15:27:16 -04:00
Clint Wylie	e7d2e8b914	fix bug filtering nested columns with expression filters (#14096 )	2023-04-17 14:21:32 -07:00
Abhishek Radhakrishnan	c98c66558f	Include statement attributes in `EXPLAIN PLAN` output (#14074 ) This commit adds attributes that contain metadata information about the query in the EXPLAIN PLAN output. The attributes currently contain two items: - `statementTyp`: SELECT, INSERT or REPLACE - `targetDataSource`: provides the target datasource name for DML statements It is added to both the legacy and native query plan outputs.	2023-04-17 21:00:25 +05:30
Gian Merlino	a8eb3f2f57	SQL: Fix natural comparator selection for groupBy. (#14075 ) * SQL: Fix natural comparator selection for groupBy. DruidQuery.computeSorting had some unique logic for finding natural comparators for SQL types. It should be using getStringComparatorForRelDataType instead. One good effect here is that the comparator for BOOLEAN is now NUMERIC rather than LEXICOGRAPHIC. The test case illustrates this. * Remove msqCompatible, for now. * Fix test.	2023-04-15 07:14:43 +05:30
Gian Merlino	eeed5ed7e2	MSQ: Use the same result coercion routines as the regular SQL endpoint. (#14046 ) * MSQ: Use the same result coercion routines as the regular SQL endpoint. The main changes are to move NativeQueryMaker.coerce to SqlResults, and to formally make the list of sqlTypeNames from the MSQ results reports use SqlTypeNames. - Change the default to MSQ-compatible rather than MSQ-incompatible. The explicit marker function is now "notMsqCompatible()".	2023-04-15 06:56:23 +05:30
Gian Merlino	0884a22c41	MSQ: Support for querying lookup and inline data directly. (#14048 ) * MSQ: Support for querying lookup and inline data directly. Main changes: 1) Add of LookupInputSpec and DataSourcePlan.forLookup. 2) Add InlineInputSpec, and modify of DataSourcePlan.forInline to use this instead of an ExternalInputSpec with JSON. This allows the inline data to act as the right-hand side of a join, if needed. Supporting changes: 1) Modify JoinDataSource's leftFilter validation to be a little less strict: it's now OK with leftFilter being attached to any concrete leaf (no children) datasource, rather than requiring it be a table. This allows MSQ to create JoinDataSource with InputNumberDataSource as the base. 2) Add SegmentWranglerModule to CliIndexer, CliPeon. This allows them to query lookups and inline data directly. * Updates based on CI. * Additional tests. * Style fix. * Remove unused import.	2023-04-14 14:04:02 -07:00
Atul Mohan	e3c160f2f2	Add start_time column to sys.servers (#13358 ) Adds a new column start_time to sys.servers that captures the time at which the server was added to the cluster.	2023-04-14 15:23:34 +05:30
zachjsh	2e87b5a901	Input source security sql layer can handle input source with multiple types (#14050 ) ### Description This change allows for input sources used during MSQ ingestion to be authorized for multiple input source types, instead of just 1. Such an input source that allows for multiple types is the CombiningInputSource. Also fixed bug that caused some input source specific functions to be authorized against the permissions ` [ new ResourceAction(new Resource(ResourceType.EXTERNAL, ResourceType.EXTERNAL), Action.READ), new ResourceAction(new Resource(ResourceType.EXTERNAL, {input_source_type}), Action.READ) ] ` when the inputSource based authorization feature is enabled, when it should instead be authorized against ` [ new ResourceAction(new Resource(ResourceType.EXTERNAL, {input_source_type}), Action.READ) ] `	2023-04-10 09:48:57 -04:00
Clint Wylie	1aef72aa7e	Bump up the version in pom to 27.0.0 in preparation of release (#14051 )	2023-04-10 14:56:59 +05:30
Gian Merlino	d52bc333aa	Frames: Ensure nulls are read as default values when appropriate. (#14020 ) * Frames: Ensure nulls are read as default values when appropriate. Fixes a bug where LongFieldWriter didn't write a properly transformed zero when writing out a null. This had no meaningful effect in SQL-compatible null handling mode, because the field would get treated as a null anyway. But it does have an effect in default-value mode: it would cause Long.MIN_VALUE to get read out instead of zero. Also adds NullHandling checks to the various frame-based column selectors, allowing reading of nullable frames by servers in default-value mode.	2023-04-10 05:28:46 +05:30
zachjsh	5c0221375c	Allow for Input source security in native task layer (#14003 ) Fixes #13837. ### Description This change allows for input source type security in the native task layer. To enable this feature, the user must set the following property to true: `druid.auth.enableInputSourceSecurity=true` The default value for this property is false, which will continue the existing functionality of needing authorization to write to the respective datasource. When this config is enabled, the users will be required to be authorized for the following resource action, in addition to write permission on the respective datasource. `new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ` where `{INPUT_SOURCE_TYPE}` is the type of the input source being used;, http, inline, s3, etc.. Only tasks that provide a non-default implementation of the `getInputSourceResources` method can be submitted when config `druid.auth.enableInputSourceSecurity=true` is set. Otherwise, a 400 error will be thrown.	2023-04-06 13:13:09 -04:00
Clint Wylie	b11c0bc249	smarter nested column index utilization (#13977 ) * smarter nested column index utilization changes: * adds skipValueRangeIndexScale and skipValuePredicateIndexScale to ColumnConfig (e.g. DruidProcessingConfig) available as system config via druid.processing.indexes.skipValueRangeIndexScale and druid.processing.indexes.skipValuePredicateIndexScale * NestedColumnIndexSupplier uses skipValueRangeIndexScale and skipValuePredicateIndexScale to multiply by the total number of rows to be processed to determine the threshold at which we should no longer consider using bitmap indexes because it will be too many operations * Default values for skipValueRangeIndexScale and skipValuePredicateIndexScale have been initially set to 0.08, but are separate to allow independent tuning * these are not documented on purpose yet because they are kind of hard to explain, the mainly exist to help conduct larger scale experiments than the jmh benchmarks used to derive the initial set of values * these changes provide a pretty sweet performance boost for filter processing on nested columns	2023-04-06 04:09:24 -07:00
Paul Rogers	030ed911d4	Temporarily revert extended table functions for Druid 26 (#14019 )	2023-04-05 21:09:33 -07:00
Gian Merlino	319f99db05	Always use file sizes when determining batch ingest splits (#13955 ) * Always use file sizes when determining batch ingest splits. Main changes: 1) Update CloudObjectInputSource and its subclasses (S3, GCS, Azure, Aliyun OSS) to use SplitHintSpecs in all cases. Previously, they were only used for prefixes, not uris or objects. 2) Update ExternalInputSpecSlicer (MSQ) to consider file size. Previously, file size was ignored; all files were treated as equal weight when determining splits. A side effect of these changes is that we'll make additional network calls to find the sizes of objects when users specify URIs or objects as opposed to prefixes. IMO, this is worth it because it's the only way to respect the user's split hint and task assignment settings. Secondary changes: 1) S3, Aliyun OSS: Use getObjectMetadata instead of listObjects to get metadata for a single object. This is a simpler call that is also expected to be less expensive. 2) Azure: Fix a bug where getBlobLength did not populate blob reference attributes, and therefore would not actually retrieve the blob length. 3) MSQ: Align dynamic slicing logic between ExternalInputSpecSlicer and TableInputSpecSlicer. 4) MSQ: Adjust WorkerInputs to ensure there is always at least one worker, even if it has a nil slice. * Add msqCompatible to testGroupByWithImpossibleTimeFilter. * Fix tests. * Add additional tests. * Remove unused stuff. * Remove more unused stuff. * Adjust thresholds. * Remove irrelevant test. * Fix comments. * Fix bug. * Updates.	2023-04-05 08:54:01 -07:00
Clint Wylie	1c8a184677	add null safety checks for DiscoveryDruidNode services for more resilient http server and task views (#13930 ) * add null safety checks for DiscoveryDruidNode services for more resilient http server and task vi	2023-04-05 02:45:39 -07:00
Clint Wylie	d21babc5b8	remix nested columns (#14014 ) changes: * introduce ColumnFormat to separate physical storage format from logical type. ColumnFormat is now used instead of ColumnCapabilities to get column handlers for segment creation * introduce new 'auto' type indexer and merger which produces a new common nested format of columns, which is the next logical iteration of the nested column stuff. Essentially this is an automatic type column indexer that produces the most appropriate column for the given inputs, making either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json>. * revert NestedDataColumnIndexer, NestedDataColumnMerger, NestedDataColumnSerializer to their version pre #13803 behavior (v4) for backwards compatibility * fix a bug in RoaringBitmapSerdeFactory if anything actually ever wrote out an empty bitmap using toBytes and then later tried to read it (the nerve!)	2023-04-04 17:51:59 -07:00
Soumyava	ca94f7146f	Planning correctly for order by queries on time which previously thre… (#13965 ) * Planning correctly for order by queries on time which previously threw a planning error * Updating toDruidQueryForExplaining on a query data source if there is a window on the partial query	2023-04-03 18:30:19 -07:00
Soumyava	1eeecf5fb2	Fixing regression issues on unnest (#13976 ) * select sum(c) on an unnested column now does not return 'Type mismatch' error and works properly * Making sure an inner join query works properly * Having on unnested column with a group by now works correctly * count(*) on an unnested query now works correctly	2023-03-31 09:06:43 +05:30
zachjsh	3bb67721f7	Allow for Input source security in SQL layer (#13989 ) This change introduces the concept of input source type security model, proposed in #13837.. With this change, this feature is only available at the SQL layer, but we will expand to native layer in a follow up PR. To enable this feature, the user must set the following property to true: druid.auth.enableInputSourceSecurity=true The default value for this property is false, which will continue the existing functionality of having the usage all external sources being authorized against the hardcoded resource action new ResourceAction(new Resource(ResourceType.EXTERNAL, ResourceType.EXTERNAL), Action.READ When this config is enabled, the users will be required to be authorized for the following resource action new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ where {INPUT_SOURCE_TYPE} is the type of the input source being used;, http, inline, s3, etc.. Documentation has not been added for the feature as it is not complete at the moment, as we still need to enable this for the native layer in a follow up pr.	2023-03-29 22:15:33 -04:00
Karan Kumar	e4c5122a60	Fixing checkstyle (#14000 )	2023-03-29 20:21:21 +05:30
Paul Rogers	76fe26d4ba	Fix typos, add tests for http() function (#13954 )	2023-03-28 14:41:06 -07:00
Clint Wylie	d5b1b5bc8e	nested columns + arrays = array columns! (#13803 ) array columns! changes: * add support for storing nested arrays of string, long, and double values as specialized nested columns instead of breaking them into separate element columns * nested column type mimic behavior means that columns ingested with only root arrays of primitive values will be ARRAY typed columns * neat test refactor stuff * add v4 segment test * add array element indexes * add tests for unnest and array columns * fix unnest column value selector cursor handling of null and empty arrays	2023-03-27 12:42:35 -07:00
Paul Rogers	da42ee5bfa	Added TYPE(native) data type for external tables (#13958 )	2023-03-22 21:43:29 -07:00
Soumyava	2ad133c06e	Unnest changes for moving the filter on right side of correlate to inside the unnest datasource (#13934 ) * Refactoring and bug fixes on top of unnest. The filter now is passed inside the unnest cursors. Added tests for scenarios such as 1. filter on unnested column which involves a left filter rewrite 2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite 3. not filters 4. SQL functions applied on top of unnested column 5. null present in first row of the column to be unnested	2023-03-22 18:24:00 -07:00
Clint Wylie	086eb26b74	fix join and unnest planning to ensure that duplicate join prefixes are not used (#13943 ) * fix join and unnest planning to ensure that duplicate join prefixes are not used * wont somebody please think of the children	2023-03-22 12:53:55 -07:00
Clint Wylie	f4392a3155	expression transform improvements and fixes (#13947 ) changes: * fixes inconsistent handling of byte[] values between ExprEval.bestEffortOf and ExprEval.ofType, which could cause byte[] values to end up as java toString values instead of base64 encoded strings in ingest time transforms * improved ExpressionTransform binding to re-use ExprEval.bestEffortOf when evaluating a binding instead of throwing it away * improved ExpressionTransform array handling, added RowFunction.evalDimension that returns List<String> to back Row.getDimension and remove the automatic coercing of array types that would typically happen to expression transforms unless using Row.getDimension * added some tests for ExpressionTransform with array inputs * improved ExpressionPostAggregator to use partial type information from decoration * migrate some test uses of InputBindings.forMap to use other methods	2023-03-21 23:26:53 -07:00
Adarsh Sanjeev	143fdcfacf	Change test name so it triggers in CI (#13844 ) As the name of the class did not end or start with "Test", CalciteSelectQueryMSQTest was not triggered in CI. This PR renames the test.	2023-03-20 15:55:52 +05:30
somu-imply	a7ba361666	Refactoring and bug fixes on top of unnest. The allowList now is not passed … (#13922 ) * Refactoring and bug fixes on top of unnest. The filter now is passed inside the unnest cursors. Added tests for scenarios such as 1. filter on unnested column which involves a left filter rewrite 2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite 3. not filters 4. SQL functions applied on top of unnested column 5. null present in first row of the column to be unnested	2023-03-14 16:05:56 -07:00
Suneet Saldanha	44547614ae	Report engine as a dimension for sqlQuery metrics (#13906 ) * Report engine as a dimension for sqlQuery metrics * docs	2023-03-10 11:23:57 -08:00
Gian Merlino	4b1ffbc452	Various changes and fixes to UNNEST. (#13892 ) * Various changes and fixes to UNNEST. Native changes: 1) UnnestDataSource: Replace "column" and "outputName" with "virtualColumn". This enables pushing expressions into the datasource. This in turn allows us to do the next thing... 2) UnnestStorageAdapter: Logically apply query-level filters and virtual columns after the unnest operation. (Physically, filters are pulled up, when possible.) This is beneficial because it allows filters and virtual columns to reference the unnested column, and because it is consistent with how the join datasource works. 3) Various documentation updates, including declaring "unnest" as an experimental feature for now. SQL changes: 1) Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel is simplified: it only handles the UNNEST part of a correlated join. Constant UNNESTs are handled with regular inline rels. 2) Rework DruidCorrelateUnnestRule to focus on pulling Projects from the left side up above the Correlate. New test testUnnestTwice verifies that this works even when two UNNESTs are stacked on the same table. 3) Include ProjectCorrelateTransposeRule from Calcite to encourage pushing mappings down below the left-hand side of the Correlate. 4) Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule to handle pulling Filters up above the Correlate. New tests testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify this behavior. 5) Require a context feature flag for SQL UNNEST, since it's undocumented. As part of this, also cleaned up how we handle feature flags in SQL. They're now hooked into EngineFeatures, which is useful because not all engines support all features.	2023-03-10 16:42:08 +05:30
imply-cheddar	6b90a320cf	Add back function signature for compat (#13914 ) * Add back function signature for compat * Suppress IntelliJ Error	2023-03-09 21:06:34 -08:00
Gian Merlino	bf39b4d313	Window planning: use collation traits, improve subquery logic. (#13902 ) * Window planning: use collation traits, improve subquery logic. SQL changes: 1) Attach RelCollation (sorting) trait to any PartialDruidQuery that ends in AGGREGATE or AGGREGATE_PROJECT. This allows planning to take advantage of the fact that Druid sorts by dimensions when doing aggregations. 2) Windowing: inspect RelCollation trait from input, and insert naiveSort if, and only if, necessary. 3) Windowing: add support for Project after Window, when the Project is a simple mapping. Helps eliminate subqueries. 4) DruidRules: update logic for considering subqueries to reflect that subqueries are not required to be GroupBys, and that we have a bunch of new Stages now. With all of this evolution that has happened, the old logic didn't quite make sense. Native changes: 1) Use merge sort (stable) rather than quicksort when sorting RowsAndColumns. Makes it easier to write test cases for plans that involve re-sorting the data. * Changes from review. * Mark the bad test as failing. * Additional update. * Fix failingTest. * Fix tests. * Mark a var final.	2023-03-09 15:48:13 -08:00
Clint Wylie	48ac5ce50b	use native nvl expression for SQL NVL and 2 argument COALESCE (#13897 ) * use custom case operator conversion instead of direct operator conversion, to produce native nvl expression for SQL NVL and 2 argument COALESCE, and add optimization for certain case filters from coalesce and nvl statements	2023-03-09 05:46:17 -08:00
Gian Merlino	90d8f67e3d	Avoid creating new RelDataTypeFactory during SQL planning. (#13904 ) * Avoid creating new RelDataTypeFactory during SQL planning. Reduces unnecessary CPU cycles. * Fix.	2023-03-08 21:55:49 -08:00
Gian Merlino	82f7a56475	Sort-merge join and hash shuffles for MSQ. (#13506 ) * Sort-merge join and hash shuffles for MSQ. The main changes are in the processing, multi-stage-query, and sql modules. processing module: 1) Rename SortColumn to KeyColumn, replace boolean descending with KeyOrder. This makes it nicer to model hash keys, which use KeyOrder.NONE. 2) Add nullability checkers to the FieldReader interface, and an "isPartiallyNullKey" method to FrameComparisonWidget. The join processor uses this to detect null keys. 3) Add WritableFrameChannel.isClosed and OutputChannel.isReadableChannelReady so callers can tell which OutputChannels are ready for reading and which aren't. 4) Specialize FrameProcessors.makeCursor to return FrameCursor, a random-access implementation. The join processor uses this to rewind when it needs to replay a set of rows with a particular key. 5) Add MemoryAllocatorFactory, which is embedded inside FrameWriterFactory instead of a particular MemoryAllocator. This allows FrameWriterFactory to be shared in more scenarios. multi-stage-query module: 1) ShuffleSpec: Add hash-based shuffles. New enum ShuffleKind helps callers figure out what kind of shuffle is happening. The change from SortColumn to KeyColumn allows ClusterBy to be used for both hash-based and sort-based shuffling. 2) WorkerImpl: Add ability to handle hash-based shuffles. Refactor the logic to be more readable by moving the work-order-running code to the inner class RunWorkOrder, and the shuffle-pipeline-building code to the inner class ShufflePipelineBuilder. 3) Add SortMergeJoinFrameProcessor and factory. 4) WorkerMemoryParameters: Adjust logic to reserve space for output frames for hash partitioning. (We need one frame per partition.) sql module: 1) Add sqlJoinAlgorithm context parameter; can be "broadcast" or "sortMerge". With native, it must always be "broadcast", or it's a validation error. MSQ supports both. Default is "broadcast" in both engines. 2) Validate that MSQs do not use broadcast join with RIGHT or FULL join, as results are not correct for broadcast join with those types. Allow this in native for two reasons: legacy (the docs caution against it, but it's always been allowed), and the fact that it actually does generate correct results in native when the join is processed on the Broker. It is much less likely that MSQ will plan in such a way that generates correct results. 3) Remove subquery penalty in DruidJoinQueryRel when using sort-merge join, because subqueries are always required, so there's no reason to penalize them. 4) Move previously-disabled join reordering and manipulation rules to FANCY_JOIN_RULES, and enable them when using sort-merge join. Helps get to better plans where projections and filters are pushed down. * Work around compiler problem. * Updates from static analysis. * Fix @param tag. * Fix declared exception. * Fix spelling. * Minor adjustments. * wip * Merge fixups * fixes * Fix CalciteSelectQueryMSQTest * Empty keys are sortable. * Address comments from code review. Rename mux -> mix. * Restore inspection config. * Restore original doc. * Reorder imports. * Adjustments * Fix. * Fix imports. * Adjustments from review. * Update header. * Adjust docs.	2023-03-08 14:19:39 -08:00
Adarsh Sanjeev	ef82756176	Add validation for aggregations on __time (#13793 ) * Add validation for aggregations on __time	2023-03-07 17:16:36 -08:00
Clint Wylie	3924f0eff4	use Calcites.getColumnTypeForRelDataType for SQL CAST operator conversion (#13890 ) * use Calcites.getColumnTypeForRelDataType for SQL CAST operator conversion * fix comment * intervals are strings but also longs	2023-03-07 13:12:15 -08:00
Gian Merlino	fcfb7b8ff6	Add warning comments to Granularity.getIterable. (#13888 ) This function is notorious for causing memory exhaustion and excessive CPU usage; so much so that it was valuable to work around it in the SQL planner in #13206. Hopefully, a warning comment will encourage developers to stay away and come up with solutions that do not involve computing all possible buckets.	2023-03-06 22:57:10 -08:00
Clint Wylie	6cf754b0e0	move numeric null value coercion out of expression processing engine (#13809 ) * move numeric null value coercion out of expression processing engine * add ExprEval.valueOrDefault() to allow consumers to automatically coerce to default values * rename Expr.buildVectorized as Expr.asVectorProcessor more consistent naming with Function and ApplyFunction; javadocs for some stuff	2023-02-28 18:10:07 -08:00
Paul Rogers	914eebb4b7	Wire up the catalog resolver (#13788 ) Introduces the catalog resolver interface Wires the resolver up to the planner factory Refactors planner factory	2023-02-22 11:42:32 -08:00
zachjsh	665dee43bf	Revert "Operator conversion deny list (#13766 )" (#13829 ) This reverts commit `38e620aa4c`.	2023-02-21 15:14:49 -08:00
Paul Rogers	85d36be085	Information schema now uses numeric column types (#13777 ) Change to use SQL schemas to allow null numeric columns * Updated docs	2023-02-17 14:39:31 -08:00
Clint Wylie	08b5951cc5	merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698 ) * merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything * fix poms and license stuff * mockito is evil * allow reset of JvmUtils RuntimeInfo if tests used static injection to override	2023-02-17 14:27:41 -08:00
Paul Rogers	333196d207	Code cleanup & message improvements (#13778 ) * Misc cleanup edits Correct spacing Add type parameters Add toString() methods to formats so tests compare correctly IT doc revisions Error message edits Display UT query results when tests fail * Edit * Build fix * Build fixes	2023-02-15 15:22:54 +05:30
Clint Wylie	fa4cab405f	fix bug with sql planner when virtual column capabilities are null (#13797 )	2023-02-13 18:27:23 -08:00
Paul Rogers	842ee554de	Refinements to input-source specific table functions (#13780 ) Refinements to table functions Fixes various bugs Improves the structure of the table function classes Adds unit and integration tests	2023-02-13 16:21:27 -08:00
Clint Wylie	f09f83697d	fix array_agg to work with complex types and bugs with expression aggregator complex array handling (#13781 ) * fix array_agg to work with complex types and bugs with expression aggregator complex array handling * more consistent handling of array expressions, numeric arrays more consistently honor druid.generic.useDefaultValueForNull, fix array_ordinal sql output type	2023-02-12 22:01:39 -08:00
zachjsh	38e620aa4c	Operator conversion deny list (#13766 ) ### Description This change adds a new config property `druid.sql.planner.operatorConversion.denyList`, which allows a user to specify any operator conversions that they wish to disallow. A user may want to do this for a number of reasons, including security concerns. The default value of this property is the empty list `[]`, which does not disallow any operator conversions. An example usage of this property is `druid.sql.planner.operatorConversion.denyList=["extern"]`, which disallows the usage of the `extern` operator conversion. If the property is configured this way, and a user of the Druid cluster tries to submit a query that uses the `extern` function, such as the example given [here](https://druid.apache.org/docs/latest/multi-stage-query/examples.html#insert-with-no-rollup), a response with http response code `400` is returned with en error body similar to the following: ``` { "taskId": "4ec5b0b6-fa9b-4c3a-827d-2308294e9985", "state": "FAILED", "error": { "error": "Plan validation failed", "errorMessage": "org.apache.calcite.runtime.CalciteContextException: From line 28, column 5 to line 32, column 5: No match found for function signature EXTERN(<CHARACTER>, <CHARACTER>, <CHARACTER>)", "errorClass": "org.apache.calcite.tools.ValidationException", "host": null } } ```	2023-02-10 09:59:26 -08:00
AmatyaAvadhanula	dcdae84888	Add server view initialization metrics (#13716 ) * Add server view init metrics * Test coverage * Rename metrics	2023-02-07 20:02:00 +05:30
Clint Wylie	2d3bee8545	various nested column (and other) fixes (#13732 ) changes: * modified druid schema column type compution to special case COMPLEX<json> handling to choose COMPLEX<json> if any column in any segment is COMPLEX<json> * NestedFieldVirtualColumn can now work correctly on any type of column, returning either a column selector if a root path, or nil selector if not * fixed a random bug with NilVectorSelector when using a vector size larger than the default and druid.generic.useDefaultValueForNull=false would have the nulls vector set to all false instead of true * fixed an overly aggressive check in ExprEval.ofType when handling complex types which would try to treat any string as base64 without gracefully falling back if it was not in fact base64 encoded, along with special handling for complex<json> * added ExpressionVectorSelectors.castValueSelectorToObject and ExpressionVectorSelectors.castObjectSelectorToNumeric as convience methods to cast vector selectors using cast expressions without the trouble of constructing an expression. the polymorphic nature of the non-vectorized engine (and significantly larger overhead of non-vectorized expression processing) made adding similar methods for non-vectorized selectors less attractive and so have not been added at this time * fix inconsistency between nested column indexer and serializer in handling values (coerce non primitive and non arrays of primitives using asString) * ExprEval best effort mode now handles byte[] as string * added test for ExprEval.bestEffortOf, and add missing conversion cases that tests uncovered * more tests more better	2023-02-06 19:48:02 -08:00
imply-cheddar	706b8a0227	Adjust Operators to be Pausable (#13694 ) * Adjust Operators to be Pausable This enables "merge" style operations that combine multiple streams. This change includes a naive implementation of one such merge operator just to provide concrete evidence that the refactoring is effective.	2023-01-23 20:52:06 -08:00
somu-imply	90d445536d	SQL version of unnest native druid function (#13576 ) * adds the SQL component of the native unnest functionality in Druid to unnest SQL queries on a table dimension, virtual column or a constant array and convert them into native Druid queries * unnest in SQL is implemented as a combination of Correlate (the comma join part) and Uncollect (the unnest part)	2023-01-23 12:53:31 -08:00
Laksh Singla	a516eb1a41	Port Calcite's tests to run with MSQ (#13625 ) * SQL test framework extensions * Capture planner artifacts: logical plan, etc. * Planner test builder validates the logical plan * Validation for the SQL resut schema (we already have validation for the Druid row signature) * Better Guice integration: properties, reuse Guice modules * Avoid need for hand-coded expr, macro tables * Retire some of the test-specific query component creation * Fix query log hook race condition Co-authored-by: Paul Rogers <progers@apache.org>	2023-01-19 08:51:11 -08:00
Paul Rogers	22630b0aab	Much improved table functions (#13627 ) Much improved table functions * Revises properties, definitions in the catalog * Adds a "table function" abstraction to model such functions * Specific functions for HTTP, inline, local and S3. * Extended SQL types in the catalog * Restructure external table definitions to use table functions * EXTEND syntax for Druid's extern table function * Support for array-valued table function parameters * Support for array-valued SQL query parameters * Much new documentation	2023-01-17 08:41:57 -08:00
imply-cheddar	7ff3722cb9	Swap LazySingleton for Singleton (#13673 ) * Swap LazySingleton for Singleton * Initialize WebserverTestUtils properly	2023-01-15 21:38:37 -08:00
Clint Wylie	b5b740bbbb	allow using nested column indexer for schema discovery (#13653 ) * single typed "root" only nested columns now mimic "regular" columns of those types * incremental index can now use nested column indexer instead of string indexer for discovered columns	2023-01-12 18:31:12 -08:00
Adarsh Sanjeev	0a486c3bcf	Update forbidden apis with fixed executor (#13633 ) * Update forbidden apis with fixed executor	2023-01-12 15:34:36 +05:30
imply-cheddar	f1821a7c18	Add Sort Operator for Window Functions (#13619 ) * Addition of NaiveSortMaker and Default implementation Add the NaiveSortMaker which makes a sorter object and a default implementation of the interface. This also allows us to plan multiple different window definitions on the same query.	2023-01-06 00:27:18 -08:00
imply-cheddar	a8ecc48ffe	Validate response headers and fix exception logging (#13609 ) * Validate response headers and fix exception logging A class of QueryException were throwing away their causes making it really hard to determine what's going wrong when something goes wrong in the SQL planner specifically. Fix that and adjust tests to do more validation of response headers as well. We allow 404s and 307s to be returned even without authorization validated, but others get converted to 403	2023-01-05 14:15:15 -08:00
Clint Wylie	fd63e5a514	fix issue with jdbc and query metrics (#13608 ) * fix issue with metrics emitting and jdbc results by getting yielder from query processing thread * more better	2022-12-21 19:32:53 -08:00
imply-cheddar	0efd0879a8	Unify the handling of HTTP between SQL and Native (#13564 ) * Unify the handling of HTTP between SQL and Native The SqlResource and QueryResource have been using independent logic for things like error handling and response context stuff. This became abundantly clear and painful during a change I was making for Window Functions, so I unified them into using the same code for walking the response and serializing it. Things are still not perfectly unified (it would be the absolute best if the SqlResource just took SQL, planned it and then delegated the query run entirely to the QueryResource), but this refactor doesn't take that fully on. The new code leverages async query processing from our jetty container, the different interaction model with the Resource means that a lot of tests had to be adjusted to align with the async query model. The semantics of the tests remain the same with one exception: the SqlResource used to not log requests that failed authorization checks, now it does.	2022-12-19 00:25:33 -08:00
Clint Wylie	9ae7a36ccd	improve nested column storage format for broader compatibility (#13568 ) * bump nested column format version changes: * nested field files are now named by their position in field paths list, rather than directly by the path itself. this fixes issues with valid json properties with commas and newlines breaking the csv file meta.smoosh * update StructuredDataProcessor to deal in NestedPathPart to be consistent with other abstract path handling rather than building JQ syntax strings directly * add v3 format segment and test	2022-12-15 15:39:26 -08:00
imply-cheddar	089d8da561	Support Framing for Window Aggregations (#13514 ) * Support Framing for Window Aggregations This adds support for framing over ROWS for window aggregations. Still not implemented as yet: 1. RANGE frames 2. Multiple different frames in the same query 3. Frames on last/first functions	2022-12-14 18:04:39 -08:00
Rohan Garg	35c983a351	Use template file for adding table functions grammar (#13553 )	2022-12-14 21:52:09 +05:30
somu-imply	7682b0b6b1	Analysis refactor (#13501 ) Refactor DataSource to have a getAnalysis method() This removes various parts of the code where while loops and instanceof checks were being used to walk through the structure of DataSource objects in order to build a DataSourceAnalysis. Instead we just ask the DataSource for its analysis and allow the stack to rebuild whatever structure existed.	2022-12-12 17:35:44 -08:00
Paul Rogers	013a12e86f	Enhanced MSQ table functions (#13360 ) * Enhanced MSQ table functions * HTTP, LOCALFILES and INLINE table functions powered by catalog metadata. * Documentation	2022-12-08 13:56:02 -08:00
imply-cheddar	83261f9641	Starting on Window Functions (#13458 ) * Processors for Window Processing This is an initial take on how to use Processors for Window Processing. A Processor is an interface that transforms RowsAndColumns objects. RowsAndColumns objects are essentially combinations of rows and columns. The intention is that these Processors are the start of a set of operators that more closely resemble what DB engineers would be accustomed to seeing. * Wire up windowed processors with a query type that can run them end-to-end. This code can be used to actually run a query, so yay! * Wire up windowed processors with a query type that can run them end-to-end. This code can be used to actually run a query, so yay! * Some SQL tests for window functions. Added wikipedia data to the indexes available to the SQL queries and tests validating the windowing functionality as it exists now. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2022-12-06 15:54:05 -08:00
Paul Rogers	b76ff16d00	SQL test framework extensions (#13426 ) SQL test framework extensions * Capture planner artifacts: logical plan, etc. * Planner test builder validates the logical plan * Validation for the SQL resut schema (we already have validation for the Druid row signature) * Better Guice integration: properties, reuse Guice modules * Avoid need for hand-coded expr, macro tables * Retire some of the test-specific query component creation * Fix query log hook race condition	2022-12-02 09:11:59 -08:00
Kashif Faraz	656b6cdf62	Add MetricsVerifier to simplify verification of metric values in tests (#13442 )	2022-11-28 19:32:37 +05:30
Kashif Faraz	7cf761cee4	Prepare master branch for next release, 26.0.0 (#13401 ) * Prepare master branch for next release, 26.0.0 * Use docker image for druid 24.0.1 * Fix version in druid-it-cases pom.xml	2022-11-22 15:31:01 +05:30
Paul Rogers	81d005f267	Druid Catalog basics (#13165 ) Druid catalog basics Catalog object model for tables, columns Druid metadata DB storage (as an extension) REST API to update the catalog (as an extension) Integration tests Model only: no planner integration yet	2022-11-12 15:30:22 -08:00
Clint Wylie	27215d1ff1	fix complex_decode_base64 function, add SQL bindings (#13332 ) * fix complex_decode_base64 function, add SQL bindings * more permissive	2022-11-09 23:40:25 -08:00
Paul Rogers	7e600d2c63	Enhancements to the Calcite test framework (#13283 ) * Enhancements to the Calcite test framework * Standardize "Unauthorized" messages * Additional test framework extension points * Resolved joinable factory dependency issue	2022-11-08 14:28:49 -08:00
Gian Merlino	8f90589ce5	Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. (#13247 ) * Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. These aggregation functions are documented as creating sketches. However, they are planned into native aggregators that include finalization logic to convert the sketch to a number of some sort. This creates an inconsistency: the functions sometimes return sketches, and sometimes return numbers, depending on where they lie in the native query plan. This patch changes these SQL aggregators to _never_ finalize, by using the "shouldFinalize" feature of the native aggregators. It already existed for theta sketches. This patch adds the feature for hll and quantiles sketches. As to impact, Druid finalizes aggregators in two cases: - When they appear in the outer level of a query (not a subquery). - When they are used as input to an expression or finalizing-field-access post-aggregator (not any other kind of post-aggregator). With this patch, the functions will no longer be finalized in these cases. The second item is not likely to matter much. The SQL functions all declare return type OTHER, which would be usable as an input to any other function that makes sense and that would be planned into an expression. So, the main effect of this patch is the first item. To provide backwards compatibility with anyone that was depending on the old behavior, the patch adds a "sqlFinalizeOuterSketches" query context parameter that restores the old behavior. Other changes: 1) Move various argument-checking logic from runtime to planning time in DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker. 2) Add various JsonIgnores to the sketches to simplify their JSON representations. 3) Allow chaining of ExpressionPostAggregators and other PostAggregators in the SQL layer. 4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer, now that expressions can operate on complex inputs. 5) Adjust return type to thetaSketch (instead of OTHER) in ThetaSketchSetBaseOperatorConversion. * Fix benchmark class. * Fix compilation error. * Fix ThetaSketchSqlAggregatorTest. * Hopefully fix ITAutoCompactionTest. * Adjustment to ITAutoCompactionTest.	2022-11-03 09:43:00 -07:00
Paul Rogers	22c140251a	Removed unused planner context parameter (#13249 ) Removed unused planner context parameter	2022-10-27 17:59:26 -07:00
somu-imply	affc522b9f	Refactoring the data source before unnest (#13085 ) * First set of changes for framework * Second set of changes to move segment map function to data source * Minot change to server manager * Removing the createSegmentMapFunction from JoinableFactoryWrapper and moving to JoinDataSource * Checkstyle fixes * Patching Eric's fix for injection * Checkstyle and fixing some CI issues * Fixing code inspections and some failed tests and one injector for test in avatica * Another set of changes for CI...almost there * Equals and hashcode part update * Fixing injector from Eric + refactoring for broadcastJoinHelper * Updating second injector. Might revert later if better way found * Fixing guice issue in JoinableFactory * Addressing review comments part 1 * Temp changes refactoring * Revert "Temp changes refactoring" This reverts commit `9da42a9ef0`. * temp * Temp discussions * Refactoring temp * Refatoring the query rewrite to refer to a datasource * Refactoring getCacheKey by moving it inside data source * Nullable annotation check in injector * Addressing some comments, removing 2 analysis.isJoin() checks and correcting the benchmark files * Minor changes for refactoring * Addressing reviews part 1 * Refactoring part 2 with new test cases for broadcast join * Set for nullables * removing instance of checks * Storing nullables in guice to avoid checking on reruns * Fixing a test case and removing an irrelevant line * Addressing the atomic reference review comments	2022-10-26 15:58:58 -07:00
Gian Merlino	2b0d873c7e	Fix two sources of SQL statement leaks. (#13259 ) * Fix two sources of SQL statement leaks. 1) SqlTaskResource and DruidJdbcResultSet leaked statements 100% of the time, since they call stmt.plan(), which adds statements to SqlLifecycleManager, and they do not explicitly remove them. 2) SqlResource leaked statements if yielder.close() threw an exception. (And also would not emit metrics, since in that case it failed to call stmt.close as well.) * Only closeQuietly is needed.	2022-10-25 09:31:56 -07:00
Paul Rogers	86e6e61e88	Modular Calcite Test Framework (#12965 ) * Refactor Calcite test "framework" for planner tests Refactors the current Calcite tests to make it a bit easier to adjust the set of runtime objects used within a test. * Move data creation out of CalciteTests into TestDataBuilder * Move "framework" creation out of CalciteTests into a QueryFramework * Move injector-dependent functions from CalciteTests into QueryFrameworkUtils * Wrapper around the planner factory, etc. to allow customization. * Bulk of the "framework" created once per class rather than once per test. * Refactor tests to use a test builder * Change all testQuery() methods to use the test builder. Move test execution & verification into a test runner.	2022-10-20 15:45:44 -07:00
Paul Rogers	b34b4353f4	Async reads for JDBC (#13196 ) Async reads for JDBC: Prevents JDBC timeouts on long queries by returning empty batches when a batch fetch takes too long. Uses an async model to run the result fetch concurrently with JDBC requests. Fixed race condition in Druid's Avatica server-side handler Fixed issue with no-user connections	2022-10-18 11:40:57 -07:00
Gian Merlino	6aca61763e	SQL: Use timestamp_floor when granularity is not safe. (#13206 ) * SQL: Use timestamp_floor when granularity is not safe. PR #12944 added a check at the execution layer to avoid materializing excessive amounts of time-granular buckets. This patch modifies the SQL planner to avoid generating queries that would throw such errors, by switching certain plans to use the timestamp_floor function instead of granularities. This applies both to the Timeseries query type, and the GroupBy timestampResultFieldGranularity feature. The patch also goes one step further: we switch to timestamp_floor not just in the ETERNITY + non-ALL case, but also if the estimated number of time-granular buckets exceeds 100,000. Finally, the patch modifies the timestampResultFieldGranularity field to consistently be a String rather than a Granularity. This ensures that it can be round-trip serialized and deserialized, which is useful when trying to execute the results of "EXPLAIN PLAN FOR" with GroupBy queries that use the timestampResultFieldGranularity feature. * Fix test, address PR comments. * Fix ControllerImpl. * Fix test. * Fix unused import.	2022-10-17 08:22:45 -07:00
Paul Rogers	f4dcc52dac	Redesign QueryContext class (#13071 ) We introduce two new configuration keys that refine the query context security model controlled by druid.auth.authorizeQueryContextParams. When that value is set to true then two other configuration options become available: druid.auth.unsecuredContextKeys: The set of query context keys that do not require a security check. Use this for the "white-list" of key to allow. All other keys go through the existing context key security checks. druid.auth.securedContextKeys: The set of query context keys that do require a security check. Use this when you want to allow all but a specific set of keys: only these keys go through the existing context key security checks. Both are set using JSON list format: druid.auth.securedContextKeys=["secretKey1", "secretKey2"] You generally set one or the other values. If both are set, unsecuredContextKeys acts as exceptions to securedContextKeys. In addition, Druid defines two query context keys which always bypass checks because Druid uses them internally: sqlQueryId sqlStringifyArrays	2022-10-15 11:02:11 +05:30
Clint Wylie	6eff6c9ae4	fix json_value sql planning with decimal type, fix vectorized expression math null value handling in default mode (#13214 ) * fix json_value sql planning with decimal type, fix vectorized expression math null value handling in default mode changes: * json_value 'returning' decimal will now plan to native double typed query instead of ending up with default string typing, allowing decimal vector math expressions to work with this type * vector math expressions now zero out 'null' values even in 'default' mode (druid.generic.useDefaultValueForNull=false) to prevent downstream things that do not check the null vector from producing incorrect results * more better * test and why not vectorize * more test, more fix	2022-10-12 16:28:41 -07:00
Abhishek Agarwal	61b34950e7	Fix assertion error in sql planning for latest aggregators (#13151 ) * Fix sql planning bug for latest aggregators * change test name * Fix error messages * fix error message again	2022-09-28 21:01:32 +05:30
Sam Rash	28b9edc2a8	Add BIG_SUM SQL function (#13102 ) This adds a sql function, "BIG_SUM", that uses CompressedBigDecimal to do a sum. Other misc changes: 1. handle NumberFormatExceptions when parsing a string (default to set to 0, configurable in agg factory to be strict and throw on error) 2. format pom file (whitespace) + add dependency 3. scaleUp -> scale and always require scale as a parameter	2022-09-26 18:02:25 -07:00
Paul Rogers	8ce03eb094	Convert the Druid planner to use statement handlers (#12905 ) * Converted Druid planner to use statement handlers Converts the large collection of if-statements for statement types into a set of classes: one per supported statement type. Cleans up a few error messages. * Revisions from review comments * Build fix * Build fix * Resolve merge confict. * More merges with QueryResponse PR * More parameterized type cleanup Forces a rebuild due to a flaky test	2022-09-19 11:58:45 +05:30
Frank Chen	fd6c05eee8	Avoid ClassCastException when getting values from `QueryContext` (#13022 ) * Use safe conversion methods * Rename method * Add getContextAsBoolean * Update test case * Remove generic from getContextValue * Update catch-handler * Add test * Resolve comments * Replace 'getContextXXX' to 'getQueryContext().getAsXXXX'	2022-09-13 18:00:09 +08:00
Gian Merlino	77925cdcdd	Expressions: fixes for round-trips of floating point literals, Long.MIN_VALUE literals, Shuffle.visitAll. (#13037 ) * SQL: Fix round-trips of floating point literals. When writing RexLiterals into Druid expressions, we now write non-integer numeric literals in such a way that ensures they are parsed as doubles on the other end. * Updates from code review, and some additional stuff inspired by the investigation. - Remove unnecessary formatting code from DruidExpression.doubleLiteral: it handles things just fine with its default behavior. - Fix a problem where expression literals could not represent Long.MIN_VALUE. Now, integer literals start life off as BigIntegerExpr instead of LongExpr, and are converted to LongExpr during flattening. This is necessary because, in order to avoid ambiguity between unary minus and negative literals, our grammar does not actually have true negative literals. Negative numbers must be represented as unary minus next to a positive literal. - Fix a bug introduced in #12230 where shuttle.visitAll(args) delegated to shuttle.visit(arg) instead of arg.visit(shuttle). The latter does a recursive visitation, which is the intended behavior. * Style fixes. * Move regexp to the right place.	2022-09-12 17:06:20 -07:00
Paul Rogers	80b97ac24d	Create a copy of the shared JDBC context (#13049 )	2022-09-12 10:27:56 -07:00
imply-cheddar	5ba0075c0c	Expose HTTP Response headers from SqlResource (#13052 ) * Expose HTTP Response headers from SqlResource This change makes the SqlResource expose HTTP response headers in the same way that the QueryResource exposes them. Fundamentally, the change is to pipe the QueryResponse object all the way through to the Resource so that it can populate response headers. There is also some code cleanup around DI, as there was a superfluous FactoryFactory class muddying things up.	2022-09-12 01:40:06 -07:00
Gian Merlino	e29e7a8434	Add ARRAY_QUANTILE function. (#13061 ) * Add ARRAY_QUANTILE function. Expected usage is like: ARRAY_QUANTILE(ARRAY_AGG(x), 0.9). * Fix test.	2022-09-09 11:29:20 -07:00
Rohan Garg	2f156b3610	Disallow timeseries queries with ETERNITY interval and non-ALL granularity (#12944 )	2022-09-07 16:45:08 +05:30
Rohan Garg	7aa8d7f987	Add query/time metric for SQL queries from router (#12867 ) * Add query/time metric for SQL queries from router * Fix query cancel bug when user has overriden native query-id in a SQL query	2022-09-07 13:54:46 +05:30
Clint Wylie	a3a377e570	more consistent expression error messages (#12995 ) * more consistent expression error messages * review stuff * add NamedFunction for Function, ApplyFunction, and ExprMacro to share common stuff * fixes * add expression transform name to transformer failure, better parse_json error messaging	2022-09-06 23:21:38 -07:00
Gian Merlino	0460d8a502	Adjust SQL "cannot plan" error message. (#12903 ) Two changes: 1) Restore the text of the SQL query. It was removed in #12897, but then it was later pointed out that the text is helpful for end users querying Druid through tools that do not show the SQL queries that they are making. 2) Adjust wording slightly, from "Cannot build plan for query" to "Query not supported". This will be clearer to most users. Generally the reason we get these errors is due to unsupported SQL constructs.	2022-08-29 18:33:00 +05:30
Abhishek Agarwal	618757352b	Bump up the version to 25.0.0 (#12975 ) * Bump up the version to 25.0.0 * Fix the version in console	2022-08-29 11:27:38 +05:30
Clint Wylie	16f5ac5bd5	json_value adjustments (#12968 ) * json_value adjustments changes: * native json_value expression now has optional 3rd argument to specify type, which will cast all values to the specified type * rework how JSON_VALUE is wired up in SQL. Now we are using a custom convertlet to translate JSON_VALUE(... RETURNING type) into dedicated JSON_VALUE_BIGINT, JSON_VALUE_DOUBLE, JSON_VALUE_VARCHAR, JSON_VALUE_ANY instead of using the calcite StandardConvertletTable that wraps JSON_VALUE_ANY in a CAST, so that we preserve the typing of JSON_VALUE to pass down to the native expression as the 3rd argument * fix json_value_any to be usable by humans too, coverage * fix bug * checkstyle * checkstyle * review stuff * validate that options to json_value are the supported options rather than ignore them * remove more legacy undocumented functions	2022-08-27 07:15:47 -07:00
Clint Wylie	4bdf9815c1	fix issue with SQL planner and null array constants (#12971 )	2022-08-26 04:44:17 -07:00
Clint Wylie	72aba00e09	add json function support for paths with negative array indexes (#12972 )	2022-08-25 17:11:28 -07:00
Clint Wylie	82ad927087	tighten up array handling, fix bug with array_slice output type inference (#12914 )	2022-08-25 00:48:49 -07:00
Karan Kumar	f7c6316992	Setting useNativeQueryExplain to true (#12936 ) * Setting useNativeQueryExplain to true * Update docs/querying/sql-query-context.md Co-authored-by: Santosh Pingale <pingalesantosh@gmail.com> * Fixing tests * Fixing broken tests Co-authored-by: Santosh Pingale <pingalesantosh@gmail.com>	2022-08-24 17:39:55 +05:30
Clint Wylie	289e43281e	stricter behavior for parse_json, add try_parse_json, remove to_json (#12920 )	2022-08-22 18:41:07 -07:00
Rohan Garg	3c129f6728	Add sql planning time metric (#12923 )	2022-08-22 11:09:44 +05:30
Paul Rogers	eb902375a2	Light refactor of the heavily refactored statement classes (#12909 ) Reflects lessons learned from working with consumers of the new code.	2022-08-19 02:31:06 +05:30
Gian Merlino	d3015d0f8e	DruidQuery: Return a copy from withScanSignatureIfNeeded, as promised. (#12906 ) The method wasn't following its contract, leading to pollution of the overall planner context, when really we just want to create a new context for a specific query.	2022-08-16 13:23:14 -07:00
Gian Merlino	6c5a43106a	SQL: Morph QueryMakerFactory into SqlEngine. (#12897 ) * SQL: Morph QueryMakerFactory into SqlEngine. Groundwork for introducing an indexing-service-task-based SQL engine under the umbrella of #12262. Also includes some other changes related to improving error behavior. Main changes: 1) Elevate the QueryMakerFactory interface (an extension point that allows customization of how queries are made) into SqlEngine. SQL engines can influence planner behavior through EngineFeatures, and can fully control the mechanics of query execution using QueryMakers. 2) Remove the server-wide QueryMakerFactory choice, in favor of the choice being made by the SQL entrypoint. The indexing-service-task-based SQL engine would be associated with its own entrypoint, like /druid/v2/sql/task. Other changes: 1) Adjust DruidPlanner to try either DRUID or BINDABLE convention based on analysis of the planned rels; never try both. In particular, we no longer try BINDABLE when DRUID fails. This simplifies the logic and improves error messages. 2) Adjust error message "Cannot build plan for query" to omit the SQL query text. Useful because the text can be quite long, which makes it easy to miss the text about the problem. 3) Add a feature to block context parameters used internally by the SQL planner from being supplied by end users. 4) Add a feature to enable adding row signature to the context for Scan queries. This is useful in building the task-based engine. 5) Add saffron.properties file that turns off sets and graphviz dumps in "cannot plan" errors. Significantly reduces log spam on the Broker. * Fixes from CI. * Changes from review. * Can vectorize, now that join-to-filter is on by default. * Checkstyle! And variable renames! * Remove throws from test.	2022-08-14 23:31:19 -07:00
Paul Rogers	41712b7a3a	Refactor SqlLifecycle into statement classes (#12845 ) * Refactor SqlLifecycle into statement classes Create direct & prepared statements Remove redundant exceptions from tests Tidy up Calcite query tests Make PlannerConfig more testable * Build fixes * Added builder to SqlQueryPlus * Moved Calcites system properties to saffron.properties * Build fix * Resolve merge conflict * Fix IntelliJ inspection issue * Revisions from reviews Backed out a revision to Calcite tests that didn't work out as planned * Build fix * Fixed spelling errors * Fixed failed test Prepare now enforces security; before it did not. * Rebase and fix IntelliJ inspections issue * Clean up exception handling * Fix handling of JDBC auth errors * Build fix * More tweaks to security messages	2022-08-14 00:44:08 -07:00
Rohan Garg	5394838030	Enable conversion of join to filter by default (#12868 )	2022-08-13 20:37:43 +05:30
Gian Merlino	836430019a	Add EXTERNAL resource type. (#12896 ) This is used to control access to the EXTERN function, which allows reading external data in SQL. The EXTERN function is not usable in production as of today, but it is used by the task-based SQL engine contemplated in #12262.	2022-08-12 10:57:30 -07:00
Paul Rogers	8ad8582dc8	Refactor DruidSchema & DruidTable (#12835 ) Refactors the DruidSchema and DruidTable abstractions to prepare for the Druid Catalog. As we add the catalog, we’ll want to combine physical segment metadata information with “hints” provided by the catalog. This is best done if we tidy up the existing code to more clearly separate responsibilities. This PR is purely a refactoring move: no functionality changed. There is no difference to user functionality or external APIs. Functionality changes will come later as we add the catalog itself. DruidSchema In the present code, DruidSchema does three tasks: Holds the segment metadata cache Interfaces with an external schema manager Acts as a schema to Calcite This PR splits those responsibilities. DruidSchema holds the Calcite schema for the druid namespace, combining information fro the segment metadata cache, from the external schema manager and (later) from the catalog. SegmentMetadataCache holds the segment metadata cache formerly in DruidSchema. DruidTable The present DruidTable class is a bit of a kitchen sink: it holds all the various kinds of tables which Druid supports, and uses if-statements to handle behavior that differs between types. Yet, any given DruidTable will handle only one such table type. To more clearly model the actual table types, we split DruidTable into several classes: DruidTable becomes an abstract base class to hold Druid-specific methods. DatasourceTable represents a datasource. ExternalTable represents an external table, such as from EXTERN or (later) from the catalog. InlineTable represents the internal case in which we attach data directly to a table. LookupTable represents Druid’s lookup table mechanism. The new subclasses are more focused: they can be selective about the data they hold and the various predicates since they represent just one table type. This will be important as the catalog information will differ depending on table type and the new structure makes adding that logic cleaner. DatasourceMetadata Previously, the DruidSchema segment cache would work with DruidTable objects. With the catalog, we need a layer between the segment metadata and the table as presented to Calcite. To fix this, the new SegmentMetadataCache class uses a new DatasourceMetadata class as its cache entry to hold only the “physical” segment metadata information: it is up to the DruidTable to combine this with the catalog information in a later PR. More Efficient Table Resolution Calcite provides a convenient base class for schema objects: AbstractSchema. However, this class is a bit too convenient: all we have to do is provide a map of tables and Calcite does the rest. This means that, to resolve any single datasource, say, foo, we need to cache segment metadata, external schema information, and catalog information for all tables. Just so Calcite can do a map lookup. There is nothing special about AbstractSchema. We can handle table lookups ourselves. The new AbstractTableSchema does this. In fact, all the rest of Calcite wants is to resolve individual tables by name, and, for commands we don’t use, to provide a list of table names. DruidSchema now extends AbstractTableSchema. SegmentMetadataCache resolves individual tables (and provides table names.) DruidSchemaManager DruidSchemaManager provides a way to specify table schemas externally. In this sense, it is similar to the catalog, but only for datasources. It originally followed the AbstractSchema pattern: it implements provide a map of tables. This PR provides new optional methods for the table lookup and table names operations. The default implementations work the same way that AbstractSchema works: we get the entire map and pick out the information we need. Extensions that use this API should be revised to support the individual operations instead. Druid code no longer calls the original getTables() method. The PR has one breaking change: since the DruidSchemaManager map is read-only to the rest of Druid, we should return a Map, not a ConcurrentMap.	2022-08-10 10:24:04 +05:30
Clint Wylie	ee41cc770f	fix issue with SQL sum aggregator due to bug with DruidTypeSystem and AggregateRemoveRule (#12880 ) * fix issue with SQL sum aggregator due to bug with DruidTypeSystem and AggregateRemoveRule * fix style * add comment about using custom sum function	2022-08-09 15:17:45 -07:00

... 2 3 4 5 6 ...

957 Commits