druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	0460d8a502	Adjust SQL "cannot plan" error message. (#12903 ) Two changes: 1) Restore the text of the SQL query. It was removed in #12897, but then it was later pointed out that the text is helpful for end users querying Druid through tools that do not show the SQL queries that they are making. 2) Adjust wording slightly, from "Cannot build plan for query" to "Query not supported". This will be clearer to most users. Generally the reason we get these errors is due to unsupported SQL constructs.	2022-08-29 18:33:00 +05:30
Abhishek Agarwal	618757352b	Bump up the version to 25.0.0 (#12975 ) * Bump up the version to 25.0.0 * Fix the version in console	2022-08-29 11:27:38 +05:30
Clint Wylie	16f5ac5bd5	json_value adjustments (#12968 ) * json_value adjustments changes: * native json_value expression now has optional 3rd argument to specify type, which will cast all values to the specified type * rework how JSON_VALUE is wired up in SQL. Now we are using a custom convertlet to translate JSON_VALUE(... RETURNING type) into dedicated JSON_VALUE_BIGINT, JSON_VALUE_DOUBLE, JSON_VALUE_VARCHAR, JSON_VALUE_ANY instead of using the calcite StandardConvertletTable that wraps JSON_VALUE_ANY in a CAST, so that we preserve the typing of JSON_VALUE to pass down to the native expression as the 3rd argument * fix json_value_any to be usable by humans too, coverage * fix bug * checkstyle * checkstyle * review stuff * validate that options to json_value are the supported options rather than ignore them * remove more legacy undocumented functions	2022-08-27 07:15:47 -07:00
Clint Wylie	4bdf9815c1	fix issue with SQL planner and null array constants (#12971 )	2022-08-26 04:44:17 -07:00
Clint Wylie	72aba00e09	add json function support for paths with negative array indexes (#12972 )	2022-08-25 17:11:28 -07:00
Clint Wylie	82ad927087	tighten up array handling, fix bug with array_slice output type inference (#12914 )	2022-08-25 00:48:49 -07:00
Karan Kumar	f7c6316992	Setting useNativeQueryExplain to true (#12936 ) * Setting useNativeQueryExplain to true * Update docs/querying/sql-query-context.md Co-authored-by: Santosh Pingale <pingalesantosh@gmail.com> * Fixing tests * Fixing broken tests Co-authored-by: Santosh Pingale <pingalesantosh@gmail.com>	2022-08-24 17:39:55 +05:30
Clint Wylie	289e43281e	stricter behavior for parse_json, add try_parse_json, remove to_json (#12920 )	2022-08-22 18:41:07 -07:00
Rohan Garg	3c129f6728	Add sql planning time metric (#12923 )	2022-08-22 11:09:44 +05:30
Paul Rogers	eb902375a2	Light refactor of the heavily refactored statement classes (#12909 ) Reflects lessons learned from working with consumers of the new code.	2022-08-19 02:31:06 +05:30
Gian Merlino	d3015d0f8e	DruidQuery: Return a copy from withScanSignatureIfNeeded, as promised. (#12906 ) The method wasn't following its contract, leading to pollution of the overall planner context, when really we just want to create a new context for a specific query.	2022-08-16 13:23:14 -07:00
Gian Merlino	6c5a43106a	SQL: Morph QueryMakerFactory into SqlEngine. (#12897 ) * SQL: Morph QueryMakerFactory into SqlEngine. Groundwork for introducing an indexing-service-task-based SQL engine under the umbrella of #12262. Also includes some other changes related to improving error behavior. Main changes: 1) Elevate the QueryMakerFactory interface (an extension point that allows customization of how queries are made) into SqlEngine. SQL engines can influence planner behavior through EngineFeatures, and can fully control the mechanics of query execution using QueryMakers. 2) Remove the server-wide QueryMakerFactory choice, in favor of the choice being made by the SQL entrypoint. The indexing-service-task-based SQL engine would be associated with its own entrypoint, like /druid/v2/sql/task. Other changes: 1) Adjust DruidPlanner to try either DRUID or BINDABLE convention based on analysis of the planned rels; never try both. In particular, we no longer try BINDABLE when DRUID fails. This simplifies the logic and improves error messages. 2) Adjust error message "Cannot build plan for query" to omit the SQL query text. Useful because the text can be quite long, which makes it easy to miss the text about the problem. 3) Add a feature to block context parameters used internally by the SQL planner from being supplied by end users. 4) Add a feature to enable adding row signature to the context for Scan queries. This is useful in building the task-based engine. 5) Add saffron.properties file that turns off sets and graphviz dumps in "cannot plan" errors. Significantly reduces log spam on the Broker. * Fixes from CI. * Changes from review. * Can vectorize, now that join-to-filter is on by default. * Checkstyle! And variable renames! * Remove throws from test.	2022-08-14 23:31:19 -07:00
Paul Rogers	41712b7a3a	Refactor SqlLifecycle into statement classes (#12845 ) * Refactor SqlLifecycle into statement classes Create direct & prepared statements Remove redundant exceptions from tests Tidy up Calcite query tests Make PlannerConfig more testable * Build fixes * Added builder to SqlQueryPlus * Moved Calcites system properties to saffron.properties * Build fix * Resolve merge conflict * Fix IntelliJ inspection issue * Revisions from reviews Backed out a revision to Calcite tests that didn't work out as planned * Build fix * Fixed spelling errors * Fixed failed test Prepare now enforces security; before it did not. * Rebase and fix IntelliJ inspections issue * Clean up exception handling * Fix handling of JDBC auth errors * Build fix * More tweaks to security messages	2022-08-14 00:44:08 -07:00
Rohan Garg	5394838030	Enable conversion of join to filter by default (#12868 )	2022-08-13 20:37:43 +05:30
Gian Merlino	836430019a	Add EXTERNAL resource type. (#12896 ) This is used to control access to the EXTERN function, which allows reading external data in SQL. The EXTERN function is not usable in production as of today, but it is used by the task-based SQL engine contemplated in #12262.	2022-08-12 10:57:30 -07:00
Paul Rogers	8ad8582dc8	Refactor DruidSchema & DruidTable (#12835 ) Refactors the DruidSchema and DruidTable abstractions to prepare for the Druid Catalog. As we add the catalog, we’ll want to combine physical segment metadata information with “hints” provided by the catalog. This is best done if we tidy up the existing code to more clearly separate responsibilities. This PR is purely a refactoring move: no functionality changed. There is no difference to user functionality or external APIs. Functionality changes will come later as we add the catalog itself. DruidSchema In the present code, DruidSchema does three tasks: Holds the segment metadata cache Interfaces with an external schema manager Acts as a schema to Calcite This PR splits those responsibilities. DruidSchema holds the Calcite schema for the druid namespace, combining information fro the segment metadata cache, from the external schema manager and (later) from the catalog. SegmentMetadataCache holds the segment metadata cache formerly in DruidSchema. DruidTable The present DruidTable class is a bit of a kitchen sink: it holds all the various kinds of tables which Druid supports, and uses if-statements to handle behavior that differs between types. Yet, any given DruidTable will handle only one such table type. To more clearly model the actual table types, we split DruidTable into several classes: DruidTable becomes an abstract base class to hold Druid-specific methods. DatasourceTable represents a datasource. ExternalTable represents an external table, such as from EXTERN or (later) from the catalog. InlineTable represents the internal case in which we attach data directly to a table. LookupTable represents Druid’s lookup table mechanism. The new subclasses are more focused: they can be selective about the data they hold and the various predicates since they represent just one table type. This will be important as the catalog information will differ depending on table type and the new structure makes adding that logic cleaner. DatasourceMetadata Previously, the DruidSchema segment cache would work with DruidTable objects. With the catalog, we need a layer between the segment metadata and the table as presented to Calcite. To fix this, the new SegmentMetadataCache class uses a new DatasourceMetadata class as its cache entry to hold only the “physical” segment metadata information: it is up to the DruidTable to combine this with the catalog information in a later PR. More Efficient Table Resolution Calcite provides a convenient base class for schema objects: AbstractSchema. However, this class is a bit too convenient: all we have to do is provide a map of tables and Calcite does the rest. This means that, to resolve any single datasource, say, foo, we need to cache segment metadata, external schema information, and catalog information for all tables. Just so Calcite can do a map lookup. There is nothing special about AbstractSchema. We can handle table lookups ourselves. The new AbstractTableSchema does this. In fact, all the rest of Calcite wants is to resolve individual tables by name, and, for commands we don’t use, to provide a list of table names. DruidSchema now extends AbstractTableSchema. SegmentMetadataCache resolves individual tables (and provides table names.) DruidSchemaManager DruidSchemaManager provides a way to specify table schemas externally. In this sense, it is similar to the catalog, but only for datasources. It originally followed the AbstractSchema pattern: it implements provide a map of tables. This PR provides new optional methods for the table lookup and table names operations. The default implementations work the same way that AbstractSchema works: we get the entire map and pick out the information we need. Extensions that use this API should be revised to support the individual operations instead. Druid code no longer calls the original getTables() method. The PR has one breaking change: since the DruidSchemaManager map is read-only to the rest of Druid, we should return a Map, not a ConcurrentMap.	2022-08-10 10:24:04 +05:30
Clint Wylie	ee41cc770f	fix issue with SQL sum aggregator due to bug with DruidTypeSystem and AggregateRemoveRule (#12880 ) * fix issue with SQL sum aggregator due to bug with DruidTypeSystem and AggregateRemoveRule * fix style * add comment about using custom sum function	2022-08-09 15:17:45 -07:00
Gian Merlino	01d555e47b	Adjust "in" filter null behavior to match "selector". (#12863 ) * Adjust "in" filter null behavior to match "selector". Now, both of them match numeric nulls if constructed with a "null" value. This is consistent as far as native execution goes, but doesn't match the behavior of SQL = and IN. So, to address that, this patch also updates the docs to clarify that the native filters do match nulls. This patch also updates the SQL docs to describe how Boolean logic is handled in addition to how NULL values are handled. Fixes #12856. * Fix test.	2022-08-08 09:08:36 -07:00
Paul Rogers	a618458bf0	Tidy up construction of the Guice Injectors (#12816 ) * Refactor Guice initialization Builders for various module collections Revise the extensions loader Injector builders for server startup Move Hadoop init to indexer Clean up server node role filtering Calcite test injector builder * Revisions from review comments * Build fixes * Revisions from review comments	2022-08-04 00:05:07 -07:00
Clint Wylie	623b075d12	fix nested column sql operator return type inference (#12851 ) * fix nested column sql operator return type inference * oops, final	2022-08-03 15:39:08 -07:00
Clint Wylie	6981b1cc12	fix bugs with nested column jsonpath parser (#12831 )	2022-08-02 11:38:25 -07:00
Clint Wylie	189e8b9d18	add NumericRangeIndex interface and BoundFilter support (#12830 ) add NumericRangeIndex interface and BoundFilter support changes: * NumericRangeIndex interface, like LexicographicalRangeIndex but for numbers * BoundFilter now uses NumericRangeIndex if comparator is numeric and there is no extractionFn * NestedFieldLiteralColumnIndexSupplier.java now supports supplying NumericRangeIndex for single typed numeric nested literal columns * better faster stronger and (ever so slightly) more understandable * more tests, fix bug * fix style	2022-07-29 18:58:49 -07:00
Paul Rogers	d52abe7b38	Today is that day - Single pass through Calcite planner (#12636 ) * Druid planner now makes only one pass through Calcite planner Resolves the issue that required two parse/plan cycles: one for validate, another for plan. Creates a clone of the Calcite planner and validator to resolve the conflict that prevented the merger.	2022-07-29 18:53:21 -07:00
Paul Rogers	a8b155e9c6	Fixes for the Avatica JDBC driver (#12709 ) * Fixes for the Avatica JDBC driver Correctly implement regular and prepared statements Correctly implement result sets Fix race condition with contexts Clarify when parameters are used Prepare for single-pass through the planner * Addressed review comments * Addressed review comment	2022-07-27 15:22:40 -07:00
Laksh Singla	2e616e633a	Determine type of `__time` column by RowSignature in case of External Datasource (#12770 ) Some queries like `REPLACE INTO ... SELECT TIME_PARSE("__time") AS __time FROM ...` fail at the Calcite layer because any column with name `__time` is considered to be of type `SqlTypeName.TIMESTAMP`. Changes: - Modify `RowSignatures.toRelDataType()` so that the type of `__time` column is determined by the RowSignature's type.	2022-07-26 12:09:40 +05:30
Maytas Monsereenusorn	3bf1e699ff	GREATEST/LEAST function is incorrectly specifying that it cannot return null (#12804 )	2022-07-20 14:41:24 +05:30
Adarsh Sanjeev	f3272a25f9	Add check for sqlOuterLimit to ingest queries (#12799 ) * Add check for sqlOuterLimit to ingest queries * Fix checkstyle * Add comment	2022-07-19 09:02:43 -07:00
Paul Rogers	ee15c238cc	Clone Calcite planner to access validator (#12708 ) Done in preparation for the "single-pass" planner.	2022-07-14 18:10:33 -07:00
Clint Wylie	05b2e967ed	druid nested data column type (#12753 ) * add new druid nested data column type * fixes and such * fixes * adjustments, more tests * self review * oops * fix and test * more better * style	2022-07-14 12:07:23 -07:00
Rohan Garg	bb953be09b	Refactor usage of JoinableFactoryWrapper + more test coverage (#12767 ) Refactor usage of JoinableFactoryWrapper to add e2e test for createSegmentMapFn with joinToFilter feature enabled	2022-07-12 06:25:36 -07:00
Gian Merlino	97207cdcc7	Automatic sizing for GroupBy dictionaries. (#12763 ) * Automatic sizing for GroupBy dictionary sizes. Merging and selector dictionary sizes currently both default to 100MB. This is not optimal, because it can lead to OOM on small servers and insufficient resource utilization on larger servers. It also invites end users to try to tune it when queries run out of dictionary space, which can make things worse if the end user sets it to too high. So, this patch: - Adds automatic tuning for selector and merge dictionaries. Selectors use up to 15% of the heap and merge buffers use up to 30% of the heap (aggregate across all queries). - Updates out-of-memory error messages to emphasize enabling disk spilling vs. increasing memory parameters. With the memory parameters automatically sized, it is more likely that an end user will get benefit from enabling disk spilling. - Removes the query context parameters that allow lowering of configured dictionary sizes. These complicate the calculation, and I don't see a reasonable use case for them. * Adjust tests. * Review adjustments. * Additional comment. * Remove unused import.	2022-07-11 08:20:50 -07:00
Gian Merlino	edfbcc8455	Preserve column order in DruidSchema, SegmentMetadataQuery. (#12754 ) * Preserve column order in DruidSchema, SegmentMetadataQuery. Instead of putting columns in alphabetical order. This is helpful because it makes query order better match ingestion order. It also allows tools, like the reindexing flow in the web console, to more easily do follow-on ingestions using a column order that matches the pre-existing column order. We prefer the order from the latest segments. The logic takes all columns from the latest segments in the order they appear, then adds on columns from older segments after those. * Additional test adjustments. * Adjust imports.	2022-07-08 22:04:11 -07:00
Gian Merlino	9c925b4f09	Frame format for data transfer and short-term storage. (#12745 ) * Frame format for data transfer and short-term storage. As we move towards query execution plans that involve more transfer of data between servers, it's important to have a data format that provides for doing this more efficiently than the options available to us today. This patch adds: - Columnar frames, which support fast querying. - Row-based frames, which support fast sorting via memory comparison and fast whole-row copies via memory copying. - Frame files, a container format that can be stored on disk or transferred between servers. The idea is we should use row-based frames when data is expected to be sorted, and columnar frames when data is expected to be queried. The code in this patch is not used in production yet. Therefore, the patch involves minimal changes outside of the org.apache.druid.frame package. The main ones are adjustments to SqlBenchmark to add benchmarks for queries on frames, and the addition of a "forEach" method to Sequence. * Fixes based on tests, static analysis. * Additional fixes. * Skip DS mapping tests on JDK 14+ * Better JDK checking in tests. * Fix imports. * Additional comment. * Adjustments from code review. * Update test case.	2022-07-08 20:42:06 -07:00
Rohan Garg	d732de9948	Allow adding calcite rules from extensions (#12715 ) * Allow adding calcite rules from extensions * fixup! Allow adding calcite rules from extensions * Move Rules to CalciteRulesManager * fixup! Move Rules to CalciteRulesManager	2022-07-06 19:32:35 +05:30
Didip Kerabat	06251c5d2a	Add EIGHT_HOUR into possible list of Granularities. (#12717 ) * Add EIGHT_HOUR into possible list of Granularities. * Add the missing definition. * fix test. * Fix another test. * Stylecheck finally passed. Co-authored-by: Didip Kerabat <didip@apple.com>	2022-07-05 11:05:37 -07:00
Clint Wylie	bbbb6e1c3f	fix DruidSchema issue where datasources with no segments can become stuck in tables list indefinitely (#12727 )	2022-07-01 18:54:01 -07:00
Clint Wylie	48731710fb	precursor changes for nested columns to minimize files changed (#12714 ) * precursor changes for nested columns to minimize files changed * inspection fix * visibility * adjustment * unecessary change	2022-07-01 02:27:19 -07:00
Clint Wylie	d30efb1c1e	fix bug when rewriting sql virtual column registry (#12718 )	2022-07-01 02:24:00 -07:00
Tejaswini Bandlamudi	1fc2f6e4b0	Throw BadQueryContextException if context params cannot be parsed (#12680 )	2022-06-24 09:21:25 +05:30
Paul Rogers	ffcb996468	Cleanup changes pulled out of PR #12368 (#12672 ) This commit contains the cleanup needed for the new integration test framework. Changes: - Fix log lines, misspellings, docs, etc. - Allow the use of some of Druid's "JSON config" objects in tests - Fix minor bug in `BaseNodeRoleWatcher`	2022-06-23 23:19:50 +05:30
Kashif Faraz	b6f8d7a1b3	Add query context param `forceExpressionVirtualColumns` to always use "expression"-type virtual columns in query plan (#12583 ) SQL expressions such as those containing `MV_FILTER_ONLY` and `MV_FILTER_NONE` are planned as specialized virtual columns instead of the default `expression`-type virtual columns. This commit adds a new context parameter to force the `expression`-type virtual columns. Changes - Add query context param `forceExpressionVirtualColumns` - Use context param to determine if specialized virtual columns should be used or not - Moved some tests into `CalciteExplainQueryTest`	2022-06-22 15:33:50 +05:30
Gian Merlino	0099940808	Add TIME_IN_INTERVAL SQL operator. (#12662 ) * Add TIME_IN_INTERVAL SQL operator. The operator is implemented as a convertlet rather than an OperatorConversion, because this allows it to be equivalent to using the >= and < operators directly. * SqlParserPos cannot be null here. * Remove unused import. * Doc updates. * Add words to dictionary.	2022-06-21 13:05:37 -07:00
Gian Merlino	818974f6e4	ScanQuery: Fix JsonIgnore for isLegacy. (#12674 ) True, false, and null have different meanings: true/false mean "legacy" and "not legacy"; null means use the default set by ScanQueryConfig. So, we need to respect this in the JsonIgnore setup.	2022-06-18 15:55:54 -07:00
Paul Rogers	893759de91	Remove null and empty fields from native queries (#12634 ) * Remove null and empty fields from native queries * Test fixes * Attempted IT fix. * Revisions from review comments * Build fixes resulting from changes suggested by reviews * IT fix for changed segment size	2022-06-16 14:07:25 -07:00
TSFenwick	a3603ad6b0	Use DefaultQueryConfig in SqlLifecycle to correctly populate request logs (#12613 ) Fixes an issue where sql query request logs do not include the default query context values set via `druid.query.default.context.xyz` runtime properties. # Change summary * Inject `DefaultQueryConfig` into `SqlLifecycleFactory` * Add params from `DefaultQueryConfig` to the query context in `SqlLifecycle` # Description - This change does not affect query execution. This is because the `DefaultQueryConfig` was already being used in `QueryLifecycle`, which is initialized when the SQL is translated to a native query. - This also handles any potential use case where a context parameter should be handled at the SQL stage itself.	2022-06-08 12:52:50 +05:30
Laksh Singla	81c37c6515	Add validation for invalid partitioned by granularities (#12589 ) * Add validation for invalid partitioned by granularities * review comments * improve error message, change location of the method * remove imports * use StringUtils.lowercase Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>	2022-06-06 22:00:29 +05:30
Adarsh Sanjeev	5a283964ca	Improve SQL validation error messages (#12611 ) Update the SQL validation error message to specify whether the ingest is INSERT or REPLACE for better user experience.	2022-06-06 16:14:28 +05:30
Clint Wylie	dc0fdfec67	fix test comment (#12584 )	2022-05-31 12:39:20 -07:00
Gian Merlino	02ae3e74ff	RowBasedColumnSelectorFactory: Add "useStringValueOfNullInLists" parameter. (#12578 ) RowBasedColumnSelectorFactory inherited strange behavior from Rows.objectToStrings for nulls that appear in lists: instead of being left as a null, it is replaced with the string "null". Some callers may need compatibility with this strange behavior, but it should be opt-in. Query-time call sites are changed to opt-out of this behavior, since it is not consistent with query-time expectations. The IncrementalIndex ingestion-time call site retains the old behavior, as this is traditionally when Rows.objectToStrings would be used.	2022-05-31 11:38:56 -07:00
Clint Wylie	b746bf9129	fix virtual column cycle bug, sql virtual column optimize bug (#12576 ) * fix virtual column cycle bug, sql virtual column optimize bug * more test	2022-05-30 23:51:21 -07:00

1 2 3 4 5 ...

624 Commits