druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	289e43281e	stricter behavior for parse_json, add try_parse_json, remove to_json (#12920 )	2022-08-22 18:41:07 -07:00
Rohan Garg	3c129f6728	Add sql planning time metric (#12923 )	2022-08-22 11:09:44 +05:30
Paul Rogers	eb902375a2	Light refactor of the heavily refactored statement classes (#12909 ) Reflects lessons learned from working with consumers of the new code.	2022-08-19 02:31:06 +05:30
Gian Merlino	d3015d0f8e	DruidQuery: Return a copy from withScanSignatureIfNeeded, as promised. (#12906 ) The method wasn't following its contract, leading to pollution of the overall planner context, when really we just want to create a new context for a specific query.	2022-08-16 13:23:14 -07:00
Gian Merlino	6c5a43106a	SQL: Morph QueryMakerFactory into SqlEngine. (#12897 ) * SQL: Morph QueryMakerFactory into SqlEngine. Groundwork for introducing an indexing-service-task-based SQL engine under the umbrella of #12262. Also includes some other changes related to improving error behavior. Main changes: 1) Elevate the QueryMakerFactory interface (an extension point that allows customization of how queries are made) into SqlEngine. SQL engines can influence planner behavior through EngineFeatures, and can fully control the mechanics of query execution using QueryMakers. 2) Remove the server-wide QueryMakerFactory choice, in favor of the choice being made by the SQL entrypoint. The indexing-service-task-based SQL engine would be associated with its own entrypoint, like /druid/v2/sql/task. Other changes: 1) Adjust DruidPlanner to try either DRUID or BINDABLE convention based on analysis of the planned rels; never try both. In particular, we no longer try BINDABLE when DRUID fails. This simplifies the logic and improves error messages. 2) Adjust error message "Cannot build plan for query" to omit the SQL query text. Useful because the text can be quite long, which makes it easy to miss the text about the problem. 3) Add a feature to block context parameters used internally by the SQL planner from being supplied by end users. 4) Add a feature to enable adding row signature to the context for Scan queries. This is useful in building the task-based engine. 5) Add saffron.properties file that turns off sets and graphviz dumps in "cannot plan" errors. Significantly reduces log spam on the Broker. * Fixes from CI. * Changes from review. * Can vectorize, now that join-to-filter is on by default. * Checkstyle! And variable renames! * Remove throws from test.	2022-08-14 23:31:19 -07:00
Paul Rogers	41712b7a3a	Refactor SqlLifecycle into statement classes (#12845 ) * Refactor SqlLifecycle into statement classes Create direct & prepared statements Remove redundant exceptions from tests Tidy up Calcite query tests Make PlannerConfig more testable * Build fixes * Added builder to SqlQueryPlus * Moved Calcites system properties to saffron.properties * Build fix * Resolve merge conflict * Fix IntelliJ inspection issue * Revisions from reviews Backed out a revision to Calcite tests that didn't work out as planned * Build fix * Fixed spelling errors * Fixed failed test Prepare now enforces security; before it did not. * Rebase and fix IntelliJ inspections issue * Clean up exception handling * Fix handling of JDBC auth errors * Build fix * More tweaks to security messages	2022-08-14 00:44:08 -07:00
Rohan Garg	5394838030	Enable conversion of join to filter by default (#12868 )	2022-08-13 20:37:43 +05:30
Gian Merlino	836430019a	Add EXTERNAL resource type. (#12896 ) This is used to control access to the EXTERN function, which allows reading external data in SQL. The EXTERN function is not usable in production as of today, but it is used by the task-based SQL engine contemplated in #12262.	2022-08-12 10:57:30 -07:00
Paul Rogers	8ad8582dc8	Refactor DruidSchema & DruidTable (#12835 ) Refactors the DruidSchema and DruidTable abstractions to prepare for the Druid Catalog. As we add the catalog, we’ll want to combine physical segment metadata information with “hints” provided by the catalog. This is best done if we tidy up the existing code to more clearly separate responsibilities. This PR is purely a refactoring move: no functionality changed. There is no difference to user functionality or external APIs. Functionality changes will come later as we add the catalog itself. DruidSchema In the present code, DruidSchema does three tasks: Holds the segment metadata cache Interfaces with an external schema manager Acts as a schema to Calcite This PR splits those responsibilities. DruidSchema holds the Calcite schema for the druid namespace, combining information fro the segment metadata cache, from the external schema manager and (later) from the catalog. SegmentMetadataCache holds the segment metadata cache formerly in DruidSchema. DruidTable The present DruidTable class is a bit of a kitchen sink: it holds all the various kinds of tables which Druid supports, and uses if-statements to handle behavior that differs between types. Yet, any given DruidTable will handle only one such table type. To more clearly model the actual table types, we split DruidTable into several classes: DruidTable becomes an abstract base class to hold Druid-specific methods. DatasourceTable represents a datasource. ExternalTable represents an external table, such as from EXTERN or (later) from the catalog. InlineTable represents the internal case in which we attach data directly to a table. LookupTable represents Druid’s lookup table mechanism. The new subclasses are more focused: they can be selective about the data they hold and the various predicates since they represent just one table type. This will be important as the catalog information will differ depending on table type and the new structure makes adding that logic cleaner. DatasourceMetadata Previously, the DruidSchema segment cache would work with DruidTable objects. With the catalog, we need a layer between the segment metadata and the table as presented to Calcite. To fix this, the new SegmentMetadataCache class uses a new DatasourceMetadata class as its cache entry to hold only the “physical” segment metadata information: it is up to the DruidTable to combine this with the catalog information in a later PR. More Efficient Table Resolution Calcite provides a convenient base class for schema objects: AbstractSchema. However, this class is a bit too convenient: all we have to do is provide a map of tables and Calcite does the rest. This means that, to resolve any single datasource, say, foo, we need to cache segment metadata, external schema information, and catalog information for all tables. Just so Calcite can do a map lookup. There is nothing special about AbstractSchema. We can handle table lookups ourselves. The new AbstractTableSchema does this. In fact, all the rest of Calcite wants is to resolve individual tables by name, and, for commands we don’t use, to provide a list of table names. DruidSchema now extends AbstractTableSchema. SegmentMetadataCache resolves individual tables (and provides table names.) DruidSchemaManager DruidSchemaManager provides a way to specify table schemas externally. In this sense, it is similar to the catalog, but only for datasources. It originally followed the AbstractSchema pattern: it implements provide a map of tables. This PR provides new optional methods for the table lookup and table names operations. The default implementations work the same way that AbstractSchema works: we get the entire map and pick out the information we need. Extensions that use this API should be revised to support the individual operations instead. Druid code no longer calls the original getTables() method. The PR has one breaking change: since the DruidSchemaManager map is read-only to the rest of Druid, we should return a Map, not a ConcurrentMap.	2022-08-10 10:24:04 +05:30
Clint Wylie	ee41cc770f	fix issue with SQL sum aggregator due to bug with DruidTypeSystem and AggregateRemoveRule (#12880 ) * fix issue with SQL sum aggregator due to bug with DruidTypeSystem and AggregateRemoveRule * fix style * add comment about using custom sum function	2022-08-09 15:17:45 -07:00
Gian Merlino	01d555e47b	Adjust "in" filter null behavior to match "selector". (#12863 ) * Adjust "in" filter null behavior to match "selector". Now, both of them match numeric nulls if constructed with a "null" value. This is consistent as far as native execution goes, but doesn't match the behavior of SQL = and IN. So, to address that, this patch also updates the docs to clarify that the native filters do match nulls. This patch also updates the SQL docs to describe how Boolean logic is handled in addition to how NULL values are handled. Fixes #12856. * Fix test.	2022-08-08 09:08:36 -07:00
Paul Rogers	a618458bf0	Tidy up construction of the Guice Injectors (#12816 ) * Refactor Guice initialization Builders for various module collections Revise the extensions loader Injector builders for server startup Move Hadoop init to indexer Clean up server node role filtering Calcite test injector builder * Revisions from review comments * Build fixes * Revisions from review comments	2022-08-04 00:05:07 -07:00
Clint Wylie	623b075d12	fix nested column sql operator return type inference (#12851 ) * fix nested column sql operator return type inference * oops, final	2022-08-03 15:39:08 -07:00
Clint Wylie	6981b1cc12	fix bugs with nested column jsonpath parser (#12831 )	2022-08-02 11:38:25 -07:00
Clint Wylie	189e8b9d18	add NumericRangeIndex interface and BoundFilter support (#12830 ) add NumericRangeIndex interface and BoundFilter support changes: * NumericRangeIndex interface, like LexicographicalRangeIndex but for numbers * BoundFilter now uses NumericRangeIndex if comparator is numeric and there is no extractionFn * NestedFieldLiteralColumnIndexSupplier.java now supports supplying NumericRangeIndex for single typed numeric nested literal columns * better faster stronger and (ever so slightly) more understandable * more tests, fix bug * fix style	2022-07-29 18:58:49 -07:00
Paul Rogers	d52abe7b38	Today is that day - Single pass through Calcite planner (#12636 ) * Druid planner now makes only one pass through Calcite planner Resolves the issue that required two parse/plan cycles: one for validate, another for plan. Creates a clone of the Calcite planner and validator to resolve the conflict that prevented the merger.	2022-07-29 18:53:21 -07:00
Paul Rogers	a8b155e9c6	Fixes for the Avatica JDBC driver (#12709 ) * Fixes for the Avatica JDBC driver Correctly implement regular and prepared statements Correctly implement result sets Fix race condition with contexts Clarify when parameters are used Prepare for single-pass through the planner * Addressed review comments * Addressed review comment	2022-07-27 15:22:40 -07:00
Laksh Singla	2e616e633a	Determine type of `__time` column by RowSignature in case of External Datasource (#12770 ) Some queries like `REPLACE INTO ... SELECT TIME_PARSE("__time") AS __time FROM ...` fail at the Calcite layer because any column with name `__time` is considered to be of type `SqlTypeName.TIMESTAMP`. Changes: - Modify `RowSignatures.toRelDataType()` so that the type of `__time` column is determined by the RowSignature's type.	2022-07-26 12:09:40 +05:30
Maytas Monsereenusorn	3bf1e699ff	GREATEST/LEAST function is incorrectly specifying that it cannot return null (#12804 )	2022-07-20 14:41:24 +05:30
Adarsh Sanjeev	f3272a25f9	Add check for sqlOuterLimit to ingest queries (#12799 ) * Add check for sqlOuterLimit to ingest queries * Fix checkstyle * Add comment	2022-07-19 09:02:43 -07:00
Paul Rogers	ee15c238cc	Clone Calcite planner to access validator (#12708 ) Done in preparation for the "single-pass" planner.	2022-07-14 18:10:33 -07:00
Clint Wylie	05b2e967ed	druid nested data column type (#12753 ) * add new druid nested data column type * fixes and such * fixes * adjustments, more tests * self review * oops * fix and test * more better * style	2022-07-14 12:07:23 -07:00
Rohan Garg	bb953be09b	Refactor usage of JoinableFactoryWrapper + more test coverage (#12767 ) Refactor usage of JoinableFactoryWrapper to add e2e test for createSegmentMapFn with joinToFilter feature enabled	2022-07-12 06:25:36 -07:00
Gian Merlino	97207cdcc7	Automatic sizing for GroupBy dictionaries. (#12763 ) * Automatic sizing for GroupBy dictionary sizes. Merging and selector dictionary sizes currently both default to 100MB. This is not optimal, because it can lead to OOM on small servers and insufficient resource utilization on larger servers. It also invites end users to try to tune it when queries run out of dictionary space, which can make things worse if the end user sets it to too high. So, this patch: - Adds automatic tuning for selector and merge dictionaries. Selectors use up to 15% of the heap and merge buffers use up to 30% of the heap (aggregate across all queries). - Updates out-of-memory error messages to emphasize enabling disk spilling vs. increasing memory parameters. With the memory parameters automatically sized, it is more likely that an end user will get benefit from enabling disk spilling. - Removes the query context parameters that allow lowering of configured dictionary sizes. These complicate the calculation, and I don't see a reasonable use case for them. * Adjust tests. * Review adjustments. * Additional comment. * Remove unused import.	2022-07-11 08:20:50 -07:00
Gian Merlino	edfbcc8455	Preserve column order in DruidSchema, SegmentMetadataQuery. (#12754 ) * Preserve column order in DruidSchema, SegmentMetadataQuery. Instead of putting columns in alphabetical order. This is helpful because it makes query order better match ingestion order. It also allows tools, like the reindexing flow in the web console, to more easily do follow-on ingestions using a column order that matches the pre-existing column order. We prefer the order from the latest segments. The logic takes all columns from the latest segments in the order they appear, then adds on columns from older segments after those. * Additional test adjustments. * Adjust imports.	2022-07-08 22:04:11 -07:00
Gian Merlino	9c925b4f09	Frame format for data transfer and short-term storage. (#12745 ) * Frame format for data transfer and short-term storage. As we move towards query execution plans that involve more transfer of data between servers, it's important to have a data format that provides for doing this more efficiently than the options available to us today. This patch adds: - Columnar frames, which support fast querying. - Row-based frames, which support fast sorting via memory comparison and fast whole-row copies via memory copying. - Frame files, a container format that can be stored on disk or transferred between servers. The idea is we should use row-based frames when data is expected to be sorted, and columnar frames when data is expected to be queried. The code in this patch is not used in production yet. Therefore, the patch involves minimal changes outside of the org.apache.druid.frame package. The main ones are adjustments to SqlBenchmark to add benchmarks for queries on frames, and the addition of a "forEach" method to Sequence. * Fixes based on tests, static analysis. * Additional fixes. * Skip DS mapping tests on JDK 14+ * Better JDK checking in tests. * Fix imports. * Additional comment. * Adjustments from code review. * Update test case.	2022-07-08 20:42:06 -07:00
Rohan Garg	d732de9948	Allow adding calcite rules from extensions (#12715 ) * Allow adding calcite rules from extensions * fixup! Allow adding calcite rules from extensions * Move Rules to CalciteRulesManager * fixup! Move Rules to CalciteRulesManager	2022-07-06 19:32:35 +05:30
Didip Kerabat	06251c5d2a	Add EIGHT_HOUR into possible list of Granularities. (#12717 ) * Add EIGHT_HOUR into possible list of Granularities. * Add the missing definition. * fix test. * Fix another test. * Stylecheck finally passed. Co-authored-by: Didip Kerabat <didip@apple.com>	2022-07-05 11:05:37 -07:00
Clint Wylie	bbbb6e1c3f	fix DruidSchema issue where datasources with no segments can become stuck in tables list indefinitely (#12727 )	2022-07-01 18:54:01 -07:00
Clint Wylie	48731710fb	precursor changes for nested columns to minimize files changed (#12714 ) * precursor changes for nested columns to minimize files changed * inspection fix * visibility * adjustment * unecessary change	2022-07-01 02:27:19 -07:00
Clint Wylie	d30efb1c1e	fix bug when rewriting sql virtual column registry (#12718 )	2022-07-01 02:24:00 -07:00
Tejaswini Bandlamudi	1fc2f6e4b0	Throw BadQueryContextException if context params cannot be parsed (#12680 )	2022-06-24 09:21:25 +05:30
Paul Rogers	ffcb996468	Cleanup changes pulled out of PR #12368 (#12672 ) This commit contains the cleanup needed for the new integration test framework. Changes: - Fix log lines, misspellings, docs, etc. - Allow the use of some of Druid's "JSON config" objects in tests - Fix minor bug in `BaseNodeRoleWatcher`	2022-06-23 23:19:50 +05:30
Kashif Faraz	b6f8d7a1b3	Add query context param `forceExpressionVirtualColumns` to always use "expression"-type virtual columns in query plan (#12583 ) SQL expressions such as those containing `MV_FILTER_ONLY` and `MV_FILTER_NONE` are planned as specialized virtual columns instead of the default `expression`-type virtual columns. This commit adds a new context parameter to force the `expression`-type virtual columns. Changes - Add query context param `forceExpressionVirtualColumns` - Use context param to determine if specialized virtual columns should be used or not - Moved some tests into `CalciteExplainQueryTest`	2022-06-22 15:33:50 +05:30
Gian Merlino	0099940808	Add TIME_IN_INTERVAL SQL operator. (#12662 ) * Add TIME_IN_INTERVAL SQL operator. The operator is implemented as a convertlet rather than an OperatorConversion, because this allows it to be equivalent to using the >= and < operators directly. * SqlParserPos cannot be null here. * Remove unused import. * Doc updates. * Add words to dictionary.	2022-06-21 13:05:37 -07:00
Gian Merlino	818974f6e4	ScanQuery: Fix JsonIgnore for isLegacy. (#12674 ) True, false, and null have different meanings: true/false mean "legacy" and "not legacy"; null means use the default set by ScanQueryConfig. So, we need to respect this in the JsonIgnore setup.	2022-06-18 15:55:54 -07:00
Paul Rogers	893759de91	Remove null and empty fields from native queries (#12634 ) * Remove null and empty fields from native queries * Test fixes * Attempted IT fix. * Revisions from review comments * Build fixes resulting from changes suggested by reviews * IT fix for changed segment size	2022-06-16 14:07:25 -07:00
TSFenwick	a3603ad6b0	Use DefaultQueryConfig in SqlLifecycle to correctly populate request logs (#12613 ) Fixes an issue where sql query request logs do not include the default query context values set via `druid.query.default.context.xyz` runtime properties. # Change summary * Inject `DefaultQueryConfig` into `SqlLifecycleFactory` * Add params from `DefaultQueryConfig` to the query context in `SqlLifecycle` # Description - This change does not affect query execution. This is because the `DefaultQueryConfig` was already being used in `QueryLifecycle`, which is initialized when the SQL is translated to a native query. - This also handles any potential use case where a context parameter should be handled at the SQL stage itself.	2022-06-08 12:52:50 +05:30
Laksh Singla	81c37c6515	Add validation for invalid partitioned by granularities (#12589 ) * Add validation for invalid partitioned by granularities * review comments * improve error message, change location of the method * remove imports * use StringUtils.lowercase Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>	2022-06-06 22:00:29 +05:30
Adarsh Sanjeev	5a283964ca	Improve SQL validation error messages (#12611 ) Update the SQL validation error message to specify whether the ingest is INSERT or REPLACE for better user experience.	2022-06-06 16:14:28 +05:30
Clint Wylie	dc0fdfec67	fix test comment (#12584 )	2022-05-31 12:39:20 -07:00
Gian Merlino	02ae3e74ff	RowBasedColumnSelectorFactory: Add "useStringValueOfNullInLists" parameter. (#12578 ) RowBasedColumnSelectorFactory inherited strange behavior from Rows.objectToStrings for nulls that appear in lists: instead of being left as a null, it is replaced with the string "null". Some callers may need compatibility with this strange behavior, but it should be opt-in. Query-time call sites are changed to opt-out of this behavior, since it is not consistent with query-time expectations. The IncrementalIndex ingestion-time call site retains the old behavior, as this is traditionally when Rows.objectToStrings would be used.	2022-05-31 11:38:56 -07:00
Clint Wylie	b746bf9129	fix virtual column cycle bug, sql virtual column optimize bug (#12576 ) * fix virtual column cycle bug, sql virtual column optimize bug * more test	2022-05-30 23:51:21 -07:00
Karan Kumar	9f9faeec81	object[] handling for DimensionHandlers for arrays (#12552 ) Description Fixes a bug when running q's like SELECT cntarray, Count() FROM (SELECT dim1, dim2, Array_agg(cnt) AS cntarray FROM (SELECT dim1, dim2, dim3, Count() AS cnt FROM foo GROUP BY 1, 2, 3) GROUP BY 1, 2) GROUP BY 1 This generates an error: org.apache.druid.java.util.common.ISE: Unable to convert type [Ljava.lang.Object; to org.apache.druid.segment.data.ComparableList at org.apache.druid.segment.DimensionHandlerUtils.convertToList(DimensionHandlerUtils.java:405) ~[druid-xx] Because it's an array of numbers it looks like it does the convertToList call, which looks like: @Nullable public static ComparableList convertToList(Object obj) { if (obj == null) { return null; } if (obj instanceof List) { return new ComparableList((List) obj); } if (obj instanceof ComparableList) { return (ComparableList) obj; } throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName()); } I.e. it doesn't know about arrays. Added the array handling as part of this PR.	2022-05-25 15:24:18 +05:30
Adarsh Sanjeev	5063eca5b9	Add error message for incorrectly ordered clause in sql (#12558 ) In the case that the clustered by is before the partitioned by for an sql query, the error message is a bit confusing. insert into foo select * from bar clustered by dim1 partitioned by all Error: SQL parse failed Encountered "PARTITIONED" at line 1, column 88. Was expecting one of: <EOF> "," ... "ASC" ... "DESC" ... "NULLS" ... "." ... "NOT" ... "IN" ... "<" ... "<=" ... ">" ... ">=" ... "=" ... "<>" ... "!=" ... "BETWEEN" ... "LIKE" ... "SIMILAR" ... "+" ... "-" ... "*" ... "/" ... "%" ... "\|\|" ... "AND" ... "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "CONTAINS" ... "OVERLAPS" ... "EQUALS" ... "PRECEDES" ... "SUCCEEDS" ... "IMMEDIATELY" ... "MULTISET" ... "[" ... "FORMAT" ... "(" ... Less... org.apache.calcite.sql.parser.SqlParseException This is a bit confusing and adding a check could be added to throw a more user friendly message stating that the order should be reversed. Add error message for incorrectly ordered clause in sql.	2022-05-23 12:41:18 +05:30
Gian Merlino	69aac6c8dd	Direct UTF-8 access for "in" filters. (#12517 ) * Direct UTF-8 access for "in" filters. Directly related: 1) InDimFilter: Store stored Strings (in ValuesSet) plus sorted UTF-8 ByteBuffers (in valuesUtf8). Use valuesUtf8 whenever possible. If necessary, the input set is copied into a ValuesSet. Much logic is simplified, because we always know what type the values set will be. I think that there won't even be an efficiency loss in most cases. InDimFilter is most frequently created by deserialization, and this patch updates the JsonCreator constructor to deserialize directly into a ValuesSet. 2) Add Utf8ValueSetIndex, which InDimFilter uses to avoid UTF-8 decodes during index lookups. 3) Add unsigned comparator to ByteBufferUtils and use it in GenericIndexed.BYTE_BUFFER_STRATEGY. This is important because UTF-8 bytes can be compared as bytes if, and only if, the comparison is unsigned. 4) Add specialization to GenericIndexed.singleThreaded().indexOf that avoids needless ByteBuffer allocations. 5) Clarify that objects returned by ColumnIndexSupplier.as are not thread-safe. DictionaryEncodedStringIndexSupplier now calls singleThreaded() on all relevant GenericIndexed objects, saving a ByteBuffer allocation per access. Also: 1) Fix performance regression in LikeFilter: since #12315, it applied the suffix matcher to all values in range even for type MATCH_ALL. 2) Add ObjectStrategy.canCompare() method. This fixes LikeFilterBenchmark, which was broken due to calls to strategy.compare in GenericIndexed.fromIterable. * Add like-filter implementation tests. * Add in-filter implementation tests. * Add tests, fix issues. * Fix style. * Adjustments from review.	2022-05-20 01:51:28 -07:00
Gian Merlino	65a1375b67	SQL: Add is_active to sys.segments, update examples and docs. (#11550 ) * SQL: Add is_active to sys.segments, update examples and docs. is_active is short for: (is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1 It's important because this represents "all the segments that should be queryable, whether or not they actually are right now". Most of the time, this is the set of segments that people will want to look at. The web console already adds this filter to a lot of its queries, proving its usefulness. This patch also reworks the caveat at the bottom of the sys.segments section, so its information is mixed into the description of each result field. This should make it more likely for people to see the information. * Wording updates. * Adjustments for spellcheck. * Adjust IT.	2022-05-19 14:23:28 -07:00
Adarsh Sanjeev	fcb1c0b7bf	Add cluster by support for replace syntax (#12524 ) * Add cluster by support for replace syntax * Add unit test for with list	2022-05-17 15:15:29 +05:30
Adarsh Sanjeev	0fd4f1e386	Improve error messages from SQL REPLACE syntax (#12523 ) - Add user friendly error messages for missing or incorrect OVERWRITE clause for REPLACE SQL query - Move validation of missing OVERWRITE clause at code level instead of parser for custom error message	2022-05-17 09:55:58 +05:30
Adarsh Sanjeev	39b3487aa9	Add replace statement to sql parser (#12386 ) Relevant Issue: #11929 - Add custom replace statement to Druid SQL parser. - Edit DruidPlanner to convert relevant fields to Query Context. - Refactor common code with INSERT statements to reuse them for REPLACE where possible.	2022-05-13 10:56:40 +05:30
Clint Wylie	9e5a940cf1	remake column indexes and query processing of filters (#12388 ) Following up on #12315, which pushed most of the logic of building ImmutableBitmap into BitmapIndex in order to hide the details of how column indexes are implemented from the Filter implementations, this PR totally refashions how Filter consume indexes. The end result, while a rather dramatic reshuffling of the existing code, should be extraordinarily flexible, eventually allowing us to model any type of index we can imagine, and providing the machinery to build the filters that use them, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.	2022-05-11 11:57:08 +05:30
Rohan Garg	75836a5a06	Add feature flag for sql planning of TimeBoundary queries (#12491 ) * Add feature flag for sql planning of TimeBoundary queries * fixup! Add feature flag for sql planning of TimeBoundary queries * Add documentation for enableTimeBoundaryPlanning * fixup! Add documentation for enableTimeBoundaryPlanning	2022-05-10 15:23:42 +05:30
somu-imply	c68388ebcd	Vectorized version of string last aggregator (#12493 ) * Vectorized version of string last aggregator * Updating string last and adding testcases * Updating code and adding testcases for serializable pairs * Addressing review comments	2022-05-09 17:02:38 -07:00
Gian Merlino	a2bad0b3a2	Reduce allocations due to Jackson serialization. (#12468 ) * Reduce allocations due to Jackson serialization. This patch attacks two sources of allocations during Jackson serialization: 1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new DefaultSerializerProvider instance for each call. It has lots of fields and creates pressure on the garbage collector. So, this patch adds helper functions in JacksonUtils that enable reuse of SerializerProvider objects and updates various call sites to make use of this. 2) GroupByQueryToolChest copies the ObjectMapper for every query to install a special module that supports backwards compatibility with map-based rows. This isn't needed if resultAsArray is set and all servers are running Druid 0.16.0 or later. This release was a while ago. So, this patch disables backwards compatibility by default, which eliminates the need to copy the heavyweight ObjectMapper. The patch also introduces a configuration option that allows admins to explicitly enable backwards compatibility. * Add test. * Update additional call sites and add to forbidden APIs.	2022-04-27 14:17:26 -07:00
Gian Merlino	2e42d04038	SQL: Create millisecond precision timestamp literals. (#12407 ) * SQL: Create millisecond precision timestamp literals. Fixes a bug where implicit casts of strings to timestamps would use seconds precision rather than milliseconds. The new test case testCountStarWithBetweenTimeFilterUsingMillisecondsInStringLiterals exercises this. * Update sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java Co-authored-by: Frank Chen <frankchen@apache.org> * Correct precision handling. - Set default precision to 3 (millis) for things involving timestamps. - Respect precision specified in types when available. * Silence, checkstyle. Co-authored-by: Frank Chen <frankchen@apache.org>	2022-04-27 14:17:07 -07:00
Abhishek Agarwal	2fe053c5cb	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
Adarsh Sanjeev	1306965c9e	Validate select columns for insert statement (#12431 ) Unnamed columns in the select part of insert SQL statements currently create a table with the column name such as "EXPR$3". This PR adds a check for this.	2022-04-27 12:25:49 +05:30
somu-imply	027935dcff	Vectorize numeric latest aggregators (#12439 ) * Vectorizing Latest aggregator Part 1 * Updating benchmark tests * Changing appropriate logic for vectors for null handling * Introducing an abstract class and moving the commonalities there * Adding vectorization for StringLast aggregator (initial version) * Updated bufferized version of numeric aggregators * Adding some javadocs * Making sure this PR vectorizes numeric latest agg only * Adding another benchmarking test * Fixing intellij inspections * Adding tests for double * Adding test cases for long and float * Updating testcases * Checkstyle oops.. * One tiny change in test case * Fixing spotbug and rhs not being used	2022-04-26 11:33:08 -07:00
Rohan Garg	95694b5afa	Convert simple min/max SQL queries on __time to timeBoundary queries (#12472 ) * Support array based results in timeBoundary query * Fix bug with query interval in timeBoundary * Convert min(__time) and max(__time) SQL queries to timeBoundary * Add tests for timeBoundary backed SQL queries * Fix query plans for existing tests * fixup! Convert min(__time) and max(__time) SQL queries to timeBoundary * fixup! Add tests for timeBoundary backed SQL queries * fixup! Fix bug with query interval in timeBoundary	2022-04-25 08:18:58 -07:00
Jihoon Son	73ce5df22d	Add support for authorizing query context params (#12396 ) The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below. Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params. User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters. System context params. They are set by the Druid query engine during query processing. These params override other context params. Today, any context params are allowed to users. This can cause 1) a bad UX if the context param is not matured yet or 2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows. This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission. { "resourceAction" : { "resource" : { "name" : "maxSubqueryRows", "type" : "QUERY_CONTEXT" }, "action" : "WRITE" }, "resourceNamePattern" : "maxSubqueryRows" } Each role can have multiple permissions for context params. Each permission should be set for different context params. When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case, HTTP endpoints will return 403 response code. JDBC will throw ForbiddenException. Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService. The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.	2022-04-21 14:21:16 +05:30
somu-imply	2db02876cf	Updating an error msg (#12450 ) * Updating an error msg * Added an extra [] so removing it	2022-04-20 07:56:09 -07:00
somu-imply	cd6fba2f6c	Handling planning with alias for time for group by and order by (#12418 ) An outer scan query, that requires ordering on a column, should be considered an invalid query.	2022-04-15 10:29:17 +05:30
Adarsh Sanjeev	b74cb7624d	Make error messages for insert statements consistent with select statements (#12414 ) For a query like INSERT INTO tablename SELECT channel, added as count FROM wikipedia the error message is Encountered "as count". However, for the insert statement INSERT INTO t SELECT channel, added as count FROM wikipedia PARTITIONED BY ALL returns INSERT statements must specify PARTITIONED BY clause explictly (incorrectly). This PR corrects this. Add EOF to end of Druid SQL Insert statements Rename SQL Insert statements in the parser to reflect the behaviour change	2022-04-09 12:21:40 +05:30
Paul Rogers	2cc2088720	Method to specify eternity in the scan query builder (#12223 ) * Method to specify eternity in the scan query builder * Fix checkstyle issue * Renamed eterity() to eternityInterval() * Minor fixes	2022-04-04 15:11:32 -07:00
Adarsh Sanjeev	ef45a1551e	Convert inQueryThreshold into query context parameter. (#12357 ) Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.	2022-03-22 18:33:57 +05:30
Gian Merlino	cb2b2b696d	Fix error message for groupByEnableMultiValueUnnesting. (#12325 ) * Fix error message for groupByEnableMultiValueUnnesting. It referred to the incorrect context parameter. Also, create a dedicated exception class, to allow easier detection of this specific error. * Fix other test. * More better error messages. * Test getDimensionName method.	2022-03-10 11:37:24 -08:00
Rohan Garg	9f6a930462	Fix join query incase of filter explosion during CNF conversion (#12324 )	2022-03-09 12:43:09 -08:00
Clint Wylie	1c004ea47e	use virtual columns for sql simple aggregators instead of inline expressions (#12251 ) * use virtual columns for sql simple aggregators instead of inline expressions * fixes * always use virtual columns * add more tests	2022-03-03 15:05:28 -08:00
Xavier Léauté	1434197ee1	update airline dependency to 2.x (#12270 ) * upgrade Airline to Airline 2 https://github.com/airlift/airline is no longer maintained, updating to https://github.com/rvesse/airline (Airline 2) to use an actively maintained version, while minimizing breaking changes. Note, this is a backwards incompatible change, and extensions relying on the CliCommandCreator extension point will also need to be updated. * fix dependency checks where jakarta.inject is now resolved first instead of javax.inject, due to Airline 2 using jakarta	2022-02-27 15:19:28 -08:00
Jihoon Son	e5ad862665	A new includeAllDimension flag for dimensionsSpec (#12276 ) * includeAllDimensions in dimensionsSpec * doc * address comments * unused import and doc spelling	2022-02-25 18:27:48 -08:00
Jonathan Wei	b1640a72ee	Re-enable segment metadata cache when using external schema (#12264 )	2022-02-22 19:50:29 -06:00
Karan Kumar	5794331eb1	Adding new config for disabling group by on multiValue column (#12253 ) As part of #12078 one of the followup's was to have a specific config which does not allow accidental unnesting of multi value columns if such columns become part of the grouping key. Added a config groupByEnableMultiValueUnnesting which can be set in the query context. The default value of groupByEnableMultiValueUnnesting is true, therefore it does not change the current engine behavior. If groupByEnableMultiValueUnnesting is set to false, the query will fail if it encounters a multi-value column in the grouping key.	2022-02-16 20:53:26 +05:30
Laksh Singla	8fc0e5c95c	Explain plan for custom insert syntax (#12243 ) * Initial commit, explain plan for custom insert syntax working * Cleanup separate SqlInsert handling	2022-02-15 21:48:34 -08:00
somu-imply	eae163a797	Moving in filter check to broker (#12195 ) * Moving in filter check to broker * Adding more unit tests, making error message meaningful * Spelling and doc changes * Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100 * Removing upper limit of 100, updated docs * Making documentation more meaningful * Moving check outside to PlannerConfig, updating test cases and adding back max limit * Updated with some additional code comments * Missed removing one line during the checkin * Addressing doc changes and one forbidden API correction * Final doc change * Adding a speling exception, correcting a testcase * Reading entire filter tree to address combinations of ANDs and ORs * Specifying in docs that, this case works only for ORs * Revert "Reading entire filter tree to address combinations of ANDs and ORs" This reverts commit `81ca8f8496`. * Covering a class cast exception and updating docs * Counting changed Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-02-15 20:45:07 -08:00
somu-imply	033989eb1d	Adding vectorized time_shift (#12254 ) * Adding vectorized time_shift * Vectorize time shift, addressing review comments * Remove an unused import	2022-02-11 14:44:52 -08:00
Laksh Singla	5bd646e10a	Surface a user friendly error when PARTITIONED BY is omitted (#12246 ) #12163 makes PARTITIONED BY a required clause in INSERT queries. While this is required, if a user accidentally omits the clause, it emits a JavaCC/Calcite error, since it's syntactically incorrect. The error message is cryptic. Since it's a custom clause, this PR aims to make the clause optional on the syntactic side, but move the validation to DruidSqlInsert where we can surface a friendlier error.	2022-02-11 11:49:00 +05:30
Clint Wylie	3ee66bb492	allow optimizing sql expressions and virtual columns (#12241 ) * rework sql planner expression and virtual column handling * simplify a bit * add back and deprecate old methods, more tests, fix multi-value string coercion bug and associated tests * spotbugs * fix bugs with multi-value string array expression handling * javadocs and adjust test * better * fix tests	2022-02-09 14:55:50 -08:00
Laksh Singla	4add2510ed	Add syntax support for PARTITIONED BY/CLUSTERED BY in INSERT queries (#12163 ) This PR aims to add parser changes for supporting PARTITIONED BY and CLUSTERED BY as proposed in the issue #11929.	2022-02-08 16:23:15 +05:30
Clint Wylie	ae71e05fc5	array_concat_agg and array_agg support for array inputs (#12226 ) * array_concat_agg and array_agg support for array inputs changes: * added array_concat_agg to aggregate arrays into a single array * added array_agg support for array inputs to make nested array * added 'shouldAggregateNullInputs' and 'shouldCombineAggregateNullInputs' to fix a correctness issue with STRING_AGG and ARRAY_AGG when merging results, with dual purpose of being an optimization for aggregating * fix test * tie capabilities type to legacy mode flag about coercing arrays to strings * oops * better javadoc	2022-02-07 19:59:30 -08:00
Clint Wylie	8fd587b28c	remove duplicate Broker ServerInventoryView, improve HttpServerInventoryView logging (#12209 ) * changes: * remove SystemSchema duplicate ServerInventoryView in broker * suppress duplicate segment added/removed warnings in HttpServerInventoryView when doing a full sync * fixes	2022-02-03 12:57:34 -08:00
Maytas Monsereenusorn	3717693633	Fix java.lang.ClassCastException error when using useApproximateCountDistinct false for aggregation query (#12216 ) * add imply * add test * add unit test * add test	2022-02-03 12:01:13 -08:00
Clint Wylie	f9b406c8f2	add backwards compatibility mode for multi-value string array null value coercion (#12210 )	2022-01-31 22:38:15 -08:00
Abhishek Agarwal	1b8808cce8	Fix SQL queries for inline datasource with null values (#12092 ) Fixes a bug because of which some SQL queries cannot be parsed using druid convention. Specifically, these queries translate to an inline datasource and have some null values. Calcite internally uses NULL as SQL type for these literals and that is not supported by the druid. I am now allowing null column types to be returned while building RowSignature in org.apache.druid.sql.calcite.table.RowSignatures#fromRelDataType. RowSignature already allows null column type for any column. Doing so should also fix bindable queries such as select (1,2). When such queries are run with headers set to true, we get an exception in org.apache.druid.sql.http.ArrayWriter#writeHeader. This is again a similar exception to the one addressed in this PR. Because SQL type for the result column is RECORD and that doesn't have a corresponding columnType.	2022-01-27 18:04:12 +05:30
Karan Kumar	96b3498a40	Grouping on arrays as arrays (#12078 ) * init multiValue column group by * Changing sorting to Lexicographic as default * Adding initial tests * 1.Fixing test cases adding 2.Optimized inmem structs * Linking SQL layer to native layer * Adding multiDimension support to group by column strategy * 1. Removing array coercion in Calcite layer 2. Removing ResultRowDeserializer * 1. Supporting all primitive array types 2. Removing dimension spec as part of columnSelector * 1. Supporting all primitive array types 2. Removing dimension spec as part of columnSelector * 1. Checkstyle things 2. Removing flag * Minor naming things * CheckStyle Things * Fixing test case * Fixing hashing * 1. Adding the MV function 2. Added few test cases * 1. Adding MV function test cases * Adding Selector strategy function test cases * Fixing ClientQuerySegmentWalkerTest * Adding GroupByQueryRunnerTest test cases * Fixing test cases * Adding few more test cases * Fixing Exception asset statement and intellij inspection * Adding null compatibility tests * Review comments * Fixing few failing tests * Fixing few failing tests * Do no convert to topN Q incase of group by on array * Fixing checkstyle * Fixing differences between jdk's class cast exception message * 1. Fixing ordering if the grouping key is an array * Fixing DefaultLimitSpec * Fixing CalciteArraysQueryTest * Dummy commit for LGTM * changes: * only coerce multi-value string null values when `ExpressionPlan.Trait.NEEDS_APPLIED` is set * correct return type inference for ARRAY_APPEND,ARRAY_PREPEND,ARRAY_SLICE,ARRAY_CONCAT * fix bug with ExprEval.ofType when actual type of object from binding doesn't match its claimed type * Review comments * Fixing test cases * Fixing spot bugs * Fixing strict compile Co-authored-by: Clint Wylie <cwylie@apache.org>	2022-01-25 20:30:56 -08:00
somu-imply	cc8b9c0b6e	Handling OOM error in ExpressionVector setup by reducing number of rows (#12186 ) * Handling OOM error in ExpressionVector setup by reducing number of rows * Removing row size to 10K in sanity tests	2022-01-24 08:37:13 -08:00
Laksh Singla	dc1703d5f9	Change value of `druid.sql.planner.useGroupingSetForExactDistinct` in common.runtime.properties (#12182 ) This PR changes the value of the property `druid.sql.planner.useGroupingSetForExactDistinct` from `false` to `true` in the runtime.properties files, so that newer installations have this property as `true`, while the default still remains as `false`. The flag determines how queries which contain an aggregation over `DISTINCT` like `SELECT COUNT(DISTINCT foo.dim1) FILTER(WHERE foo.cnt = 1), SUM(foo.cnt) FROM druid.foo` get planned by Calcite. With the flag being set to false, it plans it via joins, whereas with it being set to true, the query is set using grouping sets. There is a known issue with Calcite (https://github.com/apache/druid/issues/7953), where an NPE is thrown while planning the above query with joins. There is no such issue while planning the query using grouping sets.	2022-01-24 14:00:03 +05:30
Jihoon Son	cc2ffc6c0f	Fix node discovery to ignore unknown DruidServices (#12157 ) * Fix node discovery to ignore unknown DruidServices * ignore all runtime exceptions * fix test * add custom deserializer * custom serializer * log host for unparseable druidService	2022-01-18 22:08:59 -08:00
Gian Merlino	cf7191d2bc	Validate target dataSource for INSERT. (#12129 )	2022-01-18 09:34:23 -08:00
Maytas Monsereenusorn	bd7fe45da0	Support adding metrics in Auto Compaction (#12125 ) * add impl * add impl * add unit tests * add unit tests * add unit tests * add unit tests * add unit tests * add integration tests * add integration tests * fix LGTM * fix test * remove doc	2022-01-17 20:19:31 -08:00
Clint Wylie	f2ce76966c	add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures less ambiguous (#12145 ) * add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures unambiguous * switcheroo * EARLIEST_BY/LATEST_BY use timestamp instead of numeric types, update docs * revert unintended change * fix docs * fix docs better	2022-01-12 03:48:53 -08:00
Laksh Singla	fae73800a7	Set plannerContext error when cannot query external datasources and when insert is not supported. (#12136 ) This PR aims to add plannerContext.setPlanningError whenever external table scan rule is invoked, without the queryMaker having the ability to do so.	2022-01-12 15:11:17 +05:30
Rohan Garg	81f0aba6cb	Use ListFilteredVirtualColumn for left/fact table expression in join condition (#12127 ) * Pass VirtualColumnRegistry in PlannerContext for join expression planning * Allow for including VCs from join fact table expression * Optmize MV_FILTER functions to use a VC when in join fact table expression * fixup! Allow for including VCs from join fact table expression * Address review comments	2022-01-11 14:47:13 -08:00
Laksh Singla	7c17341caa	Return empty result when a group by gets optimized to a timeseries query (#12065 ) Related to #11188 The above mentioned PR allowed timeseries queries to return a default result, when queries of type: select count() from table where dim1="_not_present_dim_" were executed. Before the PR, it returned no row, after the PR, it would return a row with value of count() as 0 (as expected by SQL standards of different dbs). In Grouping#applyProject, we can sometimes perform optimization of a groupBy query to a timeseries query if possible (when the keys of the groupBy are constants, as generated by automated tools). For example, in select count() from table where dim1="_present_dim_" group by "dummy_key", the groupBy clause can be removed. However, in the case when the filter doesn't return anything, i.e. select count() from table where dim1="_not_present_dim_" group by "dummy_key", the behavior of general databases would be to return nothing, while druid (due to above change) returns an empty row. This PR aims to fix this divergence of behavior. Example cases: select count() from table where dim1="_not_present_dim_" group by "dummy_key". CURRENT: Returns a row with count() = 0 EXPECTED: Return no row select 'A', dim1 from foo where m1 = 123123 and dim1 = '_not_present_again_' group by dim1 CURRENT: Returns a row with ('A', 'wat') EXPECTED: Return no row To do this, a boolean droppedDimensionsWhileApplyingProject has been added to Grouping which is true whenever we make changes to the original shape with optimization. Hence if a timeseries query has a grouping with this set to true, we set skipEmptyBuckets=true in the query context (i.e. donot return any row).	2022-01-07 21:53:48 +05:30
Jonathan Wei	9b598407c1	Add interface for external schema provider to Druid SQL (#12043 ) * Add interfce for external schema provider to Druid SQL * Add annotations	2021-12-22 22:17:57 +05:30
somu-imply	1871a1ab18	ARRAY_AGG and STRING_AGG will through errors if invoked on a complex datatype (#12089 )	2021-12-21 17:41:04 -08:00
Laksh Singla	16642fb278	Fix incorrect type conversion in DruidLogicalValueRule (#11923 ) DruidLogicalValuesRule while transforming to DruidRel can return incorrect values, if during the creation of the literal it was created from a float value. The BigDecimal representation stores 123.0, and it seems that using RexLiteral's method while conversion returns the inflated value (which is 1230). I am unsure if this is intentional from Calcite's perspective, and the actual change should be done somewhere else. Extract the values of INT/LONG from the RexLiteral in the DruidLogicalValuesRule, via BigDecimal.longValue() method.	2021-12-15 10:44:35 +05:30
Clint Wylie	244c2559e9	fix IncrementalIndex performance regression (#12048 ) changes: * IncrementalIndex is now a ColumnInspector * fixes performance regression from using map of ColumnCapabilities from IncrementalIndex as a RowSignature	2021-12-09 22:04:32 -08:00
Abhishek Agarwal	7abf847eae	Return 400 when SQL query cannot be planned (#12033 ) In this PR, we will now return 400 instead of 500 when SQL query cannot be planned. I also fixed a bug where error messages were not getting sent to the users in case the rules throw UnsupportSQLQueryException.	2021-12-08 21:49:54 +05:30
Laksh Singla	ca260dfef6	Intern RowSignature in DruidSchema to reduce its memory footprint (#12001 ) DruidSchema consists of a concurrent HashMap of DataSource -> Segement -> AvailableSegmentMetadata. AvailableSegmentMetadata contains RowSignature of the segment, and for each segment, a new object is getting created. RowSignature is an immutable class, and hence it can be interned, and this can lead to huge savings of memory being used in broker, since a lot of the segments of a table would potentially have same RowSignature.	2021-12-08 15:11:13 +05:30
Clint Wylie	45be2be368	fix issues with multi-value string constant expressions (#12025 ) * add specialized constant selector for multi-valued string constants	2021-12-08 00:10:26 -08:00
Abhishek Agarwal	834aae096a	Human-readable and actionable SQL error messages (#11911 ) This PR does two things 1. It adds the capability to surface missing features in SQL to users - The calcite planner will explore through multiple rules to convert a logical SQL query to a druid native query. Some rules change the shape of the query itself, optimize it and some rules are responsible for translating the query into a druid native query. These are DruidQueryRule, DruidOuterQueryRule, DruidJoinRule, DruidUnionDataSourceRule, DruidUnionRule etc. These rules will look at SQL and will do the necessary transformation. But if the rule can't transform the query, it returns back the control to the calcite planner without recording why was it not able to transform. E.g. there is a join query with a non-equal join condition. DruidJoinRule will look at the condition, see that it is not supported, and return back the control. The reason can be that a query can be planned in many different ways so if one rule can't parse it, the query may still be parseable by other rules. In this PR, we are intercepting these gaps and passing them back to the user if the query could not be planned at all. 2. The said capability has been used to generate actionable errors for some common unsupported SQL features. However, not all possible errors are covered and we can keep adding more in the future.	2021-12-07 09:44:08 +05:30
Laksh Singla	44b2fb71ab	Fix the error case when there are multi top level unions (#12017 ) This is a follow up to the PR #11908. This fixes the bug in top level union all queries when there are more than 2 SQL subqueries are present.	2021-12-07 01:12:02 +05:30
Jihoon Son	1f052b43c5	Better serverView exec name; remove SingleServerInventoryView (#11770 ) Druid currently has 2 serverViews, regular serverView and filtered serverView. The regular serverView is used to monitor all segment announcements from all data nodes (historicals, tasks, indexers). The filtered serverView is used when you want to watch segment announcements from particular tiers. Since these server views keep track of different sets of druidServers and segments in memory, they should be maintained separately. However, they currently share the same name for their executorService, which can cause confusion and make debugging harder especially in the broker since it is using both serverViews, the filtered view for normal query processing and the regular view to serve the servers table (I'm unsure whether this is intended or whether this is a good behavior). This PR changes it to a more obvious name. This PR also removes SingleServerInventoryView. This view was deprecated a long time ago and has not been documented at least since 0.13 (#6127). I also don't think this can be better in any case than BatchServerInventoryView. Finally, I merged AbstractCuratorServerInventoryView and BatchServerInventoryView as we no longer need AbstractCuratorServerInventoryView after SingleServerInventoryView is removed.	2021-12-04 18:43:05 +05:30
Gian Merlino	e0e05aad99	Enhancements to IndexTaskClient. (#12011 ) * Enhancements to IndexTaskClient. 1) Ability to use handlers other than StringFullResponseHandler. This functionality is not used in production code yet, but is useful because it will allow tasks to communicate with each other in non-string-based formats and in streaming fashion. In the future, we'll be able to use this to make task-to-task communication more efficient. 2) Truncate server errors at 1KB, so long errors do not pollute logs. 3) Change error log level for retryable errors from WARN to INFO. (The final error is still WARN.) 4) Harmonize log and exception messages to have a more consistent format. * Additional tests and improvements.	2021-12-03 09:14:32 -08:00
Clint Wylie	af6541a236	allow `DruidSchema` to fallback to segment metadata 'type' if 'typeSignature' is null (#12016 ) * allow `DruidSchema` to fallback to segment metadata type if typeSignature is null, to avoid producing incorrect SQL schema if broker is upgraded to 0.23 before historicals * mmm, forbidden tests	2021-12-02 17:42:01 -08:00
Clint Wylie	84b4bf56d8	vectorize logical operators and boolean functions (#11184 ) changes: * adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions * vectorize logical operators and boolean functions, some only if useStrictBooleans is true	2021-12-02 16:40:23 -08:00
Paul Rogers	a66f10eea1	Code cleanup from query profile project (#11822 ) * Code cleanup from query profile project * Fix spelling errors * Fix Javadoc formatting * Abstract out repeated test code * Reuse constants in place of some string literals * Fix up some parameterized types * Reduce warnings reported by Eclipse * Reverted change due to lack of tests	2021-11-30 11:35:38 -08:00
Kashif Faraz	b48f5a576b	Fix: Do not require time condition on InlineDataSource (#11982 ) For queries on logical values, e.g. SELECT 1337, we need not check for a filter on __time column even if requireTimeCondition is true.	2021-11-25 21:10:06 +05:30
Laksh Singla	c381cae51b	Improve the output of SQL explain message (#11908 ) Currently, when we try to do EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side. This PR aims to change the format when user tries to EXPLAIN PLAN FOR for queries which are executed by converting them into Druid's native queries (i.e. not sys schemas).	2021-11-25 21:08:33 +05:30
Rohan Garg	2c08055962	Specify time column for first/last aggregators (#11949 ) Add the ability to pass time column in first/last aggregator (and latest/earliest SQL functions). It is to support cases where the time to query upon is stored as a part of a column different than __time. Also, some other logical time column can be specified.	2021-11-25 09:44:14 +05:30
Gian Merlino	0354407655	SQL INSERT planner support. (#11959 ) * SQL INSERT planner support. The main changes are: 1) DruidPlanner is able to validate and authorize INSERT queries. They require WRITE permission on the target datasource. 2) QueryMaker is now an interface, and there is a QueryMakerFactory that creates instances of it. There is only one production implementation of each (NativeQueryMaker and NativeQueryMakerFactory), which together behave the same way as the former QueryMaker class. But this opens the door to executing queries in ways other than the Druid query stack, and is used by unit tests (CalciteInsertDmlTest) to test the INSERT planning functionality. 3) Adds an EXTERN table macro that allows references external data using InputSource and InputFormat from Druid's batch ingestion API. This is not exposed in production yet, but is used by unit tests. 4) Adds a QueryFeature concept that enables the planner to change its behavior slightly depending on the capabilities of the execution system. 5) Adds an "AuthorizableOperator" concept that enables SqlOperators to require additional permissions. This is used by the EXTERN table macro. Related odds and ends: - Add equals, hashCode, toString methods to InlineInputSource. Aids in the "from external" tests in CalciteInsertDmlTest. - Add JSON-serializability to RowSignature. - Move the SQL string inside PlannerContext so it is "baked into" the planner when the planner is created. Cleans up the code a bit, since in practice, the same query is passed in every time to the same planner anyway. * Fix up calls to CalciteTests.createMockQueryLifecycleFactory. * Fix checkstyle issues. * Adjustments for CI. * Adjust DruidAvaticaHandlerTest for stricter test authorizations.	2021-11-24 12:14:04 -08:00
Maytas Monsereenusorn	bb3d2a433a	Support filtering data in Auto Compaction (#11922 ) * add impl * fix checkstyle * add test * add test * add unit tests * fix unit tests * fix unit tests * fix unit tests * add IT * add IT * add comments * fix spelling	2021-11-24 10:56:38 -08:00
Abhishek Agarwal	b6a0fbc8b6	Break down CalciteQueryTest (#11979 ) * Refactor calciteQueryTest * Move more tests to CalciteJoinQueryTest	2021-11-24 00:15:42 +05:30
Laksh Singla	b5a25f24f2	Improve the DruidRexExecutor w.r.t handling of numeric arrays (#11968 ) DruidRexExecutor while reducing Arrays, specially numeric arrays, doesn't convert the value from ExprResult's type to BigDecimal, which causes makeLiteral to cast the values. Also, if NaN or Infinite values are present in the array, the error is a generic NumberFormatException. For example: SELECT ARRAY[1.11, 2.22] returns [1, 2] SELECT SQRT(-1) throws a generic NumberFormatException instead of IAE This PR introduces change to cast the numeric values to BigDecimal since Calcite's library understands that easily, and doesn't perform casts.	2021-11-23 11:40:59 +05:30
TSFenwick	a4cb1de87a	get rid of class cast exception and add a new testcase for that issue (#11951 )	2021-11-22 08:44:20 -08:00
Gian Merlino	b3502c3e50	DruidViewMacro: Remove unused escalator field. (#11931 ) * DruidViewMacro: Remove unused escalator field. * Remove additional unused fields.	2021-11-19 16:06:29 -08:00
Gian Merlino	36ee0367ff	Scan: Add "orderBy" parameter. (#11930 ) * Scan: Add "orderBy" parameter. This patch adds an API for requesting non-time orderings, although it does not actually add the ability to execute such queries. The changes are done in such a way that no matter how Scan query objects are constructed, they will have a correct "getOrderBy". This will enable us to switch the execution to exclusively use "getOrderBy" later on when it's implemented. Scan queries are serialized such that they only include "order" (time order) if the ordering is time-based, and they only include "orderBy" if the ordering is non-time-based. This maximizes compatibility with the existing API while also providing a clean look for formatted queries. Because this patch does not include execution logic, if someone actually tries to run a query with non-time ordering, then they will get an error like "Cannot execute query with orderBy [quality ASC]". * SQL module fixes. * Add spotbugs-exclude. * Remove unused method.	2021-11-19 08:19:12 -08:00
somu-imply	29710789a4	Adding safe divide function (#11904 ) * IMPLY-4344: Adding safe divide function along with testcases and documentation updates * Changing based on review comments * Addressing review comments, fixing coding style, docs and spelling * Checkstyle passes for all code * Fixing expected results for infinity * Revert "Fixing expected results for infinity" This reverts commit `5fd5cd480d`. * Updating test result and a space in docs	2021-11-17 08:22:41 -08:00
Gian Merlino	d76e646700	Fix TestServerInventoryView behavioral discrepancy. (#11932 ) Unlike a real one, TestServerInventoryView would call segmentRemoved any time _any_ segment was removed. It should only be called when _all_ segments have been removed.	2021-11-16 18:08:35 -08:00
Clint Wylie	54fead3546	sql skip reduce of complex literal expressions (#11928 )	2021-11-16 15:40:42 -08:00
TSFenwick	1487f558b1	Use a simple class to sanitize JDBC exceptions and also log them (#11843 ) * Use a simple class to sanitize sanitizable errors and log them The purpose of this is to sanitize JDBC errors, but can sanitize other errors if they implement SanitizableError Interface add a class to log errors and sanitize them added a simple test that tests out that the error gets sanitized add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy * return less information as part of too many connections, and instead only log specific details This is so an end user gets relevant information but not too much info since they might now how many brokers they have * return only runtime exceptions added new error types that need to be sanitized also sanitize deprecated and unsupported exceptions. * dont reqrewite exceptions unless necessary for checked exceptions add docs avoid blanket turning all exceptions into runtime exceptions * address comments, to fix up docs. add more javadocs add support UOE sanitization * use try catch instead and sanitize at public methods * checkstyle fixes * throw noSuchStatement and NoSuchConnection as Avatica is affected by those * address comments. move log error back to druid meta clean up bad formatting and commented code. add missed catch for NoSuchStatementException clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions * alter test to reflect new error message	2021-11-16 13:13:03 -08:00
Gian Merlino	6f6e88e02e	SQL: Add type headers to response formats. (#11914 ) This allows clients to interpret the results of SQL queries without having to guess types.	2021-11-13 11:30:57 +05:30
Clint Wylie	5baa22148e	revert ColumnAnalysis type, add typeSignature and use it for DruidSchema (#11895 ) * revert ColumnAnalysis type, add typeSignature and use it for DruidSchema * review stuffs * maybe null * better maybe null * Update docs/querying/segmentmetadataquery.md * Update docs/querying/segmentmetadataquery.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * fix null right * sad * oops * Update batch_hadoop_queries.json Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-10 18:46:29 -08:00
TSFenwick	cdd1c2876c	catch throwable because calcite is throwing an error not exception (#11892 ) * catch throwable because calcite is throwing an error not exception * add test case	2021-11-10 17:22:04 -08:00
Jihoon Son	13bec7468a	Fix NPE for SQL queries when a query parameter is missing in the mid (#11900 ) * Fix NPE for SQL queries when a query parameter is missing in the mid * checkstyle * Throw SqlPlanningException instead of IAE	2021-11-10 10:02:26 -08:00
Clint Wylie	a8805ab60d	add missing json type for ListFilteredVirtualColumn (#11887 ) * add missing json type for ListFilteredVirtualColumn, and tests to try to avoid this happening again * fixes * ugly, but maybe this * oops * too many mappers	2021-11-09 17:25:12 -08:00
Maytas Monsereenusorn	ddc68c6a81	Support changing dimension schema in Auto Compaction (#11874 ) * add impl * add unit tests * fix checkstyle * add impl * add impl * add impl * add impl * add impl * add impl * fix test * add IT * add IT * fix docs * add test * address comments * fix conflict	2021-11-08 21:17:08 -08:00
Clint Wylie	7237dc837c	complex typed expressions (#11853 ) * complex typed expressions * add built-in hll collector expressions to get coverage on druid-processing, more types, more better * rampage!!! * more javadoc * adjustments * oops * lol * remove unused dependency * contradiction? * more test	2021-11-08 00:33:06 -08:00
Gian Merlino	8971056763	Properly count segment references in tests. (#11870 )	2021-11-05 12:49:10 -07:00
Clint Wylie	9bd2ccbb9b	SqlAggregationModuleTest now extends CalciteTestBase to ensure consistent string encoding (#11861 )	2021-11-01 15:11:40 -07:00
Gian Merlino	8276c031c5	Add druid.sql.approxCountDistinct.function property. (#11181 ) * Add druid.sql.approxCountDistinct.function property. The new property allows admins to configure the implementation for APPROX_COUNT_DISTINCT and COUNT(DISTINCT expr) in approximate mode. The motivation for adding this setting is to enable site admins to switch the default HLL implementation to DataSketches. For example, an admin can set: druid.sql.approxCountDistinct.function = APPROX_COUNT_DISTINCT_DS_HLL * Fixes * Fix tests. * Remove erroneous cannotVectorize. * Remove unused import. * Remove unused test imports.	2021-10-25 12:16:21 -07:00
Kashif Faraz	abac9e39ed	Revert permission changes to Supervisor and Task APIs (#11819 ) * Revert "Require Datasource WRITE authorization for Supervisor and Task access (#11718)" This reverts commit `f2d6100124`. * Revert "Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter (#11680)" This reverts commit `6779c4652d`. * Fix docs for the reverted commits * Fix and restore deleted tests * Fix and restore SystemSchemaTest	2021-10-25 14:50:38 +05:30
Gian Merlino	d4cace385f	SQL: Allow Scans to be used as outer queries. (#11831 ) * SQL: Allow Scans to be used as outer queries. This has been possible in the native query system for a while, but the capability hasn't yet propagated into the SQL layer. One example of where this is useful is a query like: SELECT * FROM (... LIMIT X) WHERE <filter> Because this expands the kinds of subquery structures the SQL layer will consider, it was also necessary to improve the cost calculations. These changes appear in PartialDruidQuery and DruidOuterQueryRel. The ideas are: - Attach per-column penalties to the output signature of each query, instead of to the initial projection that starts a query. This encourages moving projections into subqueries instead of leaving them on outer queries. - Only attach penalties to projections if there are actually expressions happening. So, now, projections that simply reorder or remove fields are free. - Attach a constant penalty to every outer query. This discourages creating them when they are not needed. The changes are generally beneficial to the test cases we have in CalciteQueryTest. Most plans are unchanged, or are changed in purely cosmetic ways. Two have changed for the better: - testUsingSubqueryWithLimit now returns a constant from the subquery, instead of returning every column. - testJoinOuterGroupByAndSubqueryHasLimit returns a minimal set of columns from the innermost subquery; two unnecessary columns are no longer there. * Fix various DS operator conversions. These were all implemented as direct conversions, which isn't appropriate because they do not actually map onto native functions. These are only usable as post-aggregations. * Test case adjustment.	2021-10-23 17:18:43 -07:00
Clint Wylie	187df58e30	better types (#11713 ) * better type system * needle in a haystack * ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support * fixup merge * more test * fixup * intern * fix * oops * oops again * ... * more test coverage * fix error message * adjust interning, more javadocs * oops * more docs more better	2021-10-19 01:47:25 -07:00
Kashif Faraz	f2d6100124	Require Datasource WRITE authorization for Supervisor and Task access (#11718 ) Follow up PR for #11680 Description Supervisor and Task APIs are related to ingestion and must always require Datasource WRITE authorization even if they are purely informative. Changes Check Datasource WRITE in SystemSchema for tables "supervisors" and "tasks" Check Datasource WRITE for APIs /supervisor/history and /supervisor/{id}/history Check Datasource for all Indexing Task APIs	2021-10-08 10:39:48 +05:30
Lucas Capistrant	1930ad1f47	Implement configurable internally generated query context (#11429 ) * Add the ability to add a context to internally generated druid broker queries * fix docs * changes after first CI failure * cleanup after merge with master * change default to empty map and improve unit tests * add doc info and fix checkstyle * refactor DruidSchema#runSegmentMetadataQuery and add a unit test	2021-10-06 09:02:41 -07:00
Maytas Monsereenusorn	8cc58a4368	Add sql query id to response header for failed sql query (#11756 ) * add impl * add impl	2021-09-30 13:43:39 +07:00
Clint Wylie	11017ef00a	support jdbc even if trailing / is missing (#11737 ) * support jdbc even if trailing / is missing * fix tests	2021-09-29 13:59:26 -07:00
Maytas Monsereenusorn	a04b08e45c	Add new config to filter internal Druid-related messages from Query API response (#11711 ) * add impl * add impl * add tests * add unit test * fix checkstyle * address comments * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * address comments * address comments * address comments * fix test * fix test * fix test * fix test * fix test * change config name * change config name * change config name * address comments * address comments * address comments * address comments * address comments * address comments * fix compile * fix compile * change config * add more tests * fix IT	2021-09-29 12:55:49 +07:00
Clint Wylie	5de26cf6d9	add optional system schema authorization (#11720 ) * add optional system schema authorization * remove unused * adjust docs * doc fixes, missing ldap config change for integration tests * style	2021-09-21 13:28:26 -07:00
Clint Wylie	392f0ca1b5	refactor sql authorization to get resource type from schema, resource type to be string (#11692 ) * refactor sql authorization to get resource type from schema, refactor resource type from enum to string * information schema auth filtering adjustments * refactor * minor stuff * Update SqlResourceCollectorShuttle.java	2021-09-17 09:53:25 -07:00
Clint Wylie	5e092ccb9b	add MV_FILTER_ONLY, MV_FILTER_NONE, ListFilteredVirtualColumn (#11650 ) * add MV_FILTER_ONLY SQL function, and list filter virtual column * MV_FILTER_NONE and more tests * formatting * o yeah, forgot can do easy thing * style * hmm why was that there * test filtering on virtual column * style * meh * do it right * good bot	2021-09-16 09:31:53 -07:00
Clint Wylie	3044372fc1	improved JDBC logging (#11676 ) * improve jdbc and router query debug logging * log errors too * no stacktrace * trace those stacks	2021-09-16 01:28:16 -07:00
Jihoon Son	0cbd71ebda	Return forbidden when authorization fails for sql query canceling (#11710 ) Switching http response code for authorization failures for sql query canceling to match to sql query posting.	2021-09-15 16:02:19 +05:30
Gian Merlino	7220d0466b	Fix truncation detectability for SQL array, object formats. (#11685 ) The SQL "array" and "object" formats are intended to return invalid JSON (lacking a ] terminator) if an error occurs midstream. This enables callers to detect truncated responses. But JsonGenerators, by default, close JSON arrays even when not explicitly told to. This patch disables automatic array closing, which fixes the problem with truncated response detection. It also adds tests for truncated responses for all result formats.	2021-09-14 15:59:05 -07:00
Clint Wylie	fe1d8c206a	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
Rohan Garg	60efbb51d0	Add test for IS NOT NULL filter on join column in left join (#11636 )	2021-09-06 12:20:41 +05:30
Jihoon Son	82049bbf0a	Cancel API for sqls (#11643 ) * initial work * reduce lock in sqlLifecycle * Integration test for sql canceling * javadoc, cleanup, more tests * log level to debug * fix test * checkstyle * fix flaky test; address comments * rowTransformer * cancelled state * use lock * explode instead of noop * oops * unused import * less aggressive with state * fix calcite charset * don't emit metrics when you are not authorized	2021-09-05 10:57:45 -07:00
Jihoon Son	7e90d00cc0	Configurable maxStreamLength for doubles sketches (#11574 ) * Configurable maxStreamLength for doubles sketches * fix equals/hashcode and it test failure * fix test * fix it test * benchmark * doc * grouping key * fix comment * dependency check * Update docs/development/extensions-core/datasketches-quantiles.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-31 14:56:37 -07:00
Jihoon Son	2a658acad4	Put sleep in an extension (#11632 ) * Put sleep in an extension * dependency	2021-08-25 01:27:45 -07:00
Jihoon Son	78b4be467e	Add sleep function for testing (#11626 ) * Add sleep function for testing * sql function * javadoc	2021-08-24 14:30:31 +07:00
hqx871	38ebaee0fd	VirtualColumnRegistry reuse virtual column should take account of value type (#11546 ) Co-authored-by: huangqixiang.871 <huangqixiang.871@bytedance.com>	2021-08-19 01:46:27 -07:00
Jihoon Son	177264c649	resultFormat name in camel case (#11585 ) * resultFormat name in camel case * test for letter case	2021-08-14 18:30:21 +08:00
frank chen	e40be0ae28	Add SQL functions to format numbers into human readable format (#10635 ) * add binary_byte_format/decimal_byte_format/decimal_format * clean code * fix doc * fix review comments * add spelling check rules * remove extra param * improve type handling and null handling * remove extra zeros * fix tests and add space between unit suffix and number as most size-format functions do * fix tests * add examples * change function names according to review comments * fix merge Signed-off-by: frank chen <frank.chen021@outlook.com> * no need to configure NullHandling explicitly for tests Signed-off-by: frank chen <frank.chen021@outlook.com> * fix tests in SQL-Compatible mode Signed-off-by: frank chen <frank.chen021@outlook.com> * Resolve review comments * Update SQL test case to check null handling * Fix intellij inspections * Add more examples * Fix example	2021-08-13 10:27:49 -07:00
Clint Wylie	9af7ba9d2a	STRING_AGG SQL aggregator function (#11241 ) * add string_agg * oops * style and fix test * spelling * fixup * review stuffs	2021-08-10 13:47:09 -07:00
Jihoon Son	e9d964d504	Improve concurrency between DruidSchema and BrokerServerView (#11457 ) * Improve concurrency between DruidSchema and BrokerServerView * unused imports and workaround for error prone faiure * count only known segments * add comments	2021-08-06 14:07:13 -07:00
Jihoon Son	8ba7f6a48c	Fix incorrect result of exact topN on an inner join with limit (#11517 )	2021-07-31 15:55:49 -07:00
Rohan Garg	c98e7c3aa3	Fix left join SQL queries with IS NOT NULL filter (#11434 ) This PR fixes the incorrect results for query : SELECT dim1, l1.k FROM foo LEFT JOIN (select k \|\| '' as k from lookup.lookyloo group by 1) l1 ON foo.dim1 = l1.k WHERE l1.k IS NOT NULL (in CalciteQueryTests) In the current code, the WHERE clause gets removed from the top of the left join and is pushed to the table foo leading to incorrect results. The fix for such a situation is done by : Converting such left joins into inner joins (since logically the mentioned left join query is equivalent to an inner join) using Calcite while maintaining that the druid execution layer can execute such inner joins. Preferring converted inner joins over original left joins in our cost model	2021-07-23 20:57:19 +05:30
Jihoon Son	84c957f541	Add more sql tests for groupby queries (#11454 ) * Add more sql tests for simple groupby queries * unused import * fix tests * javadocs * unused import	2021-07-20 21:05:11 -07:00
Abhishek Agarwal	94c1671eaf	Split SegmentLoader into SegmentLoader and SegmentCacheManager (#11466 ) This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager. SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects. SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.	2021-07-21 00:14:19 +05:30
kaijianding	e39ff44481	improve groupBy query granularity translation with 2x query performance improve when issued from sql layer (#11379 ) * improve groupBy query granularity translation when issued from sql layer * fix style * use virtual column to determine timestampResult granularity * dont' apply postaggregators on compute nodes * relocate constants * fix order by correctness issue * fix ut * use more easier understanding code in DefaultLimitSpec * address comment * rollback use virtual column to determine timestampResult granularity * fix style * fix style * address the comment * add more detail document to explain the tradeoff * address the comment * address the comment	2021-07-11 10:22:47 -07:00
Suneet Saldanha	49e8732e4f	Display errors for invalid timezones in TIME_FORMAT (#11423 ) Users sometimes make typos when picking timezones - like `America/Los Angeles` instead of `America/Los_Angeles` instead of defaulting to UTC, this change makes it so that an error is thrown instead notifying the user of their mistake.	2021-07-09 06:07:13 -07:00
Abhishek Agarwal	3481bb0440	Better error message for unsupported double values (#11409 ) A constant expression may evaluate to Double.NEGATIVE_INFINITY/Double.POSITIVE_INFINITY/Double.NAN e.g. log10(0). When using such an expression in native queries, the user will get the corresponding value without any error. In SQL, however, the user will run into NumberFormatException because we convert the double to big-decimal while constructing a literal numeric expression. This probably should be fixed in calcite - see https://issues.apache.org/jira/browse/CALCITE-2067. This PR adds a verbose error message so that users can take corrective action without scratching their heads.	2021-07-08 16:55:17 +05:30
Clint Wylie	17efa6f556	add single input string expression dimension vector selector and better expression planning (#11213 ) * add single input string expression dimension vector selector and better expression planning * better * fixes * oops * rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing * oops * javadocs, renaming * more javadocs * benchmarks * use string expression vector processor with vector size 1 instead of expr.eval * better logging * javadocs, surprising number of the the * more * simplify	2021-07-06 11:20:49 -07:00
Clint Wylie	df9b57aa1a	bitwise aggregators, better null handling options for expression agg (#11280 ) * bitwise aggregators, better nulls for expression agg * correct behavior * rework deserialize, better names * fix json, share mask	2021-06-25 16:51:16 -07:00
Clint Wylie	bfbd7ec432	fix a bugs related to SQL type inference return type nullability (#11327 ) * fix a bunch of type inference nullability bugs * fixes * style * fix test * fix concat	2021-06-15 12:26:59 -07:00
Clint Wylie	50327b8f63	ignore bySegment query context for SQL queries (#11352 ) * ignore bySegment query context for SQL queries * revert unintended change	2021-06-11 13:49:03 -07:00
Clint Wylie	6b272c857f	adjust topn heap algorithm to only use known cardinality path when dictionary is unique (#11186 ) * adjust topn heap algorithm to only use known cardinality path when dictionary is unique * better check and add comment * adjust comment more	2021-06-10 18:32:22 -05:00
dependabot[bot]	167044f715	Bump fastutil from 8.2.3 to 8.5.4 (#11347 ) * Bump fastutil from 8.2.3 to 8.5.4 Bumps [fastutil](https://github.com/vigna/fastutil) from 8.2.3 to 8.5.4. - [Release notes](https://github.com/vigna/fastutil/releases) - [Changelog](https://github.com/vigna/fastutil/blob/master/CHANGES) - [Commits](https://github.com/vigna/fastutil/compare/8.2.3...8.5.4) --- updated-dependencies: - dependency-name: it.unimi.dsi:fastutil dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * update licenses.yaml * update maven dependency list for -core and -extra libraries to pass maven dependency checks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2021-06-10 07:43:18 -07:00
Rohan Garg	6c7177714f	Add test for join on __time column (#11289 )	2021-05-26 22:20:39 -07:00
Maria Sitkovets	259207753d	Fix is null selector returning incorrect value for Long data type (#11170 ) * Fix is null selector returning incorrect value for Long data type * Fix style errors * Refactor getObject method to also cache null column values * Make lastInput variable nullable * Refactor unit test * Use new boolean lastInputIsNull instead of Long for lastInput to avoid boxing * Refactor to remove Long for input variable * Make a separate null caching variable * Cleaner null caching implementation	2021-05-19 20:47:02 -07:00
Clint Wylie	aa62073faa	fix sql planner bug with inner offset causing loop (#11259 ) * fix sql planner bug with inner offset causing loop * move check up	2021-05-15 14:26:41 -07:00
Clint Wylie	3649c608d2	array handling improvements (#11233 ) * fix jdbc array handling, split handling for some array and multi value operator, split and add more tests * formatting	2021-05-13 18:50:32 -07:00
Clint Wylie	f6662b4893	fix count and average SQL aggregators on constant virtual columns (#11208 ) * fix count and average SQL aggregators on constant virtual columns * style * even better, why are we tracking virtual columns in aggregations at all if we have a virtual column registry * oops missed a few * remove unused * this will fix it	2021-05-10 13:41:48 -07:00
Clint Wylie	691d7a1d54	SQL timeseries no longer skip empty buckets with all granularity (#11188 ) * SQL timeseries no longer skip empty buckets with all granularity * add comment, fix tests * the ol switcheroo * revert unintended change * docs and more tests * style * make checkstyle happy * docs fixes and more tests * add docs, tests for array_agg * fixes * oops * doc stuffs * fix compile, match doc style	2021-05-10 10:13:37 -07:00
Gian Merlino	a1f850d707	Fix vectorized cardinality bug on certain string columns. (#11199 ) * Fix vectorized cardinality bug on certain string columns. Fixes a bug introduced in #11182, related to the fact that in some cases, ColumnProcessors.makeVectorProcessor will call "makeObjectProcessor" instead of "makeSingleValueDimensionProcessor" or "makeMultiValueDimensionProcessor". CardinalityVectorProcessorFactory improperly ignored calls to "makeObjectProcessor". In addition to fixing the bug, I added this detail to the javadocs for VectorColumnProcessorFactory, to prevent others from running into the same thing in the future. They do not currently call out this case. * Improve test coverage. * Additional fixes.	2021-05-07 08:37:10 -07:00
Clint Wylie	554f1ffeee	ARRAY_AGG sql aggregator function (#11157 ) * ARRAY_AGG sql aggregator function * add javadoc * spelling * review stuff, return null instead of empty when nil input * review stuff * Update sql.md * use type inference for finalize, refactor some things	2021-05-03 22:17:10 -07:00
Gian Merlino	bef7cc911f	Vectorize the cardinality aggregator. (#11182 ) * Vectorize the cardinality aggregator. Does not include a byRow implementation, so if byRow is true then the aggregator still goes through the non-vectorized path. Testing strategy: - New tests that exercise both styles of "aggregate" for supported types. - Some existing tests have also become active (note the deleted "cannotVectorize" lines). * Adjust whitespace.	2021-05-03 20:27:02 -07:00
Jihoon Son	8215cc3238	Unit test for DefaultOperandTypeChecker (#11152 ) * Less strict operand type check and implicit casting * fix ci * Clean up unnecessary changes * more cleanup * unused import	2021-04-27 18:47:38 -07:00
Jihoon Son	261c1f271f	Keep traitSet of logicalValues (#11138 )	2021-04-27 18:45:23 -07:00
Gian Merlino	202c78c8f3	Enable rewriting certain inner joins as filters. (#11068 ) * Enable rewriting certain inner joins as filters. The main logic for doing the rewrite is in JoinableFactoryWrapper's segmentMapFn method. The requirements are: - It must be an inner equi-join. - The right-hand columns referenced by the condition must not contain any duplicate values. (If they did, the inner join would not be guaranteed to return at most one row for each left-hand-side row.) - No columns from the right-hand side can be used by anything other than the join condition itself. HashJoinSegmentStorageAdapter is also modified to pass through to the base adapter (even allowing vectorization!) in the case where 100% of join clauses could be rewritten as filters. In support of this goal: - Add Query getRequiredColumns() method to help us figure out whether the right-hand side of a join datasource is being used or not. - Add JoinConditionAnalysis getRequiredColumns() method to help us figure out if the right-hand side of a join is being used by later join clauses acting on the same base. - Add Joinable getNonNullColumnValuesIfAllUnique method to enable retrieving the set of values that will form the "in" filter. - Add LookupExtractor canGetKeySet() and keySet() methods to support LookupJoinable in its efforts to implement the new Joinable method. - Add "enableRewriteJoinToFilter" feature flag to JoinFilterRewriteConfig. The default is disabled. * Test improvements. * Test fixes. * Avoid slow size() call. * Remove invalid test. * Fix style. * Fix mistaken default. * Small fixes. * Fix logic error.	2021-04-14 10:49:27 -07:00
chenyuzhi459	b8423a38df	add round test (#11088 ) * add round test * code style * handle null val for round function * handle null val for round function * support null for round * fix compatiblity * fix test * fix test * code style * optimize format	2021-04-13 11:36:32 -07:00
Jihoon Son	25db8787b3	Fix CAST being ignored when aggregating on strings after cast (#11083 ) * Fix CAST being ignored when aggregating on strings after cast * fix checkstyle and dependency * unused import	2021-04-12 22:21:24 -07:00
Clint Wylie	338886fd5f	vector group by support for string expressions (#11010 ) * vector group by support for string expressions * fix test * comments, javadoc	2021-04-08 19:23:39 -07:00
Jihoon Son	b51ede5b49	Add a planner rule to handle empty tables (#11058 ) * Add a planner rule to handle empty tables * adjust comment * type handling * add tests * unused imports and fix test * fix more tests * fix more test * javadoc	2021-04-07 10:04:47 -07:00
Abhishek Agarwal	0df0bff44b	Enable multiple distinct aggregators in same query (#11014 ) * Enable multiple distinct count * Add more tests * fix sql test * docs fix * Address nits	2021-04-07 00:52:19 -07:00
chenyuzhi459	450535073e	fix lookup nullable (#11060 ) * fix lookup nullable * fix lookup unit test * test null case	2021-04-02 21:56:42 -07:00
Lasse Krogh Mammen	782a1d4e6c	Add Calcite Avatica protobuf handler (#10543 )	2021-03-31 12:46:25 -07:00
Jihoon Son	43ea184b74	Add explicit EOF and use assert instead of exception (#11041 )	2021-03-31 09:41:57 -07:00
chenyuzhi459	248af38777	Fix subquery with order by (#11017 ) * fix subquery with order by * fix parameter	2021-03-26 04:43:46 -07:00
Clint Wylie	bacad04aa2	make SqlResource laning test less sensitive to timing (#11032 ) * make laning test less sensitive to timing * style	2021-03-26 03:43:28 -07:00
Jonathan Wei	8296123d89	Add resources used to EXPLAIN PLAN FOR output (#11024 )	2021-03-23 17:21:15 -07:00
Samarth Jain	83fcab1d0f	Improve performance of queries against SYSTEM.SEGMENT table. (#11008 ) Size HashMap and HashSet appropriately. Perf analysis of the queries revealed that over 25% of the query time was spent in resizing HashMap and HashSet collections. Also, prevent the need to examine and authorize all resources when AllowAllAuthorizer is the configured authorizer.	2021-03-17 22:24:02 -07:00
Clint Wylie	4cd4a22f87	expression filter support for vectorized query engines (#10613 ) * expression filter support for vectorized query engines * remove unused codes * more tests * refactor, more tests * suppress * more * more * more * oops, i was wrong * comment * remove decorate, object dimension selector, more javadocs * style	2021-03-16 11:46:50 -07:00
Clint Wylie	58294329b7	fix SQL issue for group by queries with time filter that gets optimized to false (#10968 ) * fix SQL issue for group by queries with time filter that gets optimized to false * short circuit always false in CombineAndSimplifyBounds * adjust * javadocs * add preconditions for and/or filters to ensure they have children * add comments, remove preconditions	2021-03-09 19:41:16 -08:00
Jonathan Wei	9c083783c9	Don't fail on invalid views in InformationSchema (#10960 ) * Don't fail on invalid views in InformationSchema * Fix test	2021-03-09 16:19:59 -08:00
Abhishek Agarwal	c66951a59e	Add flag in SQL to disable left base filter optimization for joins (#10947 ) * Add flag to disable left base filter * code coverage * Draft * Review comments * code coverage * add docs * Add old tests	2021-03-09 13:07:34 -08:00
Abhishek Agarwal	ae620921df	Fix classCastException when inputs to union are join (#10950 ) * Fix union queries * Add tests	2021-03-08 21:20:26 -08:00
Abhishek Agarwal	1a15987432	Supporting filters in the left base table for join datasources (#10697 ) * where filter left first draft * Revert changes in calcite test * Refactor a bit * Fixing the Tests * Changes * Adding tests * Add tests for correlated queries * Add comment * Fix typos	2021-03-04 10:39:21 -08:00
Clint Wylie	f34c6eb3c0	add druid jdbc handler config for minimum number of rows per frame (#10880 ) * add druid jdbc handler config for minimum number of rows per frame * javadocs and docs adjustments * spelling * adjust docs per review with minor tweaks * adjust more	2021-02-23 02:11:04 -08:00

... 2 3 4 5 6 ...

767 Commits