druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	65a1375b67	SQL: Add is_active to sys.segments, update examples and docs. (#11550 ) * SQL: Add is_active to sys.segments, update examples and docs. is_active is short for: (is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1 It's important because this represents "all the segments that should be queryable, whether or not they actually are right now". Most of the time, this is the set of segments that people will want to look at. The web console already adds this filter to a lot of its queries, proving its usefulness. This patch also reworks the caveat at the bottom of the sys.segments section, so its information is mixed into the description of each result field. This should make it more likely for people to see the information. * Wording updates. * Adjustments for spellcheck. * Adjust IT.	2022-05-19 14:23:28 -07:00
Adarsh Sanjeev	fcb1c0b7bf	Add cluster by support for replace syntax (#12524 ) * Add cluster by support for replace syntax * Add unit test for with list	2022-05-17 15:15:29 +05:30
Adarsh Sanjeev	0fd4f1e386	Improve error messages from SQL REPLACE syntax (#12523 ) - Add user friendly error messages for missing or incorrect OVERWRITE clause for REPLACE SQL query - Move validation of missing OVERWRITE clause at code level instead of parser for custom error message	2022-05-17 09:55:58 +05:30
Adarsh Sanjeev	39b3487aa9	Add replace statement to sql parser (#12386 ) Relevant Issue: #11929 - Add custom replace statement to Druid SQL parser. - Edit DruidPlanner to convert relevant fields to Query Context. - Refactor common code with INSERT statements to reuse them for REPLACE where possible.	2022-05-13 10:56:40 +05:30
Clint Wylie	9e5a940cf1	remake column indexes and query processing of filters (#12388 ) Following up on #12315, which pushed most of the logic of building ImmutableBitmap into BitmapIndex in order to hide the details of how column indexes are implemented from the Filter implementations, this PR totally refashions how Filter consume indexes. The end result, while a rather dramatic reshuffling of the existing code, should be extraordinarily flexible, eventually allowing us to model any type of index we can imagine, and providing the machinery to build the filters that use them, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.	2022-05-11 11:57:08 +05:30
Rohan Garg	75836a5a06	Add feature flag for sql planning of TimeBoundary queries (#12491 ) * Add feature flag for sql planning of TimeBoundary queries * fixup! Add feature flag for sql planning of TimeBoundary queries * Add documentation for enableTimeBoundaryPlanning * fixup! Add documentation for enableTimeBoundaryPlanning	2022-05-10 15:23:42 +05:30
somu-imply	c68388ebcd	Vectorized version of string last aggregator (#12493 ) * Vectorized version of string last aggregator * Updating string last and adding testcases * Updating code and adding testcases for serializable pairs * Addressing review comments	2022-05-09 17:02:38 -07:00
Gian Merlino	a2bad0b3a2	Reduce allocations due to Jackson serialization. (#12468 ) * Reduce allocations due to Jackson serialization. This patch attacks two sources of allocations during Jackson serialization: 1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new DefaultSerializerProvider instance for each call. It has lots of fields and creates pressure on the garbage collector. So, this patch adds helper functions in JacksonUtils that enable reuse of SerializerProvider objects and updates various call sites to make use of this. 2) GroupByQueryToolChest copies the ObjectMapper for every query to install a special module that supports backwards compatibility with map-based rows. This isn't needed if resultAsArray is set and all servers are running Druid 0.16.0 or later. This release was a while ago. So, this patch disables backwards compatibility by default, which eliminates the need to copy the heavyweight ObjectMapper. The patch also introduces a configuration option that allows admins to explicitly enable backwards compatibility. * Add test. * Update additional call sites and add to forbidden APIs.	2022-04-27 14:17:26 -07:00
Gian Merlino	2e42d04038	SQL: Create millisecond precision timestamp literals. (#12407 ) * SQL: Create millisecond precision timestamp literals. Fixes a bug where implicit casts of strings to timestamps would use seconds precision rather than milliseconds. The new test case testCountStarWithBetweenTimeFilterUsingMillisecondsInStringLiterals exercises this. * Update sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java Co-authored-by: Frank Chen <frankchen@apache.org> * Correct precision handling. - Set default precision to 3 (millis) for things involving timestamps. - Respect precision specified in types when available. * Silence, checkstyle. Co-authored-by: Frank Chen <frankchen@apache.org>	2022-04-27 14:17:07 -07:00
Abhishek Agarwal	2fe053c5cb	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
Adarsh Sanjeev	1306965c9e	Validate select columns for insert statement (#12431 ) Unnamed columns in the select part of insert SQL statements currently create a table with the column name such as "EXPR$3". This PR adds a check for this.	2022-04-27 12:25:49 +05:30
somu-imply	027935dcff	Vectorize numeric latest aggregators (#12439 ) * Vectorizing Latest aggregator Part 1 * Updating benchmark tests * Changing appropriate logic for vectors for null handling * Introducing an abstract class and moving the commonalities there * Adding vectorization for StringLast aggregator (initial version) * Updated bufferized version of numeric aggregators * Adding some javadocs * Making sure this PR vectorizes numeric latest agg only * Adding another benchmarking test * Fixing intellij inspections * Adding tests for double * Adding test cases for long and float * Updating testcases * Checkstyle oops.. * One tiny change in test case * Fixing spotbug and rhs not being used	2022-04-26 11:33:08 -07:00
Rohan Garg	95694b5afa	Convert simple min/max SQL queries on __time to timeBoundary queries (#12472 ) * Support array based results in timeBoundary query * Fix bug with query interval in timeBoundary * Convert min(__time) and max(__time) SQL queries to timeBoundary * Add tests for timeBoundary backed SQL queries * Fix query plans for existing tests * fixup! Convert min(__time) and max(__time) SQL queries to timeBoundary * fixup! Add tests for timeBoundary backed SQL queries * fixup! Fix bug with query interval in timeBoundary	2022-04-25 08:18:58 -07:00
Jihoon Son	73ce5df22d	Add support for authorizing query context params (#12396 ) The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below. Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params. User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters. System context params. They are set by the Druid query engine during query processing. These params override other context params. Today, any context params are allowed to users. This can cause 1) a bad UX if the context param is not matured yet or 2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows. This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission. { "resourceAction" : { "resource" : { "name" : "maxSubqueryRows", "type" : "QUERY_CONTEXT" }, "action" : "WRITE" }, "resourceNamePattern" : "maxSubqueryRows" } Each role can have multiple permissions for context params. Each permission should be set for different context params. When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case, HTTP endpoints will return 403 response code. JDBC will throw ForbiddenException. Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService. The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.	2022-04-21 14:21:16 +05:30
somu-imply	2db02876cf	Updating an error msg (#12450 ) * Updating an error msg * Added an extra [] so removing it	2022-04-20 07:56:09 -07:00
somu-imply	cd6fba2f6c	Handling planning with alias for time for group by and order by (#12418 ) An outer scan query, that requires ordering on a column, should be considered an invalid query.	2022-04-15 10:29:17 +05:30
Adarsh Sanjeev	b74cb7624d	Make error messages for insert statements consistent with select statements (#12414 ) For a query like INSERT INTO tablename SELECT channel, added as count FROM wikipedia the error message is Encountered "as count". However, for the insert statement INSERT INTO t SELECT channel, added as count FROM wikipedia PARTITIONED BY ALL returns INSERT statements must specify PARTITIONED BY clause explictly (incorrectly). This PR corrects this. Add EOF to end of Druid SQL Insert statements Rename SQL Insert statements in the parser to reflect the behaviour change	2022-04-09 12:21:40 +05:30
Paul Rogers	2cc2088720	Method to specify eternity in the scan query builder (#12223 ) * Method to specify eternity in the scan query builder * Fix checkstyle issue * Renamed eterity() to eternityInterval() * Minor fixes	2022-04-04 15:11:32 -07:00
Adarsh Sanjeev	ef45a1551e	Convert inQueryThreshold into query context parameter. (#12357 ) Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.	2022-03-22 18:33:57 +05:30
Gian Merlino	cb2b2b696d	Fix error message for groupByEnableMultiValueUnnesting. (#12325 ) * Fix error message for groupByEnableMultiValueUnnesting. It referred to the incorrect context parameter. Also, create a dedicated exception class, to allow easier detection of this specific error. * Fix other test. * More better error messages. * Test getDimensionName method.	2022-03-10 11:37:24 -08:00
Rohan Garg	9f6a930462	Fix join query incase of filter explosion during CNF conversion (#12324 )	2022-03-09 12:43:09 -08:00
Clint Wylie	1c004ea47e	use virtual columns for sql simple aggregators instead of inline expressions (#12251 ) * use virtual columns for sql simple aggregators instead of inline expressions * fixes * always use virtual columns * add more tests	2022-03-03 15:05:28 -08:00
Xavier Léauté	1434197ee1	update airline dependency to 2.x (#12270 ) * upgrade Airline to Airline 2 https://github.com/airlift/airline is no longer maintained, updating to https://github.com/rvesse/airline (Airline 2) to use an actively maintained version, while minimizing breaking changes. Note, this is a backwards incompatible change, and extensions relying on the CliCommandCreator extension point will also need to be updated. * fix dependency checks where jakarta.inject is now resolved first instead of javax.inject, due to Airline 2 using jakarta	2022-02-27 15:19:28 -08:00
Jihoon Son	e5ad862665	A new includeAllDimension flag for dimensionsSpec (#12276 ) * includeAllDimensions in dimensionsSpec * doc * address comments * unused import and doc spelling	2022-02-25 18:27:48 -08:00
Jonathan Wei	b1640a72ee	Re-enable segment metadata cache when using external schema (#12264 )	2022-02-22 19:50:29 -06:00
Karan Kumar	5794331eb1	Adding new config for disabling group by on multiValue column (#12253 ) As part of #12078 one of the followup's was to have a specific config which does not allow accidental unnesting of multi value columns if such columns become part of the grouping key. Added a config groupByEnableMultiValueUnnesting which can be set in the query context. The default value of groupByEnableMultiValueUnnesting is true, therefore it does not change the current engine behavior. If groupByEnableMultiValueUnnesting is set to false, the query will fail if it encounters a multi-value column in the grouping key.	2022-02-16 20:53:26 +05:30
Laksh Singla	8fc0e5c95c	Explain plan for custom insert syntax (#12243 ) * Initial commit, explain plan for custom insert syntax working * Cleanup separate SqlInsert handling	2022-02-15 21:48:34 -08:00
somu-imply	eae163a797	Moving in filter check to broker (#12195 ) * Moving in filter check to broker * Adding more unit tests, making error message meaningful * Spelling and doc changes * Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100 * Removing upper limit of 100, updated docs * Making documentation more meaningful * Moving check outside to PlannerConfig, updating test cases and adding back max limit * Updated with some additional code comments * Missed removing one line during the checkin * Addressing doc changes and one forbidden API correction * Final doc change * Adding a speling exception, correcting a testcase * Reading entire filter tree to address combinations of ANDs and ORs * Specifying in docs that, this case works only for ORs * Revert "Reading entire filter tree to address combinations of ANDs and ORs" This reverts commit `81ca8f8496`. * Covering a class cast exception and updating docs * Counting changed Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-02-15 20:45:07 -08:00
somu-imply	033989eb1d	Adding vectorized time_shift (#12254 ) * Adding vectorized time_shift * Vectorize time shift, addressing review comments * Remove an unused import	2022-02-11 14:44:52 -08:00
Laksh Singla	5bd646e10a	Surface a user friendly error when PARTITIONED BY is omitted (#12246 ) #12163 makes PARTITIONED BY a required clause in INSERT queries. While this is required, if a user accidentally omits the clause, it emits a JavaCC/Calcite error, since it's syntactically incorrect. The error message is cryptic. Since it's a custom clause, this PR aims to make the clause optional on the syntactic side, but move the validation to DruidSqlInsert where we can surface a friendlier error.	2022-02-11 11:49:00 +05:30
Clint Wylie	3ee66bb492	allow optimizing sql expressions and virtual columns (#12241 ) * rework sql planner expression and virtual column handling * simplify a bit * add back and deprecate old methods, more tests, fix multi-value string coercion bug and associated tests * spotbugs * fix bugs with multi-value string array expression handling * javadocs and adjust test * better * fix tests	2022-02-09 14:55:50 -08:00
Laksh Singla	4add2510ed	Add syntax support for PARTITIONED BY/CLUSTERED BY in INSERT queries (#12163 ) This PR aims to add parser changes for supporting PARTITIONED BY and CLUSTERED BY as proposed in the issue #11929.	2022-02-08 16:23:15 +05:30
Clint Wylie	ae71e05fc5	array_concat_agg and array_agg support for array inputs (#12226 ) * array_concat_agg and array_agg support for array inputs changes: * added array_concat_agg to aggregate arrays into a single array * added array_agg support for array inputs to make nested array * added 'shouldAggregateNullInputs' and 'shouldCombineAggregateNullInputs' to fix a correctness issue with STRING_AGG and ARRAY_AGG when merging results, with dual purpose of being an optimization for aggregating * fix test * tie capabilities type to legacy mode flag about coercing arrays to strings * oops * better javadoc	2022-02-07 19:59:30 -08:00
Clint Wylie	8fd587b28c	remove duplicate Broker ServerInventoryView, improve HttpServerInventoryView logging (#12209 ) * changes: * remove SystemSchema duplicate ServerInventoryView in broker * suppress duplicate segment added/removed warnings in HttpServerInventoryView when doing a full sync * fixes	2022-02-03 12:57:34 -08:00
Maytas Monsereenusorn	3717693633	Fix java.lang.ClassCastException error when using useApproximateCountDistinct false for aggregation query (#12216 ) * add imply * add test * add unit test * add test	2022-02-03 12:01:13 -08:00
Clint Wylie	f9b406c8f2	add backwards compatibility mode for multi-value string array null value coercion (#12210 )	2022-01-31 22:38:15 -08:00
Abhishek Agarwal	1b8808cce8	Fix SQL queries for inline datasource with null values (#12092 ) Fixes a bug because of which some SQL queries cannot be parsed using druid convention. Specifically, these queries translate to an inline datasource and have some null values. Calcite internally uses NULL as SQL type for these literals and that is not supported by the druid. I am now allowing null column types to be returned while building RowSignature in org.apache.druid.sql.calcite.table.RowSignatures#fromRelDataType. RowSignature already allows null column type for any column. Doing so should also fix bindable queries such as select (1,2). When such queries are run with headers set to true, we get an exception in org.apache.druid.sql.http.ArrayWriter#writeHeader. This is again a similar exception to the one addressed in this PR. Because SQL type for the result column is RECORD and that doesn't have a corresponding columnType.	2022-01-27 18:04:12 +05:30
Karan Kumar	96b3498a40	Grouping on arrays as arrays (#12078 ) * init multiValue column group by * Changing sorting to Lexicographic as default * Adding initial tests * 1.Fixing test cases adding 2.Optimized inmem structs * Linking SQL layer to native layer * Adding multiDimension support to group by column strategy * 1. Removing array coercion in Calcite layer 2. Removing ResultRowDeserializer * 1. Supporting all primitive array types 2. Removing dimension spec as part of columnSelector * 1. Supporting all primitive array types 2. Removing dimension spec as part of columnSelector * 1. Checkstyle things 2. Removing flag * Minor naming things * CheckStyle Things * Fixing test case * Fixing hashing * 1. Adding the MV function 2. Added few test cases * 1. Adding MV function test cases * Adding Selector strategy function test cases * Fixing ClientQuerySegmentWalkerTest * Adding GroupByQueryRunnerTest test cases * Fixing test cases * Adding few more test cases * Fixing Exception asset statement and intellij inspection * Adding null compatibility tests * Review comments * Fixing few failing tests * Fixing few failing tests * Do no convert to topN Q incase of group by on array * Fixing checkstyle * Fixing differences between jdk's class cast exception message * 1. Fixing ordering if the grouping key is an array * Fixing DefaultLimitSpec * Fixing CalciteArraysQueryTest * Dummy commit for LGTM * changes: * only coerce multi-value string null values when `ExpressionPlan.Trait.NEEDS_APPLIED` is set * correct return type inference for ARRAY_APPEND,ARRAY_PREPEND,ARRAY_SLICE,ARRAY_CONCAT * fix bug with ExprEval.ofType when actual type of object from binding doesn't match its claimed type * Review comments * Fixing test cases * Fixing spot bugs * Fixing strict compile Co-authored-by: Clint Wylie <cwylie@apache.org>	2022-01-25 20:30:56 -08:00
somu-imply	cc8b9c0b6e	Handling OOM error in ExpressionVector setup by reducing number of rows (#12186 ) * Handling OOM error in ExpressionVector setup by reducing number of rows * Removing row size to 10K in sanity tests	2022-01-24 08:37:13 -08:00
Laksh Singla	dc1703d5f9	Change value of `druid.sql.planner.useGroupingSetForExactDistinct` in common.runtime.properties (#12182 ) This PR changes the value of the property `druid.sql.planner.useGroupingSetForExactDistinct` from `false` to `true` in the runtime.properties files, so that newer installations have this property as `true`, while the default still remains as `false`. The flag determines how queries which contain an aggregation over `DISTINCT` like `SELECT COUNT(DISTINCT foo.dim1) FILTER(WHERE foo.cnt = 1), SUM(foo.cnt) FROM druid.foo` get planned by Calcite. With the flag being set to false, it plans it via joins, whereas with it being set to true, the query is set using grouping sets. There is a known issue with Calcite (https://github.com/apache/druid/issues/7953), where an NPE is thrown while planning the above query with joins. There is no such issue while planning the query using grouping sets.	2022-01-24 14:00:03 +05:30
Jihoon Son	cc2ffc6c0f	Fix node discovery to ignore unknown DruidServices (#12157 ) * Fix node discovery to ignore unknown DruidServices * ignore all runtime exceptions * fix test * add custom deserializer * custom serializer * log host for unparseable druidService	2022-01-18 22:08:59 -08:00
Gian Merlino	cf7191d2bc	Validate target dataSource for INSERT. (#12129 )	2022-01-18 09:34:23 -08:00
Maytas Monsereenusorn	bd7fe45da0	Support adding metrics in Auto Compaction (#12125 ) * add impl * add impl * add unit tests * add unit tests * add unit tests * add unit tests * add unit tests * add integration tests * add integration tests * fix LGTM * fix test * remove doc	2022-01-17 20:19:31 -08:00
Clint Wylie	f2ce76966c	add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures less ambiguous (#12145 ) * add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures unambiguous * switcheroo * EARLIEST_BY/LATEST_BY use timestamp instead of numeric types, update docs * revert unintended change * fix docs * fix docs better	2022-01-12 03:48:53 -08:00
Laksh Singla	fae73800a7	Set plannerContext error when cannot query external datasources and when insert is not supported. (#12136 ) This PR aims to add plannerContext.setPlanningError whenever external table scan rule is invoked, without the queryMaker having the ability to do so.	2022-01-12 15:11:17 +05:30
Rohan Garg	81f0aba6cb	Use ListFilteredVirtualColumn for left/fact table expression in join condition (#12127 ) * Pass VirtualColumnRegistry in PlannerContext for join expression planning * Allow for including VCs from join fact table expression * Optmize MV_FILTER functions to use a VC when in join fact table expression * fixup! Allow for including VCs from join fact table expression * Address review comments	2022-01-11 14:47:13 -08:00
Laksh Singla	7c17341caa	Return empty result when a group by gets optimized to a timeseries query (#12065 ) Related to #11188 The above mentioned PR allowed timeseries queries to return a default result, when queries of type: select count() from table where dim1="_not_present_dim_" were executed. Before the PR, it returned no row, after the PR, it would return a row with value of count() as 0 (as expected by SQL standards of different dbs). In Grouping#applyProject, we can sometimes perform optimization of a groupBy query to a timeseries query if possible (when the keys of the groupBy are constants, as generated by automated tools). For example, in select count() from table where dim1="_present_dim_" group by "dummy_key", the groupBy clause can be removed. However, in the case when the filter doesn't return anything, i.e. select count() from table where dim1="_not_present_dim_" group by "dummy_key", the behavior of general databases would be to return nothing, while druid (due to above change) returns an empty row. This PR aims to fix this divergence of behavior. Example cases: select count() from table where dim1="_not_present_dim_" group by "dummy_key". CURRENT: Returns a row with count() = 0 EXPECTED: Return no row select 'A', dim1 from foo where m1 = 123123 and dim1 = '_not_present_again_' group by dim1 CURRENT: Returns a row with ('A', 'wat') EXPECTED: Return no row To do this, a boolean droppedDimensionsWhileApplyingProject has been added to Grouping which is true whenever we make changes to the original shape with optimization. Hence if a timeseries query has a grouping with this set to true, we set skipEmptyBuckets=true in the query context (i.e. donot return any row).	2022-01-07 21:53:48 +05:30
Jonathan Wei	9b598407c1	Add interface for external schema provider to Druid SQL (#12043 ) * Add interfce for external schema provider to Druid SQL * Add annotations	2021-12-22 22:17:57 +05:30
somu-imply	1871a1ab18	ARRAY_AGG and STRING_AGG will through errors if invoked on a complex datatype (#12089 )	2021-12-21 17:41:04 -08:00
Laksh Singla	16642fb278	Fix incorrect type conversion in DruidLogicalValueRule (#11923 ) DruidLogicalValuesRule while transforming to DruidRel can return incorrect values, if during the creation of the literal it was created from a float value. The BigDecimal representation stores 123.0, and it seems that using RexLiteral's method while conversion returns the inflated value (which is 1230). I am unsure if this is intentional from Calcite's perspective, and the actual change should be done somewhere else. Extract the values of INT/LONG from the RexLiteral in the DruidLogicalValuesRule, via BigDecimal.longValue() method.	2021-12-15 10:44:35 +05:30
Clint Wylie	244c2559e9	fix IncrementalIndex performance regression (#12048 ) changes: * IncrementalIndex is now a ColumnInspector * fixes performance regression from using map of ColumnCapabilities from IncrementalIndex as a RowSignature	2021-12-09 22:04:32 -08:00
Abhishek Agarwal	7abf847eae	Return 400 when SQL query cannot be planned (#12033 ) In this PR, we will now return 400 instead of 500 when SQL query cannot be planned. I also fixed a bug where error messages were not getting sent to the users in case the rules throw UnsupportSQLQueryException.	2021-12-08 21:49:54 +05:30
Laksh Singla	ca260dfef6	Intern RowSignature in DruidSchema to reduce its memory footprint (#12001 ) DruidSchema consists of a concurrent HashMap of DataSource -> Segement -> AvailableSegmentMetadata. AvailableSegmentMetadata contains RowSignature of the segment, and for each segment, a new object is getting created. RowSignature is an immutable class, and hence it can be interned, and this can lead to huge savings of memory being used in broker, since a lot of the segments of a table would potentially have same RowSignature.	2021-12-08 15:11:13 +05:30
Clint Wylie	45be2be368	fix issues with multi-value string constant expressions (#12025 ) * add specialized constant selector for multi-valued string constants	2021-12-08 00:10:26 -08:00
Abhishek Agarwal	834aae096a	Human-readable and actionable SQL error messages (#11911 ) This PR does two things 1. It adds the capability to surface missing features in SQL to users - The calcite planner will explore through multiple rules to convert a logical SQL query to a druid native query. Some rules change the shape of the query itself, optimize it and some rules are responsible for translating the query into a druid native query. These are DruidQueryRule, DruidOuterQueryRule, DruidJoinRule, DruidUnionDataSourceRule, DruidUnionRule etc. These rules will look at SQL and will do the necessary transformation. But if the rule can't transform the query, it returns back the control to the calcite planner without recording why was it not able to transform. E.g. there is a join query with a non-equal join condition. DruidJoinRule will look at the condition, see that it is not supported, and return back the control. The reason can be that a query can be planned in many different ways so if one rule can't parse it, the query may still be parseable by other rules. In this PR, we are intercepting these gaps and passing them back to the user if the query could not be planned at all. 2. The said capability has been used to generate actionable errors for some common unsupported SQL features. However, not all possible errors are covered and we can keep adding more in the future.	2021-12-07 09:44:08 +05:30
Laksh Singla	44b2fb71ab	Fix the error case when there are multi top level unions (#12017 ) This is a follow up to the PR #11908. This fixes the bug in top level union all queries when there are more than 2 SQL subqueries are present.	2021-12-07 01:12:02 +05:30
Jihoon Son	1f052b43c5	Better serverView exec name; remove SingleServerInventoryView (#11770 ) Druid currently has 2 serverViews, regular serverView and filtered serverView. The regular serverView is used to monitor all segment announcements from all data nodes (historicals, tasks, indexers). The filtered serverView is used when you want to watch segment announcements from particular tiers. Since these server views keep track of different sets of druidServers and segments in memory, they should be maintained separately. However, they currently share the same name for their executorService, which can cause confusion and make debugging harder especially in the broker since it is using both serverViews, the filtered view for normal query processing and the regular view to serve the servers table (I'm unsure whether this is intended or whether this is a good behavior). This PR changes it to a more obvious name. This PR also removes SingleServerInventoryView. This view was deprecated a long time ago and has not been documented at least since 0.13 (#6127). I also don't think this can be better in any case than BatchServerInventoryView. Finally, I merged AbstractCuratorServerInventoryView and BatchServerInventoryView as we no longer need AbstractCuratorServerInventoryView after SingleServerInventoryView is removed.	2021-12-04 18:43:05 +05:30
Gian Merlino	e0e05aad99	Enhancements to IndexTaskClient. (#12011 ) * Enhancements to IndexTaskClient. 1) Ability to use handlers other than StringFullResponseHandler. This functionality is not used in production code yet, but is useful because it will allow tasks to communicate with each other in non-string-based formats and in streaming fashion. In the future, we'll be able to use this to make task-to-task communication more efficient. 2) Truncate server errors at 1KB, so long errors do not pollute logs. 3) Change error log level for retryable errors from WARN to INFO. (The final error is still WARN.) 4) Harmonize log and exception messages to have a more consistent format. * Additional tests and improvements.	2021-12-03 09:14:32 -08:00
Clint Wylie	af6541a236	allow `DruidSchema` to fallback to segment metadata 'type' if 'typeSignature' is null (#12016 ) * allow `DruidSchema` to fallback to segment metadata type if typeSignature is null, to avoid producing incorrect SQL schema if broker is upgraded to 0.23 before historicals * mmm, forbidden tests	2021-12-02 17:42:01 -08:00
Clint Wylie	84b4bf56d8	vectorize logical operators and boolean functions (#11184 ) changes: * adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions * vectorize logical operators and boolean functions, some only if useStrictBooleans is true	2021-12-02 16:40:23 -08:00
Paul Rogers	a66f10eea1	Code cleanup from query profile project (#11822 ) * Code cleanup from query profile project * Fix spelling errors * Fix Javadoc formatting * Abstract out repeated test code * Reuse constants in place of some string literals * Fix up some parameterized types * Reduce warnings reported by Eclipse * Reverted change due to lack of tests	2021-11-30 11:35:38 -08:00
Kashif Faraz	b48f5a576b	Fix: Do not require time condition on InlineDataSource (#11982 ) For queries on logical values, e.g. SELECT 1337, we need not check for a filter on __time column even if requireTimeCondition is true.	2021-11-25 21:10:06 +05:30
Laksh Singla	c381cae51b	Improve the output of SQL explain message (#11908 ) Currently, when we try to do EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side. This PR aims to change the format when user tries to EXPLAIN PLAN FOR for queries which are executed by converting them into Druid's native queries (i.e. not sys schemas).	2021-11-25 21:08:33 +05:30
Rohan Garg	2c08055962	Specify time column for first/last aggregators (#11949 ) Add the ability to pass time column in first/last aggregator (and latest/earliest SQL functions). It is to support cases where the time to query upon is stored as a part of a column different than __time. Also, some other logical time column can be specified.	2021-11-25 09:44:14 +05:30
Gian Merlino	0354407655	SQL INSERT planner support. (#11959 ) * SQL INSERT planner support. The main changes are: 1) DruidPlanner is able to validate and authorize INSERT queries. They require WRITE permission on the target datasource. 2) QueryMaker is now an interface, and there is a QueryMakerFactory that creates instances of it. There is only one production implementation of each (NativeQueryMaker and NativeQueryMakerFactory), which together behave the same way as the former QueryMaker class. But this opens the door to executing queries in ways other than the Druid query stack, and is used by unit tests (CalciteInsertDmlTest) to test the INSERT planning functionality. 3) Adds an EXTERN table macro that allows references external data using InputSource and InputFormat from Druid's batch ingestion API. This is not exposed in production yet, but is used by unit tests. 4) Adds a QueryFeature concept that enables the planner to change its behavior slightly depending on the capabilities of the execution system. 5) Adds an "AuthorizableOperator" concept that enables SqlOperators to require additional permissions. This is used by the EXTERN table macro. Related odds and ends: - Add equals, hashCode, toString methods to InlineInputSource. Aids in the "from external" tests in CalciteInsertDmlTest. - Add JSON-serializability to RowSignature. - Move the SQL string inside PlannerContext so it is "baked into" the planner when the planner is created. Cleans up the code a bit, since in practice, the same query is passed in every time to the same planner anyway. * Fix up calls to CalciteTests.createMockQueryLifecycleFactory. * Fix checkstyle issues. * Adjustments for CI. * Adjust DruidAvaticaHandlerTest for stricter test authorizations.	2021-11-24 12:14:04 -08:00
Maytas Monsereenusorn	bb3d2a433a	Support filtering data in Auto Compaction (#11922 ) * add impl * fix checkstyle * add test * add test * add unit tests * fix unit tests * fix unit tests * fix unit tests * add IT * add IT * add comments * fix spelling	2021-11-24 10:56:38 -08:00
Abhishek Agarwal	b6a0fbc8b6	Break down CalciteQueryTest (#11979 ) * Refactor calciteQueryTest * Move more tests to CalciteJoinQueryTest	2021-11-24 00:15:42 +05:30
Laksh Singla	b5a25f24f2	Improve the DruidRexExecutor w.r.t handling of numeric arrays (#11968 ) DruidRexExecutor while reducing Arrays, specially numeric arrays, doesn't convert the value from ExprResult's type to BigDecimal, which causes makeLiteral to cast the values. Also, if NaN or Infinite values are present in the array, the error is a generic NumberFormatException. For example: SELECT ARRAY[1.11, 2.22] returns [1, 2] SELECT SQRT(-1) throws a generic NumberFormatException instead of IAE This PR introduces change to cast the numeric values to BigDecimal since Calcite's library understands that easily, and doesn't perform casts.	2021-11-23 11:40:59 +05:30
TSFenwick	a4cb1de87a	get rid of class cast exception and add a new testcase for that issue (#11951 )	2021-11-22 08:44:20 -08:00
Gian Merlino	b3502c3e50	DruidViewMacro: Remove unused escalator field. (#11931 ) * DruidViewMacro: Remove unused escalator field. * Remove additional unused fields.	2021-11-19 16:06:29 -08:00
Gian Merlino	36ee0367ff	Scan: Add "orderBy" parameter. (#11930 ) * Scan: Add "orderBy" parameter. This patch adds an API for requesting non-time orderings, although it does not actually add the ability to execute such queries. The changes are done in such a way that no matter how Scan query objects are constructed, they will have a correct "getOrderBy". This will enable us to switch the execution to exclusively use "getOrderBy" later on when it's implemented. Scan queries are serialized such that they only include "order" (time order) if the ordering is time-based, and they only include "orderBy" if the ordering is non-time-based. This maximizes compatibility with the existing API while also providing a clean look for formatted queries. Because this patch does not include execution logic, if someone actually tries to run a query with non-time ordering, then they will get an error like "Cannot execute query with orderBy [quality ASC]". * SQL module fixes. * Add spotbugs-exclude. * Remove unused method.	2021-11-19 08:19:12 -08:00
somu-imply	29710789a4	Adding safe divide function (#11904 ) * IMPLY-4344: Adding safe divide function along with testcases and documentation updates * Changing based on review comments * Addressing review comments, fixing coding style, docs and spelling * Checkstyle passes for all code * Fixing expected results for infinity * Revert "Fixing expected results for infinity" This reverts commit `5fd5cd480d`. * Updating test result and a space in docs	2021-11-17 08:22:41 -08:00
Gian Merlino	d76e646700	Fix TestServerInventoryView behavioral discrepancy. (#11932 ) Unlike a real one, TestServerInventoryView would call segmentRemoved any time _any_ segment was removed. It should only be called when _all_ segments have been removed.	2021-11-16 18:08:35 -08:00
Clint Wylie	54fead3546	sql skip reduce of complex literal expressions (#11928 )	2021-11-16 15:40:42 -08:00
TSFenwick	1487f558b1	Use a simple class to sanitize JDBC exceptions and also log them (#11843 ) * Use a simple class to sanitize sanitizable errors and log them The purpose of this is to sanitize JDBC errors, but can sanitize other errors if they implement SanitizableError Interface add a class to log errors and sanitize them added a simple test that tests out that the error gets sanitized add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy * return less information as part of too many connections, and instead only log specific details This is so an end user gets relevant information but not too much info since they might now how many brokers they have * return only runtime exceptions added new error types that need to be sanitized also sanitize deprecated and unsupported exceptions. * dont reqrewite exceptions unless necessary for checked exceptions add docs avoid blanket turning all exceptions into runtime exceptions * address comments, to fix up docs. add more javadocs add support UOE sanitization * use try catch instead and sanitize at public methods * checkstyle fixes * throw noSuchStatement and NoSuchConnection as Avatica is affected by those * address comments. move log error back to druid meta clean up bad formatting and commented code. add missed catch for NoSuchStatementException clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions * alter test to reflect new error message	2021-11-16 13:13:03 -08:00
Gian Merlino	6f6e88e02e	SQL: Add type headers to response formats. (#11914 ) This allows clients to interpret the results of SQL queries without having to guess types.	2021-11-13 11:30:57 +05:30
Clint Wylie	5baa22148e	revert ColumnAnalysis type, add typeSignature and use it for DruidSchema (#11895 ) * revert ColumnAnalysis type, add typeSignature and use it for DruidSchema * review stuffs * maybe null * better maybe null * Update docs/querying/segmentmetadataquery.md * Update docs/querying/segmentmetadataquery.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * fix null right * sad * oops * Update batch_hadoop_queries.json Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-10 18:46:29 -08:00
TSFenwick	cdd1c2876c	catch throwable because calcite is throwing an error not exception (#11892 ) * catch throwable because calcite is throwing an error not exception * add test case	2021-11-10 17:22:04 -08:00
Jihoon Son	13bec7468a	Fix NPE for SQL queries when a query parameter is missing in the mid (#11900 ) * Fix NPE for SQL queries when a query parameter is missing in the mid * checkstyle * Throw SqlPlanningException instead of IAE	2021-11-10 10:02:26 -08:00
Clint Wylie	a8805ab60d	add missing json type for ListFilteredVirtualColumn (#11887 ) * add missing json type for ListFilteredVirtualColumn, and tests to try to avoid this happening again * fixes * ugly, but maybe this * oops * too many mappers	2021-11-09 17:25:12 -08:00
Maytas Monsereenusorn	ddc68c6a81	Support changing dimension schema in Auto Compaction (#11874 ) * add impl * add unit tests * fix checkstyle * add impl * add impl * add impl * add impl * add impl * add impl * fix test * add IT * add IT * fix docs * add test * address comments * fix conflict	2021-11-08 21:17:08 -08:00
Clint Wylie	7237dc837c	complex typed expressions (#11853 ) * complex typed expressions * add built-in hll collector expressions to get coverage on druid-processing, more types, more better * rampage!!! * more javadoc * adjustments * oops * lol * remove unused dependency * contradiction? * more test	2021-11-08 00:33:06 -08:00
Gian Merlino	8971056763	Properly count segment references in tests. (#11870 )	2021-11-05 12:49:10 -07:00
Clint Wylie	9bd2ccbb9b	SqlAggregationModuleTest now extends CalciteTestBase to ensure consistent string encoding (#11861 )	2021-11-01 15:11:40 -07:00
Gian Merlino	8276c031c5	Add druid.sql.approxCountDistinct.function property. (#11181 ) * Add druid.sql.approxCountDistinct.function property. The new property allows admins to configure the implementation for APPROX_COUNT_DISTINCT and COUNT(DISTINCT expr) in approximate mode. The motivation for adding this setting is to enable site admins to switch the default HLL implementation to DataSketches. For example, an admin can set: druid.sql.approxCountDistinct.function = APPROX_COUNT_DISTINCT_DS_HLL * Fixes * Fix tests. * Remove erroneous cannotVectorize. * Remove unused import. * Remove unused test imports.	2021-10-25 12:16:21 -07:00
Kashif Faraz	abac9e39ed	Revert permission changes to Supervisor and Task APIs (#11819 ) * Revert "Require Datasource WRITE authorization for Supervisor and Task access (#11718)" This reverts commit `f2d6100124`. * Revert "Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter (#11680)" This reverts commit `6779c4652d`. * Fix docs for the reverted commits * Fix and restore deleted tests * Fix and restore SystemSchemaTest	2021-10-25 14:50:38 +05:30
Gian Merlino	d4cace385f	SQL: Allow Scans to be used as outer queries. (#11831 ) * SQL: Allow Scans to be used as outer queries. This has been possible in the native query system for a while, but the capability hasn't yet propagated into the SQL layer. One example of where this is useful is a query like: SELECT * FROM (... LIMIT X) WHERE <filter> Because this expands the kinds of subquery structures the SQL layer will consider, it was also necessary to improve the cost calculations. These changes appear in PartialDruidQuery and DruidOuterQueryRel. The ideas are: - Attach per-column penalties to the output signature of each query, instead of to the initial projection that starts a query. This encourages moving projections into subqueries instead of leaving them on outer queries. - Only attach penalties to projections if there are actually expressions happening. So, now, projections that simply reorder or remove fields are free. - Attach a constant penalty to every outer query. This discourages creating them when they are not needed. The changes are generally beneficial to the test cases we have in CalciteQueryTest. Most plans are unchanged, or are changed in purely cosmetic ways. Two have changed for the better: - testUsingSubqueryWithLimit now returns a constant from the subquery, instead of returning every column. - testJoinOuterGroupByAndSubqueryHasLimit returns a minimal set of columns from the innermost subquery; two unnecessary columns are no longer there. * Fix various DS operator conversions. These were all implemented as direct conversions, which isn't appropriate because they do not actually map onto native functions. These are only usable as post-aggregations. * Test case adjustment.	2021-10-23 17:18:43 -07:00
Clint Wylie	187df58e30	better types (#11713 ) * better type system * needle in a haystack * ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support * fixup merge * more test * fixup * intern * fix * oops * oops again * ... * more test coverage * fix error message * adjust interning, more javadocs * oops * more docs more better	2021-10-19 01:47:25 -07:00
Kashif Faraz	f2d6100124	Require Datasource WRITE authorization for Supervisor and Task access (#11718 ) Follow up PR for #11680 Description Supervisor and Task APIs are related to ingestion and must always require Datasource WRITE authorization even if they are purely informative. Changes Check Datasource WRITE in SystemSchema for tables "supervisors" and "tasks" Check Datasource WRITE for APIs /supervisor/history and /supervisor/{id}/history Check Datasource for all Indexing Task APIs	2021-10-08 10:39:48 +05:30
Lucas Capistrant	1930ad1f47	Implement configurable internally generated query context (#11429 ) * Add the ability to add a context to internally generated druid broker queries * fix docs * changes after first CI failure * cleanup after merge with master * change default to empty map and improve unit tests * add doc info and fix checkstyle * refactor DruidSchema#runSegmentMetadataQuery and add a unit test	2021-10-06 09:02:41 -07:00
Maytas Monsereenusorn	8cc58a4368	Add sql query id to response header for failed sql query (#11756 ) * add impl * add impl	2021-09-30 13:43:39 +07:00
Clint Wylie	11017ef00a	support jdbc even if trailing / is missing (#11737 ) * support jdbc even if trailing / is missing * fix tests	2021-09-29 13:59:26 -07:00
Maytas Monsereenusorn	a04b08e45c	Add new config to filter internal Druid-related messages from Query API response (#11711 ) * add impl * add impl * add tests * add unit test * fix checkstyle * address comments * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * address comments * address comments * address comments * fix test * fix test * fix test * fix test * fix test * change config name * change config name * change config name * address comments * address comments * address comments * address comments * address comments * address comments * fix compile * fix compile * change config * add more tests * fix IT	2021-09-29 12:55:49 +07:00
Clint Wylie	5de26cf6d9	add optional system schema authorization (#11720 ) * add optional system schema authorization * remove unused * adjust docs * doc fixes, missing ldap config change for integration tests * style	2021-09-21 13:28:26 -07:00
Clint Wylie	392f0ca1b5	refactor sql authorization to get resource type from schema, resource type to be string (#11692 ) * refactor sql authorization to get resource type from schema, refactor resource type from enum to string * information schema auth filtering adjustments * refactor * minor stuff * Update SqlResourceCollectorShuttle.java	2021-09-17 09:53:25 -07:00
Clint Wylie	5e092ccb9b	add MV_FILTER_ONLY, MV_FILTER_NONE, ListFilteredVirtualColumn (#11650 ) * add MV_FILTER_ONLY SQL function, and list filter virtual column * MV_FILTER_NONE and more tests * formatting * o yeah, forgot can do easy thing * style * hmm why was that there * test filtering on virtual column * style * meh * do it right * good bot	2021-09-16 09:31:53 -07:00
Clint Wylie	3044372fc1	improved JDBC logging (#11676 ) * improve jdbc and router query debug logging * log errors too * no stacktrace * trace those stacks	2021-09-16 01:28:16 -07:00
Jihoon Son	0cbd71ebda	Return forbidden when authorization fails for sql query canceling (#11710 ) Switching http response code for authorization failures for sql query canceling to match to sql query posting.	2021-09-15 16:02:19 +05:30
Gian Merlino	7220d0466b	Fix truncation detectability for SQL array, object formats. (#11685 ) The SQL "array" and "object" formats are intended to return invalid JSON (lacking a ] terminator) if an error occurs midstream. This enables callers to detect truncated responses. But JsonGenerators, by default, close JSON arrays even when not explicitly told to. This patch disables automatic array closing, which fixes the problem with truncated response detection. It also adds tests for truncated responses for all result formats.	2021-09-14 15:59:05 -07:00
Clint Wylie	fe1d8c206a	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
Rohan Garg	60efbb51d0	Add test for IS NOT NULL filter on join column in left join (#11636 )	2021-09-06 12:20:41 +05:30
Jihoon Son	82049bbf0a	Cancel API for sqls (#11643 ) * initial work * reduce lock in sqlLifecycle * Integration test for sql canceling * javadoc, cleanup, more tests * log level to debug * fix test * checkstyle * fix flaky test; address comments * rowTransformer * cancelled state * use lock * explode instead of noop * oops * unused import * less aggressive with state * fix calcite charset * don't emit metrics when you are not authorized	2021-09-05 10:57:45 -07:00
Jihoon Son	7e90d00cc0	Configurable maxStreamLength for doubles sketches (#11574 ) * Configurable maxStreamLength for doubles sketches * fix equals/hashcode and it test failure * fix test * fix it test * benchmark * doc * grouping key * fix comment * dependency check * Update docs/development/extensions-core/datasketches-quantiles.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-31 14:56:37 -07:00
Jihoon Son	2a658acad4	Put sleep in an extension (#11632 ) * Put sleep in an extension * dependency	2021-08-25 01:27:45 -07:00
Jihoon Son	78b4be467e	Add sleep function for testing (#11626 ) * Add sleep function for testing * sql function * javadoc	2021-08-24 14:30:31 +07:00
hqx871	38ebaee0fd	VirtualColumnRegistry reuse virtual column should take account of value type (#11546 ) Co-authored-by: huangqixiang.871 <huangqixiang.871@bytedance.com>	2021-08-19 01:46:27 -07:00
Jihoon Son	177264c649	resultFormat name in camel case (#11585 ) * resultFormat name in camel case * test for letter case	2021-08-14 18:30:21 +08:00
frank chen	e40be0ae28	Add SQL functions to format numbers into human readable format (#10635 ) * add binary_byte_format/decimal_byte_format/decimal_format * clean code * fix doc * fix review comments * add spelling check rules * remove extra param * improve type handling and null handling * remove extra zeros * fix tests and add space between unit suffix and number as most size-format functions do * fix tests * add examples * change function names according to review comments * fix merge Signed-off-by: frank chen <frank.chen021@outlook.com> * no need to configure NullHandling explicitly for tests Signed-off-by: frank chen <frank.chen021@outlook.com> * fix tests in SQL-Compatible mode Signed-off-by: frank chen <frank.chen021@outlook.com> * Resolve review comments * Update SQL test case to check null handling * Fix intellij inspections * Add more examples * Fix example	2021-08-13 10:27:49 -07:00
Clint Wylie	9af7ba9d2a	STRING_AGG SQL aggregator function (#11241 ) * add string_agg * oops * style and fix test * spelling * fixup * review stuffs	2021-08-10 13:47:09 -07:00
Jihoon Son	e9d964d504	Improve concurrency between DruidSchema and BrokerServerView (#11457 ) * Improve concurrency between DruidSchema and BrokerServerView * unused imports and workaround for error prone faiure * count only known segments * add comments	2021-08-06 14:07:13 -07:00
Jihoon Son	8ba7f6a48c	Fix incorrect result of exact topN on an inner join with limit (#11517 )	2021-07-31 15:55:49 -07:00
Rohan Garg	c98e7c3aa3	Fix left join SQL queries with IS NOT NULL filter (#11434 ) This PR fixes the incorrect results for query : SELECT dim1, l1.k FROM foo LEFT JOIN (select k \|\| '' as k from lookup.lookyloo group by 1) l1 ON foo.dim1 = l1.k WHERE l1.k IS NOT NULL (in CalciteQueryTests) In the current code, the WHERE clause gets removed from the top of the left join and is pushed to the table foo leading to incorrect results. The fix for such a situation is done by : Converting such left joins into inner joins (since logically the mentioned left join query is equivalent to an inner join) using Calcite while maintaining that the druid execution layer can execute such inner joins. Preferring converted inner joins over original left joins in our cost model	2021-07-23 20:57:19 +05:30
Jihoon Son	84c957f541	Add more sql tests for groupby queries (#11454 ) * Add more sql tests for simple groupby queries * unused import * fix tests * javadocs * unused import	2021-07-20 21:05:11 -07:00
Abhishek Agarwal	94c1671eaf	Split SegmentLoader into SegmentLoader and SegmentCacheManager (#11466 ) This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager. SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects. SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.	2021-07-21 00:14:19 +05:30
kaijianding	e39ff44481	improve groupBy query granularity translation with 2x query performance improve when issued from sql layer (#11379 ) * improve groupBy query granularity translation when issued from sql layer * fix style * use virtual column to determine timestampResult granularity * dont' apply postaggregators on compute nodes * relocate constants * fix order by correctness issue * fix ut * use more easier understanding code in DefaultLimitSpec * address comment * rollback use virtual column to determine timestampResult granularity * fix style * fix style * address the comment * add more detail document to explain the tradeoff * address the comment * address the comment	2021-07-11 10:22:47 -07:00
Suneet Saldanha	49e8732e4f	Display errors for invalid timezones in TIME_FORMAT (#11423 ) Users sometimes make typos when picking timezones - like `America/Los Angeles` instead of `America/Los_Angeles` instead of defaulting to UTC, this change makes it so that an error is thrown instead notifying the user of their mistake.	2021-07-09 06:07:13 -07:00
Abhishek Agarwal	3481bb0440	Better error message for unsupported double values (#11409 ) A constant expression may evaluate to Double.NEGATIVE_INFINITY/Double.POSITIVE_INFINITY/Double.NAN e.g. log10(0). When using such an expression in native queries, the user will get the corresponding value without any error. In SQL, however, the user will run into NumberFormatException because we convert the double to big-decimal while constructing a literal numeric expression. This probably should be fixed in calcite - see https://issues.apache.org/jira/browse/CALCITE-2067. This PR adds a verbose error message so that users can take corrective action without scratching their heads.	2021-07-08 16:55:17 +05:30
Clint Wylie	17efa6f556	add single input string expression dimension vector selector and better expression planning (#11213 ) * add single input string expression dimension vector selector and better expression planning * better * fixes * oops * rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing * oops * javadocs, renaming * more javadocs * benchmarks * use string expression vector processor with vector size 1 instead of expr.eval * better logging * javadocs, surprising number of the the * more * simplify	2021-07-06 11:20:49 -07:00
Clint Wylie	df9b57aa1a	bitwise aggregators, better null handling options for expression agg (#11280 ) * bitwise aggregators, better nulls for expression agg * correct behavior * rework deserialize, better names * fix json, share mask	2021-06-25 16:51:16 -07:00
Clint Wylie	bfbd7ec432	fix a bugs related to SQL type inference return type nullability (#11327 ) * fix a bunch of type inference nullability bugs * fixes * style * fix test * fix concat	2021-06-15 12:26:59 -07:00
Clint Wylie	50327b8f63	ignore bySegment query context for SQL queries (#11352 ) * ignore bySegment query context for SQL queries * revert unintended change	2021-06-11 13:49:03 -07:00
Clint Wylie	6b272c857f	adjust topn heap algorithm to only use known cardinality path when dictionary is unique (#11186 ) * adjust topn heap algorithm to only use known cardinality path when dictionary is unique * better check and add comment * adjust comment more	2021-06-10 18:32:22 -05:00
dependabot[bot]	167044f715	Bump fastutil from 8.2.3 to 8.5.4 (#11347 ) * Bump fastutil from 8.2.3 to 8.5.4 Bumps [fastutil](https://github.com/vigna/fastutil) from 8.2.3 to 8.5.4. - [Release notes](https://github.com/vigna/fastutil/releases) - [Changelog](https://github.com/vigna/fastutil/blob/master/CHANGES) - [Commits](https://github.com/vigna/fastutil/compare/8.2.3...8.5.4) --- updated-dependencies: - dependency-name: it.unimi.dsi:fastutil dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * update licenses.yaml * update maven dependency list for -core and -extra libraries to pass maven dependency checks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2021-06-10 07:43:18 -07:00
Rohan Garg	6c7177714f	Add test for join on __time column (#11289 )	2021-05-26 22:20:39 -07:00
Maria Sitkovets	259207753d	Fix is null selector returning incorrect value for Long data type (#11170 ) * Fix is null selector returning incorrect value for Long data type * Fix style errors * Refactor getObject method to also cache null column values * Make lastInput variable nullable * Refactor unit test * Use new boolean lastInputIsNull instead of Long for lastInput to avoid boxing * Refactor to remove Long for input variable * Make a separate null caching variable * Cleaner null caching implementation	2021-05-19 20:47:02 -07:00
Clint Wylie	aa62073faa	fix sql planner bug with inner offset causing loop (#11259 ) * fix sql planner bug with inner offset causing loop * move check up	2021-05-15 14:26:41 -07:00
Clint Wylie	3649c608d2	array handling improvements (#11233 ) * fix jdbc array handling, split handling for some array and multi value operator, split and add more tests * formatting	2021-05-13 18:50:32 -07:00
Clint Wylie	f6662b4893	fix count and average SQL aggregators on constant virtual columns (#11208 ) * fix count and average SQL aggregators on constant virtual columns * style * even better, why are we tracking virtual columns in aggregations at all if we have a virtual column registry * oops missed a few * remove unused * this will fix it	2021-05-10 13:41:48 -07:00
Clint Wylie	691d7a1d54	SQL timeseries no longer skip empty buckets with all granularity (#11188 ) * SQL timeseries no longer skip empty buckets with all granularity * add comment, fix tests * the ol switcheroo * revert unintended change * docs and more tests * style * make checkstyle happy * docs fixes and more tests * add docs, tests for array_agg * fixes * oops * doc stuffs * fix compile, match doc style	2021-05-10 10:13:37 -07:00
Gian Merlino	a1f850d707	Fix vectorized cardinality bug on certain string columns. (#11199 ) * Fix vectorized cardinality bug on certain string columns. Fixes a bug introduced in #11182, related to the fact that in some cases, ColumnProcessors.makeVectorProcessor will call "makeObjectProcessor" instead of "makeSingleValueDimensionProcessor" or "makeMultiValueDimensionProcessor". CardinalityVectorProcessorFactory improperly ignored calls to "makeObjectProcessor". In addition to fixing the bug, I added this detail to the javadocs for VectorColumnProcessorFactory, to prevent others from running into the same thing in the future. They do not currently call out this case. * Improve test coverage. * Additional fixes.	2021-05-07 08:37:10 -07:00
Clint Wylie	554f1ffeee	ARRAY_AGG sql aggregator function (#11157 ) * ARRAY_AGG sql aggregator function * add javadoc * spelling * review stuff, return null instead of empty when nil input * review stuff * Update sql.md * use type inference for finalize, refactor some things	2021-05-03 22:17:10 -07:00
Gian Merlino	bef7cc911f	Vectorize the cardinality aggregator. (#11182 ) * Vectorize the cardinality aggregator. Does not include a byRow implementation, so if byRow is true then the aggregator still goes through the non-vectorized path. Testing strategy: - New tests that exercise both styles of "aggregate" for supported types. - Some existing tests have also become active (note the deleted "cannotVectorize" lines). * Adjust whitespace.	2021-05-03 20:27:02 -07:00
Jihoon Son	8215cc3238	Unit test for DefaultOperandTypeChecker (#11152 ) * Less strict operand type check and implicit casting * fix ci * Clean up unnecessary changes * more cleanup * unused import	2021-04-27 18:47:38 -07:00
Jihoon Son	261c1f271f	Keep traitSet of logicalValues (#11138 )	2021-04-27 18:45:23 -07:00
Gian Merlino	202c78c8f3	Enable rewriting certain inner joins as filters. (#11068 ) * Enable rewriting certain inner joins as filters. The main logic for doing the rewrite is in JoinableFactoryWrapper's segmentMapFn method. The requirements are: - It must be an inner equi-join. - The right-hand columns referenced by the condition must not contain any duplicate values. (If they did, the inner join would not be guaranteed to return at most one row for each left-hand-side row.) - No columns from the right-hand side can be used by anything other than the join condition itself. HashJoinSegmentStorageAdapter is also modified to pass through to the base adapter (even allowing vectorization!) in the case where 100% of join clauses could be rewritten as filters. In support of this goal: - Add Query getRequiredColumns() method to help us figure out whether the right-hand side of a join datasource is being used or not. - Add JoinConditionAnalysis getRequiredColumns() method to help us figure out if the right-hand side of a join is being used by later join clauses acting on the same base. - Add Joinable getNonNullColumnValuesIfAllUnique method to enable retrieving the set of values that will form the "in" filter. - Add LookupExtractor canGetKeySet() and keySet() methods to support LookupJoinable in its efforts to implement the new Joinable method. - Add "enableRewriteJoinToFilter" feature flag to JoinFilterRewriteConfig. The default is disabled. * Test improvements. * Test fixes. * Avoid slow size() call. * Remove invalid test. * Fix style. * Fix mistaken default. * Small fixes. * Fix logic error.	2021-04-14 10:49:27 -07:00
chenyuzhi459	b8423a38df	add round test (#11088 ) * add round test * code style * handle null val for round function * handle null val for round function * support null for round * fix compatiblity * fix test * fix test * code style * optimize format	2021-04-13 11:36:32 -07:00
Jihoon Son	25db8787b3	Fix CAST being ignored when aggregating on strings after cast (#11083 ) * Fix CAST being ignored when aggregating on strings after cast * fix checkstyle and dependency * unused import	2021-04-12 22:21:24 -07:00
Clint Wylie	338886fd5f	vector group by support for string expressions (#11010 ) * vector group by support for string expressions * fix test * comments, javadoc	2021-04-08 19:23:39 -07:00
Jihoon Son	b51ede5b49	Add a planner rule to handle empty tables (#11058 ) * Add a planner rule to handle empty tables * adjust comment * type handling * add tests * unused imports and fix test * fix more tests * fix more test * javadoc	2021-04-07 10:04:47 -07:00
Abhishek Agarwal	0df0bff44b	Enable multiple distinct aggregators in same query (#11014 ) * Enable multiple distinct count * Add more tests * fix sql test * docs fix * Address nits	2021-04-07 00:52:19 -07:00
chenyuzhi459	450535073e	fix lookup nullable (#11060 ) * fix lookup nullable * fix lookup unit test * test null case	2021-04-02 21:56:42 -07:00
Lasse Krogh Mammen	782a1d4e6c	Add Calcite Avatica protobuf handler (#10543 )	2021-03-31 12:46:25 -07:00
Jihoon Son	43ea184b74	Add explicit EOF and use assert instead of exception (#11041 )	2021-03-31 09:41:57 -07:00
chenyuzhi459	248af38777	Fix subquery with order by (#11017 ) * fix subquery with order by * fix parameter	2021-03-26 04:43:46 -07:00
Clint Wylie	bacad04aa2	make SqlResource laning test less sensitive to timing (#11032 ) * make laning test less sensitive to timing * style	2021-03-26 03:43:28 -07:00
Jonathan Wei	8296123d89	Add resources used to EXPLAIN PLAN FOR output (#11024 )	2021-03-23 17:21:15 -07:00
Samarth Jain	83fcab1d0f	Improve performance of queries against SYSTEM.SEGMENT table. (#11008 ) Size HashMap and HashSet appropriately. Perf analysis of the queries revealed that over 25% of the query time was spent in resizing HashMap and HashSet collections. Also, prevent the need to examine and authorize all resources when AllowAllAuthorizer is the configured authorizer.	2021-03-17 22:24:02 -07:00
Clint Wylie	4cd4a22f87	expression filter support for vectorized query engines (#10613 ) * expression filter support for vectorized query engines * remove unused codes * more tests * refactor, more tests * suppress * more * more * more * oops, i was wrong * comment * remove decorate, object dimension selector, more javadocs * style	2021-03-16 11:46:50 -07:00
Clint Wylie	58294329b7	fix SQL issue for group by queries with time filter that gets optimized to false (#10968 ) * fix SQL issue for group by queries with time filter that gets optimized to false * short circuit always false in CombineAndSimplifyBounds * adjust * javadocs * add preconditions for and/or filters to ensure they have children * add comments, remove preconditions	2021-03-09 19:41:16 -08:00
Jonathan Wei	9c083783c9	Don't fail on invalid views in InformationSchema (#10960 ) * Don't fail on invalid views in InformationSchema * Fix test	2021-03-09 16:19:59 -08:00
Abhishek Agarwal	c66951a59e	Add flag in SQL to disable left base filter optimization for joins (#10947 ) * Add flag to disable left base filter * code coverage * Draft * Review comments * code coverage * add docs * Add old tests	2021-03-09 13:07:34 -08:00
Abhishek Agarwal	ae620921df	Fix classCastException when inputs to union are join (#10950 ) * Fix union queries * Add tests	2021-03-08 21:20:26 -08:00
Abhishek Agarwal	1a15987432	Supporting filters in the left base table for join datasources (#10697 ) * where filter left first draft * Revert changes in calcite test * Refactor a bit * Fixing the Tests * Changes * Adding tests * Add tests for correlated queries * Add comment * Fix typos	2021-03-04 10:39:21 -08:00
Clint Wylie	f34c6eb3c0	add druid jdbc handler config for minimum number of rows per frame (#10880 ) * add druid jdbc handler config for minimum number of rows per frame * javadocs and docs adjustments * spelling * adjust docs per review with minor tweaks * adjust more	2021-02-23 02:11:04 -08:00
Clint Wylie	cbbef80c7f	add SQL operators for bitwise expressions (#10823 ) * add SQL operators for bitwise expressions * more test * fix spelling * more tests	2021-02-18 20:56:33 -08:00
Jonathan Wei	84341737d5	Add property for binding view manager type (#10895 ) * Add property for binding view manager type * Checkstyle * Fix constructor * Add @Test	2021-02-18 15:57:45 -08:00
Jonathan Wei	8ad68135c8	Filter unauthorized views in InformationSchema (#10874 ) * Filter unauthorized views in InformationSchema * Use fixed name for view schema * Remove unused string	2021-02-16 17:36:45 -08:00
Maytas Monsereenusorn	6541178c21	Support segmentGranularity for auto-compaction (#10843 ) * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * resolve conflict * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * fix tests * fix more tests * fix checkstyle * add unit tests * fix checkstyle * fix checkstyle * fix checkstyle * add unit tests * add integration tests * fix checkstyle * fix checkstyle * fix failing tests * address comments * address comments * fix tests * fix tests * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test	2021-02-12 03:03:20 -08:00
Clint Wylie	fe30f4b414	refactor sql lifecycle, druid planner, views, and view permissions (#10812 ) * before i leaped i should've seen, the view from halfway down * fixes * fixes, more test * rename * fix style * further refactoring * review stuffs * rename * more javadoc and comments	2021-02-05 12:56:55 -08:00
Clint Wylie	cd6af93274	add leftover tests from #10743 (#10766 )	2021-01-22 09:20:48 -08:00
zhangyue19921010	8c6153d511	[Bug Fix] Broker will not wait for its SQL metadata view to fully initialize before starting up, even though set awaitInitializationOnStart true (#10779 ) * enhance the logic of Start up DruidSchema immediately if there are no segments. * add UT to test DruidSchema init Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-01-22 08:48:21 -08:00
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
Jihoon Son	149306c9db	Tidy up HTTP status codes for query errors (#10746 ) * Tidy up query error codes * fix tests * Restore query exception type in JsonParserIterator * address review comments; add a comment explaining the ugly switch * fix test	2021-01-13 17:20:00 -08:00
Clint Wylie	8c3c9b4060	fix limited queries with subtotals (#10743 ) * i put my thing down, flip it and reverse it * oops	2021-01-13 12:55:24 -08:00
Franklyn Dsouza	045b29fa95	Correctly handle null values in time column results (#10642 ) * handle null case * test this case * test sql resource * fix style	2021-01-04 22:22:46 -08:00
Clint Wylie	da0eabaa01	integration test for coordinator and overlord leadership client (#10680 ) * integration test for coordinator and overlord leadership, added sys.servers is_leader column * docs * remove not needed * fix comments * fix compile heh * oof * revert unintended * fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster * fixes * style * dang * heh * scripts are hard * fix spelling * fix thing that must not matter since was already wrong ip, log when test fails * needs more heap * fix merge * less aggro	2020-12-17 22:50:12 -08:00
Abhishek Agarwal	796c25532e	Fix post-aggregator computation when used with subtotals (#10653 ) * Fix post-aggregator computation * remove commented code * Fix numeric null handling * Add test when subquery returns null long	2020-12-17 20:10:26 -08:00
Clint Wylie	64f97e7003	fix DruidSchema incorrectly listing tables with no segments (#10660 ) * fix race condition with DruidSchema tables and dataSourcesNeedingRebuild * rework to see if it passes analysis * more better * maybe this * re-arrange and comments	2020-12-11 14:14:00 -08:00
Abhishek Agarwal	26d74b3580	Add grouping_id function (#10518 ) * First draft of grouping_id function * Add more tests and documentation * Add calcite tests * Fix travis failures * bit of a change * Add documentation * Fix typos * typo fix	2020-12-07 11:46:29 -08:00
Gian Merlino	b7641f644c	Two fixes related to encoding of % symbols. (#10645 ) * Two fixes related to encoding of % symbols. 1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments() returns already-decoded strings. Applying StringUtils.urlDecode on top of that causes erroneous behavior with '%' characters. 2) Update various ThreadFactoryBuilder name formats to escape '%' characters. This fixes situations where substrings starting with '%' are erroneously treated as format specifiers. ITs are updated to include a '%' in extra.datasource.name.suffix. * Avoid String.replace. * Work around surefire bug. * Fix xml encoding. * Another try at the proper encoding. * Give up on the emojis. * Less ambitious testing. * Fix an additional problem. * Adjust encodeForFormat to return null if the input is null.	2020-12-06 22:35:11 -08:00
frank chen	d7d2c804ad	Add zero period support to TIMESTAMPADD (#10550 ) * Allow zero period for TIMESTAMPADD * update test cases * add empty zone test case * add unit test cases for TimestampShiftMacro	2020-11-18 18:26:53 -08:00
Atul Mohan	6ccddedb7a	Improved exception handling in case of query timeouts (#10464 ) * Separate timeout exceptions * Add more tests Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-11-03 09:00:33 -06:00
Himanshu	4de4d4d111	remove ServerDiscoverySelector from DruidLeaderClient (#10537 )	2020-10-28 10:55:11 -07:00
Clint Wylie	d0821de854	support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions (#10499 ) * support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions * inspector * changes * more test * clean	2020-10-26 19:55:24 -07:00
Maytas Monsereenusorn	3538abd5d0	Make sure all fields in sys.segments are JSON-serialized (#10481 ) * fix JSON format * Change all columns in sys segments to be JSON * Change all columns in sys segments to be JSON * add tests * fix failing tests * fix failing tests	2020-10-14 13:49:46 -07:00
Clint Wylie	207ef310f2	vectorized group by support for nullable numeric columns (#10441 ) * vectorized group by support for numeric null columns * revert unintended change * adjust * review stuffs	2020-10-05 21:53:53 -07:00
Jonathan Wei	65c0d64676	Update version to 0.21.0-SNAPSHOT (#10450 ) * [maven-release-plugin] prepare release druid-0.21.0 * [maven-release-plugin] prepare for next development iteration * Update web-console versions	2020-10-03 16:08:34 -07:00
Clint Wylie	9ec5c08e2a	fix array types from escaping into wider query engine (#10460 ) * fix array types from escaping into wider query engine * oops * adjust * fix lgtm	2020-10-03 15:30:34 -07:00
Clint Wylie	753bce324b	vectorize constant expressions with optimized selectors (#10440 )	2020-09-29 13:19:06 -07:00
Clint Wylie	1d6cb624f4	add vectorizeVirtualColumns query context parameter (#10432 ) * add vectorizeVirtualColumns query context parameter * oops * spelling * default to false, more docs * fix test * fix spelling	2020-09-28 18:48:34 -07:00
Clint Wylie	3d700a5e31	vectorize remaining math expressions (#10429 ) * vectorize remaining math expressions * fixes * remove cannotVectorize() where no longer true * disable vectorized groupby for numeric columns with nulls * fixes	2020-09-26 23:30:14 -07:00
Maytas Monsereenusorn	72f1b55f56	Add last_compaction_state to sys.segments table (#10413 ) * Add is_compacted to sys.segments table * change is_compacted to last_compaction_state * fix tests * fix tests * address comments	2020-09-23 15:29:36 -07:00
Clint Wylie	19c4b16640	vectorized expressions and expression virtual columns (#10401 ) * vectorized expression virtual columns * cleanup * fixes * preserve float if explicitly specified * oops * null handling fixes, more tests * what is an expression planner? * better names * remove unused method, add pi * move vector processor builders into static methods * reduce boilerplate * oops * more naming adjustments * changes * nullable * missing hex * more	2020-09-23 13:56:38 -07:00
Suneet Saldanha	f71ba6f2c2	Vectorized ANY aggregators (#10338 ) * WIP vectorized ANY aggregators * tests * fix aggs * cleanup * code review + tests * docs * use NilVectorSelector when needed * fix spellcheck * dont instantiate vectors * cleanup	2020-09-14 19:44:58 -07:00
Clint Wylie	184b202411	add computed Expr output types (#10370 ) * push down ValueType to ExprType conversion, tidy up * determine expr output type for given input types * revert unintended name change * add nullable * tidy up * fixup * more better * fix signatures * naming things is hard * fix inspection * javadoc * make default implementation of Expr.getOutputType that returns null * rename method * more test * add output for contains expr macro, split operation and function auto conversion	2020-09-14 18:18:56 -07:00
Abhishek Agarwal	f5e2645bbb	Support SearchQueryDimFilter in sql via new methods (#10350 ) * Support SearchQueryDimFilter in sql via new methods * Contains is a reserved word * revert unnecessary change * Fix toDruidExpression method * rename methods * java docs * Add native functions * revert change in dockerfile * remove changes from dockerfile * More tests * travis fix * Handle null values better	2020-09-14 09:57:54 -07:00
Joy Kent	e5f0da30ae	Fix stringFirst/stringLast rollup during ingestion (#10332 ) * Add IndexMergerRollupTest This changelist adds a test to merge indexes with StringFirst/StringLast aggregator. * Fix StringFirstAggregateCombiner/StringLastAggregateCombiner The segment-level type for stringFirst/stringLast is SerializablePairLongString, not String. This changelist fixes it. * Fix EarliestLatestAnySqlAggregator to handle COMPLEX type This changelist allows EarliestLatestAnySqlAggregator to accept COMPLEX type as an operand. For its return type, we set it to VARCHAR, since COMPLEX column is only generated by stringFirst/stringLast during ingestion rollup. * Return value with smaller timestamp in StringFirstAggregatorFactory.combine function * Add integration tests for stringFirst/stringLast during ingestion * Use one EarliestLatestReturnTypeInference instance Co-authored-by: Joy Kent <joy@automonic.ai>	2020-09-08 17:36:04 -07:00
Gian Merlino	8ab1979304	Remove implied profanity from error messages. (#10270 ) i.e. WTF, WTH.	2020-08-28 11:38:50 -07:00
Gian Merlino	5cd7610fb6	SQL support for union datasources. (#10324 ) * SQL support for union datasources. Exposed via the "UNION ALL" operator. This means that there are now two different implementations of UNION ALL: one at the top level of a query that works by concatenating subquery results, and one at the table level that works by creating a UnionDataSource. The SQL documentation is updated to discuss these two use cases and how they behave. Future work could unify these by building support for a native datasource that represents the union of multiple subqueries. (Today, UnionDataSource can only represent the union of tables, not subqueries.) * Fixes. * Error message for sanity check. * Additional test fixes. * Add some error messages.	2020-08-28 07:57:06 -07:00
Clint Wylie	ab60661008	refactor internal type system (#9638 ) * better type tracking: add typed postaggs, finalized types for agg factories * more javadoc * adjustments * transition to getTypeName to be used exclusively for complex types * remove unused fn * adjust * more better * rename getTypeName to getComplexTypeName * setup expression post agg for type inference existing * more javadocs * fixup * oops * more test * more test * more comments/javadoc * nulls * explicitly handle only numeric and complex aggregators for incremental index * checkstyle * more tests * adjust * more tests to showcase difference in behavior * timeseries longsum array	2020-08-26 10:53:44 -07:00
Gian Merlino	0910d22f48	Add SQL "OFFSET" clause. (#10279 ) * Add SQL "OFFSET" clause. Under the hood, this uses the new offset features from #10233 (Scan) and #10235 (GroupBy). Since Timeseries and TopN queries do not currently have an offset feature, SQL planning will switch from one of those to Scan or GroupBy if users add an OFFSET. Includes a refactoring to harmonize offset and limit planning using an OffsetLimit wrapper class. This is useful because it ensures that the various places that need to deal with offset and limit collapsing all behave the same way, using its "andThen" method. * Fix test and add another test.	2020-08-21 14:11:54 -07:00
Clint Wylie	7620b0c54e	Segment backed broadcast join IndexedTable (#10224 ) * Segment backed broadcast join IndexedTable * fix comments * fix tests * sharing is caring * fix test * i hope this doesnt fix it * filter by schema to maybe fix test * changes * close join stuffs so it does not leak, allow table to directly make selector factory * oops * update comment * review stuffs * better check	2020-08-20 14:12:39 -07:00
Himanshu	12ae84165e	remove DruidLeaderClient.goAsync(..) that does not follow redirect. Replace its usage by DruidLeaderClient.go(..) with InputStreamFullResponseHandler (#9717 ) * remove DruidLeaderClient.goAsync(..) that does not follow redirect. Replace its usage by DruidLeaadereClient.go(..) with InputStreamFullResponseHandler * remove ByteArrayResponseHolder dependency from JsonParserIterator * add UT to cover lines in InputStreamFullResponseHandler * refactor SystemSchema to reduce branches * further reduce branches * Revert "add UT to cover lines in InputStreamFullResponseHandler" This reverts commit `330aba3dd9`. * UTs for InputStreamFullResponseHandler * remove unused imports	2020-08-14 10:51:18 -07:00
Gian Merlino	6cca7242de	Add "offset" parameter to the Scan query. (#10233 ) * Add "offset" parameter to the Scan query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Fix constructor call. * Fix up JSONs. * Fix call to ScanQuery. * Doc update. * Fix javadocs. * Spotbugs, LGTM suppressions. * Javadocs. * Fix suppression. * Stabilize Scan query result order, add tests. * Update LGTM comment. * Fixup. * Test different batch sizes too. * Nicer tests. * Fix comment.	2020-08-13 14:56:24 -07:00
Jihoon Son	a61263b4a9	Allow forceLimitPushDown in SQL (#10253 ) * Allow forceLimitPushDown in SQL * fix test * fix test * review comments * fix test	2020-08-13 13:30:41 -07:00
Abhishek Radhakrishnan	dc16abae34	Vectorization support for long, double, float min & max aggregators. (#10260 ) * LongMaxVectorAggregator support and test case. * DoubleMinVectorAggregator and test cases. * DoubleMaxVectorAggregator and unit test. * FloatMinVectorAggregator and FloatMaxVectorAggregator. * Documentation update to include the other vector aggregators. * Bug fix. * checkstyle formatting fixes. * CalciteQueryTest cases update. * Separate test classes for FloatMaxAggregation and FloatMniAggregation. * remove the cannotVectorize for float max/min aggregator in test. * Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.	2020-08-10 15:18:55 -07:00
Gian Merlino	b6aaf59e8c	Add "offset" parameter to GroupBy query. (#10235 ) * Add "offset" parameter to GroupBy query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Stabilize GroupBy sorts. * Fix inspections. * Fix suppression. * Fixups. * Move TopNSequence to druid-core. * Addl comments. * NumberedElement equals verification. * Changes from review.	2020-08-05 15:39:58 -07:00
Abhishek Radhakrishnan	34a4113752	Add vectorization support for the longMin aggregator. (#10211 ) * Fix minor formatting in docs. * Add Nullhandling initialization for test to run from IDE. * Vectorize longMin aggregator. - A new vectorized class for the vectorized long min aggregator. - Changes to AggregatorFactory to support vectorize functionality. - Few changes to schema evolution test to add LongMinAggregatorFactory. * Add longSum to the supported vectorized aggregator implementations. * Add MIN() long min to calcite query test that can vectorize. * Add simple long aggregations test. * Fixup formatting per checkstyle guide. * fixup and add more tests for long min aggregator. * Override test for groupBy since timestamps are handled differently. * Null compatibility check in test. * Review comment: Add a test case to LongMinAggregationTest.	2020-08-01 15:32:09 -07:00
Maytas Monsereenusorn	574b062f1f	Cluster wide default query context setting (#10208 ) * Cluster wide default query context setting * Cluster wide default query context setting * Cluster wide default query context setting * add docs * fix docs * update props * fix checkstyle * fix checkstyle * fix checkstyle * update docs * address comments * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix NPE	2020-07-29 15:19:18 -07:00
Jihoon Son	63c1746fe4	Fix timeseries query constructor when postAggregator has an expression reading timestamp result column (#10198 ) * Fix timeseries query constructor when postAggregator has an expression reading timestamp result column * fix npe * Fix postAgg referencing timestampResultField and add a test for it * fix test * doc * revert doc	2020-07-27 10:54:44 -07:00

... 2 3 4 5 6 ...

721 Commits