druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	f8b1f2f7f3	fix issue when distinct grouping dimensions are optimized into the same virtual column expression (#9429 ) * fix issue when distinct grouping dimensions are optimized into the same virtual column expression * fix tests * more better * fixes	2020-03-09 17:48:29 -07:00
Jonathan Wei	0136dba95d	Add option to control join filter rewrites (#9472 ) * Add option to control join filter rewrites * Fix inspections	2020-03-09 17:36:07 -07:00
Himanshu	072bbe210f	remove ServerDiscoverySelector from DruidLeaderClient (#9481 )	2020-03-09 12:13:59 -07:00
Gian Merlino	c9faf3e148	Add SQL GROUPING SETS support. (#9122 ) * Add SQL GROUPING SETS support. Built on top of the subtotalsSpec feature in the groupBy query. This also involves two changes to subtotalsSpec: - Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to each grouping set. This is more in line with SQL standard behavior. I think it is okay to make this change, since the old behavior was not documented, so users should hopefully not be depending on it. - Fix a bug where virtual columns were included in the subtotal queries, but they should not have been. Also fixes two bugs in query equality checking: - BaseQuery: Use getDuration() instead of "duration" in equals and hashCode, since the latter is lazily initialized and might be null in one query but not the other. - GroupByQuery: Include subtotalsSpec in equals and hashCode. * Fix bugs. * Fix tests. * PR updates. * Grouping class hygiene.	2020-02-26 08:52:39 -08:00
Clint Wylie	6d8dd5ec10	string -> expression -> string -> expression (#9367 ) * add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays * oops, macros are expressions too * style * spotbugs * qualified type arrays * review stuffs * simplify grammar * more permissive array parsing * reuse expr joiner * fix it	2020-02-21 15:43:02 -08:00
Srinivas Reddy	05258dca37	Improved the readability and fixed few java warnings (#9163 ) * Improved the readability and fixed few java warnings * Fix the checkstyle Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2020-02-22 07:30:11 +09:00
Clint Wylie	b408a6d774	sql support for dynamic parameters (#6974 ) * sql support for dynamic parameters * fixup * javadocs * fixup from merge * formatting * fixes * fix it * doc fix * remove druid fallback self-join parameterized test * unused imports * ignore test for now * fix imports * fixup * fix merge * merge fixup * fix test that cannot vectorize * fixup and more better * dependency thingo * fix docs * tweaks * fix docs * spelling * unused imports after merge * review stuffs * add comment * add ignore text * review stuffs	2020-02-19 13:09:20 -08:00
Lucas Capistrant	5befd40638	Issue 4909 popped up again. I applied PR 5451 liberally to all new Calcite test classes introduced in PR 9279 to fix (#9324 )	2020-02-16 22:29:43 -08:00
Clint Wylie	b1be88d79c	fix Expressions.toQueryGranularity to be more correct, improve javadocs of Expr.getIdentifierIfIdentifier and Expr.getBindingIfIdentifier (#9363 )	2020-02-16 08:36:40 -08:00
Suneet Saldanha	b1f38131af	Fix timestamp extract fn to match postgreSQL (#9337 ) * Fix timestamp extract fn to match postgres Update the timestamp extract function so that it matches the PostgreSQL docs. Examples from the PostgreSQL docs were added as tests for DECADE, CENTURY and MILLENIUM extraction. There were bugs in CENTURY and MILLENIUM that were spotted because of intelliJ inspections - 'Integer division in floating point context' * Update CalciteQueryTest * remove useless round * mark integer division as an error	2020-02-12 15:39:19 -08:00
Maytas Monsereenusorn	c30579e47b	ANY Aggregator should not skip null values implementation (#9317 ) * ANY Aggregator should not skip null values implementation * add tests * add more tests * Update documentation * add more tests * address review comments * optimize StringAnyBufferAggregator * fix failing tests * address pr comments	2020-02-12 14:01:41 -08:00
Manish Gill	d268ff7297	Use ExecutorService instead of ScheduledExecutorService where necessary (#9325 ) * Use ExecutorService instead of ScheduledExecutorService where necessary - #9286 * Added inspection rule to prohibit ScheduledExecutorService assignment to ExecutorService	2020-02-11 19:05:48 -08:00
Jonathan Wei	b2c00b3a79	Add query context option to disable join filter push down (#9335 )	2020-02-11 15:31:34 -08:00
Jonathan Wei	57765a499b	Allow overriding default JoinableFactory in SpecificSegmentsQuerySegmentWalker (#9330 )	2020-02-06 18:35:26 -08:00
Gian Merlino	475b90c3a6	Remove EasyMock dependency from CalciteTests. (#9310 ) * Remove EasyMock dependency from CalciteTests. Useful because CalciteTests is used by other modules (e.g. druid-benchmarks) and we don't want them to have to pull in EasyMock. * CalciteTests no longer needs curator-x-discovery either.	2020-02-04 22:10:17 -08:00
Aditya	868fdeb384	GREATEST/LEAST post-aggregators in SQL (#8719 ) * implement shell for greatest sql aggregator with hardcoded long values * implement functional long greatest aggregator for direct access columns * implement greatest & least sql aggregators for long & double types using abstract base class * add javadocs, unit tests & handling for floats for greatest/least postaggregations * minor checkstyle fix * improve naming for the test cases * make inner class static * remove blank lines to retest travis build * change trivial text to rerun travis build * implement suggested updates for greatest/least sql aggs & fix checkstyle issues * fix stale comments in greatest/least sql aggs abstract base * Update sql.md * improve sql function definitions for greatest/least sql aggs * add more tests for greatest/least sql aggs * add tests to cover invalid greatest/least sql expressions * rename & reorder greatest least sql tests	2020-02-04 17:08:53 -08:00
Suneet Saldanha	33a97dfaae	Guicify druid sql module (#9279 ) * Guicify druid sql module Break up the SQLModule in to smaller modules and provide a binding that modules can use to register schemas with druid sql. * fix some tests * address code review * tests compile * Working tests * Add all the tests * fix up licenses and dependencies * add calcite dependency to druid-benchmarks * tests pass * rename the schemas	2020-02-04 11:33:48 -08:00
Gian Merlino	b411443d22	SQL join support for lookups. (#9294 ) * SQL join support for lookups. 1) Add LookupSchema to SQL, so lookups show up in the catalog. 2) Add join-related rels and rules to SQL, allowing joins to be planned into native Druid queries. * Add two missing LookupSchema calls in tests. * Fix tests. * Fix typo.	2020-01-31 23:51:16 -08:00
Gian Merlino	204ba9966f	Add LookupJoinableFactory. (#9281 ) * Add LookupJoinableFactory. Enables joins where the right-hand side is a lookup. Includes an integration test. Also, includes changes to LookupExtractorFactoryContainerProvider: 1) Add "getAllLookupNames", which will be needed to eventually connect lookups to Druid's SQL catalog. 2) Convert "get" from nullable to Optional return. 3) Swap out most usages of LookupReferencesManager in favor of the simpler LookupExtractorFactoryContainerProvider interface. * Fixes for tests. * Fix another test. * Java 11 message fix. * Fixups. * Fixup benchmark class.	2020-01-30 14:46:21 -08:00
Suneet Saldanha	303b02eba1	intelliJ inspections cleanup (#9260 ) * intelliJ inspections cleanup - remove redundant escapes - performance warnings - access static member via instance reference - static method declared final - inner class may be static Most of these changes are aesthetic, however, they will allow inspections to be enabled as part of CI checks going forward The valuable changes in this delta are: - using StringBuilder instead of string addition in a loop indexing-hadoop/.../Utils.java processing/.../ByteBufferMinMaxOffsetHeap.java - Use class variables instead of static variables for parameterized test processing/src/.../ScanQueryLimitRowIteratorTest.java * Add intelliJ inspection warnings as errors to druid profile * one more static inner class	2020-01-29 11:50:52 -08:00
Clint Wylie	36c5efe2ab	fix some issues with filters on numeric columns with nulls (#9251 ) * fix issue with long column predicate filters and nulls * dang * uncomment a thing * styles * oops * allcaps * review stuff	2020-01-27 18:01:01 -08:00
Roman Leventov	b9186f8f9f	Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager (#7306 ) * Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error * Fix brace * Import order * Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill * Fix tests * Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY * More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters * Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig * More variable and method renames * Rename MetadataSegments to SegmentsMetadata * Javadoc update * Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs * Update Javadoc of VersionedIntervalTimeline.iterateAllObjects() * Reorder imports * Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers * Complete merge * Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests * Remove MetadataSegmentManager * Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments * Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder * Fix inspections * Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest * Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods * Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator * Unused import * Optimize imports * Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata() * Unused import * Update terminology in datasource-view.tsx * Fix label in datasource-view.spec.tsx.snap * Fix lint errors in datasource-view.tsx * Doc improvements * Another attempt to please TSLint * Another attempt to please TSLint * Style fixes * Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge) * Try to fix docs build issue * Javadoc and spelling fixes * Rename SegmentsMetadata to SegmentsMetadataManager, address other comments * Address more comments	2020-01-27 11:24:29 -08:00
Gian Merlino	f0f68570ec	Use DataSourceAnalysis throughout the query stack. (#9239 ) Builds on #9235, using the datasource analysis functionality to replace various ad-hoc approaches. The most interesting changes are in ClientQuerySegmentWalker (brokers), ServerManager (historicals), and SinkQuerySegmentWalker (indexing tasks). Other changes related to improving how we analyze queries: 1) Changes TimelineServerView to return an Optional timeline, which I thought made the analysis changes cleaner to implement. 2) Added QueryToolChest#canPerformSubquery, which is now used by query entry points to determine whether it is safe to pass a subquery dataSource to the query toolchest. Fixes an issue introduced in #5471 where subqueries under non-groupBy-typed queries were silently ignored, since neither the query entry point nor the toolchest did anything special with them. 3) Removes the QueryPlus.withQuerySegmentSpec method, which was mostly being used in error-prone ways (ignoring any potential subqueries, and not verifying that the underlying data source is actually a table). Replaces with a new function, Queries.withSpecificSegments, that includes sanity checks.	2020-01-23 14:07:14 -08:00
Gian Merlino	d886463253	Add join-related DataSource types, and analysis functionality. (#9235 ) * Add join-related DataSource types, and analysis functionality. Builds on #9111 and implements the datasource analysis mentioned in #8728. Still can't handle join datasources, but we're a step closer. Join-related DataSource types: 1) Add "join", "lookup", and "inline" datasources. 2) Add "getChildren" and "withChildren" methods to DataSource, which will be used in the future for query rewriting (e.g. inlining of subqueries). DataSource analysis functionality: 1) Add DataSourceAnalysis class, which breaks down datasources into three components: outer queries, a base datasource (left-most of the highest level left-leaning join tree), and other joined-in leaf datasources (the right-hand branches of the left-leaning join tree). 2) Add "isConcrete", "isGlobal", and "isCacheable" methods to DataSource in order to support analysis. Other notes: 1) Renamed DataSource#getNames to DataSource#getTableNames, which I think is clearer. Also, made it a Set, so implementations don't need to worry about duplicates. 2) The addition of "isCacheable" should work around #8713, since UnionDataSource now returns false for cacheability. * Remove javadoc comment. * Updates reflecting code review. * Add comments. * Add more comments.	2020-01-22 14:54:47 -08:00
Clint Wylie	8011211a0c	first/last aggregators and nulls (#9161 ) * null handling for numeric first/last aggregators, refactor to not extend nullable numeric agg since they are complex typed aggs * initially null or not based on config * review stuff, make string first/last consistent with null handling of numeric columns, more tests * docs * handle nil selectors, revert to primitive first/last types so groupby v1 works...	2020-01-20 11:51:54 -08:00
Gian Merlino	d21054f7c5	Remove the deprecated interval-chunking stuff. (#9216 ) * Remove the deprecated interval-chunking stuff. See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details. * Remove unused import. * Remove chunkInterval too.	2020-01-19 17:14:23 -08:00
Clint Wylie	f0dddaa51a	fix topn aggregation on numeric columns with null values (#9183 ) * fix topn issue with aggregating on numeric columns with null values * adjustments * rename * add more tests * fix comments * more javadocs * computeIfAbsent	2020-01-17 18:12:24 -08:00
Maytas Monsereenusorn	68ed2a2c8f	Fix LATEST / EARLIEST Buffer Aggregator does not work on String column (#9197 ) * fix buff limit bug * add tests * add test * add tests * fix checkstyle	2020-01-16 21:02:37 -08:00
Gian Merlino	bd49ec03bc	Move result-to-array logic from SQL layer into QueryToolChests. (#9130 ) * Move result-to-array logic from SQL layer into QueryToolChests. * Checkstyle adjustment. * Fix typo.	2020-01-16 15:42:10 -08:00
Maytas Monsereenusorn	42359c93dd	Implement ANY aggregator (#9187 ) * Implement ANY aggregator * Add copyright headers * Add unit tests * fix BufferAggregator * Fix bug in BufferAggregator * hook up the SQL command * add check for buffer aggregator * Address comment * address comments * add docs * Address comments * add more tests for numeric columns that have null values when run in sql compatible null mode * fix checkstyle errors * fix failing tests * fix failing tests	2020-01-16 14:40:32 -08:00
Gian Merlino	66657012bf	Replace CaseFilteredAggregatorRule with Calcite equivalent. (#9113 ) AggregateCaseToFilterRule was added to Calcite in https://issues.apache.org/jira/browse/CALCITE-3144, and was originally copied from Druid's CaseFilteredAggregatorRule. So there isn't a good reason to keep using our version.	2020-01-04 19:11:18 -08:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Jonathan Wei	4e8368a5d9	Set version to 0.18.0-SNAPSHOT (#9109 )	2020-01-02 17:55:10 -05:00
Clint Wylie	8ccce9857a	fix vectorized query engine numeric filter matchers against null values (#9063 ) * fix druid-sql issue with filtering numeric columns by null values * fix vector numeric column matchers to check null vector for null matches	2019-12-20 13:15:48 -08:00
Clint Wylie	84ef8b819e	fix druid-sql issue with filtering numeric columns by null values (#9061 ) * fix druid-sql issue with filtering numeric columns by null values * fix tests * fix tests for reals	2019-12-18 13:30:34 -08:00
Benedict Jin	24be558347	Fix NPE for subquery with limit (#8775 ) * Fix NPE for subquery with limit * Mark it as unplannable by returning null * Migrate testcases from SqlResourceTest to CalciteQueryTest * Throw CannotBuildQueryException * Fix typo * Patch comments	2019-12-17 10:21:12 -08:00
Clint Wylie	bc16ff5e7c	sql auto limit wrapping fix (#9043 ) * sql auto limit wrapping fix * fix tests and style * remove setImportance	2019-12-16 01:38:24 -08:00
Jonathan Wei	8af41d7cd0	Update version to 0.18.0-incubating-SNAPSHOT (#9009 )	2019-12-11 14:04:03 -08:00
Clint Wylie	4327892b84	modify multi-value expression transformation behavior to not treat re-use of the same input as a candidate for cartesian mapping (#8957 )	2019-12-09 20:38:15 -08:00
Roman Leventov	1c62987783	Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702 ) * Add SelfDiscoveryResource * Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself). * Extended docs * Fix brace * Remove redundant throws in Lifecycle.Handler.stop() * Import order * Remove unresolvable link * Address comments * tmp * tmp * Rollback docker changes * Remove extra .sh files * Move filter * Fix SecurityResourceFilterTest	2019-12-08 18:47:58 +03:00
Clint Wylie	d0a6fe7f12	fix bug with sqlOuterLimit, use sqlOuterLimit in web console (#8919 ) * fix bug with sqlOuterLimit, use sqlOuterLimit instead of wrapping sql query for web console * fixes, refactors, tests * meh * better name * fix comment location * fix copy and paste	2019-12-03 18:36:28 -08:00
jon-wei	dfbc066163	Revert "[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1" This reverts commit `a0f21d9b07`.	2019-11-27 23:22:43 -08:00
jon-wei	0402ff85b8	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit `8ffa71e7e6`.	2019-11-27 23:22:32 -08:00
jon-wei	8ffa71e7e6	[maven-release-plugin] prepare for next development iteration	2019-11-27 23:18:48 -08:00
jon-wei	a0f21d9b07	[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1	2019-11-27 23:18:37 -08:00
Jonathan Wei	dc6178d1f2	Upgrade Calcite to 1.21 (#8566 ) * Upgrade Calcite to 1.21 * Checkstyle, test fix' * Exclude calcite yaml deps, update license.yaml * Add method for exception chain handling * Checkstyle * PR comments, Add outer limit context flag * Revert project settings change * Update subquery test comment * Checkstyle fix * Fix test in sql compat mode * Fix test * Fix dependency analysis * Address PR comments * Checkstyle * Adjust testSelectStarFromSelectSingleColumnWithLimitDescending	2019-11-20 21:22:55 -08:00
Clint Wylie	3fcaa1a61b	fix sql compatible null handling config work with runtime.properties (#8876 ) * fix sql compatible null handling config work with runtime.properties * fix npe * fix tests * add friendly error * comment, and friendlier still * fix compile * fix from merges	2019-11-20 03:55:29 -08:00
Gian Merlino	c44452f0c1	Tidy up lifecycle, query, and ingestion logging. (#8889 ) * Tidy up lifecycle, query, and ingestion logging. The goal of this patch is to improve the clarity and usefulness of Druid's logging for cluster operators. For more information, see https://twitter.com/cowtowncoder/status/1195469299814555648. Concretely, this patch does the following: - Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the goal of reducing redundancy and improving clarity by avoiding showing rarely-useful log messages. This includes most "starting" and "stopping" messages, and most messages related to individual columns. - Adds new log4j2 templates that show operators how to enabled DEBUG logging for certain important packages. - Eliminate stack traces for query errors, unless log level is DEBUG or more. This is useful because query errors often indicate user error rather than system error, but dumping stack trace often gave operators the impression that there was a system failure. - Adds task id to Appenderator, AppenderatorDriver thread names. In the default log4j2 configuration, this will put them in log lines as well. It's very useful if a user is using the Indexer, where multiple tasks run in the same JVM. - More consistent terminology when it comes to "sequences" (sets of segments that are handed-off together by Kafka ingestion) and "offsets" (cursors in partitions). These terms had been confused in some log messages due to the fact that Kinesis calls offsets "sequence numbers". - Replaces some ugly toString calls with either the JSONification or something more operator-accessible (like a URL or segment identifier, instead of JSON object representing the same). * Adjustments. * Adjust integration test.	2019-11-19 13:57:58 -08:00
Clint Wylie	cc54b2a9df	support for array expressions in TransformSpec with ExpressionTransform (#8744 ) * transformSpec + array expressions changes: * added array expression support to transformSpec * removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning * hijacked index task test to test changes * remove docs about being unsupported * re-arrange test assert * unused imports * imports * fix tests * preserve types * suppress warning, fixes, add test * formatting * cleanup * better list to array type conversion and tests * fix oops	2019-11-13 11:04:37 -08:00
Gian Merlino	0e8c3f74d0	SQL: EARLIEST, LATEST aggregators. (#8815 ) * SQL: EARLIEST, LATEST aggregators. I chose these names instead of FIRST, LAST because those are already reserved functions in Calcite that mean something different. I think these are also better names anyway. * Finalify. * SQL updates. * Adjust aggregator calls. * Validations, test updates. * Review docs.	2019-11-08 16:29:25 -08:00

1 2 3 4 5 ...

327 Commits