druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	e7c6deac76	optimize single input column multi-value expressions (#8047 ) * optimize single input column multi-value expressions * javadocs * merge fixup * vectorization fixup * more fixes * more docs * more links * empty * javadocs are hard * suppress javadoc refs issue * fix it	2019-08-02 13:21:25 -07:00
Chi Cao Minh	4bd3bad8ba	Add IPv4 SQL functions (#8223 ) * Add IPv4 SQL functions New SQL functions for filtering IPv4 addresses: - IPV4_MATCH: Check if IP address belongs to a subnet - IPV4_PARSE: Convert string IP address to integer - IPV4_STRINGIFY: Convert integer IP address to string These are the SQL analogs of the druid expressions with the same name. Filtering is more efficient when operating on IP addresses as integers instead of strings. * Refactor operator conversions into named constants	2019-08-01 21:29:58 -07:00
Surekha	f0ecdfee30	Fix `is_realtime` column behavior in sys.segments table (#8154 ) * Fix is_realtime flag * make variable final * minor changes * Modify is_realtime behavior based on review comment * Fix UT	2019-07-31 22:26:49 -06:00
Gian Merlino	77297f4e6f	GroupBy array-based result rows. (#8196 ) * GroupBy array-based result rows. Fixes #8118; see that proposal for details. Other than the GroupBy changes, the main other "interesting" classes are: - ResultRow: The array-based result type. - BaseQuery: T is no longer required to be Comparable. - QueryToolChest: Adds "decorateObjectMapper" to enable query-aware serialization and deserialization of result rows (necessary due to their positional nature). - QueryResource: Uses the new decoration functionality. - DirectDruidClient: Also uses the new decoration functionality. - QueryMaker (in Druid SQL): Modifications to read ResultRows. These classes weren't changed, but got some new javadocs: - BySegmentQueryRunner - FinalizeResultsQueryRunner - Query * Adjustments for TC stuff.	2019-07-31 16:15:12 -07:00
Jonathan Wei	640b7afc1c	Add CliIndexer process type and initial task runner implementation (#8107 ) * Add CliIndexer process type and initial task runner implementation * Fix HttpRemoteTaskRunnerTest * Remove batch sanity check on PeonAppenderatorsManager * Fix paralle index tests * PR comments * Adjust Jersey resource logging * Additional cleanup * Fix SystemSchemaTest * Add comment to LocalDataSegmentPusherTest absolute path test * More PR comments * Use Server annotated with RemoteChatHandler * More PR comments * Checkstyle * PR comments * Add task shutdown to stopGracefully * Small cleanup * Compile fix * Address PR comments * Adjust TaskReportFileWriter and fix nits * Remove unnecessary closer * More PR comments * Minor adjustments * PR comments * ThreadingTaskRunner: cancel task run future not shutdownFuture and remove thread from workitem	2019-07-29 17:06:33 -07:00
Chi Cao Minh	ab71a2e1e4	Revert "Fix dependency analyze warnings (#8128 )" (#8189 ) This reverts commit `5dd0d8e873`.	2019-07-29 11:42:16 -07:00
Chi Cao Minh	5dd0d8e873	Fix dependency analyze warnings (#8128 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports. * Fix licenses and dependencies * Fix licenses and dependencies again * Fix integration test dependency * Address review comments * Fix unit test dependencies * Fix integration test dependency * Fix integration test dependency again * Fix integration test dependency third time * Fix integration test dependency fourth time * Fix compile error * Fix assert package	2019-07-26 10:49:03 -07:00
Jihoon Son	db14946207	Add support minor compaction with segment locking (#7547 ) * Segment locking * Allow both timeChunk and segment lock in the same gruop * fix it test * Fix adding same chunk to atomicUpdateGroup * resolving todos * Fix segments to lock * fix segments to lock * fix kill task * resolving todos * resolving todos * fix teamcity * remove unused class * fix single map * resolving todos * fix build * fix SQLMetadataSegmentManager * fix findInputSegments * adding more tests * fixing task lock checks * add SegmentTransactionalOverwriteAction * changing publisher * fixing something * fix for perfect rollup * fix test * adjust package-lock.json * fix test * fix style * adding javadocs * remove unused classes * add more javadocs * unused import * fix test * fix test * Support forceTimeChunk context and force timeChunk lock for parallel index task if intervals are missing * fix travis * fix travis * unused import * spotbug * revert getMaxVersion * address comments * fix tc * add missing error handling * fix backward compatibility * unused import * Fix perf of versionedIntervalTimeline * fix timeline * fix tc * remove remaining todos * add comment for parallel index * fix javadoc and typos * typo * address comments	2019-07-24 17:35:46 -07:00
Sashidhar Thallam	ea4bad7836	Druid SQL EXTRACT time function - adding support for additional Time Units (#8068 ) * 1. Added TimestampExtractExprMacro.Unit for MILLISECOND 2. expr eval for MILLISECOND 3. Added a test case to test extracting millisecond from expression. #7935 * 1. Adding DATASOURCE4 in tests. 2. Adding test TimeExtractWithMilliseconds * Fixing testInformationSchemaTables test * Fixing failing tests in DruidAvaticaHandlerTest * Adding cannotVectorize() call before the test * Extract time function - Adding support for MICROSECOND, ISODOW, ISOYEAR and CENTURY time units, documentation changes. * Adding MILLISECOND in test case * Adding support DECADE and MILLENNIUM, updating test case and documentation * Fixing expression eval for DECADE and MILLENIUM	2019-07-19 20:38:32 -07:00
Samarth Jain	ceb3a891bb	Fix druid sql group by queries returning complex aggregation type (#8099 ) * Fix druid sql group by queries returning complex aggregation type * Remove unnecessary check	2019-07-19 13:52:14 -07:00
Clint Wylie	03e55d30eb	add CachingClusteredClient benchmark, refactor some stuff (#8089 ) * add CachingClusteredClient benchmark, refactor some stuff * revert WeightedServerSelectorStrategy to ConnectionCountServerSelectorStrategy and remove getWeight since felt artificial, default mergeResults in toolchest implementation for topn, search, select * adjust javadoc * adjustments * oops * use it * use BinaryOperator, remove CombiningFunction, use Comparator instead of Ordering, other review adjustments * rename createComparator to createResultComparator, fix typo, firstNonNull nullable parameters	2019-07-18 13:16:28 -07:00
Roman Leventov	ceb969903f	Refactor SQLMetadataSegmentManager; Change contract of REST met… (#7653 ) * Refactor SQLMetadataSegmentManager; Change contract of REST methods in DataSourcesResource * Style fixes * Unused imports * Fix tests * Fix style * Comments * Comment fix * Remove unresolvable Javadoc references; address comments * Add comments to ImmutableDruidDataSource * Merge with master * Fix bad web-console merge * Fixes in api-reference.md * Rename in DruidCoordinatorRuntimeParams * Fix compilation * Residual changes	2019-07-17 17:18:48 +03:00
Gian Merlino	ffa25b7832	Query vectorization. (#6794 ) * Benchmarks: New SqlBenchmark, add caching & vectorization to some others. - Introduce a new SqlBenchmark geared towards benchmarking a wide variety of SQL queries. Rename the old SqlBenchmark to SqlVsNativeBenchmark. - Add (optional) caching to SegmentGenerator to enable easier benchmarking of larger segments. - Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark. * Query vectorization. This patch includes vectorized timeseries and groupBy engines, as well as some analogs of your favorite Druid classes: - VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.) - VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has methods to create analogs of the column selectors you know and love. - VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset. - VectorAggregator is like BufferAggregator. - VectorValueMatcher is like ValueMatcher. There are some noticeable differences between vectorized and regular execution: - Unlike regular cursors, vector cursors do not understand time granularity. They expect query engines to handle this on their own, which a new VectorCursorGranularizer class helps with. This is to avoid too much batch-splitting and to respect the fact that vector selectors are somewhat more heavyweight than regular selectors. - Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes for filters that might partially support them (like an OR of one filter that supports indexing and another that doesn't). I'm not sure that this behavior is desirable anyway (it is potentially too eager) but, at any rate, it'd be better to harmonize it between the two classes. Potentially they should both do some different thing that is smarter than what either of them is doing right now. - When vector cursors are created by QueryableIndexCursorSequenceBuilder, they use a morphing binary-then-linear search to find their start and end rows, rather than linear search. Limitations in this patch are: - Only timeseries and groupBy have vectorized engines. - GroupBy doesn't handle multi-value dimensions yet. - Vector cursors cannot handle virtual columns or descending order. - Only some filters have vectorized matchers: "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not". - Only some aggregators have vectorized implementations: "count", "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered". - Dimension specs other than "default" don't work yet (no extraction functions or filtered dimension specs). Currently, the testing strategy includes adding vectorization-enabled tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest, GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the filtering tests that extend BaseFilterTest. In all of those classes, there are some test cases that don't support vectorization. They are marked by special function calls like "cannotVectorize" or "skipVectorize" that tell the test harness to either expect an exception or to skip the test case. Testing should be expanded in the future -- a project in and of itself. Related to #3011. * WIP * Adjustments for unused things. * Adjust javadocs. * DimensionDictionarySelector adjustments. * Add "clone" to BatchIteratorAdapter. * ValueMatcher javadocs. * Fix benchmark. * Fixups post-merge. * Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex. * BloomDimFilterSqlTest: Tag two non-vectorizable tests. * Minor adjustments. * Update surefire, bump up Xmx in Travis. * Some more adjustments. * Javadoc adjustments * AggregatorAdapters adjustments. * Additional comments. * Remove switching search. * Only missiles.	2019-07-12 12:54:07 -07:00
Gian Merlino	613f09b45a	SQL: Add TIME_CEIL function. (#8027 ) Also simplify conversions for CEIL, FLOOR, and TIME_FLOOR by allowing them to share more code.	2019-07-04 15:40:03 -07:00
Clint Wylie	e6ba258197	multi-value string expression transformation fix (#8019 ) * multi-value string expression transformation fix * fixes * more docs and test * revert unintended doc change * formatting * change tostring to print binding identifier * review fixup * oops	2019-07-03 23:03:47 -07:00
Clint Wylie	c556d44a19	more sql support for expression array functions (#7974 ) * more sql support for expression array functions * prepend/slice * doc fixes * fix imports * fix tests * add null numeric expr for proper conversions between ExprEval and Expr and back to ExprEval * re-arrange * imports :( * add append/prepend test	2019-07-02 21:39:26 -07:00
Clint Wylie	93b738bbfa	expression language array constructor and sql multi-value string filtering support (#7973 ) * expr array constructor and sql multi-value string support * doc fix * checkstyle * change from feedback	2019-07-01 15:14:50 -07:00
Xue Yu	2831944056	support NVL sql function (#7965 ) * sql nvl * add nvl in sql doc	2019-06-30 13:14:30 -07:00
Clint Wylie	151edeec3c	expression virtual column selector fix for expressions which produce array types (#7958 ) * fix bug in multi-value string expression column selector * more test * imports!! * fixes	2019-06-26 16:57:13 -07:00
Fokko Driesprong	82b248cc17	Spotbugs: Enable MS_SHOULD_BE_FINAL (#7946 )	2019-06-23 15:42:18 -07:00
Clint Wylie	494b8ebe56	multi-value string column support for expressions (#7588 ) * array support for expression language for multi-value string columns * fix tests? * fixes * more tests * fixes * cleanup * more better, more test * ignore inspection * license * license fix * inspection * remove dumb import * more better * some comments * add expr rewrite for arrayfn args for more magic, tests * test stuff * more tests * fix test * fix test * castfunc can deal with arrays * needs more empty array * more tests, make cast to long array more forgiving * refactor * simplify ExprMacro Expr implementations with base classes in core * oops * more test * use Shuttle for Parser.flatten, javadoc, cleanup * fixes and more tests * unused import * fixes * javadocs, cleanup, refactors * fix imports * more javadoc * more javadoc * more * more javadocs, nonnullbydefault, minor refactor * markdown fix * adjustments * more doc * move initial filter out * docs * map empty arg lambda, apply function argument validation * check function args at parse time instead of eval time * more immutable * more more immutable * clarify grammar * fix docs * empty array is string test, we need a way to make arrays better maybe in the future, or define empty arrays as other types..	2019-06-19 13:57:37 -07:00
SandishKumarHN	01881e3a98	Use only com.google.errorprone.annotations.concurrent.GuardedBy, not javax.annotations.concurrent.GuardedBy (#7889 )	2019-06-17 15:58:51 +02:00
Sashidhar Thallam	3bee6adcf7	Use map.putIfAbsent() or map.computeIfAbsent() as appropriate instead of containsKey() + put() (#7764 ) * https://github.com/apache/incubator-druid/issues/7316 Use Map.putIfAbsent() instead of containsKey() + put() * fixing indentation * Using map.computeIfAbsent() instead of map.putIfAbsent() where appropriate * fixing checkstyle * Changing the recommendation text * Reverting auto changes made by IDE * Implementing recommendation: A ConcurrentHashMap on which computeIfAbsent() is called should be assigned into variables of ConcurrentHashMap type, not ConcurrentMap * Removing unused import	2019-06-14 17:59:36 +02:00
Surekha	ea752ef562	Optimize overshadowed segments computation (#7595 ) * Move the overshadowed segment computation to SQLMetadataSegmentManager's poll * rename method in MetadataSegmentManager * Fix tests * PR comments * PR comments * PR comments * fix indentation * fix tests * fix test * add test for SegmentWithOvershadowedStatus serde format * PR comments * PR comments * fix test * remove snapshot updates outside poll * PR comments * PR comments * PR comments * removed unused import	2019-06-07 19:15:54 +02:00
Clint Wylie	12a1ecfc2b	allow sql lookup function to take advantage of injective lookups (#7655 )	2019-06-06 14:36:10 -07:00
Xue Yu	d482da6e9b	fix timestamp ceil lower bound bug (#7823 )	2019-06-04 01:16:31 -07:00
Eyal Yurman	69e9b8a464	Enables SQL by default. (#7808 )	2019-05-31 20:53:42 -07:00
Jihoon Son	7abfbb066a	Bump up snapshot version to 0.16.0 (#7802 )	2019-05-30 17:17:33 -07:00
Gian Merlino	58a571ccda	SQL: Use SegmentId instead of DataSegment as set/map keys. (#7796 ) Recently we've been talking about using SegmentIds as map keys rather than DataSegments, because its sense of equality is more well-defined. This is a refactor that does this in the druid-sql module, which mostly involves DruidSchema and some related classes. It should have no user-visible effects.	2019-05-30 12:58:36 -07:00
Gian Merlino	8649b8ab4c	SQL: Allow select-sort-project query shapes. (#7769 ) * SQL: Allow select-sort-project query shapes. Fixes #7768. Design changes: - In PartialDruidQuery, allow projection after select + sort by removing the SELECT_SORT query stage and instead allowing the SORT and SORT_PROJECT stages to apply either after aggregation or after a plain non-aggregating select. This is different from prior behavior, where SORT and SORT_PROJECT were only considered valid after aggregation stages. This logic change is in the "canAccept" method. - In DruidQuery, represent either kind of sorting with a single "Sorting" class (instead of DefaultLimitSpec). The Sorting class is still convertible into a DefaultLimitSpec, but is also convertible into the sorting parameters accepted by a Scan query. - In DruidQuery, represent post-select and post-sorting projections with a single "Projection" class. This obsoletes the SortProject and SelectProjection classes, and simplifies the DruidQuery by allowing us to move virtual-column and post-aggregator-creation logic into the new Projection class. - Split "DruidQuerySignature" into RowSignature and VirtualColumnRegistry. This effectively means that instead of having mutable and immutable versions of DruidQuerySignature, we instead of RowSignature (always immutable) and VirtualColumnRegistry (always mutable, but sometimes null). This change wasn't required, but IMO it this makes the logic involving them easier to follow, and makes it more clear when the virtual column registry is active and when it's not. Other changes: - ConvertBoundsToSelectors now just accepts a RowSignature, but we use the VirtualColumnRegistry.getFullRowSignature() method to get a signature that includes all columns, and therefore allows us to simplify the logic (no need to special-case virtual columns). - Add `__time` to the Scan column list if the query is ordering by time. * Remove unused import.	2019-05-30 12:56:29 -07:00
Roman Leventov	782863ed0f	Fix some problems reported by PVS-Studio (#7738 ) * Fix some problems reported by PVS-Studio * Address comments	2019-05-29 11:20:45 -07:00
Surekha	1fe0de1c96	Fix currSize attribute of historical server type (#7706 )	2019-05-21 11:55:58 -07:00
Gian Merlino	cbbce955de	SQL: Allow NULLs in place of optional arguments in many functions. (#7709 ) * SQL: Allow NULLs in place of optional arguments in many functions. Also adjust SQL docs to describe how to make time literals using TIME_PARSE (which is now possible in a nicer way). * Be less forbidden.	2019-05-21 11:54:34 -07:00
Gian Merlino	43c54385f6	SQL: Respect default timezone for TIME_PARSE and TIME_SHIFT. (#7704 ) * SQL: Respect default timezone for TIME_PARSE and TIME_SHIFT. They were inadvertently using UTC rather than the default timezone. Also, harmonize how time functions handle their parameters. * Fix tests * Add another TIME_SHIFT test.	2019-05-21 11:40:44 -07:00
Gian Merlino	69b2ea3ddc	SQL: TIME_EXTRACT should have 2 required operands. (#7710 ) * SQL: TIME_EXTRACT should have 2 required operands. Timestamp and time unit are both required. * Add regression test.	2019-05-21 11:32:36 -07:00
Gian Merlino	bcea05e4e8	SQL: Fix exception with OR of impossible filters. (#7707 ) Fixes #7671.	2019-05-21 11:32:09 -07:00
Xue Yu	dd7dace70a	Add TIMESTAMPDIFF sql support (#7695 ) * add timestampdiff sql support * feedback address	2019-05-21 08:05:38 -07:00
Gian Merlino	cb6ec2cab8	SqlOperatorConversion Javadoc fix. (#7713 ) Appears to be a copypasta error; the toDruidFilter method was referred to aggregations, but it's not handling aggregations.	2019-05-20 21:21:21 -07:00
Surekha	d3545f5086	Show all server types in sys.servers table (#7654 ) * update sys.servers table to show all servers * update docs * Fix integration test * modify test query for batch integration test * fix case in test queries * make the server_type lowercase * Apply suggestions from code review Co-Authored-By: Himanshu <g.himanshu@gmail.com> * Fix compilation from git suggestion * fix unit test	2019-05-15 16:54:02 -07:00
Xue Yu	35a1fbefea	upgrade avatica to 1.12.0 (#7644 )	2019-05-12 14:38:06 -07:00
Xue Yu	f7b8b57c3b	simpilfy DruidConvertletTable.java, remove STANDARD_CONVERTLET declare (#7632 )	2019-05-10 14:08:32 -07:00
Jonathan Wei	a013350018	Adjust required permissions for system schema (#7579 ) * Adjust required permissions for system schema * PR comments, fix current_size handling * Checkstyle * Set curr_size instead of current_size * Adjust information schema docs * Fix merge conflict * Update tests	2019-05-02 07:18:02 -07:00
Surekha	15d19f3059	Add is_overshadowed column to sys.segments table (#7425 ) * Add is_overshadowed column to sys.segments table * update docs * Rename class and variables * PR comments * PR comments * remove unused variables in MetadataResource * move constants together * add getFullyOvershadowedSegments method to ImmutableDruidDataSource * Fix compareTo of SegmentWithOvershadowedStatus * PR comment * PR comments * PR comments * PR comments * PR comments * fix issue with already consumed stream * minor refactoring * PR comments	2019-05-01 18:00:57 +02:00
Gian Merlino	c648775b5b	SQL: Remove "useFallback" feature. (#7567 ) This feature allows Calcite's Bindable interpreter to be bolted on top of Druid queries and table scans. I think it should be removed for a few reasons: 1. It is not recommended for production anyway, because it generates unscalable query plans (e.g. it will plan a join into two table scans and then try to do the entire join in memory on the broker). 2. It doesn't work with Druid-specific SQL functions, like TIME_FLOOR, REGEXP_EXTRACT, APPROX_COUNT_DISTINCT, etc. 3. It makes the SQL planning code needlessly complicated. With SQL coming out of experimental status soon, it's a good opportunity to remove this feature.	2019-04-28 18:26:44 -07:00
Xue Yu	2c8a71f883	Support LPAD and RPAD sql function (#7388 ) * lpad and rpad sql function * feedback address * feedback address * add doc and format * update docs	2019-04-22 14:51:32 -07:00
Clint Wylie	be65cca248	refactor druid-bloom-filter aggregators (#7496 ) * now with 100% more buffer * there can be only 1 * simplify * javadoc * clean up unused test method * fix exception message * style * why does style hate javadocs * review stuff * style :(	2019-04-18 11:54:06 -07:00
Kazuhito Takeuchi	7c19c92a81	Add ROUND function in druid-sql. (#7224 ) * Implement round function in druid-sql * Return value according to the type of argument * Fix codes for abnoraml inputs, updated math-expr.md * Fix assert text * Fix error messages and refactor codes * Fix compile error, update sql.md, refactor codes and format tests	2019-04-16 11:15:39 -07:00
Gian Merlino	721191635a	SQL: Include virtual columns used for filtering in ScanQuery. (#7472 ) PR #6902 introduced the ability to use virtual columns for filters, but they were being omitted from "scan" queries, so filters would refer to a null column instead of the intended virtual column.	2019-04-14 15:03:36 -07:00
Surekha	3e5dae9b96	Rename SegmentMetadataHolder to AvailableSegmentMetadata (#7372 )	2019-04-14 10:19:48 -07:00
Justin Borromeo	408e3e1b2a	Remove select execution code from SQL planner (#7416 ) * Removed select execution code from SQL planner * Update doc	2019-04-10 22:32:57 -07:00

1 2 3 4 5 ...

264 Commits