druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	ffa25b7832	Query vectorization. (#6794 ) * Benchmarks: New SqlBenchmark, add caching & vectorization to some others. - Introduce a new SqlBenchmark geared towards benchmarking a wide variety of SQL queries. Rename the old SqlBenchmark to SqlVsNativeBenchmark. - Add (optional) caching to SegmentGenerator to enable easier benchmarking of larger segments. - Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark. * Query vectorization. This patch includes vectorized timeseries and groupBy engines, as well as some analogs of your favorite Druid classes: - VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.) - VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has methods to create analogs of the column selectors you know and love. - VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset. - VectorAggregator is like BufferAggregator. - VectorValueMatcher is like ValueMatcher. There are some noticeable differences between vectorized and regular execution: - Unlike regular cursors, vector cursors do not understand time granularity. They expect query engines to handle this on their own, which a new VectorCursorGranularizer class helps with. This is to avoid too much batch-splitting and to respect the fact that vector selectors are somewhat more heavyweight than regular selectors. - Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes for filters that might partially support them (like an OR of one filter that supports indexing and another that doesn't). I'm not sure that this behavior is desirable anyway (it is potentially too eager) but, at any rate, it'd be better to harmonize it between the two classes. Potentially they should both do some different thing that is smarter than what either of them is doing right now. - When vector cursors are created by QueryableIndexCursorSequenceBuilder, they use a morphing binary-then-linear search to find their start and end rows, rather than linear search. Limitations in this patch are: - Only timeseries and groupBy have vectorized engines. - GroupBy doesn't handle multi-value dimensions yet. - Vector cursors cannot handle virtual columns or descending order. - Only some filters have vectorized matchers: "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not". - Only some aggregators have vectorized implementations: "count", "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered". - Dimension specs other than "default" don't work yet (no extraction functions or filtered dimension specs). Currently, the testing strategy includes adding vectorization-enabled tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest, GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the filtering tests that extend BaseFilterTest. In all of those classes, there are some test cases that don't support vectorization. They are marked by special function calls like "cannotVectorize" or "skipVectorize" that tell the test harness to either expect an exception or to skip the test case. Testing should be expanded in the future -- a project in and of itself. Related to #3011. * WIP * Adjustments for unused things. * Adjust javadocs. * DimensionDictionarySelector adjustments. * Add "clone" to BatchIteratorAdapter. * ValueMatcher javadocs. * Fix benchmark. * Fixups post-merge. * Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex. * BloomDimFilterSqlTest: Tag two non-vectorizable tests. * Minor adjustments. * Update surefire, bump up Xmx in Travis. * Some more adjustments. * Javadoc adjustments * AggregatorAdapters adjustments. * Additional comments. * Remove switching search. * Only missiles.	2019-07-12 12:54:07 -07:00
Clint Wylie	abf9843e2a	fail complex type 'serde' registration when registered type does not match expected type (#7985 ) * make ComplexMetrics.registerSerde type check on register, resolves #7959 * add test * simplify * unused imports :/ * simplify * burned by imports yet again * remove unused constructor * switch to getName * heh oops	2019-07-11 23:03:15 -07:00
Alexander Saydakov	4b176ad265	force native order when wrapping ByteBuffer since Druid can have it set (#8055 ) incorrectly	2019-07-11 17:17:53 -07:00
Chi Cao Minh	0ded0ce414	Add round support for DS-HLL (#8023 ) * Add round support for DS-HLL Since the Cardinality aggregator has a "round" option to round off estimated values generated from the HyperLogLog algorithm, add the same "round" option to the DataSketches HLL Sketch module aggregators to be consistent. * Fix checkstyle errors * Change HllSketchSqlAggregator to do rounding * Fix test for standard-compliant null handling mode	2019-07-05 15:37:58 -07:00
Alexander Saydakov	f38a62e949	theta sketch to string post agg (#7937 )	2019-06-27 15:09:57 -07:00
Xue Yu	b9c6a26c0e	Use ComplexMetrics.registerSerde() across the codebase (#7925 ) * refactor complexmetric registerserde * fix error * feedback address	2019-06-25 11:39:04 +03:00
Alexander Saydakov	4dd446bfdd	sketches-core-0.13.4 (#7666 )	2019-06-06 14:36:52 -07:00
Jihoon Son	7abfbb066a	Bump up snapshot version to 0.16.0 (#7802 )	2019-05-30 17:17:33 -07:00
Gian Merlino	8649b8ab4c	SQL: Allow select-sort-project query shapes. (#7769 ) * SQL: Allow select-sort-project query shapes. Fixes #7768. Design changes: - In PartialDruidQuery, allow projection after select + sort by removing the SELECT_SORT query stage and instead allowing the SORT and SORT_PROJECT stages to apply either after aggregation or after a plain non-aggregating select. This is different from prior behavior, where SORT and SORT_PROJECT were only considered valid after aggregation stages. This logic change is in the "canAccept" method. - In DruidQuery, represent either kind of sorting with a single "Sorting" class (instead of DefaultLimitSpec). The Sorting class is still convertible into a DefaultLimitSpec, but is also convertible into the sorting parameters accepted by a Scan query. - In DruidQuery, represent post-select and post-sorting projections with a single "Projection" class. This obsoletes the SortProject and SelectProjection classes, and simplifies the DruidQuery by allowing us to move virtual-column and post-aggregator-creation logic into the new Projection class. - Split "DruidQuerySignature" into RowSignature and VirtualColumnRegistry. This effectively means that instead of having mutable and immutable versions of DruidQuerySignature, we instead of RowSignature (always immutable) and VirtualColumnRegistry (always mutable, but sometimes null). This change wasn't required, but IMO it this makes the logic involving them easier to follow, and makes it more clear when the virtual column registry is active and when it's not. Other changes: - ConvertBoundsToSelectors now just accepts a RowSignature, but we use the VirtualColumnRegistry.getFullRowSignature() method to get a signature that includes all columns, and therefore allows us to simplify the logic (no need to special-case virtual columns). - Add `__time` to the Scan column list if the query is ordering by time. * Remove unused import.	2019-05-30 12:56:29 -07:00
Clint Wylie	eef69619d3	add support for multi-value string dimensions for HllSketch build aggregator (#7730 )	2019-05-23 17:07:32 -07:00
Clint Wylie	23e96d15d4	allow quantiles merge aggregator to also accept doubles (#7718 ) * allow quantiles merge aggregator to also accept doubles * consolidate dupe * import	2019-05-23 11:13:41 -07:00
Clint Wylie	ffc2397bcd	fix AggregatorFactory.finalizeComputation implementations to be ok with null inputs (#7731 ) * AggregatorFactory finalizeComputation is nullable with nullable input, make implementations honor this * fixes	2019-05-22 21:13:09 -07:00
Gian Merlino	b6941551ae	Upgrade various build and doc links to https. (#7722 ) * Upgrade various build and doc links to https. Where it wasn't possible to upgrade build-time dependencies to https, I kept http in place but used hardcoded checksums or GPG keys to ensure that artifacts fetched over http are verified properly. * Switch to https://apache.org.	2019-05-21 11:30:14 -07:00
Merlin Lee	5f08b0b474	Add checkstyle for "Prohibit @author tags in Javadoc" (#7682 ) * Add checkstyle for "Prohibit @author tags in Javadoc" * Add "Do not use author tags/information in the code" back to CONTRIBUTING.md	2019-05-20 00:09:51 -07:00
Fokko Driesprong	2aa9613bed	Bump Checkstyle to 8.20 (#7651 ) * Bump Checkstyle to 8.20 Moderate severity vulnerability that affects: com.puppycrawl.tools:checkstyle Checkstyle prior to 8.18 loads external DTDs by default, which can potentially lead to denial of service attacks or the leaking of confidential information. Affected versions: < 8.18 * Oops, missed one * Oops, missed a few	2019-05-14 11:53:37 -07:00
Alexander Saydakov	ca1a6649f6	Datasketches quantiles more post-aggs (#7550 ) * rank and CDF post-aggs * added post-aggs to the module * added new post-aggs * moved post-agg IDs * moved post-agg IDs	2019-05-10 11:46:54 -07:00
Alexander Saydakov	59f9ff38c7	fix issue #7607 (#7619 ) * fix issue #7607 * exclude com.google.code.findbugs:annotations	2019-05-09 17:33:29 -07:00
Alexander Saydakov	9d8f934e68	handle empty sketches (#7526 ) * handle empty sketches * return array of NaN in case of empty sketch * noinspection ForLoopReplaceableByForEach in tests * style fixes	2019-04-25 14:28:41 -07:00
es1220	3e25b75c3f	Fix aggregatorFactory meta merge exception (#7504 )	2019-04-24 14:08:46 -07:00
Jonathan Wei	7d9cb6944b	Adjust BufferAggregator.get() impls to return copies (#7464 ) * Adjust BufferAggregator.get() impls to return copies * Update BufferAggregator docs, more agg fixes * Update BufferAggregator get() doc	2019-04-12 19:04:07 -07:00
Gian Merlino	7cd5477658	DoublesSketchComplexMetricSerde: Handle empty strings. (#7429 )	2019-04-08 18:01:31 -07:00
Alexander Saydakov	28b4e8586d	use latest sketches-core-0.13.1 (#7320 ) * use latest sketches-core-0.13.0 * latest release	2019-04-03 17:06:02 -04:00
Clint Wylie	a99f0ff450	prefix no-op aggs with "Noop" (#6960 )	2019-04-02 15:05:07 -07:00
Clint Wylie	d7ba19d477	sql, filters, and virtual columns (#6902 ) * refactor sql planning to re-use expression virtual columns when possible when constructing a DruidQuery, allowing virtual columns to be defined in filter expressions, and making resulting native druid queries more concise. also minor refactor of built-in sql aggregators to maximize code re-use * fix it * fix it in the right place * fixup for base64 stuff * fixup tests * fix merge conflict on import order * fixup * fix imports * fix tests * review comments * refactor * re-arrange * better javadoc * fixup merge * fixup tests * fix accidental changes	2019-03-11 11:37:58 -07:00
Himanshu Pandey	8b803cbc22	Added checkstyle for "Methods starting with Capital Letters" (#7118 ) * Added checkstyle for "Methods starting with Capital Letters" and changed the method names violating this. * Un-abbreviate the method names in the calcite tests * Fixed checkstyle errors * Changed asserts position in the code	2019-02-23 20:10:31 -08:00
Jonathan Wei	fafbc4a80e	Set version to 0.15.0-incubating-SNAPSHOT (#7014 )	2019-02-07 14:02:52 -08:00
Jonathan Wei	8bc5eaa908	Set version to 0.14.0-incubating-SNAPSHOT (#7003 )	2019-02-04 19:36:20 -08:00
Clint Wylie	6207b66e20	fix build (#6994 )	2019-02-03 09:38:51 -08:00
Jonathan Wei	953b96d0a4	Add more sketch aggregator support in Druid SQL (#6951 ) * Add more sketch aggregator support in Druid SQL * Add docs * Tweak module serde register * Fix tests * Checkstyle * Test fix * PR comment * PR comment * PR comments	2019-02-02 22:34:53 -08:00
Roman Leventov	f7df5fedcc	Add several missing inspectRuntimeShape() calls (#6893 ) * Add several missing inspectRuntimeShape() calls * Add lgK to runtime shapes	2019-01-31 20:04:26 -08:00
Benedict Jin	72a571fbf7	For performance reasons, use `java.util.Base64` instead of Base64 in Apache Commons Codec and Guava (#6913 ) * * Add few methods about base64 into StringUtils * Use `java.util.Base64` instead of others * Add org.apache.commons.codec.binary.Base64 & com.google.common.io.BaseEncoding into druid-forbidden-apis * Rename encodeBase64String & decodeBase64String * Update druid-forbidden-apis	2019-01-25 17:32:29 -08:00
Jonathan Wei	68f744ec0a	Fixed buckets histogram aggregator (#6638 ) * Fixed buckets histogram aggregator * PR comments * More PR comments * Checkstyle * TeamCity * More TeamCity * PR comment * PR comment * Fix doc formatting	2019-01-17 14:51:16 -08:00
Alexander Saydakov	161dac1d23	datasketches quantiles module - implemented makeAggregateCombiner (#6882 ) * implemented makeAggregateCombiner * fixed import order	2019-01-17 14:09:55 -08:00
Roman Leventov	887c645675	Find duplicate lines with checkstyle; enable some duplicate inspections in IntelliJ (#6558 ) Not putting this to 0.13 milestone because the found bugs are not critical (one is a harmless DI config duplicate, and another is in a benchmark. Change in `DumpSegment` is just an indentation change.	2018-11-26 16:55:42 +01:00
Roman Leventov	87b96fb1fd	Add checkstyle rules about imports and empty lines between members (#6543 ) * Add checkstyle rules about imports and empty lines between members * Add suppressions * Update Eclipse import order * Add empty line * Fix StatsDEmitter	2018-11-20 12:42:15 +01:00
Roman Leventov	8f3fe9cd02	Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies (#6607 ) * Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies * Fix bug * Replace checkstyle regexp with IntelliJ inspection	2018-11-15 13:21:34 -08:00
David Lim	afb239b17a	add missing license headers, in particular to MD files; clean up RAT … (#6563 ) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg	2018-11-13 09:38:37 -08:00
Roman Leventov	54351a5c75	Fix various bugs; Enable more IntelliJ inspections and update error-prone (#6490 ) * Fix various bugs; Enable more IntelliJ inspections and update error-prone * Fix NPE * Fix inspections * Remove unused imports	2018-11-06 14:38:08 -08:00
Roman Leventov	a2a1a1c2c9	Hide NullDimensionSelector from public (#6480 )	2018-11-02 04:38:21 -07:00
QiuMM	676f5e6d7f	Prohibit some guava collection APIs and use JDK collection APIs directly (#6511 ) * Prohibit some guava collection APIs and use JDK APIs directly * reset files that changed by accident * sort codestyle/druid-forbidden-apis.txt alphabetically	2018-10-29 13:02:43 +01:00
Alexander Saydakov	ec9d1827a0	updated to use the latest sketches-core-0.12.0 (#6381 )	2018-10-23 11:20:19 -07:00
Roman Leventov	84ac18dc1b	Catch some incorrect method parameter or call argument formatting patterns with checkstyle (#6461 ) * Catch some incorrect method parameter or call argument formatting patterns with checkstyle * Fix DiscoveryModule * Inline parameters_and_arguments.txt * Fix a bug in PolyBind * Fix formatting	2018-10-23 07:17:38 -03:00
Clint Wylie	84598fba3b	combine druid-api, druid-common, java-util into druid-core (#6443 ) * combine druid-api, druid-common, java-util * spacing	2018-10-14 20:37:37 -07:00
David Lim	20ab213ba6	change project versions to 0.13.0-incubating-SNAPSHOT (#6453 )	2018-10-11 19:28:01 -07:00
Roman Leventov	3ae563263a	Renamed 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; misc refinements (#5957 ) This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR. Some of the changes: - Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names. - Generified `ComplexMetricExtractor`	2018-10-02 14:50:22 -03:00
Alexander Saydakov	93345064b5	HllSketch module (#5712 ) * HllSketch module * updated license and imports * updated package name * implemented makeAggregateCombiner() * removed json marks * style fix * added module * removed unnecessary import, side effect of package renaming * use TreadLocalRandom * addressing code review points, mostly formatting and comments * javadoc * natural order with nulls * typo * factored out raw input value extraction * singleton * style fix * style fix * use Collections.singletonList instead of Arrays.asList * suppress warning	2018-09-24 08:41:56 -07:00
Roman Leventov	0c4bd2b57b	Prohibit some Random usage patterns (#6226 ) * Prohibit Random usage patterns * Fix FlattenJSONBenchmarkUtil	2018-09-14 13:35:51 -07:00
Gian Merlino	431d3d8497	Rename io.druid to org.apache.druid. (#6266 ) * Rename io.druid to org.apache.druid. * Fix META-INF files and remove some benchmark results. * MonitorsConfig update for metrics package migration. * Reorder some dimensions in inner queries for some reason. * Fix protobuf tests.	2018-08-30 09:56:26 -07:00
Benedict Jin	3647d4c94a	Make time-related variables more readable (#6158 ) * Make time-related variables more readable * Patch some improvements from the code reviewer * Remove unnecessary boxing of Long type variables	2018-08-21 15:29:40 -07:00
Alexander Saydakov	c47032d566	Implemented makeAggregateCombiner() in ArrayOfDoublesSketchAggregatorFactory (#6093 ) * implemented makeAggregateCombiner() * test for makeAggregateCombiner() * license, style fix	2018-08-13 14:19:11 -07:00

1 2 3

126 Commits