druid

Apache Druid: a high performance real-time analytics database.

druid

Go to file

Gian Merlino ffa25b7832 Query vectorization. (#6794 ) * Benchmarks: New SqlBenchmark, add caching & vectorization to some others. - Introduce a new SqlBenchmark geared towards benchmarking a wide variety of SQL queries. Rename the old SqlBenchmark to SqlVsNativeBenchmark. - Add (optional) caching to SegmentGenerator to enable easier benchmarking of larger segments. - Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark. * Query vectorization. This patch includes vectorized timeseries and groupBy engines, as well as some analogs of your favorite Druid classes: - VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.) - VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has methods to create analogs of the column selectors you know and love. - VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset. - VectorAggregator is like BufferAggregator. - VectorValueMatcher is like ValueMatcher. There are some noticeable differences between vectorized and regular execution: - Unlike regular cursors, vector cursors do not understand time granularity. They expect query engines to handle this on their own, which a new VectorCursorGranularizer class helps with. This is to avoid too much batch-splitting and to respect the fact that vector selectors are somewhat more heavyweight than regular selectors. - Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes for filters that might partially support them (like an OR of one filter that supports indexing and another that doesn't). I'm not sure that this behavior is desirable anyway (it is potentially too eager) but, at any rate, it'd be better to harmonize it between the two classes. Potentially they should both do some different thing that is smarter than what either of them is doing right now. - When vector cursors are created by QueryableIndexCursorSequenceBuilder, they use a morphing binary-then-linear search to find their start and end rows, rather than linear search. Limitations in this patch are: - Only timeseries and groupBy have vectorized engines. - GroupBy doesn't handle multi-value dimensions yet. - Vector cursors cannot handle virtual columns or descending order. - Only some filters have vectorized matchers: "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not". - Only some aggregators have vectorized implementations: "count", "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered". - Dimension specs other than "default" don't work yet (no extraction functions or filtered dimension specs). Currently, the testing strategy includes adding vectorization-enabled tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest, GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the filtering tests that extend BaseFilterTest. In all of those classes, there are some test cases that don't support vectorization. They are marked by special function calls like "cannotVectorize" or "skipVectorize" that tell the test harness to either expect an exception or to skip the test case. Testing should be expanded in the future -- a project in and of itself. Related to #3011. * WIP * Adjustments for unused things. * Adjust javadocs. * DimensionDictionarySelector adjustments. * Add "clone" to BatchIteratorAdapter. * ValueMatcher javadocs. * Fix benchmark. * Fixups post-merge. * Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex. * BloomDimFilterSqlTest: Tag two non-vectorizable tests. * Minor adjustments. * Update surefire, bump up Xmx in Travis. * Some more adjustments. * Javadoc adjustments * AggregatorAdapters adjustments. * Additional comments. * Remove switching search. * Only missiles.		2019-07-12 12:54:07 -07:00
.github	adjust PR template (#8016 )	2019-07-03 08:31:31 -07:00
.idea	Web-console: Add action column to segments view (#7954 )	2019-06-25 20:14:06 -07:00
benchmarks	Query vectorization. (#6794 )	2019-07-12 12:54:07 -07:00
ci	Move dev-related files and instructions to dev/ directory; add committer's instructions (#7279 )	2019-04-17 15:27:14 +02:00
cloud	Bump up snapshot version to 0.16.0 (#7802 )	2019-05-30 17:17:33 -07:00
codestyle	Enable Spotbugs: MS_OOI_PKGPROTECT (#8022 )	2019-07-08 13:17:56 +05:30
core	Query vectorization. (#6794 )	2019-07-12 12:54:07 -07:00
dev	Add the pull-request template (#7206 )	2019-06-27 15:51:25 +03:00
distribution	Fix license check in travis and make it optional (#8049 )	2019-07-09 19:35:29 -07:00
docs	Query vectorization. (#6794 )	2019-07-12 12:54:07 -07:00
examples	remove FirehoseV2 and realtime node extensions (#8020 )	2019-07-04 15:40:22 -07:00
extendedset	Bump up snapshot version to 0.16.0 (#7802 )	2019-05-30 17:17:33 -07:00
extensions-contrib	fail complex type 'serde' registration when registered type does not match expected type (#7985 )	2019-07-11 23:03:15 -07:00
extensions-core	Query vectorization. (#6794 )	2019-07-12 12:54:07 -07:00
hll	switch links from druid.io to druid.apache.org (#7914 )	2019-06-18 09:06:27 -07:00
indexing-hadoop	add config to optionally disable all compression in intermediate segment persists while ingestion (#7919 )	2019-07-10 12:22:24 -07:00
indexing-service	add config to optionally disable all compression in intermediate segment persists while ingestion (#7919 )	2019-07-10 12:22:24 -07:00
integration-tests	Add instruction about skipping up-to-date checks when running integration tests (#7843 )	2019-07-08 13:44:32 +05:30
licenses	Binary license management system (#7998 )	2019-07-08 12:24:51 -07:00
processing	Query vectorization. (#6794 )	2019-07-12 12:54:07 -07:00
publications	[ImgBot] Optimize images (#7873 )	2019-06-24 21:27:48 -07:00
server	Add inline firehose (#8056 )	2019-07-11 21:43:46 -07:00
services	write value of bitmap as field name (#8066 )	2019-07-11 19:29:46 -07:00
sql	Query vectorization. (#6794 )	2019-07-12 12:54:07 -07:00
web-console	added replicated size (#8043 )	2019-07-10 08:29:05 -07:00
.dockerignore	Add docker container for druid (#6896 )	2019-02-08 12:12:28 +00:00
.gitignore	Fix some problems reported by PVS-Studio (#7738 )	2019-05-29 11:20:45 -07:00
.travis.yml	Query vectorization. (#6794 )	2019-07-12 12:54:07 -07:00
CONTRIBUTING.md	CONTRIBUTING: Remove "keep the number of commits small" guidance. (#8004 )	2019-07-03 11:53:41 -07:00
DISCLAIMER	add missing license headers, in particular to MD files; clean up RAT … (#6563 )	2018-11-13 09:38:37 -08:00
LABELS	Add plain text README.txt, use relative link from README.md to build.md (#7611 )	2019-05-09 21:29:26 -07:00
LICENSE	Add missing license pointer for Porter Stemmer (#7941 )	2019-06-24 12:21:40 -07:00
NOTICE	Adjust NOTICE files (#7945 )	2019-06-25 09:08:54 -07:00
NOTICE.BINARY	remove FirehoseV2 and realtime node extensions (#8020 )	2019-07-04 15:40:22 -07:00
README.md	remove IRC badge from readme (#8052 )	2019-07-10 08:29:19 -07:00
README.template	switch links from druid.io to druid.apache.org (#7914 )	2019-06-18 09:06:27 -07:00
build.sh	Fix license check in travis and make it optional (#8049 )	2019-07-09 19:35:29 -07:00
licenses.yaml	Binary license management system (#7998 )	2019-07-08 12:24:51 -07:00
pom.xml	Query vectorization. (#6794 )	2019-07-12 12:54:07 -07:00
upload.sh	Adding licenses and enable apache-rat-plugin. (#6215 )	2018-09-18 08:39:26 -07:00

README.md

Apache Druid (incubating)

Apache Druid (incubating) is a high performance analytics data store for event-driven data.

Disclaimer: Apache Druid is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

License

Apache License, Version 2.0

More Information

More information about Druid can be found on https://druid.apache.org.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs/content in this repository and submit a pull request.

Getting Started

You can get started with Druid with our quickstart.

Reporting Issues

If you find any bugs, please file a GitHub issue.

Community

Community support is available on the druid-user mailing list(druid-user@googlegroups.com), which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

We also have a couple people hanging out on IRC in #druid-dev on irc.freenode.net.

Building From Source

Please note that JDK 8 is required to build Druid.

For instructions on building Druid from source, see docs/content/development/build.md

Contributing

Please follow the guidelines listed here.