2be7068f6e
* Fixes and improvements to SQL metadata caching. Also adds support for MultipleSpecificSegmentSpec to CachingClusteredClient. SQL changes: - Cache metadata on a per-segment level, in addition to per-dataSource, so we don't need to re-query all segments whenever a single new one appears. This should lower the load placed on the cluster by metadata queries. - Fix race condition in DruidSchema that can cause us to miss metadata. It was possible to notice new segments, then issue a query, and have that query not actually hit those segments, and not notice that it didn't hit those segments. Then, the metadata from those segments would be ignored. - Fix assumption in DruidSchema that all segments are immutable. Now, mutable segments are periodically re-queried. - Fix inappropriate re-use of SchemaPlus. Now we create one for each planning cycle, rather than sharing one. It caches table objects, which we want to avoid, since it can cause stale metadata. We do the caching in DruidSchema so we don't need the SchemaPlus caching. Server changes: - Add a TimelineCallback to TimelineServerView, for callers that want to get updates when the timeline has been modified. - Change CachingClusteredClient from a QueryRunner to a QuerySegmentWalker. This allows it to accept queries that are segment-descriptor-based rather than intervals-based. In particular it will now support MultipleSpecificSegmentSpec. * Fix DruidSchema, and unused imports. * Remove unused import. * Fix SqlBenchmark. |
||
---|---|---|
.idea | ||
api | ||
aws-common | ||
benchmarks | ||
bytebuffer-collections | ||
ci | ||
codestyle | ||
common | ||
distribution | ||
docs | ||
examples | ||
extendedset | ||
extensions-contrib | ||
extensions-core | ||
hll | ||
indexing-hadoop | ||
indexing-service | ||
integration-tests | ||
java-util | ||
processing | ||
publications | ||
server | ||
services | ||
sql | ||
.gitignore | ||
.travis.yml | ||
CONTRIBUTING.md | ||
DruidCorporateCLA.pdf | ||
DruidIndividualCLA.pdf | ||
INTELLIJ_SETUP.md | ||
LICENSE | ||
NOTICE | ||
README.md | ||
druid_intellij_formatting.xml | ||
eclipse.importorder | ||
eclipse_formatting.xml | ||
pom.xml | ||
upload.sh |
README.md
Druid
Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments.
Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.
Druid can load both streaming and batch data and integrates with Samza, Kafka, Storm, Spark, and Hadoop.
License
More Information
More information about Druid can be found on http://www.druid.io.
Documentation
You can find the documentation for the latest Druid release on the project website.
If you would like to contribute documentation, please do so under
/docs/content
in this repository and submit a pull request.
Getting Started
You can get started with Druid with our quickstart.
Reporting Issues
If you find any bugs, please file a GitHub issue.
Community
Community support is available on the druid-user mailing list(druid-user@googlegroups.com).
Development discussions occur on the druid-development list(druid-development@googlegroups.com).
We also have a couple people hanging out on IRC in #druid-dev
on
irc.freenode.net
.
Contributing
Please follow the guidelines listed here.