Apache Druid: a high performance real-time analytics database.
Go to file
Xavier Léauté e79284da59 new interval based cost function (#2972)
* new interval based cost function

Addresses issues with balancing of segments in the existing cost function
- `gapPenalty` led to clusters of segments ~30 days apart
- `recencyPenalty` caused imbalance among recent segments
- size-based cost could be skewed by compression

New cost function is purely based on segment intervals:
- assumes each time-slice of a partition is a constant cost
- cost is additive, i.e. cost(A, B union C) = cost(A, B) + cost(A, C)
- cost decays exponentially based on distance between time-slices

* comments and formatting

* add more comments to explain the calculation
2016-05-17 09:56:00 -07:00
api Fix parsing fail of segment id with datasource containing underscore (#2797) 2016-05-02 22:37:28 -07:00
aws-common Update version to 0.9.1-SNAPSHOT. 2016-03-17 10:34:20 -07:00
benchmarks new interval based cost function (#2972) 2016-05-17 09:56:00 -07:00
common Fix CombiningSequence.close on single element sequences. (#2969) 2016-05-13 23:12:30 -07:00
distribution Update version to 0.9.1-SNAPSHOT. 2016-03-17 10:34:20 -07:00
docs Using quotes around the cp (#2934) 2016-05-16 15:16:48 -07:00
examples Add jconsole.sh example script for connecting (#2947) 2016-05-16 15:37:15 -07:00
extensions-contrib reconnect to the graphite after transient disconnect (#2952) 2016-05-12 11:32:36 -07:00
extensions-core use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943) 2016-05-13 10:06:31 +05:30
indexing-hadoop Supervisor for KafkaIndexTask (#2656) 2016-05-04 23:13:13 -07:00
indexing-service Supervisor for KafkaIndexTask (#2656) 2016-05-04 23:13:13 -07:00
integration-tests adding QueryGranularity to segment metadata and optionally expose same from segmentMetadata query (#2873) 2016-05-03 11:31:10 -07:00
processing use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943) 2016-05-13 10:06:31 +05:30
publications Support min/max values for metadata query (#2208) 2016-02-12 09:35:58 +09:00
server new interval based cost function (#2972) 2016-05-17 09:56:00 -07:00
services Supervisor for KafkaIndexTask (#2656) 2016-05-04 23:13:13 -07:00
.gitignore move distribution artifacts to distribution/target 2015-10-30 12:40:05 -05:00
.travis.yml Fail travis builds faster 2016-02-19 07:38:06 -08:00
CONTRIBUTING.md updating how to contribute guide 2015-11-19 23:30:28 -06:00
DruidCorporateCLA.pdf fix CLA email / mailing address 2014-04-17 15:26:28 -07:00
DruidIndividualCLA.pdf fix CLA email / mailing address 2014-04-17 15:26:28 -07:00
LICENSE Clean up README and license 2015-02-18 23:09:28 -08:00
NOTICE more doc fixes 2016-02-17 09:43:47 -08:00
README.md update readme (#2830) 2016-04-13 11:33:31 -07:00
eclipse.importorder Merge pull request #2905 from javasoze/eclipse_formatting 2016-04-29 18:42:03 -07:00
eclipse_formatting.xml Merge pull request #2905 from javasoze/eclipse_formatting 2016-04-29 18:42:03 -07:00
intellij_formatting.jar Update Scala formatting in intellij_formatting.jar, and rename style to "Druid Java and Scala style". 2015-12-03 13:35:44 -08:00
pom.xml new interval based cost function (#2972) 2016-05-17 09:56:00 -07:00
upload.sh update upload.sh to upload mysql-metadata-storage tarball too 2016-03-16 15:29:58 -05:00

README.md

Build Status Coverage Status

Druid

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments.

Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

Druid can load both streaming and batch data and integrates with Samza, Kafka, Storm, Spark, and Hadoop.

License

Apache License, Version 2.0

More Information

More information about Druid can be found on http://www.druid.io.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs/content in this repository and submit a pull request.

Getting Started

You can get started with Druid with our quickstart.

Reporting Issues

If you find any bugs, please file a GitHub issue.

Community

Community support is available on the druid-user mailing list(druid-user@googlegroups.com).

Development discussions occur on the druid-development list(druid-development@googlegroups.com).

We also have a couple people hanging out on IRC in #druid-dev on irc.freenode.net.

Contributing

Please follow the guidelines listed here.