diff --git a/docs/content/Cassandra-Deep-Storage.md b/docs/content/Cassandra-Deep-Storage.md new file mode 100644 index 00000000000..15d409e52a1 --- /dev/null +++ b/docs/content/Cassandra-Deep-Storage.md @@ -0,0 +1,44 @@ +--- +layout: doc_page +--- + +## Introduction +Druid can use Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables: +`index_storage` and `descriptor_storage`. Underneath the hood, the Cassandra integration leverages Astyanax. The +index storage table is a [Chunked Object](https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store) repository. It contains +compressed segments for distribution to historical nodes. Since segments can be large, the Chunked Object storage allows the integration to multi-thread +the write to Cassandra, and spreads the data across all the nodes in a cluster. The descriptor storage table is a normal C* table that +stores the segment metadatak. + +## Schema +Below are the create statements for each: + +```sql +CREATE TABLE index_storage(key text, + chunk text, + value blob, + PRIMARY KEY (key, chunk)) WITH COMPACT STORAGE; + +CREATE TABLE descriptor_storage(key varchar, + lastModified timestamp, + descriptor varchar, + PRIMARY KEY (key)) WITH COMPACT STORAGE; +``` + +## Getting Started +First create the schema above. I use a new keyspace called `druid` for this purpose, which can be created using the +[Cassandra CQL `CREATE KEYSPACE`](http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/create_keyspace_r.html) command. + +Then, add the following to your historical and realtime runtime properties files to enable a Cassandra backend. + +```properties +druid.extensions.coordinates=["io.druid.extensions:druid-cassandra-storage:"] +druid.storage.type=c* +druid.storage.host=localhost:9160 +druid.storage.keyspace=druid +``` + +Use the `druid-development@googlegroups.com` mailing list if you have questions, +or feel free to reach out directly: `bone@alumni.brown.edu`. + + diff --git a/docs/content/Deep-Storage.md b/docs/content/Deep-Storage.md index b4c28098a6f..165b7906a4d 100644 --- a/docs/content/Deep-Storage.md +++ b/docs/content/Deep-Storage.md @@ -47,3 +47,11 @@ druid.storage.storageDirectory= Note that you should generally set `druid.storage.storageDirectory` to something different from `druid.segmentCache.locations` and `druid.segmentCache.infoDir`. If you are using the Hadoop indexer in local mode, then just give it a local file as your output directory and it will work. + + +## Cassandra + +[Apache Cassandra](http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra) can also be leveraged for deep storage. This requires some additional druid configuration as well as setting up the necessary schema within a Cassandra keystore. + +For more information on using Cassandra as deep storage, see [Cassandra Deep Storage](Cassandra-Deep-Storage.html). + diff --git a/examples/cassandra/README.md b/examples/cassandra/README.md index 7a8f5f99195..fcc852f610c 100644 --- a/examples/cassandra/README.md +++ b/examples/cassandra/README.md @@ -1,32 +1,5 @@ -## Introduction -Druid can use Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables: -`index_storage` and `descriptor_storage`. Underneath the hood, the Cassandra integration leverages Astyanax. The -index storage table is a [Chunked Object](https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store) repository. It contains -compressed segments for distribution to historical nodes. Since segments can be large, the Chunked Object storage allows the integration to multi-thread -the write to Cassandra, and spreads the data across all the nodes in a cluster. The descriptor storage table is a normal C* table that -stores the segment metadatak. - -## Schema -Below are the create statements for each: - - - - CREATE TABLE index_storage ( key text, chunk text, value blob, PRIMARY KEY (key, chunk)) WITH COMPACT STORAGE; - - CREATE TABLE descriptor_storage ( key varchar, lastModified timestamp, descriptor varchar, PRIMARY KEY (key) ) WITH COMPACT STORAGE; - - -## Getting Started -First create the schema above. (I use a new keyspace called `druid`) - -Then, add the following properties to your properties file to enable a Cassandra -backend. - - druid.storage.cassandra=true - druid.storage.cassandra.host=localhost:9160 - druid.storage.cassandra.keyspace=druid - -Use the `druid-development@googlegroups.com` mailing list if you have questions, -or feel free to reach out directly: `bone@alumni.brown.edu`. +## Example Prerequisite +The code in this example assumes Cassandra has been configured as deep storage for Druid. +For details on how to accomplish this, see [Cassandra Deep Storage](../../docs/content/Cassandra-Deep-Storage.md).