Updates Cassandra example documentation.

Updates the required properties to enable Cassandra as deep storage.
Also adds more information about how to create the required keystore,
and which properties files need to be modified to support Cassandra.
Also adds syntax highlighting to the provided example code.

Additionally, moves Cassandra Deep Storage documentation out into the
actual documentation hierarchy with appropriate linkage. A stub
README.md is left in the example directory to provide a useful landing
page for existing weblinks to the original documentation.
This commit is contained in:
Amy Troschinetz 2014-08-14 11:23:42 -05:00
parent 4e3f4fbc22
commit 71b8db631c
3 changed files with 55 additions and 30 deletions

View File

@ -0,0 +1,44 @@
---
layout: doc_page
---
## Introduction
Druid can use Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables:
`index_storage` and `descriptor_storage`. Underneath the hood, the Cassandra integration leverages Astyanax. The
index storage table is a [Chunked Object](https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store) repository. It contains
compressed segments for distribution to historical nodes. Since segments can be large, the Chunked Object storage allows the integration to multi-thread
the write to Cassandra, and spreads the data across all the nodes in a cluster. The descriptor storage table is a normal C* table that
stores the segment metadatak.
## Schema
Below are the create statements for each:
```sql
CREATE TABLE index_storage(key text,
chunk text,
value blob,
PRIMARY KEY (key, chunk)) WITH COMPACT STORAGE;
CREATE TABLE descriptor_storage(key varchar,
lastModified timestamp,
descriptor varchar,
PRIMARY KEY (key)) WITH COMPACT STORAGE;
```
## Getting Started
First create the schema above. I use a new keyspace called `druid` for this purpose, which can be created using the
[Cassandra CQL `CREATE KEYSPACE`](http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/create_keyspace_r.html) command.
Then, add the following to your historical and realtime runtime properties files to enable a Cassandra backend.
```properties
druid.extensions.coordinates=["io.druid.extensions:druid-cassandra-storage:<druid version>"]
druid.storage.type=c*
druid.storage.host=localhost:9160
druid.storage.keyspace=druid
```
Use the `druid-development@googlegroups.com` mailing list if you have questions,
or feel free to reach out directly: `bone@alumni.brown.edu`.

View File

@ -47,3 +47,11 @@ druid.storage.storageDirectory=<directory for storing segments>
Note that you should generally set `druid.storage.storageDirectory` to something different from `druid.segmentCache.locations` and `druid.segmentCache.infoDir`.
If you are using the Hadoop indexer in local mode, then just give it a local file as your output directory and it will work.
## Cassandra
[Apache Cassandra](http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra) can also be leveraged for deep storage. This requires some additional druid configuration as well as setting up the necessary schema within a Cassandra keystore.
For more information on using Cassandra as deep storage, see [Cassandra Deep Storage](Cassandra-Deep-Storage.html).

View File

@ -1,32 +1,5 @@
## Introduction
Druid can use Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables:
`index_storage` and `descriptor_storage`. Underneath the hood, the Cassandra integration leverages Astyanax. The
index storage table is a [Chunked Object](https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store) repository. It contains
compressed segments for distribution to historical nodes. Since segments can be large, the Chunked Object storage allows the integration to multi-thread
the write to Cassandra, and spreads the data across all the nodes in a cluster. The descriptor storage table is a normal C* table that
stores the segment metadatak.
## Schema
Below are the create statements for each:
CREATE TABLE index_storage ( key text, chunk text, value blob, PRIMARY KEY (key, chunk)) WITH COMPACT STORAGE;
CREATE TABLE descriptor_storage ( key varchar, lastModified timestamp, descriptor varchar, PRIMARY KEY (key) ) WITH COMPACT STORAGE;
## Getting Started
First create the schema above. (I use a new keyspace called `druid`)
Then, add the following properties to your properties file to enable a Cassandra
backend.
druid.storage.cassandra=true
druid.storage.cassandra.host=localhost:9160
druid.storage.cassandra.keyspace=druid
Use the `druid-development@googlegroups.com` mailing list if you have questions,
or feel free to reach out directly: `bone@alumni.brown.edu`.
## Example Prerequisite
The code in this example assumes Cassandra has been configured as deep storage for Druid.
For details on how to accomplish this, see [Cassandra Deep Storage](../../docs/content/Cassandra-Deep-Storage.md).