2.6 KiB
layout | title |
---|---|
doc_page | Cassandra Deep Storage |
Cassandra Deep Storage
Introduction
Apache Druid (incubating) can use Apache Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables:
index_storage
and descriptor_storage
. Underneath the hood, the Cassandra integration leverages Astyanax. The
index storage table is a Chunked Object repository. It contains
compressed segments for distribution to Historical processes. Since segments can be large, the Chunked Object storage allows the integration to multi-thread
the write to Cassandra, and spreads the data across all the processes in a cluster. The descriptor storage table is a normal C* table that
stores the segment metadatak.
Schema
Below are the create statements for each:
CREATE TABLE index_storage(key text,
chunk text,
value blob,
PRIMARY KEY (key, chunk)) WITH COMPACT STORAGE;
CREATE TABLE descriptor_storage(key varchar,
lastModified timestamp,
descriptor varchar,
PRIMARY KEY (key)) WITH COMPACT STORAGE;
Getting Started
First create the schema above. I use a new keyspace called druid
for this purpose, which can be created using the
Cassandra CQL CREATE KEYSPACE
command.
Then, add the following to your Historical and realtime runtime properties files to enable a Cassandra backend.
druid.extensions.loadList=["druid-cassandra-storage"]
druid.storage.type=c*
druid.storage.host=localhost:9160
druid.storage.keyspace=druid