2018-12-13 14:47:20 -05:00
---
layout: doc_page
title: "Cassandra Deep Storage"
---
2018-11-13 12:38:37 -05:00
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2018-12-12 23:42:12 -05:00
# Cassandra Deep Storage
2015-05-05 17:07:32 -04:00
## Introduction
2018-12-12 23:42:12 -05:00
2015-05-05 17:07:32 -04:00
Druid can use Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables:
2018-12-12 23:42:12 -05:00
`index_storage` and `descriptor_storage` . Underneath the hood, the Cassandra integration leverages Astyanax. The
2015-05-05 17:07:32 -04:00
index storage table is a [Chunked Object ](https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store ) repository. It contains
2019-02-28 21:10:39 -05:00
compressed segments for distribution to Historical processes. Since segments can be large, the Chunked Object storage allows the integration to multi-thread
the write to Cassandra, and spreads the data across all the processes in a cluster. The descriptor storage table is a normal C* table that
2018-12-12 23:42:12 -05:00
stores the segment metadatak.
2015-05-05 17:07:32 -04:00
## Schema
Below are the create statements for each:
```sql
CREATE TABLE index_storage(key text,
chunk text,
value blob,
PRIMARY KEY (key, chunk)) WITH COMPACT STORAGE;
CREATE TABLE descriptor_storage(key varchar,
lastModified timestamp,
descriptor varchar,
PRIMARY KEY (key)) WITH COMPACT STORAGE;
```
## Getting Started
First create the schema above. I use a new keyspace called `druid` for this purpose, which can be created using the
[Cassandra CQL `CREATE KEYSPACE` ](http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/create_keyspace_r.html ) command.
2019-01-30 22:41:07 -05:00
Then, add the following to your Historical and realtime runtime properties files to enable a Cassandra backend.
2015-05-05 17:07:32 -04:00
```properties
2016-01-20 21:46:26 -05:00
druid.extensions.loadList=["druid-cassandra-storage"]
2015-05-05 17:07:32 -04:00
druid.storage.type=c*
druid.storage.host=localhost:9160
druid.storage.keyspace=druid
```