Once you have a realtime node working, it is time to load your own data to see how Druid performs.
Druid can ingest data in three ways: via Kafka and a realtime node, via the indexing service, and via the Hadoop batch loader. Data is ingested in realtime using a [[Firehose]].
## Create Config Directories ##
Each type of node needs its own config file and directory, so create them as subdirectories under the druid directory.
```bash
mkdir config
mkdir config/realtime
mkdir config/master
mkdir config/compute
mkdir config/broker
```
## Loading Data with Kafka ##
[KafkaFirehoseFactory](https://github.com/metamx/druid/blob/master/realtime/src/main/java/com/metamx/druid/realtime/firehose/KafkaFirehoseFactory.java) is how druid communicates with Kafka. Using this [[Firehose]] with the right configuration, we can import data into Druid in realtime without writing any code. To load data to a realtime node via Kafka, we'll first need to initialize Zookeeper and Kafka, and then configure and initialize a [[Realtime]] node.
### Booting Kafka ###
Instructions for booting a Zookeeper and then Kafka cluster are available [here](http://kafka.apache.org/07/quickstart.html).
1. Download Apache Kafka 0.7.2 from [http://kafka.apache.org/downloads.html](http://kafka.apache.org/downloads.html)
The setup for a single node, 'standalone' Hadoop cluster is available at [http://hadoop.apache.org/docs/stable/single_node_setup.html](http://hadoop.apache.org/docs/stable/single_node_setup.html).
### Setup MySQL ###
1. If you don't already have it, download MySQL Community Server here: [http://dev.mysql.com/downloads/mysql/](http://dev.mysql.com/downloads/mysql/)
2. Install MySQL
3. Create a druid user and database
```bash
mysql -u root
```
```sql
GRANT ALL ON druid.* TO 'druid'@'localhost' IDENTIFIED BY 'diurd';
CREATE database druid;
```
The [[Master]] node will create the tables it needs based on its configuration.
### Make sure you have ZooKeeper Running ###
Make sure that you have a zookeeper instance running. If you followed the instructions for Kafka, it is probably running. If you are unsure if you have zookeeper running, try running
```bash
ps auxww | grep zoo | grep -v grep
```
If you get any result back, then zookeeper is most likely running. If you haven't setup Kafka or do not have zookeeper running, then you can download it and start it up with
If you've already setup a realtime node, be aware that although you can run multiple node types on one physical computer, you must assign them unique ports. Having used 8080 for the [[Realtime]] node, we use 8081 for the [[Master]].
1. Setup a configuration file called config/master/runtime.properties similar to:
```bash
druid.host=0.0.0.0:8081
druid.port=8081
com.metamx.emitter.logging=true
druid.processing.formatString=processing_%s
druid.processing.numThreads=1
druid.processing.buffer.sizeBytes=10000000
#emitting, opaque marker
druid.service=example
druid.master.startDelay=PT60s
druid.request.logging.dir=/tmp/example/log
druid.realtime.specFile=realtime.spec
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=debug
# below are dummy values when operating a realtime only node
Now its time to run the Hadoop [[Batch-ingestion]] job, HadoopDruidIndexer, which will fill a historical [[Compute]] node with data. First we'll need to configure the job.
1. Create a config called batchConfig.json similar to: