druid/docs/content/Tutorial:-Loading-Your-Data...

---
layout: doc_page
---
Once you have a real-time node working, it is time to load your own data to see how Druid performs.

Druid can ingest data in three ways: via Kafka and a realtime node, via the indexing service, and via the Hadoop batch loader. Data is ingested in real-time using a [Firehose](Firehose.html).

## Create Config Directories ##
Each type of node needs its own config file and directory, so create them as subdirectories under the druid directory if they not already exist.

```bash
mkdir config
mkdir config/realtime
mkdir config/coordinator
mkdir config/historical
mkdir config/broker
```

## Loading Data with Kafka ##

[KafkaFirehoseFactory](https://github.com/metamx/druid/blob/druid-0.6.0/realtime/src/main/java/com/metamx/druid/realtime/firehose/KafkaFirehoseFactory.java) is how druid communicates with Kafka. Using this [Firehose](Firehose.html) with the right configuration, we can import data into Druid in realtime without writing any code. To load data to a realtime node via Kafka, we'll first need to initialize Zookeeper and Kafka, and then configure and initialize a [Realtime](Realtime.html) node.

### Booting Kafka ###

Instructions for booting a Zookeeper and then Kafka cluster are available [here](http://kafka.apache.org/07/quickstart.html).

1. Download Apache Kafka 0.7.2 from [http://kafka.apache.org/downloads.html](http://kafka.apache.org/downloads.html)

  ```bash
  wget http://apache.spinellicreations.com/incubator/kafka/kafka-0.7.2-incubating/kafka-0.7.2-incubating-src.tgz
  tar -xvzf kafka-0.7.2-incubating-src.tgz
  cd kafka-0.7.2-incubating-src
  ```

2. Build Kafka

  ```bash
  ./sbt update
  ./sbt package
  ```

3. Boot Kafka

  ```bash
  cat config/zookeeper.properties
  bin/zookeeper-server-start.sh config/zookeeper.properties
  # in a new console
  bin/kafka-server-start.sh config/server.properties
  ```

4. Launch the console producer (so you can type in JSON kafka messages in a bit)

  ```bash
  bin/kafka-console-producer.sh --zookeeper localhost:2181 --topic druidtest
  ```

### Launching a Realtime Node

1. Create a valid configuration file similar to this called config/realtime/runtime.properties:

  ```properties
  druid.host=localhost
  druid.service=example
  druid.port=8080

  druid.zk.service.host=localhost

  druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
  druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b

  druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
  druid.db.connector.user=druid
  druid.db.connector.password=diurd

  druid.realtime.specFile=config/realtime/realtime.spec

  druid.processing.buffer.sizeBytes=10000000

  druid.processing.numThreads=3
  ```

2. Create a valid realtime configuration file similar to this called realtime.spec:

  ```json
  [
    {
      "schema": {
        "dataSource": "druidtest",
        "aggregators": [
          {
            "type": "count",
            "name": "impressions"
          },
          {
            "type": "doubleSum",
            "name": "wp",
            "fieldName": "wp"
          }
        ],
        "indexGranularity": "minute",
        "shardSpec": {
          "type": "none"
        }
      },
      "config": {
        "maxRowsInMemory": 500000,
        "intermediatePersistPeriod": "PT10m"
      },
      "firehose": {
        "type": "kafka-0.7.2",
        "consumerProps": {
          "zk.connect": "localhost:2181",
          "zk.connectiontimeout.ms": "15000",
          "zk.sessiontimeout.ms": "15000",
          "zk.synctime.ms": "5000",
          "groupid": "topic-pixel-local",
          "fetch.size": "1048586",
          "autooffset.reset": "largest",
          "autocommit.enable": "false"
        },
        "feed": "druidtest",
        "parser": {
          "timestampSpec": {
            "column": "utcdt",
            "format": "iso"
          },
          "data": {
            "format": "json"
          },
          "dimensionExclusions": [
            "wp"
          ]
        }
      },
      "plumber": {
        "type": "realtime",
        "windowPeriod": "PT10m",
        "segmentGranularity": "hour",
        "basePersistDirectory": "\/tmp\/realtime\/basePersist",
        "rejectionPolicy": {
          "type": "messageTime"
        }
      }
    }
  ]
  ```

3. Launch the realtime node

  ```bash
  java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
  -Ddruid.realtime.specFile=config/realtime/realtime.spec \
  -classpath lib/*:config/realtime io.druid.cli.Main server realtime
  ```

4. Paste data into the Kafka console producer

  ```json
  {"utcdt": "2010-01-01T01:01:01", "wp": 1000, "gender": "male", "age": 100}
  {"utcdt": "2010-01-01T01:01:02", "wp": 2000, "gender": "female", "age": 50}
  {"utcdt": "2010-01-01T01:01:03", "wp": 3000, "gender": "male", "age": 20}
  {"utcdt": "2010-01-01T01:01:04", "wp": 4000, "gender": "female", "age": 30}
  {"utcdt": "2010-01-01T01:01:05", "wp": 5000, "gender": "male", "age": 40}
  ```

5. Watch the events as they are ingested by Druid's realtime node

  ```bash
  ...
  2013-06-17 21:41:55,569 INFO [Global--0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2013-06-17T21:41:55.569Z","service":"example","host":"127.0.0.1","metric":"events/processed","value":5,"user2":"druidtest"}]
  ...
  ```

6. In a new console, edit a file called query.body:

  ```json
  {
      "queryType": "groupBy",
      "dataSource": "druidtest",
      "granularity": "all",
      "dimensions": [],
      "aggregations": [
          { "type": "count", "name": "rows" },
          {"type": "longSum", "name": "imps", "fieldName": "impressions"},
          {"type": "doubleSum", "name": "wp", "fieldName": "wp"}
      ],
      "intervals": ["2010-01-01T00:00/2020-01-01T00"]
  }
  ```

7. Submit the query via curl

  ```bash
  curl -X POST "http://localhost:8080/druid/v2/?pretty" \
  -H 'content-type: application/json' -d @query.body
  ```

8. View Result!

  ```json
  [ {
    "timestamp" : "2010-01-01T01:01:00.000Z",
    "result" : {
      "imps" : 20,
      "wp" : 60000.0,
      "rows" : 5
    }
  } ]
  ```

Now you're ready for [Querying Your Data](Querying-Your-Data.html)!

## Loading Data with the HadoopDruidIndexer ##

Historical data can be loaded via a Hadoop job. 

The setup for a single node, 'standalone' Hadoop cluster is available at [http://hadoop.apache.org/docs/stable/single_node_setup.html](http://hadoop.apache.org/docs/stable/single_node_setup.html).

### Setup MySQL ###
1. If you don't already have it, download MySQL Community Server here: [http://dev.mysql.com/downloads/mysql/](http://dev.mysql.com/downloads/mysql/)
2. Install MySQL
3. Create a druid user and database

```bash
mysql -u root
```

```sql
GRANT ALL ON druid.* TO 'druid'@'localhost' IDENTIFIED BY 'diurd';
CREATE database druid;
```

The [Coordinator](Coordinator.html) node will create the tables it needs based on its configuration.

### Make sure you have ZooKeeper Running ###

Make sure that you have a zookeeper instance running.  If you followed the instructions for Kafka, it is probably running.  If you are unsure if you have zookeeper running, try running

```bash
ps auxww | grep zoo | grep -v grep
```

If you get any result back, then zookeeper is most likely running.  If you haven't setup Kafka or do not have zookeeper running, then you can download it and start it up with

```bash
curl http://www.motorlogy.com/apache/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz -o zookeeper-3.4.5.tar.gz
tar xzf zookeeper-3.4.5.tar.gz
cd zookeeper-3.4.5
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
cd ..
```

### Launch a Coordinator Node ###

If you've already setup a realtime node, be aware that although you can run multiple node types on one physical computer, you must assign them unique ports. Having used 8080 for the [Realtime](Realtime.html) node, we use 8081 for the [Coordinator](Coordinator.html).

1. Setup a configuration file called config/coordinator/runtime.properties similar to:

  ```properties
  druid.host=localhost
  druid.service=coordinator
  druid.port=8081

  druid.zk.service.host=localhost

  druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
  druid.s3.secretKey=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

  druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
  druid.db.connector.user=druid
  druid.db.connector.password=diurd

  druid.coordinator.startDelay=PT60s
  ```

2. Launch the [Coordinator](Coordinator.html) node

  ```bash
  java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
  -classpath lib/*:config/coordinator \
  io.druid.Cli.Main server coordinator
  ```

### Launch a Historical Node ###

1. Create a configuration file in config/historical/runtime.properties similar to:

  ```properties
  druid.host=localhost
  druid.service=historical
  druid.port=8082

  druid.zk.service.host=localhost

  druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
  druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ

  druid.server.maxSize=100000000

  druid.processing.buffer.sizeBytes=10000000

  druid.segmentCache.infoPath=/tmp/druid/segmentInfoCache
  druid.segmentCache.locations=[{"path": "/tmp/druid/indexCache", "maxSize"\: 100000000}]
  ```

2. Launch the historical node:

  ```bash
  java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
  -classpath lib/*:config/historical \
  io.druid.cli.Main server historical
  ```

### Create a File of Records ###

We can use the same records we have been, in a file called records.json:

```json
{"utcdt": "2010-01-01T01:01:01", "wp": 1000, "gender": "male", "age": 100}
{"utcdt": "2010-01-01T01:01:02", "wp": 2000, "gender": "female", "age": 50}
{"utcdt": "2010-01-01T01:01:03", "wp": 3000, "gender": "male", "age": 20}
{"utcdt": "2010-01-01T01:01:04", "wp": 4000, "gender": "female", "age": 30}
{"utcdt": "2010-01-01T01:01:05", "wp": 5000, "gender": "male", "age": 40}
```

### Run the Hadoop Job ###

Now its time to run the Hadoop [Batch-ingestion](Batch-ingestion.html) job, HadoopDruidIndexer, which will fill a historical [Historical](Historical.html) node with data. First we'll need to configure the job.

1. Create a config called batchConfig.json similar to:

  ```json
  {
    "dataSource": "druidtest",
    "timestampColumn": "utcdt",
    "timestampFormat": "iso",
    "dataSpec": {
      "format": "json",
      "dimensions": [
        "gender",
        "age"
      ]
    },
    "granularitySpec": {
      "type": "uniform",
      "intervals": [
        "2010-01-01T01\/PT1H"
      ],
      "gran": "hour"
    },
    "pathSpec": {
      "type": "static",
      "paths": "\/druid\/records.json"
    },
    "rollupSpec": {
      "aggs": [
        {
          "type": "count",
          "name": "impressions"
        },
        {
          "type": "doubleSum",
          "name": "wp",
          "fieldName": "wp"
        }
      ],
      "rollupGranularity": "minute"
    },
    "workingPath": "\/tmp\/working_path",
    "segmentOutputPath": "\/tmp\/segments",
    "partitionsSpec": {
      "targetPartitionSize": 5000000
    },
    "updaterJobSpec": {
      "type": "db",
      "connectURI": "jdbc:mysql:\/\/localhost:3306\/druid",
      "user": "druid",
      "password": "diurd",
      "segmentTable": "druid_segments"
    }
  }
  ```

2. Now run the job, with the config pointing at batchConfig.json:

  ```bash
  java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
       -classpath `echo lib/* | tr ' ' ':'` \
       io.druid.cli.Main index hadoop batchConfig.json
  ```

You can now move on to [Querying Your Data](Querying-Your-Data.html)!
Added prepend tag to make pages display. 2013-09-16 17:49:36 -04:00			`---`
Docs working 2013-09-26 19:22:28 -04:00			`layout: doc_page`
Added prepend tag to make pages display. 2013-09-16 17:49:36 -04:00			`---`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`Once you have a real-time node working, it is time to load your own data to see how Druid performs.`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`Druid can ingest data in three ways: via Kafka and a realtime node, via the indexing service, and via the Hadoop batch loader. Data is ingested in real-time using a [Firehose](Firehose.html).`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`## Create Config Directories ##`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`Each type of node needs its own config file and directory, so create them as subdirectories under the druid directory if they not already exist.`
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```bash
			`mkdir config`
			`mkdir config/realtime`
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`mkdir config/coordinator`
			`mkdir config/historical`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`mkdir config/broker`
			```

			`## Loading Data with Kafka ##`

fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`[KafkaFirehoseFactory](https://github.com/metamx/druid/blob/druid-0.6.0/realtime/src/main/java/com/metamx/druid/realtime/firehose/KafkaFirehoseFactory.java) is how druid communicates with Kafka. Using this [Firehose](Firehose.html) with the right configuration, we can import data into Druid in realtime without writing any code. To load data to a realtime node via Kafka, we'll first need to initialize Zookeeper and Kafka, and then configure and initialize a [Realtime](Realtime.html) node.`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`### Booting Kafka ###`

			`Instructions for booting a Zookeeper and then Kafka cluster are available [here](http://kafka.apache.org/07/quickstart.html).`

			`1. Download Apache Kafka 0.7.2 from [http://kafka.apache.org/downloads.html](http://kafka.apache.org/downloads.html)`
fix layout 2013-09-26 20:38:11 -04:00
			```bash
			`wget http://apache.spinellicreations.com/incubator/kafka/kafka-0.7.2-incubating/kafka-0.7.2-incubating-src.tgz`
			`tar -xvzf kafka-0.7.2-incubating-src.tgz`
			`cd kafka-0.7.2-incubating-src`
			```

Add docs from github wiki 2013-09-13 18:20:39 -04:00			`2. Build Kafka`
fix layout 2013-09-26 20:38:11 -04:00
			```bash
			`./sbt update`
			`./sbt package`
			```

Add docs from github wiki 2013-09-13 18:20:39 -04:00			`3. Boot Kafka`
fix layout 2013-09-26 20:38:11 -04:00
			```bash
			`cat config/zookeeper.properties`
			`bin/zookeeper-server-start.sh config/zookeeper.properties`
			`# in a new console`
			`bin/kafka-server-start.sh config/server.properties`
			```

Add docs from github wiki 2013-09-13 18:20:39 -04:00			`4. Launch the console producer (so you can type in JSON kafka messages in a bit)`
fix layout 2013-09-26 20:38:11 -04:00
			```bash
			`bin/kafka-console-producer.sh --zookeeper localhost:2181 --topic druidtest`
			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
Docs working 2013-09-26 19:22:28 -04:00			`### Launching a Realtime Node`
fix layout 2013-09-26 20:38:11 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`1. Create a valid configuration file similar to this called config/realtime/runtime.properties:`
fix layout 2013-09-26 20:38:11 -04:00
pretty ! 2013-09-26 20:55:10 -04:00			```properties
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.host=localhost`
			`druid.service=example`
fix layout 2013-09-26 20:38:11 -04:00			`druid.port=8080`

fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.zk.service.host=localhost`
fix layout 2013-09-26 20:38:11 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ`
			`druid.s3.secretKey=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`
fix layout 2013-09-26 20:38:11 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid`
			`druid.db.connector.user=druid`
			`druid.db.connector.password=diurd`
fix layout 2013-09-26 20:38:11 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.realtime.specFile=config/realtime/realtime.spec`
fix layout 2013-09-26 20:38:11 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.processing.buffer.sizeBytes=10000000`
fix layout 2013-09-26 20:38:11 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.processing.numThreads=3`
fix layout 2013-09-26 20:38:11 -04:00			```
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`2. Create a valid realtime configuration file similar to this called realtime.spec:`
fix layout 2013-09-26 20:38:11 -04:00
			```json
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`[`
			`{`
			`"schema": {`
			`"dataSource": "druidtest",`
			`"aggregators": [`
			`{`
			`"type": "count",`
			`"name": "impressions"`
			`},`
			`{`
			`"type": "doubleSum",`
			`"name": "wp",`
			`"fieldName": "wp"`
			`}`
			`],`
			`"indexGranularity": "minute",`
			`"shardSpec": {`
			`"type": "none"`
			`}`
			`},`
			`"config": {`
			`"maxRowsInMemory": 500000,`
			`"intermediatePersistPeriod": "PT10m"`
			`},`
			`"firehose": {`
			`"type": "kafka-0.7.2",`
			`"consumerProps": {`
			`"zk.connect": "localhost:2181",`
			`"zk.connectiontimeout.ms": "15000",`
			`"zk.sessiontimeout.ms": "15000",`
			`"zk.synctime.ms": "5000",`
			`"groupid": "topic-pixel-local",`
			`"fetch.size": "1048586",`
			`"autooffset.reset": "largest",`
			`"autocommit.enable": "false"`
			`},`
			`"feed": "druidtest",`
			`"parser": {`
			`"timestampSpec": {`
			`"column": "utcdt",`
			`"format": "iso"`
			`},`
			`"data": {`
			`"format": "json"`
			`},`
			`"dimensionExclusions": [`
			`"wp"`
			`]`
			`}`
			`},`
			`"plumber": {`
			`"type": "realtime",`
			`"windowPeriod": "PT10m",`
			`"segmentGranularity": "hour",`
			`"basePersistDirectory": "\/tmp\/realtime\/basePersist",`
			`"rejectionPolicy": {`
			`"type": "messageTime"`
			`}`
			`}`
			`}`
			`]`
fix layout 2013-09-26 20:38:11 -04:00			```
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`3. Launch the realtime node`
fix layout 2013-09-26 20:38:11 -04:00
			```bash
			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \`
			`-Ddruid.realtime.specFile=config/realtime/realtime.spec \`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`-classpath lib/*:config/realtime io.druid.cli.Main server realtime`
fix layout 2013-09-26 20:38:11 -04:00			```
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`4. Paste data into the Kafka console producer`
fix layout 2013-09-26 20:38:11 -04:00
			```json
			`{"utcdt": "2010-01-01T01:01:01", "wp": 1000, "gender": "male", "age": 100}`
			`{"utcdt": "2010-01-01T01:01:02", "wp": 2000, "gender": "female", "age": 50}`
			`{"utcdt": "2010-01-01T01:01:03", "wp": 3000, "gender": "male", "age": 20}`
			`{"utcdt": "2010-01-01T01:01:04", "wp": 4000, "gender": "female", "age": 30}`
			`{"utcdt": "2010-01-01T01:01:05", "wp": 5000, "gender": "male", "age": 40}`
			```
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`5. Watch the events as they are ingested by Druid's realtime node`
fix layout 2013-09-26 20:38:11 -04:00
			```bash
			`...`
			`2013-06-17 21:41:55,569 INFO [Global--0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2013-06-17T21:41:55.569Z","service":"example","host":"127.0.0.1","metric":"events/processed","value":5,"user2":"druidtest"}]`
			`...`
			```
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`6. In a new console, edit a file called query.body:`
fix layout 2013-09-26 20:38:11 -04:00
			```json
			`{`
			`"queryType": "groupBy",`
			`"dataSource": "druidtest",`
			`"granularity": "all",`
			`"dimensions": [],`
			`"aggregations": [`
			`{ "type": "count", "name": "rows" },`
			`{"type": "longSum", "name": "imps", "fieldName": "impressions"},`
			`{"type": "doubleSum", "name": "wp", "fieldName": "wp"}`
			`],`
			`"intervals": ["2010-01-01T00:00/2020-01-01T00"]`
			`}`
			```
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`7. Submit the query via curl`
fix layout 2013-09-26 20:38:11 -04:00
			```bash
			`curl -X POST "http://localhost:8080/druid/v2/?pretty" \`
			`-H 'content-type: application/json' -d @query.body`
			```
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`8. View Result!`
fix layout 2013-09-26 20:38:11 -04:00
			```json
			`[ {`
			`"timestamp" : "2010-01-01T01:01:00.000Z",`
			`"result" : {`
			`"imps" : 20,`
			`"wp" : 60000.0,`
			`"rows" : 5`
			`}`
			`} ]`
			```
Docs working 2013-09-26 19:22:28 -04:00
Replaced spaces with dashes 2013-09-16 19:19:49 -04:00			`Now you're ready for [Querying Your Data](Querying-Your-Data.html)!`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`## Loading Data with the HadoopDruidIndexer ##`

			`Historical data can be loaded via a Hadoop job.`

			`The setup for a single node, 'standalone' Hadoop cluster is available at [http://hadoop.apache.org/docs/stable/single_node_setup.html](http://hadoop.apache.org/docs/stable/single_node_setup.html).`

			`### Setup MySQL ###`
			`1. If you don't already have it, download MySQL Community Server here: [http://dev.mysql.com/downloads/mysql/](http://dev.mysql.com/downloads/mysql/)`
			`2. Install MySQL`
			`3. Create a druid user and database`
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```bash
			`mysql -u root`
			```
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```sql
			`GRANT ALL ON druid.* TO 'druid'@'localhost' IDENTIFIED BY 'diurd';`
			`CREATE database druid;`
			```
Docs working 2013-09-26 19:22:28 -04:00
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`The [Coordinator](Coordinator.html) node will create the tables it needs based on its configuration.`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`### Make sure you have ZooKeeper Running ###`

			`Make sure that you have a zookeeper instance running. If you followed the instructions for Kafka, it is probably running. If you are unsure if you have zookeeper running, try running`

			```bash
			`ps auxww \| grep zoo \| grep -v grep`
			```

			`If you get any result back, then zookeeper is most likely running. If you haven't setup Kafka or do not have zookeeper running, then you can download it and start it up with`

			```bash
			`curl http://www.motorlogy.com/apache/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz -o zookeeper-3.4.5.tar.gz`
			`tar xzf zookeeper-3.4.5.tar.gz`
			`cd zookeeper-3.4.5`
			`cp conf/zoo_sample.cfg conf/zoo.cfg`
			`./bin/zkServer.sh start`
			`cd ..`
			```

first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`### Launch a Coordinator Node ###`
fix layout 2013-09-26 20:38:11 -04:00
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`If you've already setup a realtime node, be aware that although you can run multiple node types on one physical computer, you must assign them unique ports. Having used 8080 for the [Realtime](Realtime.html) node, we use 8081 for the [Coordinator](Coordinator.html).`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`1. Setup a configuration file called config/coordinator/runtime.properties similar to:`
Docs working 2013-09-26 19:22:28 -04:00
pretty ! 2013-09-26 20:55:10 -04:00			```properties
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.host=localhost`
			`druid.service=coordinator`
fix layout 2013-09-26 20:38:11 -04:00			`druid.port=8081`

fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.zk.service.host=localhost`
fix layout 2013-09-26 20:38:11 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ`
			`druid.s3.secretKey=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`
fix layout 2013-09-26 20:38:11 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid`
			`druid.db.connector.user=druid`
			`druid.db.connector.password=diurd`
fix layout 2013-09-26 20:38:11 -04:00
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`druid.coordinator.startDelay=PT60s`
fix layout 2013-09-26 20:38:11 -04:00			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`2. Launch the [Coordinator](Coordinator.html) node`
Docs working 2013-09-26 19:22:28 -04:00
fix layout 2013-09-26 20:38:11 -04:00			```bash
			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \`
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`-classpath lib/*:config/coordinator \`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`io.druid.Cli.Main server coordinator`
fix layout 2013-09-26 20:38:11 -04:00			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`### Launch a Historical Node ###`
fix layout 2013-09-26 20:38:11 -04:00
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`1. Create a configuration file in config/historical/runtime.properties similar to:`
Docs working 2013-09-26 19:22:28 -04:00
pretty ! 2013-09-26 20:55:10 -04:00			```properties
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.host=localhost`
			`druid.service=historical`
fix layout 2013-09-26 20:38:11 -04:00			`druid.port=8082`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.zk.service.host=localhost`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b`
			`druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.server.maxSize=100000000`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.processing.buffer.sizeBytes=10000000`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.segmentCache.infoPath=/tmp/druid/segmentInfoCache`
			`druid.segmentCache.locations=[{"path": "/tmp/druid/indexCache", "maxSize"\: 100000000}]`
fix layout 2013-09-26 20:38:11 -04:00			```
Docs working 2013-09-26 19:22:28 -04:00
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`2. Launch the historical node:`
Docs working 2013-09-26 19:22:28 -04:00
fix layout 2013-09-26 20:38:11 -04:00			```bash
			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \`
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`-classpath lib/*:config/historical \`
			`io.druid.cli.Main server historical`
fix layout 2013-09-26 20:38:11 -04:00			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`### Create a File of Records ###`

			`We can use the same records we have been, in a file called records.json:`
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```json
			`{"utcdt": "2010-01-01T01:01:01", "wp": 1000, "gender": "male", "age": 100}`
			`{"utcdt": "2010-01-01T01:01:02", "wp": 2000, "gender": "female", "age": 50}`
			`{"utcdt": "2010-01-01T01:01:03", "wp": 3000, "gender": "male", "age": 20}`
			`{"utcdt": "2010-01-01T01:01:04", "wp": 4000, "gender": "female", "age": 30}`
			`{"utcdt": "2010-01-01T01:01:05", "wp": 5000, "gender": "male", "age": 40}`
			```

			`### Run the Hadoop Job ###`

first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`Now its time to run the Hadoop [Batch-ingestion](Batch-ingestion.html) job, HadoopDruidIndexer, which will fill a historical [Historical](Historical.html) node with data. First we'll need to configure the job.`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`1. Create a config called batchConfig.json similar to:`
Docs working 2013-09-26 19:22:28 -04:00
fix layout 2013-09-26 20:38:11 -04:00			```json
			`{`
			`"dataSource": "druidtest",`
			`"timestampColumn": "utcdt",`
			`"timestampFormat": "iso",`
			`"dataSpec": {`
			`"format": "json",`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`"dimensions": [`
			`"gender",`
			`"age"`
			`]`
fix layout 2013-09-26 20:38:11 -04:00			`},`
			`"granularitySpec": {`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`"type": "uniform",`
			`"intervals": [`
			`"2010-01-01T01\/PT1H"`
			`],`
			`"gran": "hour"`
			`},`
			`"pathSpec": {`
			`"type": "static",`
			`"paths": "\/druid\/records.json"`
			`},`
			`"rollupSpec": {`
			`"aggs": [`
			`{`
			`"type": "count",`
			`"name": "impressions"`
			`},`
			`{`
			`"type": "doubleSum",`
			`"name": "wp",`
			`"fieldName": "wp"`
			`}`
			`],`
			`"rollupGranularity": "minute"`
fix layout 2013-09-26 20:38:11 -04:00			`},`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`"workingPath": "\/tmp\/working_path",`
			`"segmentOutputPath": "\/tmp\/segments",`
fix layout 2013-09-26 20:38:11 -04:00			`"partitionsSpec": {`
			`"targetPartitionSize": 5000000`
			`},`
			`"updaterJobSpec": {`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`"type": "db",`
			`"connectURI": "jdbc:mysql:\/\/localhost:3306\/druid",`
			`"user": "druid",`
			`"password": "diurd",`
			`"segmentTable": "druid_segments"`
fix layout 2013-09-26 20:38:11 -04:00			`}`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`}`
fix layout 2013-09-26 20:38:11 -04:00			```
Docs working 2013-09-26 19:22:28 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`2. Now run the job, with the config pointing at batchConfig.json:`
Docs working 2013-09-26 19:22:28 -04:00
fix layout 2013-09-26 20:38:11 -04:00			```bash
pretty ! 2013-09-26 20:55:10 -04:00			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			-classpath `echo lib/* \| tr ' ' ':'` \
			`io.druid.cli.Main index hadoop batchConfig.json`
fix layout 2013-09-26 20:38:11 -04:00			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
Docs working 2013-09-26 19:22:28 -04:00			`You can now move on to [Querying Your Data](Querying-Your-Data.html)!`