druid/docs/content/Tutorial:-The-Druid-Cluster.md

---
layout: doc_page
---

# Tutorial: The Druid Cluster
Welcome back! In our first [tutorial](Tutorial%3A-A-First-Look-at-Druid.html), we introduced you to the most basic Druid setup: a single realtime node. We streamed in some data and queried it. Realtime nodes collect very recent data and periodically hand that data off to the rest of the Druid cluster. Some questions about the architecture must naturally come to mind. What does the rest of Druid cluster look like?

This tutorial will hopefully answer these questions!

In this tutorial, we will set up other types of Druid nodes and external dependencies for a fully functional Druid cluster. The architecture of Druid is very much like the [Megazord](http://www.youtube.com/watch?v=7mQuHh1X4H4) from the popular 90s show Mighty Morphin' Power Rangers. Each Druid node has a specific purpose and the nodes come together to form a fully functional system.

## Downloading Druid

If you followed the first tutorial, you should already have Druid downloaded. If not, let's go back and do that first.

You can download the latest version of druid [here](http://static.druid.io/artifacts/releases/druid-services-0.7.0-rc1-bin.tar.gz)

and untar the contents within by issuing:

```bash
tar -zxvf druid-services-*-bin.tar.gz
cd druid-services-*
```

You can also [Build From Source](Build-from-source.html).

## External Dependencies

Druid requires 3 external dependencies. A "deep" storage that acts as a backup data repository, a relational database such as MySQL to hold configuration and metadata information, and [Apache Zookeeper](http://zookeeper.apache.org/) for coordination among different pieces of the cluster.

For deep storage, we will use local disk in this tutorial.

#### Set up Metadata storage

1. If you don't already have it, download MySQL Community Server here: [http://dev.mysql.com/downloads/mysql/](http://dev.mysql.com/downloads/mysql/).
2. Install MySQL.
3. Create a druid user and database.

```bash
mysql -u root
```

```sql
GRANT ALL ON druid.* TO 'druid'@'localhost' IDENTIFIED BY 'diurd';
CREATE DATABASE druid DEFAULT CHARACTER SET utf8;
```

#### Set up Zookeeper

```bash
Download zookeeper from [http://www.apache.org/dyn/closer.cgi/zookeeper/](http://www.apache.org/dyn/closer.cgi/zookeeper/)
Install zookeeper.

e.g.
curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o zookeeper-3.4.6.tar.gz
tar xzf zookeeper-3.4.6.tar.gz
cd zookeeper-3.4.6
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
cd ..
```

## The Data

Similar to the first tutorial, the data we will be loading is based on edits that have occurred on Wikipedia. Every time someone edits a page in Wikipedia, metadata is generated about the editor and edited page. Druid collects each individual event and packages them together in a container known as a [segment](Segments.html). Segments contain data over some span of time. We've prebuilt a segment for this tutorial and will cover making your own segments in other [pages](Tutorial%3A-Loading-Your-Data-Part-1.html).The segment we are going to work with has the following format:

Dimensions (things to filter on):

```json
"page"
"language"
"user"
"unpatrolled"
"newPage"
"robot"
"anonymous"
"namespace"
"continent"
"country"
"region"
"city"
```

Metrics (things to aggregate over):

```json
"count"
"added"
"delta"
"deleted"
```

## The Cluster

Before we get started, let's make sure we have configs in the config directory for our various nodes. Issue the following from the Druid home directory:

```
ls config
```

If you are interested in learning more about Druid configuration files, check out this [link](Configuration.html). Many aspects of Druid are customizable. For the purposes of this tutorial, we are going to use default values for most things.

#### Common Configuration

There are a couple of cluster wide configuration options we have to define. The common/cluster configuration files should exist under:

```
config/_common
```

In the directory, there should be a `common.runtime.properties` file with the following contents:

```
# Extensions
druid.extensions.coordinates=["io.druid.extensions:druid-examples","io.druid.extensions:druid-kafka-seven","io.druid.extensions:mysql-metadata-storage"]

# Zookeeper
druid.zk.service.host=localhost

# Metadata Storage
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=diurd

# Deep storage
druid.storage.type=local
druid.storage.storage.storageDirectory=/tmp/druid/localStorage

# Cache (we use a simple 10mb heap-based local cache on the broker)
druid.cache.type=local
druid.cache.sizeInBytes=10000000

# Indexing service discovery
druid.selectors.indexing.serviceName=overlord

# Monitoring (disabled for examples)
# druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"]

# Metrics logging (disabled for examples)
druid.emitter=noop
```

In this file we define our external dependencies and cluster wide configs.

#### Start a Coordinator Node

Coordinator nodes are in charge of load assignment and distribution. Coordinator nodes monitor the status of the cluster and command historical nodes to assign and drop segments.
For more information about coordinator nodes, see [here](Coordinator.html).

The coordinator config file should already exist at:

```
config/coordinator
```

In the directory, there should be a `runtime.properties` file with the following contents:

```
druid.host=localhost
druid.port=8082
druid.service=coordinator

# The coordinator begins assignment operations after the start delay.
# We override the default here to start things up faster for examples.
druid.coordinator.startDelay=PT70s
```

To start the coordinator node:

```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/coordinator io.druid.cli.Main server coordinator
```

#### Start a Historical Node

Historical nodes are the workhorses of a cluster and are in charge of loading historical segments and making them available for queries. Realtime nodes hand off segments to historical nodes.
For more information about Historical nodes, see [here](Historical.html).

The historical config file should exist at:

```
config/historical
```

In the directory we just created, we should have the file `runtime.properties` with the following contents:

```
druid.host=localhost
druid.port=8081
druid.service=historical

# We can only 1 scan segment in parallel with these configs.
# Our intermediate buffer is also very small so longer topNs will be slow.
druid.processing.buffer.sizeBytes=100000000
druid.processing.numThreads=1

druid.segmentCache.locations=[{"path": "/tmp/druid/indexCache", "maxSize"\: 10000000000}]
druid.server.maxSize=10000000000
```

To start the historical node:

```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/historical io.druid.cli.Main server historical
```

#### Start a Broker Node

Broker nodes are responsible for figuring out which historical and/or realtime nodes correspond to which queries. They also merge partial results from these nodes in a scatter/gather fashion.
For more information about Broker nodes, see [here](Broker.html).

The broker config file should exist at:

```
config/broker
```

In the directory, there should be a `runtime.properties` file with the following contents:

```
druid.host=localhost
druid.port=8080
druid.service=broker

druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true

# Bump these up only for faster nested groupBy
druid.processing.buffer.sizeBytes=100000000
druid.processing.numThreads=1
```

To start the broker node:

```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/broker io.druid.cli.Main server broker
```

#### Start a Realtime Node

Our goal is to ingest some data and hand-off that data to the rest of our Druid cluster. To accomplish this goal, we need to make some small configuration changes.

In your favorite editor, open up:

```
examples/wikipedia/wikipedia_realtime.spec
```

We need to change some configuration in order to force hand-off faster.

Let's change:

```
"segmentGranularity": "HOUR",
```

to

```
"segmentGranularity": "FIVE_MINUTE",
```

and

```
"intermediatePersistPeriod": "PT10m",
"windowPeriod": "PT10m",
```

to

```
"intermediatePersistPeriod": "PT3m",
"windowPeriod": "PT1m",
```

Now we should be handing off segments every 6 minutes or so.

To start the realtime node that was used in our first tutorial, you simply have to issue:

```
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.realtime.specFile=examples/wikipedia/wikipedia_realtime.spec -classpath lib/*:config/realtime io.druid.cli.Main server realtime
```

The configurations are located in `config/realtime/runtime.properties` and should contain the following:

```
druid.host=localhost
druid.port=8083
druid.service=realtime

# We can only 1 scan segment in parallel with these configs.
# Our intermediate buffer is also very small so longer topNs will be slow.
druid.processing.buffer.sizeBytes=100000000
druid.processing.numThreads=1

# Enable Real monitoring
# druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor","io.druid.segment.realtime.RealtimeMetricsMonitor"]
```

Once the real-time node starts up, it should begin ingesting data and handing that data off to the rest of the Druid cluster. You can use a web UI located at coordinator_ip:port to view the status of data being loaded. Once data is handed off from the real-time nodes to historical nodes, the historical nodes should begin serving segments.

At any point during ingestion, we can query for data. The queries should span across both real-time and historical nodes. For more information on querying, see this [link](Querying.html).

Next Steps
----------
If you are interested in how data flows through the different Druid components, check out the [Druid data flow architecture](Design.html). Now that you have an understanding of what the Druid cluster looks like, why not load some of your own data?
Check out the next [tutorial](Tutorial%3A-Loading-Your-Data-Part-1.html) section for more info!
Added prepend tag to make pages display. 2013-09-16 17:49:36 -04:00			`---`
Docs working 2013-09-26 19:22:28 -04:00			`layout: doc_page`
Added prepend tag to make pages display. 2013-09-16 17:49:36 -04:00			`---`
added titles since there is no other indication other than URL as to which page has been selected from the left-side nav menu 2013-12-04 23:41:25 -05:00
			`# Tutorial: The Druid Cluster`
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`Welcome back! In our first [tutorial](Tutorial%3A-A-First-Look-at-Druid.html), we introduced you to the most basic Druid setup: a single realtime node. We streamed in some data and queried it. Realtime nodes collect very recent data and periodically hand that data off to the rest of the Druid cluster. Some questions about the architecture must naturally come to mind. What does the rest of Druid cluster look like?`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`This tutorial will hopefully answer these questions!`

minor corrections and edits for flow 2014-01-24 15:26:38 -05:00			`In this tutorial, we will set up other types of Druid nodes and external dependencies for a fully functional Druid cluster. The architecture of Druid is very much like the [Megazord](http://www.youtube.com/watch?v=7mQuHh1X4H4) from the popular 90s show Mighty Morphin' Power Rangers. Each Druid node has a specific purpose and the nodes come together to form a fully functional system.`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`## Downloading Druid`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`If you followed the first tutorial, you should already have Druid downloaded. If not, let's go back and do that first.`

Add more docs for production clusters 2015-01-21 17:10:13 -05:00			`You can download the latest version of druid [here](http://static.druid.io/artifacts/releases/druid-services-0.7.0-rc1-bin.tar.gz)`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`and untar the contents within by issuing:`
Make more of the docs look and work correctly. Yay! Almost done with this! 2013-09-27 13:57:08 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```bash
			`tar -zxvf druid-services-*-bin.tar.gz`
			`cd druid-services-*`
			```

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`You can also [Build From Source](Build-from-source.html).`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`## External Dependencies`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`Druid requires 3 external dependencies. A "deep" storage that acts as a backup data repository, a relational database such as MySQL to hold configuration and metadata information, and [Apache Zookeeper](http://zookeeper.apache.org/) for coordination among different pieces of the cluster.`

redocumenting ingestion 2014-12-08 19:15:46 -05:00			`For deep storage, we will use local disk in this tutorial.`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`#### Set up Metadata storage`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
minor corrections and edits for flow 2014-01-24 15:26:38 -05:00			`1. If you don't already have it, download MySQL Community Server here: [http://dev.mysql.com/downloads/mysql/](http://dev.mysql.com/downloads/mysql/).`
			`2. Install MySQL.`
			`3. Create a druid user and database.`
Make more of the docs look and work correctly. Yay! Almost done with this! 2013-09-27 13:57:08 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```bash
			`mysql -u root`
			```
Make more of the docs look and work correctly. Yay! Almost done with this! 2013-09-27 13:57:08 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```sql
			`GRANT ALL ON druid.* TO 'druid'@'localhost' IDENTIFIED BY 'diurd';`
update tutorial to create db utf8 by default 2015-01-16 16:44:02 -05:00			`CREATE DATABASE druid DEFAULT CHARACTER SET utf8;`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

redocumenting ingestion 2014-12-08 19:15:46 -05:00			`#### Set up Zookeeper`
Make more of the docs look and work correctly. Yay! Almost done with this! 2013-09-27 13:57:08 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```bash
some minor fixes in docs and change jetty defaults 2014-11-04 13:08:50 -05:00			`Download zookeeper from [http://www.apache.org/dyn/closer.cgi/zookeeper/](http://www.apache.org/dyn/closer.cgi/zookeeper/)`
			`Install zookeeper.`

			`e.g.`
			`curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o zookeeper-3.4.6.tar.gz`
			`tar xzf zookeeper-3.4.6.tar.gz`
			`cd zookeeper-3.4.6`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`cp conf/zoo_sample.cfg conf/zoo.cfg`
			`./bin/zkServer.sh start`
			`cd ..`
			```

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`## The Data`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			Similar to the first tutorial, the data we will be loading is based on edits that have occurred on Wikipedia. Every time someone edits a page in Wikipedia, metadata is generated about the editor and edited page. Druid collects each individual event and packages them together in a container known as a [segment](Segments.html). Segments contain data over some span of time. We've prebuilt a segment for this tutorial and will cover making your own segments in other [pages](Tutorial%3A-Loading-Your-Data-Part-1.html).The segment we are going to work with has the following format:
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`Dimensions (things to filter on):`
Make more of the docs look and work correctly. Yay! Almost done with this! 2013-09-27 13:57:08 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```json
			`"page"`
			`"language"`
			`"user"`
			`"unpatrolled"`
			`"newPage"`
			`"robot"`
			`"anonymous"`
			`"namespace"`
			`"continent"`
			`"country"`
			`"region"`
			`"city"`
			```

			`Metrics (things to aggregate over):`
Make more of the docs look and work correctly. Yay! Almost done with this! 2013-09-27 13:57:08 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```json
			`"count"`
			`"added"`
			`"delta"`
			`"deleted"`
			```

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`## The Cluster`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`Before we get started, let's make sure we have configs in the config directory for our various nodes. Issue the following from the Druid home directory:`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			```
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`ls config`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`If you are interested in learning more about Druid configuration files, check out this [link](Configuration.html). Many aspects of Druid are customizable. For the purposes of this tutorial, we are going to use default values for most things.`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`#### Common Configuration`

			`There are a couple of cluster wide configuration options we have to define. The common/cluster configuration files should exist under:`

			```
			`config/_common`
			```

			In the directory, there should be a `common.runtime.properties` file with the following contents:

			```
			`# Extensions`
			`druid.extensions.coordinates=["io.druid.extensions:druid-examples","io.druid.extensions:druid-kafka-seven","io.druid.extensions:mysql-metadata-storage"]`

			`# Zookeeper`
			`druid.zk.service.host=localhost`

			`# Metadata Storage`
			`druid.metadata.storage.type=mysql`
			`druid.metadata.storage.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid`
			`druid.metadata.storage.connector.user=druid`
			`druid.metadata.storage.connector.password=diurd`

			`# Deep storage`
			`druid.storage.type=local`
			`druid.storage.storage.storageDirectory=/tmp/druid/localStorage`

			`# Cache (we use a simple 10mb heap-based local cache on the broker)`
			`druid.cache.type=local`
			`druid.cache.sizeInBytes=10000000`

			`# Indexing service discovery`
			`druid.selectors.indexing.serviceName=overlord`

			`# Monitoring (disabled for examples)`
			`# druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"]`

			`# Metrics logging (disabled for examples)`
			`druid.emitter=noop`
			```

			`In this file we define our external dependencies and cluster wide configs.`

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`#### Start a Coordinator Node`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`Coordinator nodes are in charge of load assignment and distribution. Coordinator nodes monitor the status of the cluster and command historical nodes to assign and drop segments.`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`For more information about coordinator nodes, see [here](Coordinator.html).`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`The coordinator config file should already exist at:`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			```
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`config/coordinator`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			In the directory, there should be a `runtime.properties` file with the following contents:
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			```
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.host=localhost`
			`druid.port=8082`
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`druid.service=coordinator`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`# The coordinator begins assignment operations after the start delay.`
			`# We override the default here to start things up faster for examples.`
another doc fix 2014-06-19 16:33:16 -04:00			`druid.coordinator.startDelay=PT70s`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`To start the coordinator node:`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			```bash
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/coordinator io.druid.cli.Main server coordinator`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`#### Start a Historical Node`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`Historical nodes are the workhorses of a cluster and are in charge of loading historical segments and making them available for queries. Realtime nodes hand off segments to historical nodes.`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`For more information about Historical nodes, see [here](Historical.html).`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`The historical config file should exist at:`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			```
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`config/historical`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			In the directory we just created, we should have the file `runtime.properties` with the following contents:
Make more of the docs look and work correctly. Yay! Almost done with this! 2013-09-27 13:57:08 -04:00
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.host=localhost`
			`druid.port=8081`
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`druid.service=historical`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`# We can only 1 scan segment in parallel with these configs.`
			`# Our intermediate buffer is also very small so longer topNs will be slow.`
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00			`druid.processing.buffer.sizeBytes=100000000`
fix the index task and more docs 2014-01-10 17:47:18 -05:00			`druid.processing.numThreads=1`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
fix the index task and more docs 2014-01-10 17:47:18 -05:00			`druid.segmentCache.locations=[{"path": "/tmp/druid/indexCache", "maxSize"\: 10000000000}]`
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`druid.server.maxSize=10000000000`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`To start the historical node:`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			```bash
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/historical io.druid.cli.Main server historical`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`#### Start a Broker Node`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
first set of changes to standarize the naming convention we use in druid 2013-10-03 19:36:48 -04:00			`Broker nodes are responsible for figuring out which historical and/or realtime nodes correspond to which queries. They also merge partial results from these nodes in a scatter/gather fashion.`
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`For more information about Broker nodes, see [here](Broker.html).`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`The broker config file should exist at:`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			```
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`config/broker`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			In the directory, there should be a `runtime.properties` file with the following contents:
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			```
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`druid.host=localhost`
			`druid.port=8080`
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`druid.service=broker`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`druid.broker.cache.useCache=true`
			`druid.broker.cache.populateCache=true`

			`# Bump these up only for faster nested groupBy`
			`druid.processing.buffer.sizeBytes=100000000`
			`druid.processing.numThreads=1`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

			`To start the broker node:`

			```bash
fix docs for 0.6 part 1 of many 2013-10-07 17:47:04 -04:00			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/broker io.druid.cli.Main server broker`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

redocumenting ingestion 2014-12-08 19:15:46 -05:00			`#### Start a Realtime Node`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`Our goal is to ingest some data and hand-off that data to the rest of our Druid cluster. To accomplish this goal, we need to make some small configuration changes.`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`In your favorite editor, open up:`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			```
			`examples/wikipedia/wikipedia_realtime.spec`
Make more of the docs look and work correctly. Yay! Almost done with this! 2013-09-27 13:57:08 -04:00			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`We need to change some configuration in order to force hand-off faster.`

			`Let's change:`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			```
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`"segmentGranularity": "HOUR",`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

redocumenting ingestion 2014-12-08 19:15:46 -05:00			`to`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			```
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`"segmentGranularity": "FIVE_MINUTE",`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			```

redocumenting ingestion 2014-12-08 19:15:46 -05:00			`and`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			```
			`"intermediatePersistPeriod": "PT10m",`
			`"windowPeriod": "PT10m",`
			```

			`to`

			```
			`"intermediatePersistPeriod": "PT3m",`
			`"windowPeriod": "PT1m",`
			```

			`Now we should be handing off segments every 6 minutes or so.`
port docs over to 0.6 and a bunch of misc fixes 2013-10-11 21:38:53 -04:00
			`To start the realtime node that was used in our first tutorial, you simply have to issue:`

			```
			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.realtime.specFile=examples/wikipedia/wikipedia_realtime.spec -classpath lib/*:config/realtime io.druid.cli.Main server realtime`
			```

			The configurations are located in `config/realtime/runtime.properties` and should contain the following:

			```
			`druid.host=localhost`
			`druid.port=8083`
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`druid.service=realtime`
port docs over to 0.6 and a bunch of misc fixes 2013-10-11 21:38:53 -04:00
redocumenting ingestion 2014-12-08 19:15:46 -05:00			`# We can only 1 scan segment in parallel with these configs.`
			`# Our intermediate buffer is also very small so longer topNs will be slow.`
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00			`druid.processing.buffer.sizeBytes=100000000`
clean up examples, finish paper 2014-03-13 21:52:08 -04:00			`druid.processing.numThreads=1`

redocumenting ingestion 2014-12-08 19:15:46 -05:00			`# Enable Real monitoring`
			`# druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor","io.druid.segment.realtime.RealtimeMetricsMonitor"]`
port docs over to 0.6 and a bunch of misc fixes 2013-10-11 21:38:53 -04:00			```

redocumenting ingestion 2014-12-08 19:15:46 -05:00			`Once the real-time node starts up, it should begin ingesting data and handing that data off to the rest of the Druid cluster. You can use a web UI located at coordinator_ip:port to view the status of data being loaded. Once data is handed off from the real-time nodes to historical nodes, the historical nodes should begin serving segments.`

			`At any point during ingestion, we can query for data. The queries should span across both real-time and historical nodes. For more information on querying, see this [link](Querying.html).`

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`Next Steps`
			`----------`
remove server monitor from list of default monitors 2013-12-13 12:04:32 -05:00			`If you are interested in how data flows through the different Druid components, check out the [Druid data flow architecture](Design.html). Now that you have an understanding of what the Druid cluster looks like, why not load some of your own data?`
updated link to removed Data-Flow page 2013-11-04 15:45:06 -05:00			`Check out the next [tutorial](Tutorial%3A-Loading-Your-Data-Part-1.html) section for more info!`