druid/docs/content/tutorials/cluster.md

---
layout: doc_page
---

# Clustering

Druid is designed to be deployed as a scalable, fault-tolerant cluster.

In this document, we'll set up a simple cluster and discuss how it can be further configured to meet
your needs. This simple cluster will feature scalable, fault-tolerant servers for Historicals and MiddleManagers, and a single
coordination server to host the Coordinator and Overlord processes. In production, we recommend deploying Coordinators and Overlords in a fault-tolerant
configuration as well.

## Select hardware

The Coordinator and Overlord processes can be co-located on a single server that is responsible for handling the metadata and coordination needs of your cluster.
The equivalent of an AWS [m3.xlarge](https://aws.amazon.com/ec2/instance-types/#M3) is sufficient for most clusters. This
hardware offers:

- 4 vCPUs
- 15 GB RAM
- 80 GB SSD storage

Historicals and MiddleManagers can be colocated on a single server to handle the actual data in your cluster. These servers benefit greatly from CPU, RAM,
and SSDs. The equivalent of an AWS [r3.2xlarge](https://aws.amazon.com/ec2/instance-types/#r3) is a
good starting point. This hardware offers:

- 8 vCPUs
- 61 GB RAM
- 160 GB SSD storage

Druid Brokers accept queries and farm them out to the rest of the cluster. They also optionally maintain an
in-memory query cache. These servers benefit greatly from CPU and RAM, and can also be deployed on
the equivalent of an AWS [r3.2xlarge](https://aws.amazon.com/ec2/instance-types/#r3). This hardware
offers:

- 8 vCPUs
- 61 GB RAM
- 160 GB SSD storage

You can consider co-locating any open source UIs or query libraries on the same server that the Broker is running on.

Very large clusters should consider selecting larger servers.

## Select OS

We recommend running your favorite Linux distribution. You will also need:

  * Java 7 or better

Your OS package manager should be able to help for both Java. If your Ubuntu-based OS
does not have a recent enough version of Java, WebUpd8 offers [packages for those
OSes](http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html).

## Download the distribution

First, download and unpack the release archive. It's best to do this on a single machine at first,
since you will be editing the configurations and then copying the modified distribution out to all
of your servers.

```bash
curl -O http://static.druid.io/artifacts/releases/druid-0.9.0-bin.tar.gz
tar -xzf druid-0.9.0-bin.tar.gz
cd druid-0.9.0
```

In this package, you'll find:


* `LICENSE` - the license files.
* `bin/` - scripts related to the [single-machine quickstart](quickstart.md).
* `conf/*` - template configurations for a clustered setup.
* `conf-quickstart/*` - configurations for the [single-machine quickstart](quickstart.md).
* `extensions/*` - all Druid extensions.
* `hadoop-dependencies/*` - Druid Hadoop dependencies.
* `lib/*` - all included software packages for core Druid.
* `quickstart/*` - files related to the [single-machine quickstart](quickstart.md).

We'll be editing the files in `conf/` in order to get things running.

## Configure deep storage

Druid relies on a distributed filesystem or large object (blob) store for data storage. The most
commonly used deep storage implementations are S3 (popular for those on AWS) and HDFS (popular if
you already have a Hadoop deployment).

### S3

In `conf/druid/_common/common.runtime.properties`,

- Set `druid.extensions.loadList=["druid-s3-extensions"]`.

- Comment out the configurations for local storage under "Deep Storage" and "Indexing service logs".

- Uncomment and configure appropriate values in the "For S3" sections of "Deep Storage" and
"Indexing service logs".

After this, you should have made the following changes:

```
druid.extensions.loadList=["druid-s3-extensions"]

#druid.storage.type=local
#druid.storage.storageDirectory=var/druid/segments

druid.storage.type=s3
druid.storage.bucket=your-bucket
druid.storage.baseKey=druid/segments
druid.s3.accessKey=...
druid.s3.secretKey=...

#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs

druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=your-bucket
druid.indexer.logs.s3Prefix=druid/indexing-logs
```

### HDFS

In `conf/druid/_common/common.runtime.properties`,

- Set `druid.extensions.loadList=["io.druid.extensions:druid-hdfs-storage"]`.

- Comment out the configurations for local storage under "Deep Storage" and "Indexing service logs".

- Uncomment and configure appropriate values in the "For HDFS" sections of "Deep Storage" and
"Indexing service logs".

After this, you should have made the following changes:

```
druid.extensions.loadList=["druid-hdfs-storage"]

#druid.storage.type=local
#druid.storage.storageDirectory=var/druid/segments

druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments

#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs

druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs
```

Also,

- Place your Hadoop configuration XMLs (core-site.xml, hdfs-site.xml, yarn-site.xml,
mapred-site.xml) on the classpath of your Druid nodes. You can do this by copying them into
`conf/druid/_common/`.

## Configure Tranquility Server (optional)

Data streams can be sent to Druid through a simple HTTP API powered by Tranquility
Server. If you will be using this functionality, then at this point you should [configure
Tranquility Server](../ingestion/stream-ingestion.html#server).

## Configure Tranquility Kafka (optional)

Druid can consuming streams from Kafka through Tranquility Kafka. If you will be
using this functionality, then at this point you should
[configure Tranquility Kafka](../ingestion/stream-ingestion.html#kafka).

## Configure for connecting to Hadoop (optional)

If you will be loading data from a Hadoop cluster, then at this point you should configure Druid to be aware
of your cluster:

- Update `druid.indexer.task.hadoopWorkingPath` in `conf/middleManager/runtime.properties` to
a path on HDFS that you'd like to use for temporary files required during the indexing process.
`druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing` is a common choice.

- Place your Hadoop configuration XMLs (core-site.xml, hdfs-site.xml, yarn-site.xml,
mapred-site.xml) on the classpath of your Druid nodes. You can do this by copying them into
`conf/druid/_common/core-site.xml`, `conf/druid/_common/hdfs-site.xml`, and so on.

Note that you don't need to use HDFS deep storage in order to load data from Hadoop. For example, if
your cluster is running on Amazon Web Services, we recommend using S3 for deep storage even if you
are loading data using Hadoop or Elastic MapReduce.

For more info, please see [batch ingestion](../ingestion/batch-ingestion.html).

## Configure addresses for Druid coordination

In this simple cluster, you will deploy a single Druid Coordinator, a
single Druid Overlord, a single ZooKeeper instance, and an embedded Derby metadata store on the same server.

In `conf/druid/_common/common.runtime.properties`, replace
"zk.host.ip" with the IP address of the machine that runs your ZK instance:

- `druid.zk.service.host`

In `conf/_common/common.runtime.properties`, replace
"metadata.store.ip" with the IP address of the machine that you will use as your metadata store:

- `druid.metadata.storage.connector.connectURI`
- `druid.metadata.storage.connector.host`

<div class="note caution">
In production, we recommend running 2 servers, each running a Druid Coordinator
and a Druid Overlord. We also recommend running a ZooKeeper cluster on its own dedicated hardware,
as well as  replicated [metadata
storage](http://druid.io/docs/latest/dependencies/metadata-storage.html) such as MySQL or
PostgreSQL, on its own dedicated hardware.
</div>

## Tune Druid processes that serve queries

Druid Historicals and MiddleManagers can be co-located on the same hardware. Both Druid processes benefit greatly from
being tuned to the hardware they run on. If you are running Tranquility Server or Kafka, you can also colocate Tranquility with these two Druid processes.
If you are using [r3.2xlarge](https://aws.amazon.com/ec2/instance-types/#r3)
EC2 instances, or similar hardware, the configuration in the distribution is a
reasonable starting point.

If you are using different hardware, we recommend adjusting configurations for your specific
hardware. The most commonly adjusted configurations are:

- `-Xmx` and `-Xms`
- `druid.server.http.numThreads`
- `druid.processing.buffer.sizeBytes`
- `druid.processing.numThreads`
- `druid.query.groupBy.maxIntermediateRows`
- `druid.query.groupBy.maxResults`
- `druid.server.maxSize` and `druid.segmentCache.locations` on Historical Nodes
- `druid.worker.capacity` on MiddleManagers

<div class="note info">
Keep -XX:MaxDirectMemory >= numThreads*sizeBytes, otherwise Druid will fail to start up..
</div>

Please see the Druid [configuration documentation](../configuration/index.html) for a full description of all
possible configuration options.

## Tune Druid Brokers

Druid Brokers also benefit greatly from being tuned to the hardware they
run on. If you are using [r3.2xlarge](https://aws.amazon.com/ec2/instance-types/#r3) EC2 instances,
or similar hardware, the configuration in the distribution is a reasonable starting point.

If you are using different hardware, we recommend adjusting configurations for your specific
hardware. The most commonly adjusted configurations are:

- `-Xmx` and `-Xms`
- `druid.server.http.numThreads`
- `druid.cache.sizeInBytes`
- `druid.processing.buffer.sizeBytes`
- `druid.processing.numThreads`
- `druid.query.groupBy.maxIntermediateRows`
- `druid.query.groupBy.maxResults`

<div class="note caution">
Keep -XX:MaxDirectMemory >= numThreads*sizeBytes, otherwise Druid will fail to start up.
</div>

Please see the Druid [configuration documentation](../configuration/index.html) for a full description of all
possible configuration options.

## Open ports (if using a firewall)

If you're using a firewall or some other system that only allows traffic on specific ports, allow
inbound connections on the following:

- 1527 (Derby on your Coordinator; not needed if you are using a separate metadata store like MySQL or PostgreSQL)
- 2181 (ZooKeeper; not needed if you are using a separate ZooKeeper cluster)
- 8081 (Coordinator)
- 8082 (Broker)
- 8083 (Historical)
- 8090 (Overlord)
- 8091, 8100&ndash;8199 (Druid Middle Manager)
- 8200 (Tranquility Server, if used)

<div class="note caution">
In production, we recommend deploying ZooKeeper and your metadata store on their own dedicated hardware,
rather than on the Coordinator server.
</div>

## Start Coordinator, Overlord, Zookeeper, and metadata store

Copy the Druid distribution and your edited configurations to your coordination
server. If you have been editing the configurations on your local machine, you can use *rsync* to
copy them:

```bash
rsync -az druid-0.9.0/ COORDINATION_SERVER:druid-0.9.0/
```

Log on to your coordination server and install Zookeeper:

```bash
curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o zookeeper-3.4.6.tar.gz
tar -xzf zookeeper-3.4.6.tar.gz
cd zookeeper-3.4.6
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
```

<div class="note caution">
In production, we also recommend running a ZooKeeper cluster on its own dedicated hardware.
</div>

On your coordination server, *cd* into the distribution and start up the coordination services (you should do this in different windows or pipe the log to a file):

```bash
java `cat conf/druid/coordinator/jvm.config | xargs` -cp conf/druid/_common:conf/druid/coordinator:lib/* io.druid.cli.Main server coordinator
java `cat conf/druid/overlord/jvm.config | xargs` -cp conf/druid/_common:conf/druid/overlord:lib/* io.druid.cli.Main server overlord
```

You should see a log message printed out for each service that starts up. You can view detailed logs
for any service by looking in the `var/log/druid` directory using another terminal.

## Start Historicals and MiddleManagers

Copy the Druid distribution and your edited configurations to your servers set aside for the Druid Historicals and MiddleManagers.

On each one, *cd* into the distribution and run this command to start a Data server:

```bash
java `cat conf/druid/historical/jvm.config | xargs` -cp conf/druid/_common:conf/druid/historical:lib/* io.druid.cli.Main server historical
java `cat conf/druid/middleManager/jvm.config | xargs` -cp conf/druid/_common:conf/druid/middleManager:lib/* io.druid.cli.Main server middleManager
```

You can add more servers with Druid Historicals and MiddleManagers as needed.

<div class="note info">
For clusters with complex resource allocation needs, you can break apart Historicals and MiddleManagers and scale the components individually.
This also allows you take advantage of Druid's built-in MiddleManager
autoscaling facility.
</div>

If you are doing push-based stream ingestion with Kafka or over HTTP, you can also start Tranquility Server on the same
hardware that holds MiddleManagers and Historicals. For large scale production, MiddleManagers and Tranquility Server
can still be co-located. If you are running Tranquility (not server) with a stream processor, you can co-locate
Tranquility with the stream processor and not require Tranquility Server.

```bash
curl -O http://static.druid.io/tranquility/releases/tranquility-distribution-0.7.2.tgz
tar -xzf tranquility-distribution-0.7.2.tgz
cd tranquility-distribution-0.7.2.tgz
bin/tranquility <server or kafka> -configFile <path_to_druid_distro>/conf/tranquility/<server or kafka>.json
```

## Start Druid Broker

Copy the Druid distribution and your edited configurations to your servers set aside for the Druid Brokers.

On each one, *cd* into the distribution and run this command to start a Broker (you may want to pipe the output to a log file):

```bash
java `cat conf/druid/broker/jvm.config | xargs` -cp conf/druid/_common:conf/druid/broker:lib/* io.druid.cli.Main server broker
```

You can add more Brokers as needed based on query load.

## Loading data

Congratulations, you now have a Druid cluster! The next step is to learn about recommended ways to load data into
Druid based on your use case. Read more about [loading data](ingestion.html).
new quickstart 2016-01-06 00:27:52 -05:00			`---`
			`layout: doc_page`
			`---`

			`# Clustering`

			`Druid is designed to be deployed as a scalable, fault-tolerant cluster.`

add doc rendering 2016-02-04 14:53:09 -05:00			`In this document, we'll set up a simple cluster and discuss how it can be further configured to meet`
			`your needs. This simple cluster will feature scalable, fault-tolerant servers for Historicals and MiddleManagers, and a single`
			`coordination server to host the Coordinator and Overlord processes. In production, we recommend deploying Coordinators and Overlords in a fault-tolerant`
new quickstart 2016-01-06 00:27:52 -05:00			`configuration as well.`

			`## Select hardware`

add doc rendering 2016-02-04 14:53:09 -05:00			`The Coordinator and Overlord processes can be co-located on a single server that is responsible for handling the metadata and coordination needs of your cluster.`
			`The equivalent of an AWS [m3.xlarge](https://aws.amazon.com/ec2/instance-types/#M3) is sufficient for most clusters. This`
new quickstart 2016-01-06 00:27:52 -05:00			`hardware offers:`

			`- 4 vCPUs`
			`- 15 GB RAM`
			`- 80 GB SSD storage`

add doc rendering 2016-02-04 14:53:09 -05:00			`Historicals and MiddleManagers can be colocated on a single server to handle the actual data in your cluster. These servers benefit greatly from CPU, RAM,`
			`and SSDs. The equivalent of an AWS [r3.2xlarge](https://aws.amazon.com/ec2/instance-types/#r3) is a`
new quickstart 2016-01-06 00:27:52 -05:00			`good starting point. This hardware offers:`

			`- 8 vCPUs`
			`- 61 GB RAM`
			`- 160 GB SSD storage`

add doc rendering 2016-02-04 14:53:09 -05:00			`Druid Brokers accept queries and farm them out to the rest of the cluster. They also optionally maintain an`
			`in-memory query cache. These servers benefit greatly from CPU and RAM, and can also be deployed on`
			`the equivalent of an AWS [r3.2xlarge](https://aws.amazon.com/ec2/instance-types/#r3). This hardware`
new quickstart 2016-01-06 00:27:52 -05:00			`offers:`

			`- 8 vCPUs`
			`- 61 GB RAM`
			`- 160 GB SSD storage`

			`You can consider co-locating any open source UIs or query libraries on the same server that the Broker is running on.`

			`Very large clusters should consider selecting larger servers.`

			`## Select OS`

			`We recommend running your favorite Linux distribution. You will also need:`

			`* Java 7 or better`

add doc rendering 2016-02-04 14:53:09 -05:00			`Your OS package manager should be able to help for both Java. If your Ubuntu-based OS`
			`does not have a recent enough version of Java, WebUpd8 offers [packages for those`
new quickstart 2016-01-06 00:27:52 -05:00			`OSes](http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html).`

			`## Download the distribution`

add doc rendering 2016-02-04 14:53:09 -05:00			`First, download and unpack the release archive. It's best to do this on a single machine at first,`
			`since you will be editing the configurations and then copying the modified distribution out to all`
new quickstart 2016-01-06 00:27:52 -05:00			`of your servers.`

			```bash
			`curl -O http://static.druid.io/artifacts/releases/druid-0.9.0-bin.tar.gz`
			`tar -xzf druid-0.9.0-bin.tar.gz`
			`cd druid-0.9.0`
			```

			`In this package, you'll find:`


			* `LICENSE` - the license files.
			* `bin/` - scripts related to the [single-machine quickstart](quickstart.md).
			* `conf/*` - template configurations for a clustered setup.
			* `conf-quickstart/*` - configurations for the [single-machine quickstart](quickstart.md).
			* `extensions/*` - all Druid extensions.
			* `hadoop-dependencies/*` - Druid Hadoop dependencies.
			* `lib/*` - all included software packages for core Druid.
			* `quickstart/*` - files related to the [single-machine quickstart](quickstart.md).

			We'll be editing the files in `conf/` in order to get things running.

			`## Configure deep storage`

add doc rendering 2016-02-04 14:53:09 -05:00			`Druid relies on a distributed filesystem or large object (blob) store for data storage. The most`
			`commonly used deep storage implementations are S3 (popular for those on AWS) and HDFS (popular if`
new quickstart 2016-01-06 00:27:52 -05:00			`you already have a Hadoop deployment).`

			`### S3`

			In `conf/druid/_common/common.runtime.properties`,

			- Set `druid.extensions.loadList=["druid-s3-extensions"]`.

			`- Comment out the configurations for local storage under "Deep Storage" and "Indexing service logs".`

			`- Uncomment and configure appropriate values in the "For S3" sections of "Deep Storage" and`
			`"Indexing service logs".`

			`After this, you should have made the following changes:`

			```
			`druid.extensions.loadList=["druid-s3-extensions"]`

			`#druid.storage.type=local`
			`#druid.storage.storageDirectory=var/druid/segments`

			`druid.storage.type=s3`
			`druid.storage.bucket=your-bucket`
			`druid.storage.baseKey=druid/segments`
			`druid.s3.accessKey=...`
			`druid.s3.secretKey=...`

			`#druid.indexer.logs.type=file`
			`#druid.indexer.logs.directory=var/druid/indexing-logs`

			`druid.indexer.logs.type=s3`
			`druid.indexer.logs.s3Bucket=your-bucket`
			`druid.indexer.logs.s3Prefix=druid/indexing-logs`
			```

			`### HDFS`

			In `conf/druid/_common/common.runtime.properties`,

			- Set `druid.extensions.loadList=["io.druid.extensions:druid-hdfs-storage"]`.

			`- Comment out the configurations for local storage under "Deep Storage" and "Indexing service logs".`

			`- Uncomment and configure appropriate values in the "For HDFS" sections of "Deep Storage" and`
			`"Indexing service logs".`

			`After this, you should have made the following changes:`

			```
			`druid.extensions.loadList=["druid-hdfs-storage"]`

			`#druid.storage.type=local`
			`#druid.storage.storageDirectory=var/druid/segments`

			`druid.storage.type=hdfs`
			`druid.storage.storageDirectory=/druid/segments`

			`#druid.indexer.logs.type=file`
			`#druid.indexer.logs.directory=var/druid/indexing-logs`

			`druid.indexer.logs.type=hdfs`
			`druid.indexer.logs.directory=/druid/indexing-logs`
			```

			`Also,`

add doc rendering 2016-02-04 14:53:09 -05:00			`- Place your Hadoop configuration XMLs (core-site.xml, hdfs-site.xml, yarn-site.xml,`
			`mapred-site.xml) on the classpath of your Druid nodes. You can do this by copying them into`
new quickstart 2016-01-06 00:27:52 -05:00			`conf/druid/_common/`.

			`## Configure Tranquility Server (optional)`

add doc rendering 2016-02-04 14:53:09 -05:00			`Data streams can be sent to Druid through a simple HTTP API powered by Tranquility`
new quickstart 2016-01-06 00:27:52 -05:00			`Server. If you will be using this functionality, then at this point you should [configure`
			`Tranquility Server](../ingestion/stream-ingestion.html#server).`

			`## Configure Tranquility Kafka (optional)`

			`Druid can consuming streams from Kafka through Tranquility Kafka. If you will be`
			`using this functionality, then at this point you should`
			`[configure Tranquility Kafka](../ingestion/stream-ingestion.html#kafka).`

			`## Configure for connecting to Hadoop (optional)`

add doc rendering 2016-02-04 14:53:09 -05:00			`If you will be loading data from a Hadoop cluster, then at this point you should configure Druid to be aware`
new quickstart 2016-01-06 00:27:52 -05:00			`of your cluster:`

add doc rendering 2016-02-04 14:53:09 -05:00			- Update `druid.indexer.task.hadoopWorkingPath` in `conf/middleManager/runtime.properties` to
			`a path on HDFS that you'd like to use for temporary files required during the indexing process.`
new quickstart 2016-01-06 00:27:52 -05:00			`druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing` is a common choice.

add doc rendering 2016-02-04 14:53:09 -05:00			`- Place your Hadoop configuration XMLs (core-site.xml, hdfs-site.xml, yarn-site.xml,`
			`mapred-site.xml) on the classpath of your Druid nodes. You can do this by copying them into`
new quickstart 2016-01-06 00:27:52 -05:00			`conf/druid/_common/core-site.xml`, `conf/druid/_common/hdfs-site.xml`, and so on.

add doc rendering 2016-02-04 14:53:09 -05:00			`Note that you don't need to use HDFS deep storage in order to load data from Hadoop. For example, if`
			`your cluster is running on Amazon Web Services, we recommend using S3 for deep storage even if you`
new quickstart 2016-01-06 00:27:52 -05:00			`are loading data using Hadoop or Elastic MapReduce.`

			`For more info, please see [batch ingestion](../ingestion/batch-ingestion.html).`

			`## Configure addresses for Druid coordination`

add doc rendering 2016-02-04 14:53:09 -05:00			`In this simple cluster, you will deploy a single Druid Coordinator, a`
new quickstart 2016-01-06 00:27:52 -05:00			`single Druid Overlord, a single ZooKeeper instance, and an embedded Derby metadata store on the same server.`

add doc rendering 2016-02-04 14:53:09 -05:00			In `conf/druid/_common/common.runtime.properties`, replace
new quickstart 2016-01-06 00:27:52 -05:00			`"zk.host.ip" with the IP address of the machine that runs your ZK instance:`

			- `druid.zk.service.host`

add doc rendering 2016-02-04 14:53:09 -05:00			In `conf/_common/common.runtime.properties`, replace
new quickstart 2016-01-06 00:27:52 -05:00			`"metadata.store.ip" with the IP address of the machine that you will use as your metadata store:`

			- `druid.metadata.storage.connector.connectURI`
			- `druid.metadata.storage.connector.host`

add doc rendering 2016-02-04 14:53:09 -05:00			`<div class="note caution">`
			`In production, we recommend running 2 servers, each running a Druid Coordinator`
			`and a Druid Overlord. We also recommend running a ZooKeeper cluster on its own dedicated hardware,`
			`as well as replicated [metadata`
			`storage](http://druid.io/docs/latest/dependencies/metadata-storage.html) such as MySQL or`
new quickstart 2016-01-06 00:27:52 -05:00			`PostgreSQL, on its own dedicated hardware.`
add doc rendering 2016-02-04 14:53:09 -05:00			`</div>`
new quickstart 2016-01-06 00:27:52 -05:00
			`## Tune Druid processes that serve queries`

add doc rendering 2016-02-04 14:53:09 -05:00			`Druid Historicals and MiddleManagers can be co-located on the same hardware. Both Druid processes benefit greatly from`
			`being tuned to the hardware they run on. If you are running Tranquility Server or Kafka, you can also colocate Tranquility with these two Druid processes.`
			`If you are using [r3.2xlarge](https://aws.amazon.com/ec2/instance-types/#r3)`
			`EC2 instances, or similar hardware, the configuration in the distribution is a`
new quickstart 2016-01-06 00:27:52 -05:00			`reasonable starting point.`

add doc rendering 2016-02-04 14:53:09 -05:00			`If you are using different hardware, we recommend adjusting configurations for your specific`
new quickstart 2016-01-06 00:27:52 -05:00			`hardware. The most commonly adjusted configurations are:`

			- `-Xmx` and `-Xms`
			- `druid.server.http.numThreads`
			- `druid.processing.buffer.sizeBytes`
			- `druid.processing.numThreads`
			- `druid.query.groupBy.maxIntermediateRows`
			- `druid.query.groupBy.maxResults`
			- `druid.server.maxSize` and `druid.segmentCache.locations` on Historical Nodes
			- `druid.worker.capacity` on MiddleManagers

add doc rendering 2016-02-04 14:53:09 -05:00			`<div class="note info">`
new quickstart 2016-01-06 00:27:52 -05:00			`Keep -XX:MaxDirectMemory >= numThreads*sizeBytes, otherwise Druid will fail to start up..`
add doc rendering 2016-02-04 14:53:09 -05:00			`</div>`
new quickstart 2016-01-06 00:27:52 -05:00
add doc rendering 2016-02-04 14:53:09 -05:00			`Please see the Druid [configuration documentation](../configuration/index.html) for a full description of all`
new quickstart 2016-01-06 00:27:52 -05:00			`possible configuration options.`

			`## Tune Druid Brokers`

add doc rendering 2016-02-04 14:53:09 -05:00			`Druid Brokers also benefit greatly from being tuned to the hardware they`
			`run on. If you are using [r3.2xlarge](https://aws.amazon.com/ec2/instance-types/#r3) EC2 instances,`
new quickstart 2016-01-06 00:27:52 -05:00			`or similar hardware, the configuration in the distribution is a reasonable starting point.`

add doc rendering 2016-02-04 14:53:09 -05:00			`If you are using different hardware, we recommend adjusting configurations for your specific`
new quickstart 2016-01-06 00:27:52 -05:00			`hardware. The most commonly adjusted configurations are:`

			- `-Xmx` and `-Xms`
			- `druid.server.http.numThreads`
			- `druid.cache.sizeInBytes`
			- `druid.processing.buffer.sizeBytes`
			- `druid.processing.numThreads`
			- `druid.query.groupBy.maxIntermediateRows`
			- `druid.query.groupBy.maxResults`

add doc rendering 2016-02-04 14:53:09 -05:00			`<div class="note caution">`
			`Keep -XX:MaxDirectMemory >= numThreads*sizeBytes, otherwise Druid will fail to start up.`
			`</div>`
new quickstart 2016-01-06 00:27:52 -05:00
add doc rendering 2016-02-04 14:53:09 -05:00			`Please see the Druid [configuration documentation](../configuration/index.html) for a full description of all`
new quickstart 2016-01-06 00:27:52 -05:00			`possible configuration options.`

Docs on default ports. 2016-03-14 14:25:21 -04:00			`## Open ports (if using a firewall)`

			`If you're using a firewall or some other system that only allows traffic on specific ports, allow`
			`inbound connections on the following:`

			`- 1527 (Derby on your Coordinator; not needed if you are using a separate metadata store like MySQL or PostgreSQL)`
			`- 2181 (ZooKeeper; not needed if you are using a separate ZooKeeper cluster)`
			`- 8081 (Coordinator)`
			`- 8082 (Broker)`
			`- 8083 (Historical)`
			`- 8090 (Overlord)`
			`- 8091, 8100–8199 (Druid Middle Manager)`
			`- 8200 (Tranquility Server, if used)`

			`<div class="note caution">`
			`In production, we recommend deploying ZooKeeper and your metadata store on their own dedicated hardware,`
			`rather than on the Coordinator server.`
			`</div>`

new quickstart 2016-01-06 00:27:52 -05:00			`## Start Coordinator, Overlord, Zookeeper, and metadata store`

add doc rendering 2016-02-04 14:53:09 -05:00			`Copy the Druid distribution and your edited configurations to your coordination`
			`server. If you have been editing the configurations on your local machine, you can use rsync to`
new quickstart 2016-01-06 00:27:52 -05:00			`copy them:`

			```bash
			`rsync -az druid-0.9.0/ COORDINATION_SERVER:druid-0.9.0/`
			```

			`Log on to your coordination server and install Zookeeper:`
add doc rendering 2016-02-04 14:53:09 -05:00
new quickstart 2016-01-06 00:27:52 -05:00			```bash
			`curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o zookeeper-3.4.6.tar.gz`
			`tar -xzf zookeeper-3.4.6.tar.gz`
			`cd zookeeper-3.4.6`
			`cp conf/zoo_sample.cfg conf/zoo.cfg`
			`./bin/zkServer.sh start`
add doc rendering 2016-02-04 14:53:09 -05:00			```
new quickstart 2016-01-06 00:27:52 -05:00
add doc rendering 2016-02-04 14:53:09 -05:00			`<div class="note caution">`
new quickstart 2016-01-06 00:27:52 -05:00			`In production, we also recommend running a ZooKeeper cluster on its own dedicated hardware.`
add doc rendering 2016-02-04 14:53:09 -05:00			`</div>`
new quickstart 2016-01-06 00:27:52 -05:00
			`On your coordination server, cd into the distribution and start up the coordination services (you should do this in different windows or pipe the log to a file):`

			```bash
			java `cat conf/druid/coordinator/jvm.config \| xargs` -cp conf/druid/_common:conf/druid/coordinator:lib/* io.druid.cli.Main server coordinator
			java `cat conf/druid/overlord/jvm.config \| xargs` -cp conf/druid/_common:conf/druid/overlord:lib/* io.druid.cli.Main server overlord
			```

add doc rendering 2016-02-04 14:53:09 -05:00			`You should see a log message printed out for each service that starts up. You can view detailed logs`
new quickstart 2016-01-06 00:27:52 -05:00			for any service by looking in the `var/log/druid` directory using another terminal.

			`## Start Historicals and MiddleManagers`

			`Copy the Druid distribution and your edited configurations to your servers set aside for the Druid Historicals and MiddleManagers.`

			`On each one, cd into the distribution and run this command to start a Data server:`

			```bash
			java `cat conf/druid/historical/jvm.config \| xargs` -cp conf/druid/_common:conf/druid/historical:lib/* io.druid.cli.Main server historical
			java `cat conf/druid/middleManager/jvm.config \| xargs` -cp conf/druid/_common:conf/druid/middleManager:lib/* io.druid.cli.Main server middleManager
			```

			`You can add more servers with Druid Historicals and MiddleManagers as needed.`

add doc rendering 2016-02-04 14:53:09 -05:00			`<div class="note info">`
			`For clusters with complex resource allocation needs, you can break apart Historicals and MiddleManagers and scale the components individually.`
			`This also allows you take advantage of Druid's built-in MiddleManager`
new quickstart 2016-01-06 00:27:52 -05:00			`autoscaling facility.`
add doc rendering 2016-02-04 14:53:09 -05:00			`</div>`
new quickstart 2016-01-06 00:27:52 -05:00
Docs on default ports. 2016-03-14 14:25:21 -04:00			`If you are doing push-based stream ingestion with Kafka or over HTTP, you can also start Tranquility Server on the same`
			`hardware that holds MiddleManagers and Historicals. For large scale production, MiddleManagers and Tranquility Server`
add doc rendering 2016-02-04 14:53:09 -05:00			`can still be co-located. If you are running Tranquility (not server) with a stream processor, you can co-locate`
Docs on default ports. 2016-03-14 14:25:21 -04:00			`Tranquility with the stream processor and not require Tranquility Server.`
new quickstart 2016-01-06 00:27:52 -05:00
			```bash
			`curl -O http://static.druid.io/tranquility/releases/tranquility-distribution-0.7.2.tgz`
			`tar -xzf tranquility-distribution-0.7.2.tgz`
			`cd tranquility-distribution-0.7.2.tgz`
			`bin/tranquility <server or kafka> -configFile <path_to_druid_distro>/conf/tranquility/<server or kafka>.json`
			```

			`## Start Druid Broker`

add doc rendering 2016-02-04 14:53:09 -05:00			`Copy the Druid distribution and your edited configurations to your servers set aside for the Druid Brokers.`
new quickstart 2016-01-06 00:27:52 -05:00
add doc rendering 2016-02-04 14:53:09 -05:00			`On each one, cd into the distribution and run this command to start a Broker (you may want to pipe the output to a log file):`
new quickstart 2016-01-06 00:27:52 -05:00
			```bash
			java `cat conf/druid/broker/jvm.config \| xargs` -cp conf/druid/_common:conf/druid/broker:lib/* io.druid.cli.Main server broker
			```

			`You can add more Brokers as needed based on query load.`

			`## Loading data`

add doc rendering 2016-02-04 14:53:09 -05:00			`Congratulations, you now have a Druid cluster! The next step is to learn about recommended ways to load data into`
new quickstart 2016-01-06 00:27:52 -05:00			`Druid based on your use case. Read more about [loading data](ingestion.html).`