If you will be loading data from a Hadoop cluster, then at this point you should configure Druid to be aware
of your cluster:
- Update `druid.indexer.task.hadoopWorkingPath` in `conf/druid/cluster/middleManager/runtime.properties` to
a path on HDFS that you'd like to use for temporary files required during the indexing process.
`druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing` is a common choice.
- Place your Hadoop configuration XMLs (core-site.xml, hdfs-site.xml, yarn-site.xml,
mapred-site.xml) on the classpath of your Druid processes. You can do this by copying them into
`conf/druid/cluster/_common/core-site.xml`, `conf/druid/cluster/_common/hdfs-site.xml`, and so on.
Note that you don't need to use HDFS deep storage in order to load data from Hadoop. For example, if
your cluster is running on Amazon Web Services, we recommend using S3 for deep storage even if you
are loading data using Hadoop or Elastic MapReduce.
For more info, please see the [Hadoop-based ingestion](../ingestion/hadoop.md) page.
## Configure Zookeeper connection
In a production cluster, we recommend using a dedicated ZK cluster in a quorum, deployed separately from the Druid servers.
In `conf/druid/cluster/_common/common.runtime.properties`, set
`druid.zk.service.host` to a [connection string](https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html)
containing a comma separated list of host:port pairs, each corresponding to a ZooKeeper server in your ZK quorum.
(e.g. "" or ",,")
You can also choose to run ZK on the Master servers instead of having a dedicated ZK cluster. If doing so, we recommend deploying 3 Master servers so that you have a ZK quorum.
## Configuration Tuning
### Migrating from a Single-Server Deployment
#### Master
If you are using an example configuration from [single-server deployment examples](../operations/single-server.md), these examples combine the Coordinator and Overlord processes into one combined process.
The example configs under `conf/druid/cluster/master/coordinator-overlord` also combine the Coordinator and Overlord processes.
You can copy your existing `coordinator-overlord` configs from the single-server deployment to `conf/druid/cluster/master/coordinator-overlord`.
#### Data
Suppose we are migrating from a single-server deployment that had 32 CPU and 256GB RAM. In the old deployment, the following configurations for Historicals and MiddleManagers were applied:
In the clustered deployment, we can choose a split factor (2 in this example), and deploy 2 Data servers with 16CPU and 128GB RAM each. The areas to scale are the following:
-`druid.processing.numThreads`: Set to `(num_cores - 1)` based on the new hardware
-`druid.processing.numMergeBuffers`: Divide the old value from the single-server deployment by the split factor
-`druid.processing.buffer.sizeBytes`: Keep this unchanged
-`druid.worker.capacity`: Divide the old value from the single-server deployment by the split factor
-`druid.indexer.fork.property.druid.processing.numMergeBuffers`: Keep this unchanged
-`druid.indexer.fork.property.druid.processing.buffer.sizeBytes`: Keep this unchanged
-`druid.indexer.fork.property.druid.processing.numThreads`: Keep this unchanged
You can copy your existing Broker and Router configs to the directories under `conf/druid/cluster/query`, no modifications are needed, as long as the new hardware is sized accordingly.
### Fresh deployment
If you are using the example cluster described above:
- 1 Master server (m5.2xlarge)
- 2 Data servers (i3.4xlarge)
- 1 Query server (m5.2xlarge)
The configurations under `conf/druid/cluster` have already been sized for this hardware and you do not need to make further modifications for general use cases.
If you have chosen different hardware, the [basic cluster tuning guide](../operations/basic-cluster-tuning.md) can help you size your configurations.
## Open ports (if using a firewall)
If you're using a firewall or some other system that only allows traffic on specific ports, allow
inbound connections on the following:
### Master Server
- 1527 (Derby metadata store; not needed if you are using a separate metadata store like MySQL or PostgreSQL)
- 2181 (ZooKeeper; not needed if you are using a separate ZooKeeper cluster)
- 8081 (Coordinator)
- 8090 (Overlord)
### Data Server
- 8083 (Historical)
- 8091, 8100–8199 (Druid Middle Manager; you may need higher than port 8199 if you have a very high `druid.worker.capacity`)
### Query Server
- 8082 (Broker)
- 8088 (Router, if used)
> In production, we recommend deploying ZooKeeper and your metadata store on their own dedicated hardware,
> rather than on the Master server.
## Start Master Server
Copy the Druid distribution and your edited configurations to your Master server.
If you have been editing the configurations on your local machine, you can use *rsync* to copy them:
From the distribution root, run the following command to start the Master server:
### With Zookeeper on Master
If you plan to run ZK on Master servers, first update `conf/zoo.cfg` to reflect how you plan to run ZK. Then, you
can start the Master server processes together with ZK using:
> In production, we also recommend running a ZooKeeper cluster on its own dedicated hardware.
## Start Data Server
Copy the Druid distribution and your edited configurations to your Data servers.
From the distribution root, run the following command to start the Data server:
You can add more Data servers as needed.
> For clusters with complex resource allocation needs, you can break apart Historicals and MiddleManagers and scale the components individually.
> This also allows you take advantage of Druid's built-in MiddleManager autoscaling facility.
## Start Query Server
Copy the Druid distribution and your edited configurations to your Query servers.
From the distribution root, run the following command to start the Query server:
You can add more Query servers as needed based on query load. If you increase the number of Query servers, be sure to adjust the connection pools on your Historicals and Tasks as described in the [basic cluster tuning guide](../operations/basic-cluster-tuning.md).
## Loading data
Congratulations, you now have a Druid cluster! The next step is to learn about recommended ways to load data into
Druid based on your use case. Read more about [loading data](../ingestion/index.md).