In production, we recommend deploying multiple Master servers and multiple Query servers in a fault-tolerant configuration based on your specific fault-tolerance needs, but you can get started quickly with one Master and one Query server and add more servers later.
If you do not have an existing Druid cluster, and wish to start running Druid in a clustered deployment, this guide provides an example clustered deployment with pre-made configurations.
The Coordinator and Overlord processes are responsible for handling the metadata and coordination needs of your cluster. They can be colocated together on the same server.
Example Master server configurations that have been sized for this hardware can be found under `conf/druid/cluster/master`.
#### Data Server
Historicals and MiddleManagers can be colocated on the same server to handle the actual data in your cluster. These servers benefit greatly from CPU, RAM,
and SSDs.
In this example, we will be deploying the equivalent of two AWS [i3.4xlarge](https://aws.amazon.com/ec2/instance-types/i3/) instances.
This hardware offers:
- 16 vCPUs
- 122 GB RAM
- 2 * 1.9TB SSD storage
Example Data server configurations that have been sized for this hardware can be found under `conf/druid/cluster/data`.
Example Query server configurations that have been sized for this hardware can be found under `conf/druid/cluster/query`.
#### Other Hardware Sizes
The example cluster above is chosen as a single example out of many possible ways to size a Druid cluster.
You can choose smaller/larger hardware or less/more servers for your specific needs and constraints.
If your use case has complex scaling requirements, you can also choose to not co-locate Druid processes (e.g., standalone Historical servers).
The information in the [basic cluster tuning guide](../operations/basic-cluster-tuning.html) can help with your decision-making process and with sizing your configurations.
### Migrating from a Single-Server Deployment
If you have an existing single-server deployment, such as the ones from the [single-server deployment examples](../operations/single-server.html), and you wish to migrate to a clustered deployment of similar scale, the following section contains guidelines for choosing equivalent hardware using the Master/Data/Query server organization.
#### Master Server
The main considerations for the Master server are available CPUs and RAM for the Coordinator and Overlord heaps.
Sum up the allocated heap sizes for your Coordinator and Overlord from the single-server deployment, and choose Master server hardware with enough RAM for the combined heaps, with some extra RAM for other processes on the machine.
For CPU cores, you can choose hardware with approximately 1/4th of the cores of the single-server deployment.
#### Data Server
When choosing Data server hardware for the cluster, the main considerations are available CPUs and RAM, and using SSD storage if feasible.
In a clustered deployment, having multiple Data servers is a good idea for fault-tolerance purposes.
When choosing the Data server hardware, you can choose a split factor `N`, divide the original CPU/RAM of the single-server deployment by `N`, and deploy `N` Data servers of reduced size in the new cluster.
Instructions for adjusting the Historical/MiddleManager configs for the split are described in a later section in this guide.
#### Query Server
The main considerations for the Query server are available CPUs and RAM for the Broker heap + direct memory, and Router heap.
Sum up the allocated memory sizes for your Broker and Router from the single-server deployment, and choose Query server hardware with enough RAM to cover the Broker/Router, with some extra RAM for other processes on the machine.
For CPU cores, you can choose hardware with approximately 1/4th of the cores of the single-server deployment.
The [basic cluster tuning guide](../operations/basic-cluster-tuning.html) has information on how to calculate Broker/Router memory usage.
We'll be editing the files in `conf/druid/cluster/` in order to get things running.
### Migrating from Single-Server Deployments
In the following sections we will be editing the configs under `conf/druid/cluster`.
If you have an existing single-server deployment, please copy your existing configs to `conf/druid/cluster` to preserve any config changes you have made.
## Configure metadata storage and deep storage
### Migrating from Single-Server Deployments
If you have an existing single-server deployment and you wish to preserve your data across the migration, please follow the instructions at [metadata migration](../operations/metadata-migration.html) and [deep storage migration](../operations/deep-storage-migration.html) before updating your metadata/deep storage configs.
These guides are targeted at single-server deployments that use the Derby metadata store and local deep storage. If you are already using a non-Derby metadata store in your single-server cluster, you can reuse the existing metadata store for the new cluster.
These guides also provide information on migrating segments from local deep storage. A clustered deployment requires distributed deep storage like S3 or HDFS. If your single-server deployment was already using distributed deep storage, you can reuse the existing deep storage for the new cluster.
### Metadata Storage
In `conf/druid/cluster/_common/common.runtime.properties`, replace
"metadata.storage.*" with the address of the machine that you will use as your metadata store:
-`druid.metadata.storage.connector.connectURI`
-`druid.metadata.storage.connector.host`
In a production deployment, we recommend running a dedicated metadata store such as MySQL or PostgreSQL with replication, deployed separately from the Druid servers.
The [MySQL extension](../development/extensions-core/mysql.html) and [PostgreSQL extension](../development/extensions-core/postgresql.html) docs have instructions for extension configuration and initial database setup.
You can also choose to run ZK on the Master servers instead of having a dedicated ZK cluster. If doing so, we recommend deploying 3 Master servers so that you have a ZK quorum.
If you are using an example configuration from [single-server deployment examples](../operations/single-server.html), these examples combine the Coordinator and Overlord processes into one combined process.
Suppose we are migrating from a single-server deployment that had 32 CPU and 256GB RAM. In the old deployment, the following configurations for Historicals and MiddleManagers were applied:
In the clustered deployment, we can choose a split factor (2 in this example), and deploy 2 Data servers with 16CPU and 128GB RAM each. The areas to scale are the following:
You can copy your existing Broker and Router configs to the directories under `conf/druid/cluster/query`, no modifications are needed, as long as the new hardware is sized accordingly.
### Fresh deployment
If you are using the example cluster described above:
- 1 Master server (m5.2xlarge)
- 2 Data servers (i3.4xlarge)
- 1 Query server (m5.2xlarge)
The configurations under `conf/druid/cluster` have already been sized for this hardware and you do not need to make further modifications for general use cases.
If you have chosen different hardware, the [basic cluster tuning guide](../operations/basic-cluster-tuning.html) can help you size your configurations.
From the distribution root, run the following command to start the Master server:
```
bin/start-cluster-master-no-zk-server
```
### With Zookeeper on Master
If you plan to run ZK on Master servers, first update `conf/zoo.cfg` to reflect how you plan to run ZK. Then log on to your Master servers and install Zookeeper:
If you are doing push-based stream ingestion with Kafka or over HTTP, you can also start Tranquility Server on the Data server.
For large scale production, Data server processes and the Tranquility Server can still be co-located.
If you are running Tranquility (not server) with a stream processor, you can co-locate Tranquility with the stream processor and not require Tranquility Server.
You can add more Query servers as needed based on query load. If you increase the number of Query servers, be sure to adjust the connection pools on your Historicals and Tasks as described in the [basic cluster tuning guide](../operations/basic-cluster-tuning.html).