2018-12-13 14:47:20 -05:00
---
2019-08-21 00:48:59 -04:00
id: cluster
2023-05-19 12:42:27 -04:00
title: Clustered deployment
sidebar_label: Clustered deployment
2018-12-13 14:47:20 -05:00
---
2018-11-13 12:38:37 -05:00
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2016-01-06 00:27:52 -05:00
2020-01-03 12:33:19 -05:00
Apache Druid is designed to be deployed as a scalable, fault-tolerant cluster.
2016-01-06 00:27:52 -05:00
2016-02-04 14:53:09 -05:00
In this document, we'll set up a simple cluster and discuss how it can be further configured to meet
2019-08-21 00:48:59 -04:00
your needs.
2019-01-30 22:41:07 -05:00
This simple cluster will feature:
2019-07-24 18:26:03 -04:00
2019-05-16 14:13:48 -04:00
- A Master server to host the Coordinator and Overlord processes
- Two scalable, fault-tolerant Data servers running Historical and MiddleManager processes
- A query server, hosting the Druid Broker and Router processes
2019-01-30 22:41:07 -05:00
2019-05-16 14:13:48 -04:00
In production, we recommend deploying multiple Master servers and multiple Query servers in a fault-tolerant configuration based on your specific fault-tolerance needs, but you can get started quickly with one Master and one Query server and add more servers later.
2016-01-06 00:27:52 -05:00
## Select hardware
2019-05-16 14:13:48 -04:00
### Fresh Deployment
2019-01-30 22:41:07 -05:00
2019-05-16 14:13:48 -04:00
If you do not have an existing Druid cluster, and wish to start running Druid in a clustered deployment, this guide provides an example clustered deployment with pre-made configurations.
2016-01-06 00:27:52 -05:00
2019-08-21 00:48:59 -04:00
#### Master server
2016-01-06 00:27:52 -05:00
2019-08-21 00:48:59 -04:00
The Coordinator and Overlord processes are responsible for handling the metadata and coordination needs of your cluster. They can be colocated together on the same server.
2019-01-30 22:41:07 -05:00
2019-05-16 14:13:48 -04:00
In this example, we will be deploying the equivalent of one AWS [m5.2xlarge ](https://aws.amazon.com/ec2/instance-types/m5/ ) instance.
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
This hardware offers:
2019-07-24 18:26:03 -04:00
2016-01-06 00:27:52 -05:00
- 8 vCPUs
2021-06-30 16:42:45 -04:00
- 32 GiB RAM
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
Example Master server configurations that have been sized for this hardware can be found under `conf/druid/cluster/master` .
2019-08-21 00:48:59 -04:00
#### Data server
2019-05-16 14:13:48 -04:00
Historicals and MiddleManagers can be colocated on the same server to handle the actual data in your cluster. These servers benefit greatly from CPU, RAM,
2019-08-21 00:48:59 -04:00
and SSDs.
2019-05-16 14:13:48 -04:00
2019-08-21 00:48:59 -04:00
In this example, we will be deploying the equivalent of two AWS [i3.4xlarge ](https://aws.amazon.com/ec2/instance-types/i3/ ) instances.
2019-05-16 14:13:48 -04:00
This hardware offers:
- 16 vCPUs
2021-06-30 16:42:45 -04:00
- 122 GiB RAM
2019-05-16 14:13:48 -04:00
- 2 * 1.9TB SSD storage
Example Data server configurations that have been sized for this hardware can be found under `conf/druid/cluster/data` .
2019-08-21 00:48:59 -04:00
#### Query server
2019-01-30 22:41:07 -05:00
2016-02-04 14:53:09 -05:00
Druid Brokers accept queries and farm them out to the rest of the cluster. They also optionally maintain an
2019-05-16 14:13:48 -04:00
in-memory query cache. These servers benefit greatly from CPU and RAM.
2019-08-21 00:48:59 -04:00
In this example, we will be deploying the equivalent of one AWS [m5.2xlarge ](https://aws.amazon.com/ec2/instance-types/m5/ ) instance.
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
This hardware offers:
2019-07-24 18:26:03 -04:00
2016-01-06 00:27:52 -05:00
- 8 vCPUs
2021-06-30 16:42:45 -04:00
- 32 GiB RAM
2016-01-06 00:27:52 -05:00
You can consider co-locating any open source UIs or query libraries on the same server that the Broker is running on.
2019-05-16 14:13:48 -04:00
Example Query server configurations that have been sized for this hardware can be found under `conf/druid/cluster/query` .
#### Other Hardware Sizes
The example cluster above is chosen as a single example out of many possible ways to size a Druid cluster.
You can choose smaller/larger hardware or less/more servers for your specific needs and constraints.
If your use case has complex scaling requirements, you can also choose to not co-locate Druid processes (e.g., standalone Historical servers).
2019-08-21 00:48:59 -04:00
The information in the [basic cluster tuning guide ](../operations/basic-cluster-tuning.md ) can help with your decision-making process and with sizing your configurations.
2019-05-16 14:13:48 -04:00
2019-08-21 00:48:59 -04:00
### Migrating from a single-server deployment
2019-05-16 14:13:48 -04:00
2019-08-21 00:48:59 -04:00
If you have an existing single-server deployment, such as the ones from the [single-server deployment examples ](../operations/single-server.md ), and you wish to migrate to a clustered deployment of similar scale, the following section contains guidelines for choosing equivalent hardware using the Master/Data/Query server organization.
2019-05-16 14:13:48 -04:00
2019-08-21 00:48:59 -04:00
#### Master server
2019-05-16 14:13:48 -04:00
The main considerations for the Master server are available CPUs and RAM for the Coordinator and Overlord heaps.
Sum up the allocated heap sizes for your Coordinator and Overlord from the single-server deployment, and choose Master server hardware with enough RAM for the combined heaps, with some extra RAM for other processes on the machine.
For CPU cores, you can choose hardware with approximately 1/4th of the cores of the single-server deployment.
2019-08-21 00:48:59 -04:00
#### Data server
2019-05-16 14:13:48 -04:00
When choosing Data server hardware for the cluster, the main considerations are available CPUs and RAM, and using SSD storage if feasible.
In a clustered deployment, having multiple Data servers is a good idea for fault-tolerance purposes.
When choosing the Data server hardware, you can choose a split factor `N` , divide the original CPU/RAM of the single-server deployment by `N` , and deploy `N` Data servers of reduced size in the new cluster.
Instructions for adjusting the Historical/MiddleManager configs for the split are described in a later section in this guide.
2019-09-17 15:47:30 -04:00
#### Query server
2019-05-16 14:13:48 -04:00
The main considerations for the Query server are available CPUs and RAM for the Broker heap + direct memory, and Router heap.
Sum up the allocated memory sizes for your Broker and Router from the single-server deployment, and choose Query server hardware with enough RAM to cover the Broker/Router, with some extra RAM for other processes on the machine.
For CPU cores, you can choose hardware with approximately 1/4th of the cores of the single-server deployment.
2019-08-21 00:48:59 -04:00
The [basic cluster tuning guide ](../operations/basic-cluster-tuning.md ) has information on how to calculate Broker/Router memory usage.
2016-01-06 00:27:52 -05:00
## Select OS
Druid automated quickstart (#13365)
* Druid automated quickstart
* remove conf/druid/single-server/quickstart/_common/historical/jvm.config
* Minor changes in python script
* Add lower bound memory for some services
* Additional runtime properties for services
* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py
* File end newline
* Limit the ability to start multiple instances of a service, documentation changes
* simplify script arguments
* restore changes in medium profile
* run-druid refactor
* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging
* Remove extra quotes from mm task javaopts array
* Update logic to compute minimum memory
* simplify run-druid
* remove debug options from run-druid
* resolve the config_path provided
* comment out service specific runtime properties which are computed in the code
* simplify run-druid
* clean up docs, naming changes
* Throw ValueError exception on illegal state
* update docs
* rename args, compute_only -> compute, run_zk -> zk
* update help documentation
* update help documentation
* move task memory computation into separate method
* Add validation checks
* remove print
* Add validations
* remove start-druid bash script, rename start-druid-main
* Include tasks in lower bound memory calculation
* Fix test
* 256m instead of 256g
* caffeine cache uses 5% of heap
* ensure min task count is 2, task count is monotonic
* update configs and documentation for runtime props in conf/druid/single-server/quickstart
* Update docs
* Specify memory argument for each profile in single-server.md
* Update middleManager runtime.properties
* Move quickstart configs to conf/druid/base, add bash launch script, support python2
* Update supervise script
* rename base config directory to auto
* rename python script, changes to pass repeated args to supervise
* remove exmaples/conf/druid/base dir
* add docs
* restore changes in conf dir
* update start-druid-auto
* remove hashref for commands in supervise script
* start-druid-main java_opts array is comma separated
* update entry point script name in python script
* Update help docs
* documentation changes
* docs changes
* update docs
* add support for running indexer
* update supported services list
* update help
* Update python.md
* remove dir
* update .spelling
* Remove dependency on psutil and pathlib
* update docs
* Update get_physical_memory method
* Update help docs
* update docs
* update method to get physical memory on python
* udpate spelling
* update .spelling
* minor change
* Minor change
* memory comptuation for indexer
* update start-druid
* Update python.md
* Update single-server.md
* Update python.md
* run python3 --version to check if python is installed
* Update supervise script
* start-druid: echo message if python not found
* update anchor text
* minor change
* Update condition in supervise script
* JVM not jvm in docs
2022-12-09 14:04:02 -05:00
We recommend running your favorite Linux distribution. You will also need
2023-07-07 15:52:35 -04:00
* [Java 8u92+, 11, or 17 ](../operations/java.md )
2023-04-07 12:55:52 -04:00
* Python 2 or Python 3
2016-01-06 00:27:52 -05:00
2023-08-16 22:01:21 -04:00
:::info
If needed, you can specify where to find Java using the environment variables
`DRUID_JAVA_HOME` or `JAVA_HOME` . For more details run the `bin/verify-java` script.
:::
2016-01-06 00:27:52 -05:00
2022-05-01 10:44:31 -04:00
For information about installing Java, see the documentation for your OS package manager. If your Ubuntu-based OS does not have a recent enough version of Java, WebUpd8 offers [packages for those
2016-01-06 00:27:52 -05:00
OSes](http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html).
## Download the distribution
2016-02-04 14:53:09 -05:00
First, download and unpack the release archive. It's best to do this on a single machine at first,
since you will be editing the configurations and then copying the modified distribution out to all
2016-01-06 00:27:52 -05:00
of your servers.
2020-01-06 16:00:33 -05:00
[Download ](https://www.apache.org/dyn/closer.cgi?path=/druid/{{DRUIDVERSION}}/apache-druid-{{DRUIDVERSION}}-bin.tar.gz )
2019-09-22 20:38:55 -04:00
the {{DRUIDVERSION}} release.
2018-11-02 00:47:29 -04:00
Extract Druid by running the following commands in your terminal:
2016-01-06 00:27:52 -05:00
```bash
2019-09-22 20:38:55 -04:00
tar -xzf apache-druid-{{DRUIDVERSION}}-bin.tar.gz
cd apache-druid-{{DRUIDVERSION}}
2016-01-06 00:27:52 -05:00
```
2018-11-02 00:47:29 -04:00
In the package, you should find:
2016-01-06 00:27:52 -05:00
2020-01-03 12:33:19 -05:00
* `LICENSE` and `NOTICE` files
2020-12-17 16:37:43 -05:00
* `bin/*` - scripts related to the [single-machine quickstart ](index.md )
2019-05-16 14:13:48 -04:00
* `conf/druid/cluster/*` - template configurations for a clustered setup
2018-11-02 00:47:29 -04:00
* `extensions/*` - core Druid extensions
* `hadoop-dependencies/*` - Druid Hadoop dependencies
* `lib/*` - libraries and dependencies for core Druid
2020-12-17 16:37:43 -05:00
* `quickstart/*` - files related to the [single-machine quickstart ](index.md )
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
We'll be editing the files in `conf/druid/cluster/` in order to get things running.
### Migrating from Single-Server Deployments
In the following sections we will be editing the configs under `conf/druid/cluster` .
If you have an existing single-server deployment, please copy your existing configs to `conf/druid/cluster` to preserve any config changes you have made.
## Configure metadata storage and deep storage
### Migrating from Single-Server Deployments
2019-08-21 00:48:59 -04:00
If you have an existing single-server deployment and you wish to preserve your data across the migration, please follow the instructions at [metadata migration ](../operations/metadata-migration.md ) and [deep storage migration ](../operations/deep-storage-migration.md ) before updating your metadata/deep storage configs.
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
These guides are targeted at single-server deployments that use the Derby metadata store and local deep storage. If you are already using a non-Derby metadata store in your single-server cluster, you can reuse the existing metadata store for the new cluster.
These guides also provide information on migrating segments from local deep storage. A clustered deployment requires distributed deep storage like S3 or HDFS. If your single-server deployment was already using distributed deep storage, you can reuse the existing deep storage for the new cluster.
2019-08-21 00:48:59 -04:00
### Metadata storage
2019-05-16 14:13:48 -04:00
In `conf/druid/cluster/_common/common.runtime.properties` , replace
"metadata.storage.*" with the address of the machine that you will use as your metadata store:
- `druid.metadata.storage.connector.connectURI`
- `druid.metadata.storage.connector.host`
In a production deployment, we recommend running a dedicated metadata store such as MySQL or PostgreSQL with replication, deployed separately from the Druid servers.
2019-08-21 00:48:59 -04:00
The [MySQL extension ](../development/extensions-core/mysql.md ) and [PostgreSQL extension ](../development/extensions-core/postgresql.md ) docs have instructions for extension configuration and initial database setup.
2019-05-16 14:13:48 -04:00
2019-08-21 00:48:59 -04:00
### Deep storage
2016-01-06 00:27:52 -05:00
2016-02-04 14:53:09 -05:00
Druid relies on a distributed filesystem or large object (blob) store for data storage. The most
commonly used deep storage implementations are S3 (popular for those on AWS) and HDFS (popular if
2016-01-06 00:27:52 -05:00
you already have a Hadoop deployment).
2019-05-16 14:13:48 -04:00
#### S3
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
In `conf/druid/cluster/_common/common.runtime.properties` ,
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
- Add "druid-s3-extensions" to `druid.extensions.loadList` .
2016-01-06 00:27:52 -05:00
- Comment out the configurations for local storage under "Deep Storage" and "Indexing service logs".
- Uncomment and configure appropriate values in the "For S3" sections of "Deep Storage" and
"Indexing service logs".
After this, you should have made the following changes:
```
druid.extensions.loadList=["druid-s3-extensions"]
#druid.storage.type=local
#druid.storage.storageDirectory=var/druid/segments
druid.storage.type=s3
druid.storage.bucket=your-bucket
druid.storage.baseKey=druid/segments
druid.s3.accessKey=...
druid.s3.secretKey=...
#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=your-bucket
druid.indexer.logs.s3Prefix=druid/indexing-logs
```
2019-08-21 00:48:59 -04:00
Please see the [S3 extension ](../development/extensions-core/s3.md ) documentation for more info.
2019-05-16 14:13:48 -04:00
#### HDFS
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
In `conf/druid/cluster/_common/common.runtime.properties` ,
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
- Add "druid-hdfs-storage" to `druid.extensions.loadList` .
2016-01-06 00:27:52 -05:00
- Comment out the configurations for local storage under "Deep Storage" and "Indexing service logs".
- Uncomment and configure appropriate values in the "For HDFS" sections of "Deep Storage" and
"Indexing service logs".
After this, you should have made the following changes:
```
druid.extensions.loadList=["druid-hdfs-storage"]
#druid.storage.type=local
#druid.storage.storageDirectory=var/druid/segments
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments
#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs
```
Also,
2016-02-04 14:53:09 -05:00
- Place your Hadoop configuration XMLs (core-site.xml, hdfs-site.xml, yarn-site.xml,
2019-02-28 21:10:39 -05:00
mapred-site.xml) on the classpath of your Druid processes. You can do this by copying them into
2019-05-16 14:13:48 -04:00
`conf/druid/cluster/_common/` .
2019-08-21 00:48:59 -04:00
Please see the [HDFS extension ](../development/extensions-core/hdfs.md ) documentation for more info.
2016-01-06 00:27:52 -05:00
2019-08-21 00:48:59 -04:00
< a name = "hadoop" > < / a >
2016-01-06 00:27:52 -05:00
## Configure for connecting to Hadoop (optional)
2016-02-04 14:53:09 -05:00
If you will be loading data from a Hadoop cluster, then at this point you should configure Druid to be aware
2016-01-06 00:27:52 -05:00
of your cluster:
2019-05-16 14:13:48 -04:00
- Update `druid.indexer.task.hadoopWorkingPath` in `conf/druid/cluster/middleManager/runtime.properties` to
2016-02-04 14:53:09 -05:00
a path on HDFS that you'd like to use for temporary files required during the indexing process.
2016-01-06 00:27:52 -05:00
`druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing` is a common choice.
2016-02-04 14:53:09 -05:00
- Place your Hadoop configuration XMLs (core-site.xml, hdfs-site.xml, yarn-site.xml,
2019-02-28 21:10:39 -05:00
mapred-site.xml) on the classpath of your Druid processes. You can do this by copying them into
2019-05-16 14:13:48 -04:00
`conf/druid/cluster/_common/core-site.xml` , `conf/druid/cluster/_common/hdfs-site.xml` , and so on.
2016-01-06 00:27:52 -05:00
2016-02-04 14:53:09 -05:00
Note that you don't need to use HDFS deep storage in order to load data from Hadoop. For example, if
your cluster is running on Amazon Web Services, we recommend using S3 for deep storage even if you
2016-01-06 00:27:52 -05:00
are loading data using Hadoop or Elastic MapReduce.
2019-08-21 00:48:59 -04:00
For more info, please see the [Hadoop-based ingestion ](../ingestion/hadoop.md ) page.
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
## Configure Zookeeper connection
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
In a production cluster, we recommend using a dedicated ZK cluster in a quorum, deployed separately from the Druid servers.
2019-01-30 22:41:07 -05:00
2019-05-16 14:13:48 -04:00
In `conf/druid/cluster/_common/common.runtime.properties` , set
`druid.zk.service.host` to a [connection string ](https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html )
containing a comma separated list of host:port pairs, each corresponding to a ZooKeeper server in your ZK quorum.
(e.g. "127.0.0.1:4545" or "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002")
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
You can also choose to run ZK on the Master servers instead of having a dedicated ZK cluster. If doing so, we recommend deploying 3 Master servers so that you have a ZK quorum.
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
## Configuration Tuning
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
### Migrating from a Single-Server Deployment
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
#### Master
2016-01-06 00:27:52 -05:00
2019-08-21 00:48:59 -04:00
If you are using an example configuration from [single-server deployment examples ](../operations/single-server.md ), these examples combine the Coordinator and Overlord processes into one combined process.
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
The example configs under `conf/druid/cluster/master/coordinator-overlord` also combine the Coordinator and Overlord processes.
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
You can copy your existing `coordinator-overlord` configs from the single-server deployment to `conf/druid/cluster/master/coordinator-overlord` .
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
#### Data
2016-01-06 00:27:52 -05:00
2021-06-30 16:42:45 -04:00
Suppose we are migrating from a single-server deployment that had 32 CPU and 256GiB RAM. In the old deployment, the following configurations for Historicals and MiddleManagers were applied:
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
Historical (Single-server)
2019-07-24 18:26:03 -04:00
2019-05-16 14:13:48 -04:00
```
2021-06-30 16:42:45 -04:00
druid.processing.buffer.sizeBytes=500MiB
2019-05-16 14:13:48 -04:00
druid.processing.numMergeBuffers=8
druid.processing.numThreads=31
```
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
MiddleManager (Single-server)
2019-07-24 18:26:03 -04:00
2019-05-16 14:13:48 -04:00
```
druid.worker.capacity=8
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
2021-06-30 16:42:45 -04:00
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB
2019-05-16 14:13:48 -04:00
druid.indexer.fork.property.druid.processing.numThreads=1
```
2016-01-06 00:27:52 -05:00
2021-06-30 16:42:45 -04:00
In the clustered deployment, we can choose a split factor (2 in this example), and deploy 2 Data servers with 16CPU and 128GiB RAM each. The areas to scale are the following:
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
Historical
2019-07-24 18:26:03 -04:00
2019-05-16 14:13:48 -04:00
- `druid.processing.numThreads` : Set to `(num_cores - 1)` based on the new hardware
- `druid.processing.numMergeBuffers` : Divide the old value from the single-server deployment by the split factor
- `druid.processing.buffer.sizeBytes` : Keep this unchanged
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
MiddleManager:
2019-07-24 18:26:03 -04:00
2019-05-16 14:13:48 -04:00
- `druid.worker.capacity` : Divide the old value from the single-server deployment by the split factor
- `druid.indexer.fork.property.druid.processing.numMergeBuffers` : Keep this unchanged
- `druid.indexer.fork.property.druid.processing.buffer.sizeBytes` : Keep this unchanged
- `druid.indexer.fork.property.druid.processing.numThreads` : Keep this unchanged
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
The resulting configs after the split:
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
New Historical (on 2 Data servers)
2019-07-24 18:26:03 -04:00
2019-05-16 14:13:48 -04:00
```
2021-06-30 16:42:45 -04:00
druid.processing.buffer.sizeBytes=500MiB
2023-02-23 11:50:47 -05:00
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15
2019-05-16 14:13:48 -04:00
```
New MiddleManager (on 2 Data servers)
2019-07-24 18:26:03 -04:00
2019-05-16 14:13:48 -04:00
```
druid.worker.capacity=4
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
2021-06-30 16:42:45 -04:00
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB
2019-05-16 14:13:48 -04:00
druid.indexer.fork.property.druid.processing.numThreads=1
```
#### Query
You can copy your existing Broker and Router configs to the directories under `conf/druid/cluster/query` , no modifications are needed, as long as the new hardware is sized accordingly.
### Fresh deployment
If you are using the example cluster described above:
- 1 Master server (m5.2xlarge)
- 2 Data servers (i3.4xlarge)
- 1 Query server (m5.2xlarge)
The configurations under `conf/druid/cluster` have already been sized for this hardware and you do not need to make further modifications for general use cases.
2016-01-06 00:27:52 -05:00
2019-08-21 00:48:59 -04:00
If you have chosen different hardware, the [basic cluster tuning guide ](../operations/basic-cluster-tuning.md ) can help you size your configurations.
2016-01-06 00:27:52 -05:00
2016-03-14 14:25:21 -04:00
## Open ports (if using a firewall)
If you're using a firewall or some other system that only allows traffic on specific ports, allow
inbound connections on the following:
2019-01-30 22:41:07 -05:00
### Master Server
- 1527 (Derby metadata store; not needed if you are using a separate metadata store like MySQL or PostgreSQL)
2016-03-14 14:25:21 -04:00
- 2181 (ZooKeeper; not needed if you are using a separate ZooKeeper cluster)
- 8081 (Coordinator)
- 8090 (Overlord)
2019-01-30 22:41:07 -05:00
### Data Server
- 8083 (Historical)
2016-03-14 22:11:18 -04:00
- 8091, 8100– 8199 (Druid Middle Manager; you may need higher than port 8199 if you have a very high `druid.worker.capacity` )
2019-01-30 22:41:07 -05:00
### Query Server
- 8082 (Broker)
- 8088 (Router, if used)
2023-08-16 22:01:21 -04:00
:::info
In production, we recommend deploying ZooKeeper and your metadata store on their own dedicated hardware,
rather than on the Master server.
:::
2016-03-14 14:25:21 -04:00
2019-01-30 22:41:07 -05:00
## Start Master Server
2019-05-16 14:13:48 -04:00
Copy the Druid distribution and your edited configurations to your Master server.
2016-01-06 00:27:52 -05:00
2019-01-30 22:41:07 -05:00
If you have been editing the configurations on your local machine, you can use *rsync* to copy them:
2016-01-06 00:27:52 -05:00
```bash
2019-09-22 20:38:55 -04:00
rsync -az apache-druid-{{DRUIDVERSION}}/ MASTER_SERVER:apache-druid-{{DRUIDVERSION}}/
2016-01-06 00:27:52 -05:00
```
2019-07-31 13:43:11 -04:00
### No Zookeeper on Master
2019-05-16 14:13:48 -04:00
From the distribution root, run the following command to start the Master server:
```
bin/start-cluster-master-no-zk-server
```
### With Zookeeper on Master
2019-10-30 19:17:28 -04:00
If you plan to run ZK on Master servers, first update `conf/zoo.cfg` to reflect how you plan to run ZK. Then, you
can start the Master server processes together with ZK using:
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
```
bin/start-cluster-master-with-zk-server
2016-01-06 00:27:52 -05:00
```
2023-08-16 22:01:21 -04:00
:::info
In production, we also recommend running a ZooKeeper cluster on its own dedicated hardware.
:::
2016-01-06 00:27:52 -05:00
2019-01-30 22:41:07 -05:00
## Start Data Server
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
Copy the Druid distribution and your edited configurations to your Data servers.
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
From the distribution root, run the following command to start the Data server:
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
```
bin/start-cluster-data-server
2016-01-06 00:27:52 -05:00
```
2019-05-16 14:13:48 -04:00
You can add more Data servers as needed.
2016-01-06 00:27:52 -05:00
2023-08-16 22:01:21 -04:00
:::info
For clusters with complex resource allocation needs, you can break apart Historicals and MiddleManagers and scale the components individually.
This also allows you take advantage of Druid's built-in MiddleManager autoscaling facility.
:::
2016-01-06 00:27:52 -05:00
2019-01-30 22:41:07 -05:00
## Start Query Server
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
Copy the Druid distribution and your edited configurations to your Query servers.
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
From the distribution root, run the following command to start the Query server:
2016-01-06 00:27:52 -05:00
2019-05-16 14:13:48 -04:00
```
bin/start-cluster-query-server
2016-01-06 00:27:52 -05:00
```
2019-08-21 00:48:59 -04:00
You can add more Query servers as needed based on query load. If you increase the number of Query servers, be sure to adjust the connection pools on your Historicals and Tasks as described in the [basic cluster tuning guide ](../operations/basic-cluster-tuning.md ).
2016-01-06 00:27:52 -05:00
## Loading data
2016-02-04 14:53:09 -05:00
Congratulations, you now have a Druid cluster! The next step is to learn about recommended ways to load data into
2019-08-21 00:48:59 -04:00
Druid based on your use case. Read more about [loading data ](../ingestion/index.md ).