druid/docs/operations/rolling-updates.md

---
id: rolling-updates
title: "Rolling updates"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->


For rolling Apache Druid cluster updates with no downtime, we recommend updating Druid processes in the
following order:

1. Historical
2. Middle Manager and Indexer (if any)
3. Broker
4. Router
5. Overlord (Note that you can upgrade the Overlord before any Middle Manager processes if you use [autoscaling-based replacement](#autoscaling-based-replacement).)
6. Coordinator ( or merged Coordinator+Overlord )

If you need to do a rolling downgrade, reverse the order and start with the Coordinator processes.

For information about the latest release, see [Druid releases](https://github.com/apache/druid/releases).

## Historical

Historical processes can be updated one at a time. Each Historical process has a startup time to memory map
all the segments it was serving before the update. The startup time typically takes a few seconds to
a few minutes, depending on the hardware of the host. As long as each Historical process is updated
with a sufficient delay (greater than the time required to start a single process), you can rolling
update the entire Historical cluster.

## Overlord

Overlord processes can be updated one at a time in a rolling fashion.

## Middle Managers/Indexers

Middle Managers or Indexer nodes run both batch and real-time indexing tasks. Generally you want to update Middle
Managers in such a way that real-time indexing tasks do not fail. There are three strategies for
doing that.

### Rolling restart (restore-based)

Middle Managers can be updated one at a time in a rolling fashion when you set
`druid.indexer.task.restoreTasksOnRestart=true`. In this case, indexing tasks that support restoring
will restore their state on Middle Manager restart, and will not fail.

Currently, only realtime tasks support restoring, so non-realtime indexing tasks will fail and will
need to be resubmitted.

### Rolling restart (graceful-termination-based)

Middle Managers can be gracefully terminated using the "disable" API. This works for all task types,
even tasks that are not restorable.

To prepare a Middle Manager for update, send a POST request to
`<Middle_Manager_IP:PORT>/druid/worker/v1/disable`. The Overlord will now no longer send tasks to
this Middle Manager. Tasks that have already started will run to completion. Current state can be checked
using `<Middle_Manager_IP:PORT>/druid/worker/v1/enabled` .

To view all existing tasks, send a GET request to `<Middle_Manager_IP:PORT>/druid/worker/v1/tasks`.
When this list is empty, you can safely update the Middle Manager. After the Middle Manager starts
back up, it is automatically enabled again. You can also manually enable Middle Managers by POSTing
to `<Middle_Manager_IP:PORT>/druid/worker/v1/enable`.

### Autoscaling-based replacement

If autoscaling is enabled on your Overlord, then Overlord processes can launch new Middle Manager processes
en masse and then gracefully terminate old ones as their tasks finish. This process is configured by
setting `druid.indexer.runner.minWorkerVersion=#{VERSION}`. Each time you update your Overlord process,
the `VERSION` value should be increased, which will trigger a mass launch of new Middle Managers.

The config `druid.indexer.autoscale.workerVersion=#{VERSION}` also needs to be set.

## Standalone Real-time

Standalone real-time processes can be updated one at a time in a rolling fashion.

## Broker

Broker processes can be updated one at a time in a rolling fashion. There needs to be some delay between
updating each process as Brokers must load the entire state of the cluster before they return valid
results.

## Coordinator

Coordinator processes can be updated one at a time in a rolling fashion.
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`---`
Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`id: rolling-updates`
			`title: "Rolling updates"`
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`---`

add missing license headers, in particular to MD files; clean up RAT … (#6563) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg 2018-11-13 12:38:37 -05:00			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
De-incubation cleanup in code, docs, packaging (#9108) * De-incubation cleanup in code, docs, packaging * remove unused docs script 2020-01-03 12:33:19 -05:00			`For rolling Apache Druid cluster updates with no downtime, we recommend updating Druid processes in the`
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`following order:`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`1. Historical`
docs: clean up some rolling updates stuff (#15762) 2024-01-26 17:10:53 -05:00			`2. Middle Manager and Indexer (if any)`
			`3. Broker`
			`4. Router`
Middle Manager wording update in docs (#17005) 2024-09-05 13:25:30 -04:00			`5. Overlord (Note that you can upgrade the Overlord before any Middle Manager processes if you use [autoscaling-based replacement](#autoscaling-based-replacement).)`
rolling upgrade order change to bring coordinator and overlord together (#4281) * rolling upgrade order change to bring coordinator and overlord together * mentioned merged Coordinator-Overlord in upgrade order doc * revert autoscaling doc change * auto scaling doc fix 2017-07-25 13:54:12 -04:00			`6. Coordinator ( or merged Coordinator+Overlord )`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
docs: clean up some rolling updates stuff (#15762) 2024-01-26 17:10:53 -05:00			`If you need to do a rolling downgrade, reverse the order and start with the Coordinator processes.`
add links to release notes, light refactor of landing page (#11051) * add links to release notes, light refactor of landing page * Update docs/design/index.md 2021-05-07 17:26:47 -04:00
docs: clean up some rolling updates stuff (#15762) 2024-01-26 17:10:53 -05:00			`For information about the latest release, see [Druid releases](https://github.com/apache/druid/releases).`
add order change needed for KIS in 0.12.0 (#5760) 2018-06-08 18:25:26 -04:00
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`## Historical`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`Historical processes can be updated one at a time. Each Historical process has a startup time to memory map`
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`all the segments it was serving before the update. The startup time typically takes a few seconds to`
Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`a few minutes, depending on the hardware of the host. As long as each Historical process is updated`
			`with a sufficient delay (greater than the time required to start a single process), you can rolling`
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`update the entire Historical cluster.`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`## Overlord`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`Overlord processes can be updated one at a time in a rolling fashion.`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
add reference to indexer nodes (#8607) 2019-09-30 18:45:33 -04:00			`## Middle Managers/Indexers`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
add reference to indexer nodes (#8607) 2019-09-30 18:45:33 -04:00			`Middle Managers or Indexer nodes run both batch and real-time indexing tasks. Generally you want to update Middle`
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`Managers in such a way that real-time indexing tasks do not fail. There are three strategies for`
			`doing that.`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`### Rolling restart (restore-based)`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`Middle Managers can be updated one at a time in a rolling fashion when you set`
			`druid.indexer.task.restoreTasksOnRestart=true`. In this case, indexing tasks that support restoring
			`will restore their state on Middle Manager restart, and will not fail.`

			`Currently, only realtime tasks support restoring, so non-realtime indexing tasks will fail and will`
			`need to be resubmitted.`

			`### Rolling restart (graceful-termination-based)`

			`Middle Managers can be gracefully terminated using the "disable" API. This works for all task types,`
			`even tasks that are not restorable.`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`To prepare a Middle Manager for update, send a POST request to`
Middle Manager wording update in docs (#17005) 2024-09-05 13:25:30 -04:00			`<Middle_Manager_IP:PORT>/druid/worker/v1/disable`. The Overlord will now no longer send tasks to
document how to check MM enabled/disabled (#3331) 2016-08-05 17:56:51 -04:00			`this Middle Manager. Tasks that have already started will run to completion. Current state can be checked`
Middle Manager wording update in docs (#17005) 2024-09-05 13:25:30 -04:00			using `<Middle_Manager_IP:PORT>/druid/worker/v1/enabled` .
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Middle Manager wording update in docs (#17005) 2024-09-05 13:25:30 -04:00			To view all existing tasks, send a GET request to `<Middle_Manager_IP:PORT>/druid/worker/v1/tasks`.
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`When this list is empty, you can safely update the Middle Manager. After the Middle Manager starts`
			`back up, it is automatically enabled again. You can also manually enable Middle Managers by POSTing`
Middle Manager wording update in docs (#17005) 2024-09-05 13:25:30 -04:00			to `<Middle_Manager_IP:PORT>/druid/worker/v1/enable`.
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`### Autoscaling-based replacement`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`If autoscaling is enabled on your Overlord, then Overlord processes can launch new Middle Manager processes`
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`en masse and then gracefully terminate old ones as their tasks finish. This process is configured by`
Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			setting `druid.indexer.runner.minWorkerVersion=#{VERSION}`. Each time you update your Overlord process,
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			the `VERSION` value should be increased, which will trigger a mass launch of new Middle Managers.

			The config `druid.indexer.autoscale.workerVersion=#{VERSION}` also needs to be set.

			`## Standalone Real-time`

Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`Standalone real-time processes can be updated one at a time in a rolling fashion.`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`## Broker`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`Broker processes can be updated one at a time in a rolling fashion. There needs to be some delay between`
			`updating each process as Brokers must load the entire state of the cluster before they return valid`
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`results.`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Freshen up rolling update docs 1. Clarify what "Indexing Service / Realtime" means 2. Add info about restore-based middle manager rolling restarts 3. Add info about what happens in middle manager updates 4. More consistent capitalization and spelling of node types 2016-02-09 16:54:00 -05:00			`## Coordinator`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`Coordinator processes can be updated one at a time in a rolling fashion.`