mirror of https://github.com/apache/druid.git
81 lines
3.1 KiB
Markdown
81 lines
3.1 KiB
Markdown
---
|
|
layout: doc_page
|
|
---
|
|
|
|
Rolling Updates
|
|
===============
|
|
|
|
For rolling Druid cluster updates with no downtime, we recommend updating Druid nodes in the
|
|
following order:
|
|
|
|
1. Historical
|
|
2. Overlord (if any)
|
|
3. Middle Manager (if any)
|
|
4. Standalone Real-time (if any)
|
|
5. Broker
|
|
6. Coordinator
|
|
|
|
## Historical
|
|
|
|
Historical nodes can be updated one at a time. Each Historical node has a startup time to memory map
|
|
all the segments it was serving before the update. The startup time typically takes a few seconds to
|
|
a few minutes, depending on the hardware of the node. As long as each Historical node is updated
|
|
with a sufficient delay (greater than the time required to start a single node), you can rolling
|
|
update the entire Historical cluster.
|
|
|
|
## Overlord
|
|
|
|
Overlord nodes can be updated one at a time in a rolling fashion.
|
|
|
|
## Middle Managers
|
|
|
|
Middle Managers run both batch and real-time indexing tasks. Generally you want to update Middle
|
|
Managers in such a way that real-time indexing tasks do not fail. There are three strategies for
|
|
doing that.
|
|
|
|
### Rolling restart (restore-based)
|
|
|
|
Middle Managers can be updated one at a time in a rolling fashion when you set
|
|
`druid.indexer.task.restoreTasksOnRestart=true`. In this case, indexing tasks that support restoring
|
|
will restore their state on Middle Manager restart, and will not fail.
|
|
|
|
Currently, only realtime tasks support restoring, so non-realtime indexing tasks will fail and will
|
|
need to be resubmitted.
|
|
|
|
### Rolling restart (graceful-termination-based)
|
|
|
|
Middle Managers can be gracefully terminated using the "disable" API. This works for all task types,
|
|
even tasks that are not restorable.
|
|
|
|
To prepare a Middle Manager for update, send a POST request to
|
|
`<MiddleManager_IP:PORT>/druid/worker/v1/disable`. The Overlord will now no longer send tasks to
|
|
this Middle Manager. Tasks that have already started will run to completion.
|
|
|
|
To view all existing tasks, send a GET request to `<MiddleManager_IP:PORT>/druid/worker/v1/tasks`.
|
|
When this list is empty, you can safely update the Middle Manager. After the Middle Manager starts
|
|
back up, it is automatically enabled again. You can also manually enable Middle Managers by POSTing
|
|
to `<MiddleManager_IP:PORT>/druid/worker/v1/enable`.
|
|
|
|
### Autoscaling-based replacement
|
|
|
|
If autoscaling is enabled on your Overlord, then Overlord nodes can launch new Middle Manager nodes
|
|
en masse and then gracefully terminate old ones as their tasks finish. This process is configured by
|
|
setting `druid.indexer.runner.minWorkerVersion=#{VERSION}`. Each time you update your overlord node,
|
|
the `VERSION` value should be increased, which will trigger a mass launch of new Middle Managers.
|
|
|
|
The config `druid.indexer.autoscale.workerVersion=#{VERSION}` also needs to be set.
|
|
|
|
## Standalone Real-time
|
|
|
|
Standalone real-time nodes can be updated one at a time in a rolling fashion.
|
|
|
|
## Broker
|
|
|
|
Broker nodes can be updated one at a time in a rolling fashion. There needs to be some delay between
|
|
updating each node as brokers must load the entire state of the cluster before they return valid
|
|
results.
|
|
|
|
## Coordinator
|
|
|
|
Coordinator nodes can be updated one at a time in a rolling fashion.
|