OpenSearch/docs/reference/cluster/reroute.asciidoc

[[cluster-reroute]]
== Cluster Reroute

The reroute command allows to explicitly execute a cluster reroute
allocation command including specific commands. For example, a shard can
be moved from one node to another explicitly, an allocation can be
canceled, or an unassigned shard can be explicitly allocated on a
specific node.

Here is a short example of how a simple reroute API call:

[source,js]
--------------------------------------------------
POST /_cluster/reroute
{
    "commands" : [
        {
            "move" : {
                "index" : "test", "shard" : 0,
                "from_node" : "node1", "to_node" : "node2"
            }
        },
        {
          "allocate_replica" : {
                "index" : "test", "shard" : 1,
                "node" : "node3"
          }
        }
    ]
}
--------------------------------------------------
// CONSOLE
// TEST[skip:doc tests run with only a single node]

An important aspect to remember is the fact that once when an allocation
occurs, the cluster will aim at re-balancing its state back to an even
state. For example, if the allocation includes moving a shard from
`node1` to `node2`, in an `even` state, then another shard will be moved
from `node2` to `node1` to even things out.

The cluster can be set to disable allocations, which means that only the
explicitly allocations will be performed. Obviously, only once all
commands has been applied, the cluster will aim to be re-balance its
state.

Another option is to run the commands in `dry_run` (as a URI flag, or in
the request body). This will cause the commands to apply to the current
cluster state, and return the resulting cluster after the commands (and
re-balancing) has been applied.

If the `explain` parameter is specified, a detailed explanation of why the
commands could or could not be executed is returned.

The commands supported are:

`move`::
    Move a started shard from one node to another node. Accepts
    `index` and `shard` for index name and shard number, `from_node` for the
    node to move the shard `from`, and `to_node` for the node to move the
    shard to.

`cancel`::
    Cancel allocation of a shard (or recovery). Accepts `index`
    and `shard` for index name and shard number, and `node` for the node to
    cancel the shard allocation on. It also accepts `allow_primary` flag to
    explicitly specify that it is allowed to cancel allocation for a primary
    shard.  This can be used to force resynchronization of existing replicas
    from the primary shard by cancelling them and allowing them to be
    reinitialized through the standard reallocation process.

`allocate_replica`::
    Allocate an unassigned replica shard to a node. Accepts the
    `index` and `shard` for index name and shard number, and `node` to
    allocate the shard to. Takes <<modules-cluster,allocation deciders>> into account.

Two more commands are available that allow the allocation of a primary shard
to a node. These commands should however be used with extreme care, as primary
shard allocation is usually fully automatically handled by Elasticsearch.
Reasons why a primary shard cannot be automatically allocated include the following:

- A new index was created but there is no node which satisfies the allocation deciders.
- An up-to-date shard copy of the data cannot be found on the current data nodes in
the cluster. To prevent data loss, the system does not automatically promote a stale
shard copy to primary.

[float]
=== Retry failed shards

The cluster will attempt to allocate a shard a maximum of
`index.allocation.max_retries` times in a row (defaults to `5`), before giving
up and leaving the shard unallocated. This scenario can be caused by
structural problems such as having an analyzer which refers to a stopwords
file which doesn't exist on all nodes.

Once the problem has been corrected, allocation can be manually retried by
calling the <<cluster-reroute,`reroute`>> API with `?retry_failed`, which
will attempt a single retry round for these shards.

[float]
=== Forced allocation on unrecoverable errors

The following two commands are dangerous and may result in data loss. They are
meant to be used in cases where the original data can not be recovered and the cluster
administrator accepts the loss. If you have suffered a temporary issue that has been 
fixed, please see the `retry_failed` flag described above.

`allocate_stale_primary`::
    Allocate a primary shard to a node that holds a stale copy. Accepts the
    `index` and `shard` for index name and shard number, and `node` to
    allocate the shard to. Using this command may lead to data loss
    for the provided shard id. If a node which has the good copy of the
    data rejoins the cluster later on, that data will be overwritten with
    the data of the stale copy that was forcefully allocated with this
    command. To ensure that these implications are well-understood,
    this command requires the special field `accept_data_loss` to be
    explicitly set to `true` for it to work.

`allocate_empty_primary`::
    Allocate an empty primary shard to a node. Accepts the
    `index` and `shard` for index name and shard number, and `node` to
    allocate the shard to. Using this command leads to a complete loss
    of all data that was indexed into this shard, if it was previously
    started. If a node which has a copy of the
    data rejoins the cluster later on, that data will be deleted!
    To ensure that these implications are well-understood,
    this command requires the special field `accept_data_loss` to be
    explicitly set to `true` for it to work.
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00			`[[cluster-reroute]]`
			`== Cluster Reroute`

			`The reroute command allows to explicitly execute a cluster reroute`
			`allocation command including specific commands. For example, a shard can`
			`be moved from one node to another explicitly, an allocation can be`
			`canceled, or an unassigned shard can be explicitly allocated on a`
			`specific node.`

			`Here is a short example of how a simple reroute API call:`

			`[source,js]`
			`--------------------------------------------------`
Docs: CONSOLEify some more docs These need to be CONSOLEified now because we're starting to require Content-Type headers and they didn't have any. * cluster/reroute: Marked as CONSOLE but skipped because the docs build runs with a single node. * docs/bulk: Marked as NOTCONSOLE because the snippets describe either examples or `curl` commands. Fixed the `curl` command to include the `Content-Type` header. * query-dsl/terms-query: Marked as CONSOLE. * search/request/rescore: Marked as CONSOLE. Fixed deprecated syntax. Relates #23001 Relates #18160 2017-02-07 16:24:05 -05:00			`POST /_cluster/reroute`
			`{`
			`"commands" : [`
			`{`
			`"move" : {`
			`"index" : "test", "shard" : 0,`
			`"from_node" : "node1", "to_node" : "node2"`
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00			`}`
			`},`
			`{`
Extend reroute with an option to force assign stale primary shard copies Closes #15708 2016-01-13 16:59:39 +01:00			`"allocate_replica" : {`
Docs: CONSOLEify some more docs These need to be CONSOLEified now because we're starting to require Content-Type headers and they didn't have any. * cluster/reroute: Marked as CONSOLE but skipped because the docs build runs with a single node. * docs/bulk: Marked as NOTCONSOLE because the snippets describe either examples or `curl` commands. Fixed the `curl` command to include the `Content-Type` header. * query-dsl/terms-query: Marked as CONSOLE. * search/request/rescore: Marked as CONSOLE. Fixed deprecated syntax. Relates #23001 Relates #18160 2017-02-07 16:24:05 -05:00			`"index" : "test", "shard" : 1,`
			`"node" : "node3"`
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00			`}`
			`}`
			`]`
Docs: CONSOLEify some more docs These need to be CONSOLEified now because we're starting to require Content-Type headers and they didn't have any. * cluster/reroute: Marked as CONSOLE but skipped because the docs build runs with a single node. * docs/bulk: Marked as NOTCONSOLE because the snippets describe either examples or `curl` commands. Fixed the `curl` command to include the `Content-Type` header. * query-dsl/terms-query: Marked as CONSOLE. * search/request/rescore: Marked as CONSOLE. Fixed deprecated syntax. Relates #23001 Relates #18160 2017-02-07 16:24:05 -05:00			`}`
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00			`--------------------------------------------------`
Docs: CONSOLEify some more docs These need to be CONSOLEified now because we're starting to require Content-Type headers and they didn't have any. * cluster/reroute: Marked as CONSOLE but skipped because the docs build runs with a single node. * docs/bulk: Marked as NOTCONSOLE because the snippets describe either examples or `curl` commands. Fixed the `curl` command to include the `Content-Type` header. * query-dsl/terms-query: Marked as CONSOLE. * search/request/rescore: Marked as CONSOLE. Fixed deprecated syntax. Relates #23001 Relates #18160 2017-02-07 16:24:05 -05:00			`// CONSOLE`
			`// TEST[skip:doc tests run with only a single node]`
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00
			`An important aspect to remember is the fact that once when an allocation`
			`occurs, the cluster will aim at re-balancing its state back to an even`
			`state. For example, if the allocation includes moving a shard from`
			`node1` to `node2`, in an `even` state, then another shard will be moved
			from `node2` to `node1` to even things out.

			`The cluster can be set to disable allocations, which means that only the`
			`explicitly allocations will be performed. Obviously, only once all`
			`commands has been applied, the cluster will aim to be re-balance its`
			`state.`

			Another option is to run the commands in `dry_run` (as a URI flag, or in
			`the request body). This will cause the commands to apply to the current`
			`cluster state, and return the resulting cluster after the commands (and`
			`re-balancing) has been applied.`

Add `explain` flag support to the reroute API By specifying the `explain` flag, an explanation for the reason a command can or cannot be executed is returned. No allocation commands are actually performed. Returns a response similar to: { "state": {...cluster state...}, "acknowledged": true, "explanations" : [ { "command" : "cancel", "parameters" : { "index" : "decide", "shard" : 0, "node" : "IvpoKRdtRiGrQ_WKtt4_4w", "allow_primary" : false }, "decisions" : [ { "decider" : "cancel_allocation_command", "decision" : "YES", "explanation" : "..." } ] }, { "command" : "move", "parameters" : { "index" : "decide", "shard" : 0, "from_node" : "IvpoKRdtRiGrQ_WKtt4_4w", "to_node" : "IvpoKRdtRiGrQ_WKtt4_4w" }, "decisions" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "shard cannot be allocated on same node [IvpoKRdtRiGrQ_WKtt4_4w] it already exists on" }, etc ] }] } also removes AllocationExplanation from cluster state Closes #2483 Closes #5169 2014-01-31 16:50:32 -07:00			If the `explain` parameter is specified, a detailed explanation of why the
			`commands could or could not be executed is returned.`
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00
Add `explain` flag support to the reroute API By specifying the `explain` flag, an explanation for the reason a command can or cannot be executed is returned. No allocation commands are actually performed. Returns a response similar to: { "state": {...cluster state...}, "acknowledged": true, "explanations" : [ { "command" : "cancel", "parameters" : { "index" : "decide", "shard" : 0, "node" : "IvpoKRdtRiGrQ_WKtt4_4w", "allow_primary" : false }, "decisions" : [ { "decider" : "cancel_allocation_command", "decision" : "YES", "explanation" : "..." } ] }, { "command" : "move", "parameters" : { "index" : "decide", "shard" : 0, "from_node" : "IvpoKRdtRiGrQ_WKtt4_4w", "to_node" : "IvpoKRdtRiGrQ_WKtt4_4w" }, "decisions" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "shard cannot be allocated on same node [IvpoKRdtRiGrQ_WKtt4_4w] it already exists on" }, etc ] }] } also removes AllocationExplanation from cluster state Closes #2483 Closes #5169 2014-01-31 16:50:32 -07:00			`The commands supported are:`

			`move`::
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00			`Move a started shard from one node to another node. Accepts`
			`index` and `shard` for index name and shard number, `from_node` for the
			node to move the shard `from`, and `to_node` for the node to move the
Add `explain` flag support to the reroute API By specifying the `explain` flag, an explanation for the reason a command can or cannot be executed is returned. No allocation commands are actually performed. Returns a response similar to: { "state": {...cluster state...}, "acknowledged": true, "explanations" : [ { "command" : "cancel", "parameters" : { "index" : "decide", "shard" : 0, "node" : "IvpoKRdtRiGrQ_WKtt4_4w", "allow_primary" : false }, "decisions" : [ { "decider" : "cancel_allocation_command", "decision" : "YES", "explanation" : "..." } ] }, { "command" : "move", "parameters" : { "index" : "decide", "shard" : 0, "from_node" : "IvpoKRdtRiGrQ_WKtt4_4w", "to_node" : "IvpoKRdtRiGrQ_WKtt4_4w" }, "decisions" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "shard cannot be allocated on same node [IvpoKRdtRiGrQ_WKtt4_4w] it already exists on" }, etc ] }] } also removes AllocationExplanation from cluster state Closes #2483 Closes #5169 2014-01-31 16:50:32 -07:00			`shard to.`
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00
Add `explain` flag support to the reroute API By specifying the `explain` flag, an explanation for the reason a command can or cannot be executed is returned. No allocation commands are actually performed. Returns a response similar to: { "state": {...cluster state...}, "acknowledged": true, "explanations" : [ { "command" : "cancel", "parameters" : { "index" : "decide", "shard" : 0, "node" : "IvpoKRdtRiGrQ_WKtt4_4w", "allow_primary" : false }, "decisions" : [ { "decider" : "cancel_allocation_command", "decision" : "YES", "explanation" : "..." } ] }, { "command" : "move", "parameters" : { "index" : "decide", "shard" : 0, "from_node" : "IvpoKRdtRiGrQ_WKtt4_4w", "to_node" : "IvpoKRdtRiGrQ_WKtt4_4w" }, "decisions" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "shard cannot be allocated on same node [IvpoKRdtRiGrQ_WKtt4_4w] it already exists on" }, etc ] }] } also removes AllocationExplanation from cluster state Closes #2483 Closes #5169 2014-01-31 16:50:32 -07:00			`cancel`::
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00			Cancel allocation of a shard (or recovery). Accepts `index`
			and `shard` for index name and shard number, and `node` for the node to
			cancel the shard allocation on. It also accepts `allow_primary` flag to
			`explicitly specify that it is allowed to cancel allocation for a primary`
Doc fix explaining resynchronization with the Cancel command. Added line explaining resync process to Reroute/Cancel command. Closes #5025 2014-02-07 16:58:49 -05:00			`shard. This can be used to force resynchronization of existing replicas`
			`from the primary shard by cancelling them and allowing them to be`
			`reinitialized through the standard reallocation process.`
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00
Extend reroute with an option to force assign stale primary shard copies Closes #15708 2016-01-13 16:59:39 +01:00			`allocate_replica`::
			`Allocate an unassigned replica shard to a node. Accepts the`
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00			`index` and `shard` for index name and shard number, and `node` to
Extend reroute with an option to force assign stale primary shard copies Closes #15708 2016-01-13 16:59:39 +01:00			`allocate the shard to. Takes <<modules-cluster,allocation deciders>> into account.`

			`Two more commands are available that allow the allocation of a primary shard`
			`to a node. These commands should however be used with extreme care, as primary`
			`shard allocation is usually fully automatically handled by Elasticsearch.`
			`Reasons why a primary shard cannot be automatically allocated include the following:`

			`- A new index was created but there is no node which satisfies the allocation deciders.`
			`- An up-to-date shard copy of the data cannot be found on the current data nodes in`
			`the cluster. To prevent data loss, the system does not automatically promote a stale`
			`shard copy to primary.`

Add a note about using the `retry_failed` flag before accepting data loss (#29160) 2018-03-20 17:53:48 +01:00			`[float]`
			`=== Retry failed shards`

			`The cluster will attempt to allocate a shard a maximum of`
			`index.allocation.max_retries` times in a row (defaults to `5`), before giving
			`up and leaving the shard unallocated. This scenario can be caused by`
			`structural problems such as having an analyzer which refers to a stopwords`
			`file which doesn't exist on all nodes.`

			`Once the problem has been corrected, allocation can be manually retried by`
			calling the <<cluster-reroute,`reroute`>> API with `?retry_failed`, which
			`will attempt a single retry round for these shards.`

			`[float]`
			`=== Forced allocation on unrecoverable errors`

			`The following two commands are dangerous and may result in data loss. They are`
			`meant to be used in cases where the original data can not be recovered and the cluster`
			`administrator accepts the loss. If you have suffered a temporary issue that has been`
			fixed, please see the `retry_failed` flag described above.
Extend reroute with an option to force assign stale primary shard copies Closes #15708 2016-01-13 16:59:39 +01:00
			`allocate_stale_primary`::
			`Allocate a primary shard to a node that holds a stale copy. Accepts the`
			`index` and `shard` for index name and shard number, and `node` to
			`allocate the shard to. Using this command may lead to data loss`
			`for the provided shard id. If a node which has the good copy of the`
			`data rejoins the cluster later on, that data will be overwritten with`
			`the data of the stale copy that was forcefully allocated with this`
			`command. To ensure that these implications are well-understood,`
			this command requires the special field `accept_data_loss` to be
			explicitly set to `true` for it to work.

			`allocate_empty_primary`::
			`Allocate an empty primary shard to a node. Accepts the`
			`index` and `shard` for index name and shard number, and `node` to
			`allocate the shard to. Using this command leads to a complete loss`
			`of all data that was indexed into this shard, if it was previously`
			`started. If a node which has a copy of the`
			`data rejoins the cluster later on, that data will be deleted!`
			`To ensure that these implications are well-understood,`
			this command requires the special field `accept_data_loss` to be
			explicitly set to `true` for it to work.
Limit retries of failed allocations per index (#18467) Today if a shard fails during initialization phase due to misconfiguration, broken disks, missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize or rather allocate that shard. Yet, in the worst case scenario this ends in an endless allocation loop. To prevent this loop and all it's sideeffects like spamming log files over and over again this commit adds an allocation decider that stops allocating a shard that failed more than N times in a row to allocate. The number or retries can be configured via `index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated shards with less failures than the number set per index will be allowed to allocate again. Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards has been started. Relates to #18417 2016-05-20 20:37:45 +02:00