From 84acb65ca18c9d53faea1d02a69f400bf094b871 Mon Sep 17 00:00:00 2001 From: Clinton Gormley Date: Tue, 30 Jun 2015 13:53:04 +0200 Subject: [PATCH] Docs: Documented delayed allocation settings Relates to: #11712 --- docs/reference/cluster/health.asciidoc | 30 +++-- .../index-modules/allocation.asciidoc | 127 +----------------- .../index-modules/allocation/delayed.asciidoc | 91 +++++++++++++ .../allocation/filtering.asciidoc | 97 +++++++++++++ .../allocation/total_shards.asciidoc | 26 ++++ 5 files changed, 238 insertions(+), 133 deletions(-) create mode 100644 docs/reference/index-modules/allocation/delayed.asciidoc create mode 100644 docs/reference/index-modules/allocation/filtering.asciidoc create mode 100644 docs/reference/index-modules/allocation/total_shards.asciidoc diff --git a/docs/reference/cluster/health.asciidoc b/docs/reference/cluster/health.asciidoc index a58a0924fce..7d9bdc1b041 100644 --- a/docs/reference/cluster/health.asciidoc +++ b/docs/reference/cluster/health.asciidoc @@ -7,18 +7,22 @@ of the cluster. [source,js] -------------------------------------------------- $ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true' -{ - "cluster_name" : "testcluster", - "status" : "green", - "timed_out" : false, - "number_of_nodes" : 2, - "number_of_data_nodes" : 2, - "active_primary_shards" : 5, - "active_shards" : 10, - "relocating_shards" : 0, - "initializing_shards" : 0, +{ + "cluster_name" : "testcluster", + "status" : "green", + "timed_out" : false, + "number_of_nodes" : 2, + "number_of_data_nodes" : 2, + "active_primary_shards" : 5, + "active_shards" : 10, + "relocating_shards" : 0, + "initializing_shards" : 0, "unassigned_shards" : 0, - "number_of_pending_tasks" : 0 + "delayed_unassigned_shards": 0, + "number_of_pending_tasks" : 0, + "number_of_in_flight_fetch": 0, + "task_max_waiting_in_queue_millis": 0, + "active_shards_percent_as_number": 100 } -------------------------------------------------- @@ -61,14 +65,14 @@ The cluster health API accepts the following request parameters: `wait_for_status`:: One of `green`, `yellow` or `red`. Will wait (until the timeout provided) until the status of the cluster changes to the one - provided or better, i.e. `green` > `yellow` > `red`. By default, will not + provided or better, i.e. `green` > `yellow` > `red`. By default, will not wait for any status. `wait_for_relocating_shards`:: A number controlling to how many relocating shards to wait for. Usually will be `0` to indicate to wait till all relocations have happened. Defaults to not wait. - + `wait_for_active_shards`:: A number controlling to how many active shards to wait for. Defaults to not wait. diff --git a/docs/reference/index-modules/allocation.asciidoc b/docs/reference/index-modules/allocation.asciidoc index 4cc07060b62..c0a94c85eb5 100644 --- a/docs/reference/index-modules/allocation.asciidoc +++ b/docs/reference/index-modules/allocation.asciidoc @@ -2,130 +2,17 @@ == Index Shard Allocation This module provides per-index settings to control the allocation of shards to -nodes. +nodes: -[float] -[[shard-allocation-filtering]] -=== Shard Allocation Filtering +* <>: Controlling which shards are allocated to which nodes. +* <>: Delaying allocation of unassigned shards caused by a node leaving. +* <>: A hard limit on the number of shards from the same index per node. -Shard allocation filtering allows you to specify which nodes are allowed -to host the shards of a particular index. +include::allocation/filtering.asciidoc[] -NOTE: The per-index shard allocation filters explained below work in -conjunction with the cluster-wide allocation filters explained in -<>. +include::allocation/delayed.asciidoc[] -It is possible to assign arbitrary metadata attributes to each node at -startup. For instance, nodes could be assigned a `rack` and a `group` -attribute as follows: - -[source,sh] ------------------------- -bin/elasticsearch --node.rack rack1 --node.size big <1> ------------------------- -<1> These attribute settings can also be specfied in the `elasticsearch.yml` config file. - -These metadata attributes can be used with the -`index.routing.allocation.*` settings to allocate an index to a particular -group of nodes. For instance, we can move the index `test` to either `big` or -`medium` nodes as follows: - -[source,json] ------------------------- -PUT test/_settings -{ - "index.routing.allocation.include.size": "big,medium" -} ------------------------- -// AUTOSENSE - -Alternatively, we can move the index `test` away from the `small` nodes with -an `exclude` rule: - -[source,json] ------------------------- -PUT test/_settings -{ - "index.routing.allocation.exclude.size": "small" -} ------------------------- -// AUTOSENSE - -Multiple rules can be specified, in which case all conditions must be -satisfied. For instance, we could move the index `test` to `big` nodes in -`rack1` with the following: - -[source,json] ------------------------- -PUT test/_settings -{ - "index.routing.allocation.include.size": "big", - "index.routing.allocation.include.rack": "rack1" -} ------------------------- -// AUTOSENSE - -NOTE: If some conditions cannot be satisfied then shards will not be moved. - -The following settings are _dynamic_, allowing live indices to be moved from -one set of nodes to another: - -`index.routing.allocation.include.{attribute}`:: - - Assign the index to a node whose `{attribute}` has at least one of the - comma-separated values. - -`index.routing.allocation.require.{attribute}`:: - - Assign the index to a node whose `{attribute}` has _all_ of the - comma-separated values. - -`index.routing.allocation.exclude.{attribute}`:: - - Assign the index to a node whose `{attribute}` has _none_ of the - comma-separated values. - -These special attributes are also supported: - -[horizontal] -`_name`:: Match nodes by node name -`_ip`:: Match nodes by IP address (the IP address associated with the hostname) -`_host`:: Match nodes by hostname - -All attribute values can be specified with wildcards, eg: - -[source,json] ------------------------- -PUT test/_settings -{ - "index.routing.allocation.include._ip": "192.168.2.*" -} ------------------------- -// AUTOSENSE - -[float] -=== Total Shards Per Node - -The cluster-level shard allocator tries to spread the shards of a single index -across as many nodes as possible. However, depending on how many shards and -indices you have, and how big they are, it may not always be possible to spread -shards evenly. - -The following _dynamic_ setting allows you to specify a hard limit on the total -number of shards from a single index allowed per node: - -`index.routing.allocation.total_shards_per_node`:: - - The maximum number of shards (replicas and primaries) that will be - allocated to a single node. Defaults to unbounded. - -[WARNING] -======================================= -This setting imposes a hard limit which can result in some shards not -being allocated. - -Use with caution. -======================================= +include::allocation/total_shards.asciidoc[] diff --git a/docs/reference/index-modules/allocation/delayed.asciidoc b/docs/reference/index-modules/allocation/delayed.asciidoc new file mode 100644 index 00000000000..31f5b8092f3 --- /dev/null +++ b/docs/reference/index-modules/allocation/delayed.asciidoc @@ -0,0 +1,91 @@ +[[delayed-allocation]] +=== Delaying allocation when a node leaves + +When a node leaves the cluster for whatever reason, intentional or otherwise, +the master reacts by: + +* Promoting a replica shard to primary to replace any primaries that were on the node. +* Allocating replica shards to replace the missing replicas (assuming there are enough nodes). +* Rebalancing shards evenly across the remaining nodes. + +These actions are intended to protect the cluster against data loss by +ensuring that every shard is fully replicated as soon as possible. + +Even though we throttle concurrent recoveries both at the +<> and at the <>, this +``shard-shuffle'' can still put a lot of extra load on the cluster which +may not be necessary if the missing node is likely to return soon. Imagine +this scenario: + +* Node 5 loses network connectivity. +* The master promotes a replica shard to primary for each primary that was on Node 5. +* The master allocates new replicas to other nodes in the cluster. +* Each new replica makes an entire copy of the primary shard across the network. +* More shards are moved to different nodes to rebalance the cluster. +* Node 5 returns after a few minutes. +* The master rebalances the cluster by allocating shards to Node 5. + +If the master had just waited for a few minutes, then the missing shards could +have been re-allocated to Node 5 with the minimum of network traffic. This +process would be even quicker for idle shards (shards not receiving indexing +requests) which have been automatically <>. + +The allocation of replica shards which become unassigned because a node has +left can be delayed with the `index.unassigned.node_left.delayed_timeout` +dynamic setting, which defaults to `0` (reassign shards immediately). + +This setting can be updated on a live index (or on all indices): + +[source,js] +------------------------------ +PUT /_all/_settings +{ + "settings": { + "index.unassigned.node_left.delayed_timeout": "5m" + } +} +------------------------------ +// AUTOSENSE + +With delayed allocation enabled, the above scenario changes to look like this: + +* Node 5 loses network connectivity. +* The master promotes a replica shard to primary for each primary that was on Node 5. +* The master logs a message that allocation of unassigned shards has been delayed, and for how long. +* The cluster remains yellow because there are unassigned replica shards. +* Node 5 returns after a few minutes, before the `timeout` expires. +* The missing replicas are re-allocated to Node 5 (and sync-flushed shards recover almost immediately). + +NOTE: This setting will not affect the promotion of replicas to primaries, nor +will it affect the assignment of replicas that have not been assigned +previously. + +==== Monitoring delayed unassigned shards + +The number of shards whose allocation has been delayed by this timeout setting +can be viewed with the <>: + +[source,js] +------------------------------ +GET _cluster/health <1> +------------------------------ +<1> This request will return a `delayed_unassigned_shards` value. + +==== Removing a node permanently + +If a node is not going to return and you would like Elasticsearch to allocate +the missing shards immediately, just update the timeout to zero: + + +[source,js] +------------------------------ +PUT /_all/_settings +{ + "settings": { + "index.unassigned.node_left.delayed_timeout": "0" + } +} +------------------------------ +// AUTOSENSE + +You can reset the timeout as soon as the missing shards have started to recover. diff --git a/docs/reference/index-modules/allocation/filtering.asciidoc b/docs/reference/index-modules/allocation/filtering.asciidoc new file mode 100644 index 00000000000..d5e30fb76bb --- /dev/null +++ b/docs/reference/index-modules/allocation/filtering.asciidoc @@ -0,0 +1,97 @@ +[[shard-allocation-filtering]] +=== Shard Allocation Filtering + +Shard allocation filtering allows you to specify which nodes are allowed +to host the shards of a particular index. + +NOTE: The per-index shard allocation filters explained below work in +conjunction with the cluster-wide allocation filters explained in +<>. + +It is possible to assign arbitrary metadata attributes to each node at +startup. For instance, nodes could be assigned a `rack` and a `group` +attribute as follows: + +[source,sh] +------------------------ +bin/elasticsearch --node.rack rack1 --node.size big <1> +------------------------ +<1> These attribute settings can also be specfied in the `elasticsearch.yml` config file. + +These metadata attributes can be used with the +`index.routing.allocation.*` settings to allocate an index to a particular +group of nodes. For instance, we can move the index `test` to either `big` or +`medium` nodes as follows: + +[source,json] +------------------------ +PUT test/_settings +{ + "index.routing.allocation.include.size": "big,medium" +} +------------------------ +// AUTOSENSE + +Alternatively, we can move the index `test` away from the `small` nodes with +an `exclude` rule: + +[source,json] +------------------------ +PUT test/_settings +{ + "index.routing.allocation.exclude.size": "small" +} +------------------------ +// AUTOSENSE + +Multiple rules can be specified, in which case all conditions must be +satisfied. For instance, we could move the index `test` to `big` nodes in +`rack1` with the following: + +[source,json] +------------------------ +PUT test/_settings +{ + "index.routing.allocation.include.size": "big", + "index.routing.allocation.include.rack": "rack1" +} +------------------------ +// AUTOSENSE + +NOTE: If some conditions cannot be satisfied then shards will not be moved. + +The following settings are _dynamic_, allowing live indices to be moved from +one set of nodes to another: + +`index.routing.allocation.include.{attribute}`:: + + Assign the index to a node whose `{attribute}` has at least one of the + comma-separated values. + +`index.routing.allocation.require.{attribute}`:: + + Assign the index to a node whose `{attribute}` has _all_ of the + comma-separated values. + +`index.routing.allocation.exclude.{attribute}`:: + + Assign the index to a node whose `{attribute}` has _none_ of the + comma-separated values. + +These special attributes are also supported: + +[horizontal] +`_name`:: Match nodes by node name +`_ip`:: Match nodes by IP address (the IP address associated with the hostname) +`_host`:: Match nodes by hostname + +All attribute values can be specified with wildcards, eg: + +[source,json] +------------------------ +PUT test/_settings +{ + "index.routing.allocation.include._ip": "192.168.2.*" +} +------------------------ +// AUTOSENSE diff --git a/docs/reference/index-modules/allocation/total_shards.asciidoc b/docs/reference/index-modules/allocation/total_shards.asciidoc new file mode 100644 index 00000000000..3e1b3ab16e8 --- /dev/null +++ b/docs/reference/index-modules/allocation/total_shards.asciidoc @@ -0,0 +1,26 @@ +[[allocation-total-shards]] +=== Total Shards Per Node + +The cluster-level shard allocator tries to spread the shards of a single index +across as many nodes as possible. However, depending on how many shards and +indices you have, and how big they are, it may not always be possible to spread +shards evenly. + +The following _dynamic_ setting allows you to specify a hard limit on the total +number of shards from a single index allowed per node: + +`index.routing.allocation.total_shards_per_node`:: + + The maximum number of shards (replicas and primaries) that will be + allocated to a single node. Defaults to unbounded. + +[WARNING] +======================================= +This setting imposes a hard limit which can result in some shards not +being allocated. + +Use with caution. +======================================= + + +