parent
b5e958b2e0
commit
84acb65ca1
|
@ -18,7 +18,11 @@ $ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
|
||||||
"relocating_shards" : 0,
|
"relocating_shards" : 0,
|
||||||
"initializing_shards" : 0,
|
"initializing_shards" : 0,
|
||||||
"unassigned_shards" : 0,
|
"unassigned_shards" : 0,
|
||||||
"number_of_pending_tasks" : 0
|
"delayed_unassigned_shards": 0,
|
||||||
|
"number_of_pending_tasks" : 0,
|
||||||
|
"number_of_in_flight_fetch": 0,
|
||||||
|
"task_max_waiting_in_queue_millis": 0,
|
||||||
|
"active_shards_percent_as_number": 100
|
||||||
}
|
}
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -2,130 +2,17 @@
|
||||||
== Index Shard Allocation
|
== Index Shard Allocation
|
||||||
|
|
||||||
This module provides per-index settings to control the allocation of shards to
|
This module provides per-index settings to control the allocation of shards to
|
||||||
nodes.
|
nodes:
|
||||||
|
|
||||||
[float]
|
* <<shard-allocation-filtering,Shard allocation filtering>>: Controlling which shards are allocated to which nodes.
|
||||||
[[shard-allocation-filtering]]
|
* <<delayed-allocation,Delayed allocation>>: Delaying allocation of unassigned shards caused by a node leaving.
|
||||||
=== Shard Allocation Filtering
|
* <<allocation-total-shards,Total shards per node>>: A hard limit on the number of shards from the same index per node.
|
||||||
|
|
||||||
Shard allocation filtering allows you to specify which nodes are allowed
|
include::allocation/filtering.asciidoc[]
|
||||||
to host the shards of a particular index.
|
|
||||||
|
|
||||||
NOTE: The per-index shard allocation filters explained below work in
|
include::allocation/delayed.asciidoc[]
|
||||||
conjunction with the cluster-wide allocation filters explained in
|
|
||||||
<<shards-allocation>>.
|
|
||||||
|
|
||||||
It is possible to assign arbitrary metadata attributes to each node at
|
include::allocation/total_shards.asciidoc[]
|
||||||
startup. For instance, nodes could be assigned a `rack` and a `group`
|
|
||||||
attribute as follows:
|
|
||||||
|
|
||||||
[source,sh]
|
|
||||||
------------------------
|
|
||||||
bin/elasticsearch --node.rack rack1 --node.size big <1>
|
|
||||||
------------------------
|
|
||||||
<1> These attribute settings can also be specfied in the `elasticsearch.yml` config file.
|
|
||||||
|
|
||||||
These metadata attributes can be used with the
|
|
||||||
`index.routing.allocation.*` settings to allocate an index to a particular
|
|
||||||
group of nodes. For instance, we can move the index `test` to either `big` or
|
|
||||||
`medium` nodes as follows:
|
|
||||||
|
|
||||||
[source,json]
|
|
||||||
------------------------
|
|
||||||
PUT test/_settings
|
|
||||||
{
|
|
||||||
"index.routing.allocation.include.size": "big,medium"
|
|
||||||
}
|
|
||||||
------------------------
|
|
||||||
// AUTOSENSE
|
|
||||||
|
|
||||||
Alternatively, we can move the index `test` away from the `small` nodes with
|
|
||||||
an `exclude` rule:
|
|
||||||
|
|
||||||
[source,json]
|
|
||||||
------------------------
|
|
||||||
PUT test/_settings
|
|
||||||
{
|
|
||||||
"index.routing.allocation.exclude.size": "small"
|
|
||||||
}
|
|
||||||
------------------------
|
|
||||||
// AUTOSENSE
|
|
||||||
|
|
||||||
Multiple rules can be specified, in which case all conditions must be
|
|
||||||
satisfied. For instance, we could move the index `test` to `big` nodes in
|
|
||||||
`rack1` with the following:
|
|
||||||
|
|
||||||
[source,json]
|
|
||||||
------------------------
|
|
||||||
PUT test/_settings
|
|
||||||
{
|
|
||||||
"index.routing.allocation.include.size": "big",
|
|
||||||
"index.routing.allocation.include.rack": "rack1"
|
|
||||||
}
|
|
||||||
------------------------
|
|
||||||
// AUTOSENSE
|
|
||||||
|
|
||||||
NOTE: If some conditions cannot be satisfied then shards will not be moved.
|
|
||||||
|
|
||||||
The following settings are _dynamic_, allowing live indices to be moved from
|
|
||||||
one set of nodes to another:
|
|
||||||
|
|
||||||
`index.routing.allocation.include.{attribute}`::
|
|
||||||
|
|
||||||
Assign the index to a node whose `{attribute}` has at least one of the
|
|
||||||
comma-separated values.
|
|
||||||
|
|
||||||
`index.routing.allocation.require.{attribute}`::
|
|
||||||
|
|
||||||
Assign the index to a node whose `{attribute}` has _all_ of the
|
|
||||||
comma-separated values.
|
|
||||||
|
|
||||||
`index.routing.allocation.exclude.{attribute}`::
|
|
||||||
|
|
||||||
Assign the index to a node whose `{attribute}` has _none_ of the
|
|
||||||
comma-separated values.
|
|
||||||
|
|
||||||
These special attributes are also supported:
|
|
||||||
|
|
||||||
[horizontal]
|
|
||||||
`_name`:: Match nodes by node name
|
|
||||||
`_ip`:: Match nodes by IP address (the IP address associated with the hostname)
|
|
||||||
`_host`:: Match nodes by hostname
|
|
||||||
|
|
||||||
All attribute values can be specified with wildcards, eg:
|
|
||||||
|
|
||||||
[source,json]
|
|
||||||
------------------------
|
|
||||||
PUT test/_settings
|
|
||||||
{
|
|
||||||
"index.routing.allocation.include._ip": "192.168.2.*"
|
|
||||||
}
|
|
||||||
------------------------
|
|
||||||
// AUTOSENSE
|
|
||||||
|
|
||||||
[float]
|
|
||||||
=== Total Shards Per Node
|
|
||||||
|
|
||||||
The cluster-level shard allocator tries to spread the shards of a single index
|
|
||||||
across as many nodes as possible. However, depending on how many shards and
|
|
||||||
indices you have, and how big they are, it may not always be possible to spread
|
|
||||||
shards evenly.
|
|
||||||
|
|
||||||
The following _dynamic_ setting allows you to specify a hard limit on the total
|
|
||||||
number of shards from a single index allowed per node:
|
|
||||||
|
|
||||||
`index.routing.allocation.total_shards_per_node`::
|
|
||||||
|
|
||||||
The maximum number of shards (replicas and primaries) that will be
|
|
||||||
allocated to a single node. Defaults to unbounded.
|
|
||||||
|
|
||||||
[WARNING]
|
|
||||||
=======================================
|
|
||||||
This setting imposes a hard limit which can result in some shards not
|
|
||||||
being allocated.
|
|
||||||
|
|
||||||
Use with caution.
|
|
||||||
=======================================
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,91 @@
|
||||||
|
[[delayed-allocation]]
|
||||||
|
=== Delaying allocation when a node leaves
|
||||||
|
|
||||||
|
When a node leaves the cluster for whatever reason, intentional or otherwise,
|
||||||
|
the master reacts by:
|
||||||
|
|
||||||
|
* Promoting a replica shard to primary to replace any primaries that were on the node.
|
||||||
|
* Allocating replica shards to replace the missing replicas (assuming there are enough nodes).
|
||||||
|
* Rebalancing shards evenly across the remaining nodes.
|
||||||
|
|
||||||
|
These actions are intended to protect the cluster against data loss by
|
||||||
|
ensuring that every shard is fully replicated as soon as possible.
|
||||||
|
|
||||||
|
Even though we throttle concurrent recoveries both at the
|
||||||
|
<<recovery,node level>> and at the <<shards-allocation,cluster level>>, this
|
||||||
|
``shard-shuffle'' can still put a lot of extra load on the cluster which
|
||||||
|
may not be necessary if the missing node is likely to return soon. Imagine
|
||||||
|
this scenario:
|
||||||
|
|
||||||
|
* Node 5 loses network connectivity.
|
||||||
|
* The master promotes a replica shard to primary for each primary that was on Node 5.
|
||||||
|
* The master allocates new replicas to other nodes in the cluster.
|
||||||
|
* Each new replica makes an entire copy of the primary shard across the network.
|
||||||
|
* More shards are moved to different nodes to rebalance the cluster.
|
||||||
|
* Node 5 returns after a few minutes.
|
||||||
|
* The master rebalances the cluster by allocating shards to Node 5.
|
||||||
|
|
||||||
|
If the master had just waited for a few minutes, then the missing shards could
|
||||||
|
have been re-allocated to Node 5 with the minimum of network traffic. This
|
||||||
|
process would be even quicker for idle shards (shards not receiving indexing
|
||||||
|
requests) which have been automatically <<indices-synced-flush,sync-flushed>>.
|
||||||
|
|
||||||
|
The allocation of replica shards which become unassigned because a node has
|
||||||
|
left can be delayed with the `index.unassigned.node_left.delayed_timeout`
|
||||||
|
dynamic setting, which defaults to `0` (reassign shards immediately).
|
||||||
|
|
||||||
|
This setting can be updated on a live index (or on all indices):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
------------------------------
|
||||||
|
PUT /_all/_settings
|
||||||
|
{
|
||||||
|
"settings": {
|
||||||
|
"index.unassigned.node_left.delayed_timeout": "5m"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
With delayed allocation enabled, the above scenario changes to look like this:
|
||||||
|
|
||||||
|
* Node 5 loses network connectivity.
|
||||||
|
* The master promotes a replica shard to primary for each primary that was on Node 5.
|
||||||
|
* The master logs a message that allocation of unassigned shards has been delayed, and for how long.
|
||||||
|
* The cluster remains yellow because there are unassigned replica shards.
|
||||||
|
* Node 5 returns after a few minutes, before the `timeout` expires.
|
||||||
|
* The missing replicas are re-allocated to Node 5 (and sync-flushed shards recover almost immediately).
|
||||||
|
|
||||||
|
NOTE: This setting will not affect the promotion of replicas to primaries, nor
|
||||||
|
will it affect the assignment of replicas that have not been assigned
|
||||||
|
previously.
|
||||||
|
|
||||||
|
==== Monitoring delayed unassigned shards
|
||||||
|
|
||||||
|
The number of shards whose allocation has been delayed by this timeout setting
|
||||||
|
can be viewed with the <<cluster-health,cluster health API>>:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
------------------------------
|
||||||
|
GET _cluster/health <1>
|
||||||
|
------------------------------
|
||||||
|
<1> This request will return a `delayed_unassigned_shards` value.
|
||||||
|
|
||||||
|
==== Removing a node permanently
|
||||||
|
|
||||||
|
If a node is not going to return and you would like Elasticsearch to allocate
|
||||||
|
the missing shards immediately, just update the timeout to zero:
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
------------------------------
|
||||||
|
PUT /_all/_settings
|
||||||
|
{
|
||||||
|
"settings": {
|
||||||
|
"index.unassigned.node_left.delayed_timeout": "0"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
You can reset the timeout as soon as the missing shards have started to recover.
|
|
@ -0,0 +1,97 @@
|
||||||
|
[[shard-allocation-filtering]]
|
||||||
|
=== Shard Allocation Filtering
|
||||||
|
|
||||||
|
Shard allocation filtering allows you to specify which nodes are allowed
|
||||||
|
to host the shards of a particular index.
|
||||||
|
|
||||||
|
NOTE: The per-index shard allocation filters explained below work in
|
||||||
|
conjunction with the cluster-wide allocation filters explained in
|
||||||
|
<<shards-allocation>>.
|
||||||
|
|
||||||
|
It is possible to assign arbitrary metadata attributes to each node at
|
||||||
|
startup. For instance, nodes could be assigned a `rack` and a `group`
|
||||||
|
attribute as follows:
|
||||||
|
|
||||||
|
[source,sh]
|
||||||
|
------------------------
|
||||||
|
bin/elasticsearch --node.rack rack1 --node.size big <1>
|
||||||
|
------------------------
|
||||||
|
<1> These attribute settings can also be specfied in the `elasticsearch.yml` config file.
|
||||||
|
|
||||||
|
These metadata attributes can be used with the
|
||||||
|
`index.routing.allocation.*` settings to allocate an index to a particular
|
||||||
|
group of nodes. For instance, we can move the index `test` to either `big` or
|
||||||
|
`medium` nodes as follows:
|
||||||
|
|
||||||
|
[source,json]
|
||||||
|
------------------------
|
||||||
|
PUT test/_settings
|
||||||
|
{
|
||||||
|
"index.routing.allocation.include.size": "big,medium"
|
||||||
|
}
|
||||||
|
------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
Alternatively, we can move the index `test` away from the `small` nodes with
|
||||||
|
an `exclude` rule:
|
||||||
|
|
||||||
|
[source,json]
|
||||||
|
------------------------
|
||||||
|
PUT test/_settings
|
||||||
|
{
|
||||||
|
"index.routing.allocation.exclude.size": "small"
|
||||||
|
}
|
||||||
|
------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
Multiple rules can be specified, in which case all conditions must be
|
||||||
|
satisfied. For instance, we could move the index `test` to `big` nodes in
|
||||||
|
`rack1` with the following:
|
||||||
|
|
||||||
|
[source,json]
|
||||||
|
------------------------
|
||||||
|
PUT test/_settings
|
||||||
|
{
|
||||||
|
"index.routing.allocation.include.size": "big",
|
||||||
|
"index.routing.allocation.include.rack": "rack1"
|
||||||
|
}
|
||||||
|
------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
NOTE: If some conditions cannot be satisfied then shards will not be moved.
|
||||||
|
|
||||||
|
The following settings are _dynamic_, allowing live indices to be moved from
|
||||||
|
one set of nodes to another:
|
||||||
|
|
||||||
|
`index.routing.allocation.include.{attribute}`::
|
||||||
|
|
||||||
|
Assign the index to a node whose `{attribute}` has at least one of the
|
||||||
|
comma-separated values.
|
||||||
|
|
||||||
|
`index.routing.allocation.require.{attribute}`::
|
||||||
|
|
||||||
|
Assign the index to a node whose `{attribute}` has _all_ of the
|
||||||
|
comma-separated values.
|
||||||
|
|
||||||
|
`index.routing.allocation.exclude.{attribute}`::
|
||||||
|
|
||||||
|
Assign the index to a node whose `{attribute}` has _none_ of the
|
||||||
|
comma-separated values.
|
||||||
|
|
||||||
|
These special attributes are also supported:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`_name`:: Match nodes by node name
|
||||||
|
`_ip`:: Match nodes by IP address (the IP address associated with the hostname)
|
||||||
|
`_host`:: Match nodes by hostname
|
||||||
|
|
||||||
|
All attribute values can be specified with wildcards, eg:
|
||||||
|
|
||||||
|
[source,json]
|
||||||
|
------------------------
|
||||||
|
PUT test/_settings
|
||||||
|
{
|
||||||
|
"index.routing.allocation.include._ip": "192.168.2.*"
|
||||||
|
}
|
||||||
|
------------------------
|
||||||
|
// AUTOSENSE
|
|
@ -0,0 +1,26 @@
|
||||||
|
[[allocation-total-shards]]
|
||||||
|
=== Total Shards Per Node
|
||||||
|
|
||||||
|
The cluster-level shard allocator tries to spread the shards of a single index
|
||||||
|
across as many nodes as possible. However, depending on how many shards and
|
||||||
|
indices you have, and how big they are, it may not always be possible to spread
|
||||||
|
shards evenly.
|
||||||
|
|
||||||
|
The following _dynamic_ setting allows you to specify a hard limit on the total
|
||||||
|
number of shards from a single index allowed per node:
|
||||||
|
|
||||||
|
`index.routing.allocation.total_shards_per_node`::
|
||||||
|
|
||||||
|
The maximum number of shards (replicas and primaries) that will be
|
||||||
|
allocated to a single node. Defaults to unbounded.
|
||||||
|
|
||||||
|
[WARNING]
|
||||||
|
=======================================
|
||||||
|
This setting imposes a hard limit which can result in some shards not
|
||||||
|
being allocated.
|
||||||
|
|
||||||
|
Use with caution.
|
||||||
|
=======================================
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue