[DOCS] Reworked the shard allocation filtering info. (#36456)

* [DOCS] Reworked the shard allocation filtering info. Closes #36079

* Added multiple index allocation settings example back.

* Removed extraneous space
This commit is contained in:
debadair 2018-12-11 07:44:57 -08:00 committed by GitHub
parent c3a6d1998a
commit c9e03e6ead
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 177 additions and 171 deletions

View File

@ -1,29 +1,54 @@
[[shard-allocation-filtering]] [[shard-allocation-filtering]]
=== Shard Allocation Filtering === Index-level shard allocation filtering
Shard allocation filtering allows you to specify which nodes are allowed You can use shard allocation filters to control where {es} allocates shards of
to host the shards of a particular index. a particular index. These per-index filters are applied in conjunction with
<<allocation-filtering, cluster-wide allocation filtering>> and
<<allocation-awareness, allocation awareness>>.
NOTE: The per-index shard allocation filters explained below work in Shard allocation filters can be based on custom node attributes or the built-in
conjunction with the cluster-wide allocation filters explained in `_name`, `host_ip`, `publish_ip`, `_ip`, and `_host` attributes.
<<shards-allocation>>. <<index-lifecycle-management, Index lifecycle management>> uses filters based
on custom node attributes to determine how to reallocate shards when moving
between phases.
It is possible to assign arbitrary metadata attributes to each node at The `cluster.routing.allocation` settings are dynamic, enabling live indices to
startup. For instance, nodes could be assigned a `rack` and a `size` be moved from one set of nodes to another. Shards are only relocated if it is
attribute as follows: possible to do so without breaking another routing constraint, such as never
allocating a primary and replica shard on the same node.
For example, you could use a custom node attribute to indicate a node's
performance characteristics and use shard allocation filtering to route shards
for a particular index to the most appropriate class of hardware.
[float]
[[index-allocation-filters]]
==== Enabling index-level shard allocation filtering
To filter based on a custom node attribute:
. Specify the filter characteristics with a custom node attribute in each
node's `elasticsearch.yml` configuration file. For example, if you have `small`,
`medium`, and `big` nodes, you could add a `size` attribute to filter based
on node size.
+
[source,yaml]
--------------------------------------------------------
node.attr.size: medium
--------------------------------------------------------
+
You can also set custom attributes when you start a node:
+
[source,sh] [source,sh]
------------------------ --------------------------------------------------------
bin/elasticsearch -Enode.attr.rack=rack1 -Enode.attr.size=big <1> `./bin/elasticsearch -Enode.attr.size=medium
------------------------ --------------------------------------------------------
<1> These attribute settings can also be specified in the `elasticsearch.yml` config file.
These metadata attributes can be used with the
`index.routing.allocation.*` settings to allocate an index to a particular
group of nodes. For instance, we can move the index `test` to either `big` or
`medium` nodes as follows:
. Add a routing allocation filter to the index. The `index.routing.allocation`
settings support three types of filters: `include`, `exclude`, and `require`.
For example, to tell {es} to allocate shards from the `test` index to either
`big` or `medium` nodes, use `index.routing.allocation.include`:
+
[source,js] [source,js]
------------------------ ------------------------
PUT test/_settings PUT test/_settings
@ -33,24 +58,11 @@ PUT test/_settings
------------------------ ------------------------
// CONSOLE // CONSOLE
// TEST[s/^/PUT test\n/] // TEST[s/^/PUT test\n/]
+
Alternatively, we can move the index `test` away from the `small` nodes with If you specify multiple filters, all conditions must be satisfied for shards to
an `exclude` rule: be relocated. For example, to move the `test` index to `big` nodes in `rack1`,
you could specify:
[source,js] +
------------------------
PUT test/_settings
{
"index.routing.allocation.exclude.size": "small"
}
------------------------
// CONSOLE
// TEST[s/^/PUT test\n/]
Multiple rules can be specified, in which case all conditions must be
satisfied. For instance, we could move the index `test` to `big` nodes in
`rack1` with the following:
[source,js] [source,js]
------------------------ ------------------------
PUT test/_settings PUT test/_settings
@ -62,10 +74,9 @@ PUT test/_settings
// CONSOLE // CONSOLE
// TEST[s/^/PUT test\n/] // TEST[s/^/PUT test\n/]
NOTE: If some conditions cannot be satisfied then shards will not be moved. [float]
[[index-allocation-settings]]
The following settings are _dynamic_, allowing live indices to be moved from ==== Index allocation filter settings
one set of nodes to another:
`index.routing.allocation.include.{attribute}`:: `index.routing.allocation.include.{attribute}`::
@ -82,7 +93,7 @@ one set of nodes to another:
Assign the index to a node whose `{attribute}` has _none_ of the Assign the index to a node whose `{attribute}` has _none_ of the
comma-separated values. comma-separated values.
These special attributes are also supported: The index allocation settings support the following built-in attributes:
[horizontal] [horizontal]
`_name`:: Match nodes by node name `_name`:: Match nodes by node name
@ -91,7 +102,7 @@ These special attributes are also supported:
`_ip`:: Match either `_host_ip` or `_publish_ip` `_ip`:: Match either `_host_ip` or `_publish_ip`
`_host`:: Match nodes by hostname `_host`:: Match nodes by hostname
All attribute values can be specified with wildcards, eg: You can use wildcards when specifying attribute values, for example:
[source,js] [source,js]
------------------------ ------------------------

View File

@ -1,5 +1,5 @@
[[allocation-total-shards]] [[allocation-total-shards]]
=== Total Shards Per Node === Total shards per node
The cluster-level shard allocator tries to spread the shards of a single index The cluster-level shard allocator tries to spread the shards of a single index
across as many nodes as possible. However, depending on how many shards and across as many nodes as possible. However, depending on how many shards and
@ -28,6 +28,3 @@ allocated.
Use with caution. Use with caution.
======================================= =======================================

View File

@ -1,114 +1,110 @@
[[allocation-awareness]] [[allocation-awareness]]
=== Shard Allocation Awareness === Shard allocation awareness
When running nodes on multiple VMs on the same physical server, on multiple You can use custom node attributes as _awareness attributes_ to enable {es}
racks, or across multiple zones or domains, it is more likely that two nodes on to take your physical hardware configuration into account when allocating shards.
the same physical server, in the same rack, or in the same zone or domain will If {es} knows which nodes are on the same physical server, in the same rack, or
crash at the same time, rather than two unrelated nodes crashing in the same zone, it can distribute the primary shard and its replica shards to
simultaneously. minimise the risk of losing all shard copies in the event of a failure.
If Elasticsearch is _aware_ of the physical configuration of your hardware, it When shard allocation awareness is enabled with the
can ensure that the primary shard and its replica shards are spread across `cluster.routing.allocation.awareness.attributes` setting, shards are only
different physical servers, racks, or zones, to minimise the risk of losing allocated to nodes that have values set for the specified awareness
all shard copies at the same time. attributes. If you use multiple awareness attributes, {es} considers
each attribute separately when allocating shards.
The shard allocation awareness settings allow you to tell Elasticsearch about The allocation awareness settings can be configured in
your hardware configuration. `elasticsearch.yml` and updated dynamically with the
As an example, let's assume we have several racks. When we start a node, we
can tell it which rack it is in by assigning it an arbitrary metadata
attribute called `rack_id` -- we could use any attribute name. For example:
[source,sh]
----------------------
./bin/elasticsearch -Enode.attr.rack_id=rack_one <1>
----------------------
<1> This setting could also be specified in the `elasticsearch.yml` config file.
Now, we need to set up _shard allocation awareness_ by telling Elasticsearch
which attributes to use. This can be configured in the `elasticsearch.yml`
file on *all* master-eligible nodes, or it can be set (and changed) with the
<<cluster-update-settings,cluster-update-settings>> API. <<cluster-update-settings,cluster-update-settings>> API.
For our example, we'll set the value in the config file: {es} prefers using shards in the same location (with the same
awareness attribute values) to process search or GET requests. Using local
shards is usually faster than crossing rack or zone boundaries.
NOTE: The number of attribute values determines how many shard copies are
allocated in each location. If the number of nodes in each location is
unbalanced and there are a lot of replicas, replica shards might be left
unassigned.
[float]
[[enabling-awareness]]
==== Enabling shard allocation awareness
To enable shard allocation awareness:
. Specify the location of each node with a custom node attribute. For example,
if you want Elasticsearch to distribute shards across different racks, you might
set an awareness attribute called `rack_id` in each node's `elasticsearch.yml`
config file.
+
[source,yaml] [source,yaml]
-------------------------------------------------------- --------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id node.attr.rack_id: rack_one
--------------------------------------------------------
+
You can also set custom attributes when you start a node:
+
[source,sh]
--------------------------------------------------------
`./bin/elasticsearch -Enode.attr.rack_id=rack_one`
-------------------------------------------------------- --------------------------------------------------------
With this config in place, let's say we start two nodes with . Tell {es} to take one or more awareness attributes into account when
`node.attr.rack_id` set to `rack_one`, and we create an index with 5 primary allocating shards by setting
shards and 1 replica of each primary. All primaries and replicas are `cluster.routing.allocation.awareness.attributes` in *every* master-eligible
node's `elasticsearch.yml` config file.
+
--
[source,yaml]
--------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id <1>
--------------------------------------------------------
<1> Specify multiple attributes as a comma-separated list.
--
+
You can also use the
<<cluster-update-settings,cluster-update-settings>> API to set or update
a cluster's awareness attributes.
With this example configuration, if you start two nodes with
`node.attr.rack_id` set to `rack_one` and create an index with 5 primary
shards and 1 replica of each primary, all primaries and replicas are
allocated across the two nodes. allocated across the two nodes.
Now, if we start two more nodes with `node.attr.rack_id` set to `rack_two`, If you add two nodes with `node.attr.rack_id` set to `rack_two`,
Elasticsearch will move shards across to the new nodes, ensuring (if possible) {es} moves shards to the new nodes, ensuring (if possible)
that no two copies of the same shard will be in the same rack. However if that no two copies of the same shard are in the same rack.
`rack_two` were to fail, taking down both of its nodes, Elasticsearch will
still allocate the lost shard copies to nodes in `rack_one`.
.Prefer local shards If `rack_two` fails and takes down both its nodes, by default {es}
********************************************* allocates the lost shard copies to nodes in `rack_one`. To prevent multiple
copies of a particular shard from being allocated in the same location, you can
When executing search or GET requests, with shard awareness enabled, enable forced awareness.
Elasticsearch will prefer using local shards -- shards in the same awareness
group -- to execute the request. This is usually faster than crossing between
racks or across zone boundaries.
*********************************************
Multiple awareness attributes can be specified, in which case each attribute
is considered separately when deciding where to allocate the shards.
[source,yaml]
-------------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id,zone
-------------------------------------------------------------
NOTE: When using awareness attributes, shards will not be allocated to nodes
that don't have values set for those attributes.
NOTE: Number of primary/replica of a shard allocated on a specific group of
nodes with the same awareness attribute value is determined by the number of
attribute values. When the number of nodes in groups is unbalanced and there
are many replicas, replica shards may be left unassigned.
[float] [float]
[[forced-awareness]] [[forced-awareness]]
=== Forced Awareness ==== Forced awareness
Imagine that you have two zones and enough hardware across the two zones to By default, if one location fails, Elasticsearch assigns all of the missing
host all of your primary and replica shards. But perhaps the hardware in a replica shards to the remaining locations. While you might have sufficient
single zone, while sufficient to host half the shards, would be unable to host resources across all locations to host your primary and replica shards, a single
*ALL* the shards. location might be unable to host *ALL* of the shards.
With ordinary awareness, if one zone lost contact with the other zone, To prevent a single location from being overloaded in the event of a failure,
Elasticsearch would assign all of the missing replica shards to a single zone. you can set `cluster.routing.allocation.awareness.force` so no replicas are
But in this example, this sudden extra load would cause the hardware in the allocated until nodes are available in another location.
remaining zone to be overloaded.
Forced awareness solves this problem by *NEVER* allowing copies of the same For example, if you have an awareness attribute called `zone` and configure nodes
shard to be allocated to the same zone. in `zone1` and `zone2`, you can use forced awareness to prevent Elasticsearch
from allocating replicas if only one zone is available:
For example, lets say we have an awareness attribute called `zone`, and we
know we are going to have two zones, `zone1` and `zone2`. Here is how we can
force awareness on a node:
[source,yaml] [source,yaml]
------------------------------------------------------------------- -------------------------------------------------------------------
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2 <1>
cluster.routing.allocation.awareness.attributes: zone cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2 <1>
------------------------------------------------------------------- -------------------------------------------------------------------
<1> We must list all possible values that the `zone` attribute can have. <1> Specify all possible values for the awareness attribute.
Now, if we start 2 nodes with `node.attr.zone` set to `zone1` and create an
index with 5 shards and 1 replica. The index will be created, but only the 5
primary shards will be allocated (with no replicas). Only when we start more
nodes with `node.attr.zone` set to `zone2` will the replicas be allocated.
The `cluster.routing.allocation.awareness.*` settings can all be updated
dynamically on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API.
With this example configuration, if you start two nodes with `node.attr.zone` set
to `zone1` and create an index with 5 shards and 1 replica, Elasticsearch creates
the index and allocates the 5 primary shards but no replicas. Replicas are
only allocated once nodes with `node.attr.zone` set to `zone2` are available.

View File

@ -1,13 +1,37 @@
[[allocation-filtering]] [[allocation-filtering]]
=== Shard Allocation Filtering === Cluster-level shard allocation filtering
While <<index-modules-allocation>> provides *per-index* settings to control the You can use cluster-level shard allocation filters to control where {es}
allocation of shards to nodes, cluster-level shard allocation filtering allows allocates shards from any index. These cluster wide filters are applied in
you to allow or disallow the allocation of shards from *any* index to conjunction with <<shard-allocation-filtering, per-index allocation filtering>>
particular nodes. and <<allocation-awareness, allocation awareness>>.
The available _dynamic_ cluster settings are as follows, where `{attribute}` Shard allocation filters can be based on custom node attributes or the built-in
refers to an arbitrary node attribute.: `_name`, `_ip`, and `_host` attributes.
The `cluster.routing.allocation` settings are dynamic, enabling live indices to
be moved from one set of nodes to another. Shards are only relocated if it is
possible to do so without breaking another routing constraint, such as never
allocating a primary and replica shard on the same node.
The most common use case for cluster-level shard allocation filtering is when
you want to decommission a node. To move shards off of a node prior to shutting
it down, you could create a filter that excludes the node by its IP address:
[source,js]
--------------------------------------------------
PUT _cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}
--------------------------------------------------
// CONSOLE
[float]
[[cluster-routing-settings]]
==== Cluster routing settings
`cluster.routing.allocation.include.{attribute}`:: `cluster.routing.allocation.include.{attribute}`::
@ -24,36 +48,14 @@ refers to an arbitrary node attribute.:
Do not allocate shards to a node whose `{attribute}` has _any_ of the Do not allocate shards to a node whose `{attribute}` has _any_ of the
comma-separated values. comma-separated values.
These special attributes are also supported: The cluster allocation settings support the following built-in attributes:
[horizontal] [horizontal]
`_name`:: Match nodes by node names `_name`:: Match nodes by node names
`_ip`:: Match nodes by IP addresses (the IP address associated with the hostname) `_ip`:: Match nodes by IP addresses (the IP address associated with the hostname)
`_host`:: Match nodes by hostnames `_host`:: Match nodes by hostnames
The typical use case for cluster-wide shard allocation filtering is when you You can use wildcards when specifying attribute values, for example:
want to decommission a node, and you would like to move the shards from that
node to other nodes in the cluster before shutting it down.
For instance, we could decommission a node using its IP address as follows:
[source,js]
--------------------------------------------------
PUT _cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}
--------------------------------------------------
// CONSOLE
NOTE: Shards will only be relocated if it is possible to do so without
breaking another routing constraint, such as never allocating a primary and
replica shard to the same node.
In addition to listing multiple values as a comma-separated list, all
attribute values can be specified with wildcards, eg:
[source,js] [source,js]
------------------------ ------------------------

View File

@ -1,5 +1,5 @@
[[disk-allocator]] [[disk-allocator]]
=== Disk-based Shard Allocation === Disk-based shard allocation
Elasticsearch considers the available disk space on a node before deciding Elasticsearch considers the available disk space on a node before deciding
whether to allocate new shards to that node or to actively relocate shards away whether to allocate new shards to that node or to actively relocate shards away

View File

@ -1,12 +1,12 @@
[[shards-allocation]] [[shards-allocation]]
=== Cluster Level Shard Allocation === Cluster level shard allocation
Shard allocation is the process of allocating shards to nodes. This can Shard allocation is the process of allocating shards to nodes. This can
happen during initial recovery, replica allocation, rebalancing, or happen during initial recovery, replica allocation, rebalancing, or
when nodes are added or removed. when nodes are added or removed.
[float] [float]
=== Shard Allocation Settings === Shard allocation settings
The following _dynamic_ settings may be used to control shard allocation and recovery: The following _dynamic_ settings may be used to control shard allocation and recovery:
@ -59,7 +59,7 @@ one of the active allocation ids in the cluster state.
setting only applies if multiple nodes are started on the same machine. setting only applies if multiple nodes are started on the same machine.
[float] [float]
=== Shard Rebalancing Settings === Shard rebalancing settings
The following _dynamic_ settings may be used to control the rebalancing of The following _dynamic_ settings may be used to control the rebalancing of
shards across the cluster: shards across the cluster:
@ -98,7 +98,7 @@ Specify when shard rebalancing is allowed:
or <<forced-awareness,forced awareness>>. or <<forced-awareness,forced awareness>>.
[float] [float]
=== Shard Balancing Heuristics === Shard balancing heuristics
The following settings are used together to determine where to place each The following settings are used together to determine where to place each
shard. The cluster is balanced when no allowed rebalancing operation can bring the weight shard. The cluster is balanced when no allowed rebalancing operation can bring the weight