[DOCS] Reworked the shard allocation filtering info. (#36456)

* [DOCS] Reworked the shard allocation filtering info. Closes #36079

* Added multiple index allocation settings example back.

* Removed extraneous space
This commit is contained in:
debadair 2018-12-11 07:44:57 -08:00 committed by GitHub
parent c3a6d1998a
commit c9e03e6ead
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 177 additions and 171 deletions

View File

@ -1,29 +1,54 @@
[[shard-allocation-filtering]]
=== Shard Allocation Filtering
=== Index-level shard allocation filtering
Shard allocation filtering allows you to specify which nodes are allowed
to host the shards of a particular index.
You can use shard allocation filters to control where {es} allocates shards of
a particular index. These per-index filters are applied in conjunction with
<<allocation-filtering, cluster-wide allocation filtering>> and
<<allocation-awareness, allocation awareness>>.
NOTE: The per-index shard allocation filters explained below work in
conjunction with the cluster-wide allocation filters explained in
<<shards-allocation>>.
Shard allocation filters can be based on custom node attributes or the built-in
`_name`, `host_ip`, `publish_ip`, `_ip`, and `_host` attributes.
<<index-lifecycle-management, Index lifecycle management>> uses filters based
on custom node attributes to determine how to reallocate shards when moving
between phases.
It is possible to assign arbitrary metadata attributes to each node at
startup. For instance, nodes could be assigned a `rack` and a `size`
attribute as follows:
The `cluster.routing.allocation` settings are dynamic, enabling live indices to
be moved from one set of nodes to another. Shards are only relocated if it is
possible to do so without breaking another routing constraint, such as never
allocating a primary and replica shard on the same node.
For example, you could use a custom node attribute to indicate a node's
performance characteristics and use shard allocation filtering to route shards
for a particular index to the most appropriate class of hardware.
[float]
[[index-allocation-filters]]
==== Enabling index-level shard allocation filtering
To filter based on a custom node attribute:
. Specify the filter characteristics with a custom node attribute in each
node's `elasticsearch.yml` configuration file. For example, if you have `small`,
`medium`, and `big` nodes, you could add a `size` attribute to filter based
on node size.
+
[source,yaml]
--------------------------------------------------------
node.attr.size: medium
--------------------------------------------------------
+
You can also set custom attributes when you start a node:
+
[source,sh]
------------------------
bin/elasticsearch -Enode.attr.rack=rack1 -Enode.attr.size=big <1>
------------------------
<1> These attribute settings can also be specified in the `elasticsearch.yml` config file.
These metadata attributes can be used with the
`index.routing.allocation.*` settings to allocate an index to a particular
group of nodes. For instance, we can move the index `test` to either `big` or
`medium` nodes as follows:
--------------------------------------------------------
`./bin/elasticsearch -Enode.attr.size=medium
--------------------------------------------------------
. Add a routing allocation filter to the index. The `index.routing.allocation`
settings support three types of filters: `include`, `exclude`, and `require`.
For example, to tell {es} to allocate shards from the `test` index to either
`big` or `medium` nodes, use `index.routing.allocation.include`:
+
[source,js]
------------------------
PUT test/_settings
@ -33,24 +58,11 @@ PUT test/_settings
------------------------
// CONSOLE
// TEST[s/^/PUT test\n/]
Alternatively, we can move the index `test` away from the `small` nodes with
an `exclude` rule:
[source,js]
------------------------
PUT test/_settings
{
"index.routing.allocation.exclude.size": "small"
}
------------------------
// CONSOLE
// TEST[s/^/PUT test\n/]
Multiple rules can be specified, in which case all conditions must be
satisfied. For instance, we could move the index `test` to `big` nodes in
`rack1` with the following:
+
If you specify multiple filters, all conditions must be satisfied for shards to
be relocated. For example, to move the `test` index to `big` nodes in `rack1`,
you could specify:
+
[source,js]
------------------------
PUT test/_settings
@ -62,10 +74,9 @@ PUT test/_settings
// CONSOLE
// TEST[s/^/PUT test\n/]
NOTE: If some conditions cannot be satisfied then shards will not be moved.
The following settings are _dynamic_, allowing live indices to be moved from
one set of nodes to another:
[float]
[[index-allocation-settings]]
==== Index allocation filter settings
`index.routing.allocation.include.{attribute}`::
@ -82,7 +93,7 @@ one set of nodes to another:
Assign the index to a node whose `{attribute}` has _none_ of the
comma-separated values.
These special attributes are also supported:
The index allocation settings support the following built-in attributes:
[horizontal]
`_name`:: Match nodes by node name
@ -91,7 +102,7 @@ These special attributes are also supported:
`_ip`:: Match either `_host_ip` or `_publish_ip`
`_host`:: Match nodes by hostname
All attribute values can be specified with wildcards, eg:
You can use wildcards when specifying attribute values, for example:
[source,js]
------------------------

View File

@ -1,5 +1,5 @@
[[allocation-total-shards]]
=== Total Shards Per Node
=== Total shards per node
The cluster-level shard allocator tries to spread the shards of a single index
across as many nodes as possible. However, depending on how many shards and
@ -28,6 +28,3 @@ allocated.
Use with caution.
=======================================

View File

@ -1,114 +1,110 @@
[[allocation-awareness]]
=== Shard Allocation Awareness
=== Shard allocation awareness
When running nodes on multiple VMs on the same physical server, on multiple
racks, or across multiple zones or domains, it is more likely that two nodes on
the same physical server, in the same rack, or in the same zone or domain will
crash at the same time, rather than two unrelated nodes crashing
simultaneously.
You can use custom node attributes as _awareness attributes_ to enable {es}
to take your physical hardware configuration into account when allocating shards.
If {es} knows which nodes are on the same physical server, in the same rack, or
in the same zone, it can distribute the primary shard and its replica shards to
minimise the risk of losing all shard copies in the event of a failure.
If Elasticsearch is _aware_ of the physical configuration of your hardware, it
can ensure that the primary shard and its replica shards are spread across
different physical servers, racks, or zones, to minimise the risk of losing
all shard copies at the same time.
When shard allocation awareness is enabled with the
`cluster.routing.allocation.awareness.attributes` setting, shards are only
allocated to nodes that have values set for the specified awareness
attributes. If you use multiple awareness attributes, {es} considers
each attribute separately when allocating shards.
The shard allocation awareness settings allow you to tell Elasticsearch about
your hardware configuration.
As an example, let's assume we have several racks. When we start a node, we
can tell it which rack it is in by assigning it an arbitrary metadata
attribute called `rack_id` -- we could use any attribute name. For example:
[source,sh]
----------------------
./bin/elasticsearch -Enode.attr.rack_id=rack_one <1>
----------------------
<1> This setting could also be specified in the `elasticsearch.yml` config file.
Now, we need to set up _shard allocation awareness_ by telling Elasticsearch
which attributes to use. This can be configured in the `elasticsearch.yml`
file on *all* master-eligible nodes, or it can be set (and changed) with the
The allocation awareness settings can be configured in
`elasticsearch.yml` and updated dynamically with the
<<cluster-update-settings,cluster-update-settings>> API.
For our example, we'll set the value in the config file:
{es} prefers using shards in the same location (with the same
awareness attribute values) to process search or GET requests. Using local
shards is usually faster than crossing rack or zone boundaries.
NOTE: The number of attribute values determines how many shard copies are
allocated in each location. If the number of nodes in each location is
unbalanced and there are a lot of replicas, replica shards might be left
unassigned.
[float]
[[enabling-awareness]]
==== Enabling shard allocation awareness
To enable shard allocation awareness:
. Specify the location of each node with a custom node attribute. For example,
if you want Elasticsearch to distribute shards across different racks, you might
set an awareness attribute called `rack_id` in each node's `elasticsearch.yml`
config file.
+
[source,yaml]
--------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id
node.attr.rack_id: rack_one
--------------------------------------------------------
+
You can also set custom attributes when you start a node:
+
[source,sh]
--------------------------------------------------------
`./bin/elasticsearch -Enode.attr.rack_id=rack_one`
--------------------------------------------------------
With this config in place, let's say we start two nodes with
`node.attr.rack_id` set to `rack_one`, and we create an index with 5 primary
shards and 1 replica of each primary. All primaries and replicas are
. Tell {es} to take one or more awareness attributes into account when
allocating shards by setting
`cluster.routing.allocation.awareness.attributes` in *every* master-eligible
node's `elasticsearch.yml` config file.
+
--
[source,yaml]
--------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id <1>
--------------------------------------------------------
<1> Specify multiple attributes as a comma-separated list.
--
+
You can also use the
<<cluster-update-settings,cluster-update-settings>> API to set or update
a cluster's awareness attributes.
With this example configuration, if you start two nodes with
`node.attr.rack_id` set to `rack_one` and create an index with 5 primary
shards and 1 replica of each primary, all primaries and replicas are
allocated across the two nodes.
Now, if we start two more nodes with `node.attr.rack_id` set to `rack_two`,
Elasticsearch will move shards across to the new nodes, ensuring (if possible)
that no two copies of the same shard will be in the same rack. However if
`rack_two` were to fail, taking down both of its nodes, Elasticsearch will
still allocate the lost shard copies to nodes in `rack_one`.
If you add two nodes with `node.attr.rack_id` set to `rack_two`,
{es} moves shards to the new nodes, ensuring (if possible)
that no two copies of the same shard are in the same rack.
.Prefer local shards
*********************************************
When executing search or GET requests, with shard awareness enabled,
Elasticsearch will prefer using local shards -- shards in the same awareness
group -- to execute the request. This is usually faster than crossing between
racks or across zone boundaries.
*********************************************
Multiple awareness attributes can be specified, in which case each attribute
is considered separately when deciding where to allocate the shards.
[source,yaml]
-------------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id,zone
-------------------------------------------------------------
NOTE: When using awareness attributes, shards will not be allocated to nodes
that don't have values set for those attributes.
NOTE: Number of primary/replica of a shard allocated on a specific group of
nodes with the same awareness attribute value is determined by the number of
attribute values. When the number of nodes in groups is unbalanced and there
are many replicas, replica shards may be left unassigned.
If `rack_two` fails and takes down both its nodes, by default {es}
allocates the lost shard copies to nodes in `rack_one`. To prevent multiple
copies of a particular shard from being allocated in the same location, you can
enable forced awareness.
[float]
[[forced-awareness]]
=== Forced Awareness
==== Forced awareness
Imagine that you have two zones and enough hardware across the two zones to
host all of your primary and replica shards. But perhaps the hardware in a
single zone, while sufficient to host half the shards, would be unable to host
*ALL* the shards.
By default, if one location fails, Elasticsearch assigns all of the missing
replica shards to the remaining locations. While you might have sufficient
resources across all locations to host your primary and replica shards, a single
location might be unable to host *ALL* of the shards.
With ordinary awareness, if one zone lost contact with the other zone,
Elasticsearch would assign all of the missing replica shards to a single zone.
But in this example, this sudden extra load would cause the hardware in the
remaining zone to be overloaded.
To prevent a single location from being overloaded in the event of a failure,
you can set `cluster.routing.allocation.awareness.force` so no replicas are
allocated until nodes are available in another location.
Forced awareness solves this problem by *NEVER* allowing copies of the same
shard to be allocated to the same zone.
For example, lets say we have an awareness attribute called `zone`, and we
know we are going to have two zones, `zone1` and `zone2`. Here is how we can
force awareness on a node:
For example, if you have an awareness attribute called `zone` and configure nodes
in `zone1` and `zone2`, you can use forced awareness to prevent Elasticsearch
from allocating replicas if only one zone is available:
[source,yaml]
-------------------------------------------------------------------
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2 <1>
cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2 <1>
-------------------------------------------------------------------
<1> We must list all possible values that the `zone` attribute can have.
Now, if we start 2 nodes with `node.attr.zone` set to `zone1` and create an
index with 5 shards and 1 replica. The index will be created, but only the 5
primary shards will be allocated (with no replicas). Only when we start more
nodes with `node.attr.zone` set to `zone2` will the replicas be allocated.
The `cluster.routing.allocation.awareness.*` settings can all be updated
dynamically on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API.
<1> Specify all possible values for the awareness attribute.
With this example configuration, if you start two nodes with `node.attr.zone` set
to `zone1` and create an index with 5 shards and 1 replica, Elasticsearch creates
the index and allocates the 5 primary shards but no replicas. Replicas are
only allocated once nodes with `node.attr.zone` set to `zone2` are available.

View File

@ -1,13 +1,37 @@
[[allocation-filtering]]
=== Shard Allocation Filtering
=== Cluster-level shard allocation filtering
While <<index-modules-allocation>> provides *per-index* settings to control the
allocation of shards to nodes, cluster-level shard allocation filtering allows
you to allow or disallow the allocation of shards from *any* index to
particular nodes.
You can use cluster-level shard allocation filters to control where {es}
allocates shards from any index. These cluster wide filters are applied in
conjunction with <<shard-allocation-filtering, per-index allocation filtering>>
and <<allocation-awareness, allocation awareness>>.
The available _dynamic_ cluster settings are as follows, where `{attribute}`
refers to an arbitrary node attribute.:
Shard allocation filters can be based on custom node attributes or the built-in
`_name`, `_ip`, and `_host` attributes.
The `cluster.routing.allocation` settings are dynamic, enabling live indices to
be moved from one set of nodes to another. Shards are only relocated if it is
possible to do so without breaking another routing constraint, such as never
allocating a primary and replica shard on the same node.
The most common use case for cluster-level shard allocation filtering is when
you want to decommission a node. To move shards off of a node prior to shutting
it down, you could create a filter that excludes the node by its IP address:
[source,js]
--------------------------------------------------
PUT _cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}
--------------------------------------------------
// CONSOLE
[float]
[[cluster-routing-settings]]
==== Cluster routing settings
`cluster.routing.allocation.include.{attribute}`::
@ -24,36 +48,14 @@ refers to an arbitrary node attribute.:
Do not allocate shards to a node whose `{attribute}` has _any_ of the
comma-separated values.
These special attributes are also supported:
The cluster allocation settings support the following built-in attributes:
[horizontal]
`_name`:: Match nodes by node names
`_ip`:: Match nodes by IP addresses (the IP address associated with the hostname)
`_host`:: Match nodes by hostnames
The typical use case for cluster-wide shard allocation filtering is when you
want to decommission a node, and you would like to move the shards from that
node to other nodes in the cluster before shutting it down.
For instance, we could decommission a node using its IP address as follows:
[source,js]
--------------------------------------------------
PUT _cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}
--------------------------------------------------
// CONSOLE
NOTE: Shards will only be relocated if it is possible to do so without
breaking another routing constraint, such as never allocating a primary and
replica shard to the same node.
In addition to listing multiple values as a comma-separated list, all
attribute values can be specified with wildcards, eg:
You can use wildcards when specifying attribute values, for example:
[source,js]
------------------------

View File

@ -1,5 +1,5 @@
[[disk-allocator]]
=== Disk-based Shard Allocation
=== Disk-based shard allocation
Elasticsearch considers the available disk space on a node before deciding
whether to allocate new shards to that node or to actively relocate shards away

View File

@ -1,12 +1,12 @@
[[shards-allocation]]
=== Cluster Level Shard Allocation
=== Cluster level shard allocation
Shard allocation is the process of allocating shards to nodes. This can
happen during initial recovery, replica allocation, rebalancing, or
when nodes are added or removed.
[float]
=== Shard Allocation Settings
=== Shard allocation settings
The following _dynamic_ settings may be used to control shard allocation and recovery:
@ -59,7 +59,7 @@ one of the active allocation ids in the cluster state.
setting only applies if multiple nodes are started on the same machine.
[float]
=== Shard Rebalancing Settings
=== Shard rebalancing settings
The following _dynamic_ settings may be used to control the rebalancing of
shards across the cluster:
@ -98,7 +98,7 @@ Specify when shard rebalancing is allowed:
or <<forced-awareness,forced awareness>>.
[float]
=== Shard Balancing Heuristics
=== Shard balancing heuristics
The following settings are used together to determine where to place each
shard. The cluster is balanced when no allowed rebalancing operation can bring the weight