2015-06-22 17:49:45 -04:00
|
|
|
[[allocation-awareness]]
|
|
|
|
=== Shard Allocation Awareness
|
|
|
|
|
|
|
|
When running nodes on multiple VMs on the same physical server, on multiple
|
2018-03-19 03:04:47 -04:00
|
|
|
racks, or across multiple zones or domains, it is more likely that two nodes on
|
|
|
|
the same physical server, in the same rack, or in the same zone or domain will
|
2015-06-22 17:49:45 -04:00
|
|
|
crash at the same time, rather than two unrelated nodes crashing
|
|
|
|
simultaneously.
|
|
|
|
|
|
|
|
If Elasticsearch is _aware_ of the physical configuration of your hardware, it
|
|
|
|
can ensure that the primary shard and its replica shards are spread across
|
|
|
|
different physical servers, racks, or zones, to minimise the risk of losing
|
|
|
|
all shard copies at the same time.
|
|
|
|
|
|
|
|
The shard allocation awareness settings allow you to tell Elasticsearch about
|
|
|
|
your hardware configuration.
|
|
|
|
|
|
|
|
As an example, let's assume we have several racks. When we start a node, we
|
|
|
|
can tell it which rack it is in by assigning it an arbitrary metadata
|
|
|
|
attribute called `rack_id` -- we could use any attribute name. For example:
|
|
|
|
|
|
|
|
[source,sh]
|
|
|
|
----------------------
|
2016-05-19 14:08:08 -04:00
|
|
|
./bin/elasticsearch -Enode.attr.rack_id=rack_one <1>
|
2015-06-22 17:49:45 -04:00
|
|
|
----------------------
|
|
|
|
<1> This setting could also be specified in the `elasticsearch.yml` config file.
|
|
|
|
|
2018-03-19 03:04:47 -04:00
|
|
|
Now, we need to set up _shard allocation awareness_ by telling Elasticsearch
|
2015-06-22 17:49:45 -04:00
|
|
|
which attributes to use. This can be configured in the `elasticsearch.yml`
|
|
|
|
file on *all* master-eligible nodes, or it can be set (and changed) with the
|
|
|
|
<<cluster-update-settings,cluster-update-settings>> API.
|
|
|
|
|
|
|
|
For our example, we'll set the value in the config file:
|
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
--------------------------------------------------------
|
|
|
|
cluster.routing.allocation.awareness.attributes: rack_id
|
|
|
|
--------------------------------------------------------
|
|
|
|
|
2018-03-19 03:04:47 -04:00
|
|
|
With this config in place, let's say we start two nodes with
|
|
|
|
`node.attr.rack_id` set to `rack_one`, and we create an index with 5 primary
|
|
|
|
shards and 1 replica of each primary. All primaries and replicas are
|
|
|
|
allocated across the two nodes.
|
2015-06-22 17:49:45 -04:00
|
|
|
|
2016-03-30 07:52:45 -04:00
|
|
|
Now, if we start two more nodes with `node.attr.rack_id` set to `rack_two`,
|
2015-06-22 17:49:45 -04:00
|
|
|
Elasticsearch will move shards across to the new nodes, ensuring (if possible)
|
2018-03-19 03:04:47 -04:00
|
|
|
that no two copies of the same shard will be in the same rack. However if
|
|
|
|
`rack_two` were to fail, taking down both of its nodes, Elasticsearch will
|
|
|
|
still allocate the lost shard copies to nodes in `rack_one`.
|
2015-06-22 17:49:45 -04:00
|
|
|
|
|
|
|
.Prefer local shards
|
|
|
|
*********************************************
|
|
|
|
|
|
|
|
When executing search or GET requests, with shard awareness enabled,
|
|
|
|
Elasticsearch will prefer using local shards -- shards in the same awareness
|
2018-03-19 03:04:47 -04:00
|
|
|
group -- to execute the request. This is usually faster than crossing between
|
|
|
|
racks or across zone boundaries.
|
2015-06-22 17:49:45 -04:00
|
|
|
|
|
|
|
*********************************************
|
|
|
|
|
2018-03-19 03:04:47 -04:00
|
|
|
Multiple awareness attributes can be specified, in which case each attribute
|
|
|
|
is considered separately when deciding where to allocate the shards.
|
2015-06-22 17:49:45 -04:00
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
-------------------------------------------------------------
|
|
|
|
cluster.routing.allocation.awareness.attributes: rack_id,zone
|
|
|
|
-------------------------------------------------------------
|
|
|
|
|
2018-03-19 03:04:47 -04:00
|
|
|
NOTE: When using awareness attributes, shards will not be allocated to nodes
|
|
|
|
that don't have values set for those attributes.
|
2015-06-22 17:49:45 -04:00
|
|
|
|
2018-03-19 03:04:47 -04:00
|
|
|
NOTE: Number of primary/replica of a shard allocated on a specific group of
|
|
|
|
nodes with the same awareness attribute value is determined by the number of
|
|
|
|
attribute values. When the number of nodes in groups is unbalanced and there
|
|
|
|
are many replicas, replica shards may be left unassigned.
|
2015-07-29 23:50:10 -04:00
|
|
|
|
2015-06-22 17:49:45 -04:00
|
|
|
[float]
|
|
|
|
[[forced-awareness]]
|
|
|
|
=== Forced Awareness
|
|
|
|
|
2018-03-19 03:04:47 -04:00
|
|
|
Imagine that you have two zones and enough hardware across the two zones to
|
|
|
|
host all of your primary and replica shards. But perhaps the hardware in a
|
|
|
|
single zone, while sufficient to host half the shards, would be unable to host
|
|
|
|
*ALL* the shards.
|
2015-06-22 17:49:45 -04:00
|
|
|
|
|
|
|
With ordinary awareness, if one zone lost contact with the other zone,
|
|
|
|
Elasticsearch would assign all of the missing replica shards to a single zone.
|
|
|
|
But in this example, this sudden extra load would cause the hardware in the
|
|
|
|
remaining zone to be overloaded.
|
|
|
|
|
|
|
|
Forced awareness solves this problem by *NEVER* allowing copies of the same
|
|
|
|
shard to be allocated to the same zone.
|
|
|
|
|
2018-03-19 03:04:47 -04:00
|
|
|
For example, lets say we have an awareness attribute called `zone`, and we
|
|
|
|
know we are going to have two zones, `zone1` and `zone2`. Here is how we can
|
|
|
|
force awareness on a node:
|
2015-06-22 17:49:45 -04:00
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2 <1>
|
|
|
|
cluster.routing.allocation.awareness.attributes: zone
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
<1> We must list all possible values that the `zone` attribute can have.
|
|
|
|
|
2018-03-19 03:04:47 -04:00
|
|
|
Now, if we start 2 nodes with `node.attr.zone` set to `zone1` and create an
|
|
|
|
index with 5 shards and 1 replica. The index will be created, but only the 5
|
|
|
|
primary shards will be allocated (with no replicas). Only when we start more
|
|
|
|
nodes with `node.attr.zone` set to `zone2` will the replicas be allocated.
|
2015-06-22 17:49:45 -04:00
|
|
|
|
|
|
|
The `cluster.routing.allocation.awareness.*` settings can all be updated
|
|
|
|
dynamically on a live cluster with the
|
|
|
|
<<cluster-update-settings,cluster-update-settings>> API.
|
|
|
|
|
|
|
|
|