169 lines
6.8 KiB
Plaintext
169 lines
6.8 KiB
Plaintext
[[index-modules-allocation]]
|
|
== Index Shard Allocation
|
|
|
|
[float]
|
|
[[shard-allocation-filtering]]
|
|
=== Shard Allocation Filtering
|
|
|
|
Allows to control the allocation of indices on nodes based on include/exclude
|
|
filters. The filters can be set both on the index level and on the
|
|
cluster level. Lets start with an example of setting it on the cluster
|
|
level:
|
|
|
|
Lets say we have 4 nodes, each has specific attribute called `tag`
|
|
associated with it (the name of the attribute can be any name). Each
|
|
node has a specific value associated with `tag`. Node 1 has a setting
|
|
`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on.
|
|
|
|
We can create an index that will only deploy on nodes that have `tag`
|
|
set to `value1` and `value2` by setting
|
|
`index.routing.allocation.include.tag` to `value1,value2`. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT localhost:9200/test/_settings -d '{
|
|
"index.routing.allocation.include.tag" : "value1,value2"
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
On the other hand, we can create an index that will be deployed on all
|
|
nodes except for nodes with a `tag` of value `value3` by setting
|
|
`index.routing.allocation.exclude.tag` to `value3`. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT localhost:9200/test/_settings -d '{
|
|
"index.routing.allocation.exclude.tag" : "value3"
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
`index.routing.allocation.require.*` can be used to
|
|
specify a number of rules, all of which MUST match in order for a shard
|
|
to be allocated to a node. This is in contrast to `include` which will
|
|
include a node if ANY rule matches.
|
|
|
|
The `include`, `exclude` and `require` values can have generic simple
|
|
matching wildcards, for example, `value1*`. Additionally, special attribute
|
|
names called `_ip`, `_name`, `_id` and `_host` can be used to match by node
|
|
ip address, name, id or host name, respectively.
|
|
|
|
Obviously a node can have several attributes associated with it, and
|
|
both the attribute name and value are controlled in the setting. For
|
|
example, here is a sample of several node configurations:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
node.group1: group1_value1
|
|
node.group2: group2_value4
|
|
--------------------------------------------------
|
|
|
|
In the same manner, `include`, `exclude` and `require` can work against
|
|
several attributes, for example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT localhost:9200/test/_settings -d '{
|
|
"index.routing.allocation.include.group1" : "xxx"
|
|
"index.routing.allocation.include.group2" : "yyy",
|
|
"index.routing.allocation.exclude.group3" : "zzz",
|
|
"index.routing.allocation.require.group4" : "aaa",
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
The provided settings can also be updated in real time using the update
|
|
settings API, allowing to "move" indices (shards) around in realtime.
|
|
|
|
Cluster wide filtering can also be defined, and be updated in real time
|
|
using the cluster update settings API. This setting can come in handy
|
|
for things like decommissioning nodes (even if the replica count is set
|
|
to 0). Here is a sample of how to decommission a node based on `_ip`
|
|
address:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT localhost:9200/_cluster/settings -d '{
|
|
"transient" : {
|
|
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
|
|
}
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
=== Total Shards Per Node
|
|
|
|
The `index.routing.allocation.total_shards_per_node` setting allows to
|
|
control how many total shards (replicas and primaries) for an index will be allocated per node.
|
|
It can be dynamically set on a live index using the update index
|
|
settings API.
|
|
|
|
[float]
|
|
[[disk]]
|
|
=== Disk-based Shard Allocation
|
|
|
|
disk based shard allocation is enabled from version 1.3.0 onward
|
|
|
|
Elasticsearch can be configured to prevent shard
|
|
allocation on nodes depending on disk usage for the node. This
|
|
functionality is enabled by default, and can be changed either in the
|
|
configuration file, or dynamically using:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT localhost:9200/_cluster/settings -d '{
|
|
"transient" : {
|
|
"cluster.routing.allocation.disk.threshold_enabled" : false
|
|
}
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
Once enabled, Elasticsearch uses two watermarks to decide whether
|
|
shards should be allocated or can remain on the node.
|
|
|
|
`cluster.routing.allocation.disk.watermark.low` controls the low
|
|
watermark for disk usage. It defaults to 85%, meaning ES will not
|
|
allocate new shards to nodes once they have more than 85% disk
|
|
used. It can also be set to an absolute byte value (like 500mb) to
|
|
prevent ES from allocating shards if less than the configured amount
|
|
of space is available.
|
|
|
|
`cluster.routing.allocation.disk.watermark.high` controls the high
|
|
watermark. It defaults to 90%, meaning ES will attempt to relocate
|
|
shards to another node if the node disk usage rises above 90%. It can
|
|
also be set to an absolute byte value (similar to the low watermark)
|
|
to relocate shards once less than the configured amount of space is
|
|
available on the node.
|
|
|
|
NOTE: Percentage values refer to used disk space, while byte values refer to
|
|
free disk space. This can be confusing, since it flips the meaning of
|
|
high and low. For example, it makes sense to set the low watermark to 10gb
|
|
and the high watermark to 5gb, but not the other way around.
|
|
|
|
Both watermark settings can be changed dynamically using the cluster
|
|
settings API. By default, Elasticsearch will retrieve information
|
|
about the disk usage of the nodes every 30 seconds. This can also be
|
|
changed by setting the `cluster.info.update.interval` setting.
|
|
|
|
An example of updating the low watermark to no more than 80% of the disk size, a
|
|
high watermark of at least 50 gigabytes free, and updating the information about
|
|
the cluster every minute:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT localhost:9200/_cluster/settings -d '{
|
|
"transient" : {
|
|
"cluster.routing.allocation.disk.watermark.low" : "80%",
|
|
"cluster.routing.allocation.disk.watermark.high" : "50gb",
|
|
"cluster.info.update.interval" : "1m"
|
|
}
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
By default, Elasticsearch will take into account shards that are currently being
|
|
relocated to the target node when computing a node's disk usage. This can be
|
|
changed by setting the `cluster.routing.allocation.disk.include_relocations`
|
|
setting to `false` (defaults to `true`). Taking relocating shards' sizes into
|
|
account may, however, mean that the disk usage for a node is incorrectly
|
|
estimated on the high side, since the relocation could be 90% complete and a
|
|
recently retrieved disk usage would include the total size of the relocating
|
|
shard as well as the space already used by the running relocation.
|