OpenSearch/docs/reference/index-modules/allocation.asciidoc

[[index-modules-allocation]]
== Index Shard Allocation

[float]
=== Shard Allocation Filtering

Allow to control allocation if indices on nodes based on include/exclude
filters. The filters can be set both on the index level and on the
cluster level. Lets start with an example of setting it on the cluster
level:

Lets say we have 4 nodes, each has specific attribute called `tag`
associated with it (the name of the attribute can be any name). Each
node has a specific value associated with `tag`. Node 1 has a setting
`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on.

We can create an index that will only deploy on nodes that have `tag`
set to `value1` and `value2` by setting
`index.routing.allocation.include.tag` to `value1,value2`. For example:

[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
    "index.routing.allocation.include.tag" : "value1,value2"
}'
--------------------------------------------------

On the other hand, we can create an index that will be deployed on all
nodes except for nodes with a `tag` of value `value3` by setting
`index.routing.allocation.exclude.tag` to `value3`. For example:

[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
    "index.routing.allocation.exclude.tag" : "value3"
}'
--------------------------------------------------

`index.routing.allocation.require.*` can be used to 
specify a number of rules, all of which MUST match in order for a shard
to be allocated to a node. This is in contrast to `include` which will
include a node if ANY rule matches.

The `include`, `exclude` and `require` values can have generic simple
matching wildcards, for example, `value1*`. A special attribute name
called `_ip` can be used to match on node ip values.

Obviously a node can have several attributes associated with it, and
both the attribute name and value are controlled in the setting. For
example, here is a sample of several node configurations:

[source,js]
--------------------------------------------------
node.group1: group1_value1
node.group2: group2_value4
--------------------------------------------------

In the same manner, `include`, `exclude` and `require` can work against
several attributes, for example:

[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
    "index.routing.allocation.include.group1" : "xxx"
    "index.routing.allocation.include.group2" : "yyy",
    "index.routing.allocation.exclude.group3" : "zzz",
    "index.routing.allocation.require.group4" : "aaa",
}'
--------------------------------------------------

The provided settings can also be updated in real time using the update
settings API, allowing to "move" indices (shards) around in realtime.

Cluster wide filtering can also be defined, and be updated in real time
using the cluster update settings API. This setting can come in handy
for things like decommissioning nodes (even if the replica count is set
to 0). Here is a sample of how to decommission a node based on `_ip`
address:

[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
        "cluster.routing.allocation.exclude._ip" : "10.0.0.1"
    }
}'
--------------------------------------------------

[float]
=== Total Shards Per Node

The `index.routing.allocation.total_shards_per_node` setting allows to
control how many total shards for an index will be allocated per node.
It can be dynamically set on a live index using the update index
settings API.

[float]
=== Disk-based Shard Allocation
In 0.90.4 and later, Elasticsearch con be configured to prevent shard
allocation on nodes depending on disk usage for the node. This
functionality is disabled by default, and can be changed either in the
configuration file, or dynamically using:

[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
        "cluster.routing.allocation.disk.threshold_enabled" : true
    }
}'
--------------------------------------------------

Once enabled, Elasticsearch uses two watermarks to decide whether
shards should be allocated or can remain on the node.

`cluster.routing.allocation.disk.watermark.low` controls the low
watermark for disk usage. It defaults to 0.70, meaning ES will not
allocate new shards to nodes once they have more than 70% disk
used. It can also be set to an absolute byte value (like 500mb) to
prevent ES from allocating shards if less than the configured amount
of space is available.

`cluster.routing.allocation.disk.watermark.high` controls the high
watermark. It defaults to 0.85, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 85%. It can
also be set to an absolute byte value (similar to the low watermark)
to relocate shards once less than the configured amount of space is
available on the node.

Both watermark settings can be changed dynamically using the cluster
settings API. By default, Elasticsearch will retrieve information
about the disk usage of the nodes every 30 seconds. This can also be
changed by setting the `cluster.info.update.interval` setting.
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00			`[[index-modules-allocation]]`
			`== Index Shard Allocation`

			`[float]`
			`=== Shard Allocation Filtering`

			`Allow to control allocation if indices on nodes based on include/exclude`
			`filters. The filters can be set both on the index level and on the`
			`cluster level. Lets start with an example of setting it on the cluster`
			`level:`

			Lets say we have 4 nodes, each has specific attribute called `tag`
			`associated with it (the name of the attribute can be any name). Each`
			node has a specific value associated with `tag`. Node 1 has a setting
			`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on.

			We can create an index that will only deploy on nodes that have `tag`
			set to `value1` and `value2` by setting
			`index.routing.allocation.include.tag` to `value1,value2`. For example:

			`[source,js]`
			`--------------------------------------------------`
			`curl -XPUT localhost:9200/test/_settings -d '{`
			`"index.routing.allocation.include.tag" : "value1,value2"`
			`}'`
			`--------------------------------------------------`

			`On the other hand, we can create an index that will be deployed on all`
			nodes except for nodes with a `tag` of value `value3` by setting
			`index.routing.allocation.exclude.tag` to `value3`. For example:

			`[source,js]`
			`--------------------------------------------------`
			`curl -XPUT localhost:9200/test/_settings -d '{`
			`"index.routing.allocation.exclude.tag" : "value3"`
			`}'`
			`--------------------------------------------------`

[DOCS] Removed outdated new/deprecated version notices 2013-09-03 21:27:49 +02:00			`index.routing.allocation.require.*` can be used to
Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00			`specify a number of rules, all of which MUST match in order for a shard`
			to be allocated to a node. This is in contrast to `include` which will
			`include a node if ANY rule matches.`

			The `include`, `exclude` and `require` values can have generic simple
			matching wildcards, for example, `value1*`. A special attribute name
			called `_ip` can be used to match on node ip values.

			`Obviously a node can have several attributes associated with it, and`
			`both the attribute name and value are controlled in the setting. For`
			`example, here is a sample of several node configurations:`

			`[source,js]`
			`--------------------------------------------------`
			`node.group1: group1_value1`
			`node.group2: group2_value4`
			`--------------------------------------------------`

			In the same manner, `include`, `exclude` and `require` can work against
			`several attributes, for example:`

			`[source,js]`
			`--------------------------------------------------`
			`curl -XPUT localhost:9200/test/_settings -d '{`
			`"index.routing.allocation.include.group1" : "xxx"`
			`"index.routing.allocation.include.group2" : "yyy",`
			`"index.routing.allocation.exclude.group3" : "zzz",`
			`"index.routing.allocation.require.group4" : "aaa",`
			`}'`
			`--------------------------------------------------`

			`The provided settings can also be updated in real time using the update`
			`settings API, allowing to "move" indices (shards) around in realtime.`

			`Cluster wide filtering can also be defined, and be updated in real time`
			`using the cluster update settings API. This setting can come in handy`
			`for things like decommissioning nodes (even if the replica count is set`
			to 0). Here is a sample of how to decommission a node based on `_ip`
			`address:`

			`[source,js]`
			`--------------------------------------------------`
			`curl -XPUT localhost:9200/_cluster/settings -d '{`
			`"transient" : {`
			`"cluster.routing.allocation.exclude._ip" : "10.0.0.1"`
			`}`
			`}'`
			`--------------------------------------------------`

			`[float]`
			`=== Total Shards Per Node`

			The `index.routing.allocation.total_shards_per_node` setting allows to
			`control how many total shards for an index will be allocated per node.`
			`It can be dynamically set on a live index using the update index`
			`settings API.`
Add AllocationDecider that takes free disk space into account This commit adds two main pieces, the first is a ClusterInfoService that provides a service running on the master nodes that fetches the total/free bytes for each data node in the cluster as well as the sizes of all shards in the cluster. This information is gathered by default every 30 seconds, and can be changed dynamically by setting the `cluster.info.update.interval` setting. This ClusterInfoService can hopefully be used in the future to weight nodes for allocation based on their disk usage, if desired. The second main piece is the DiskThresholdDecider, which can disallow a shard from being allocated to a node, or from remaining on the node depending on configuration parameters. There are three main configuration parameters for the DiskThresholdDecider: `cluster.routing.allocation.disk.threshold_enabled` controls whether the decider is enabled. It defaults to false (disabled). Note that the decider is also disabled for clusters with only a single data node. `cluster.routing.allocation.disk.watermark.low` controls the low watermark for disk usage. It defaults to 0.70, meaning ES will not allocate new shards to nodes once they have more than 70% disk used. It can also be set to an absolute byte value (like 500mb) to prevent ES from allocating shards if less than the configured amount of space is available. `cluster.routing.allocation.disk.watermark.high` controls the high watermark. It defaults to 0.85, meaning ES will attempt to relocate shards to another node if the node disk usage rises above 85%. It can also be set to an absolute byte value (similar to the low watermark) to relocate shards once less than the configured amount of space is available on the node. Closes #3480 2013-08-16 12:20:56 -06:00
			`[float]`
			`=== Disk-based Shard Allocation`
			`In 0.90.4 and later, Elasticsearch con be configured to prevent shard`
			`allocation on nodes depending on disk usage for the node. This`
			`functionality is disabled by default, and can be changed either in the`
			`configuration file, or dynamically using:`

			`[source,js]`
			`--------------------------------------------------`
			`curl -XPUT localhost:9200/_cluster/settings -d '{`
			`"transient" : {`
			`"cluster.routing.allocation.disk.threshold_enabled" : true`
			`}`
			`}'`
			`--------------------------------------------------`

			`Once enabled, Elasticsearch uses two watermarks to decide whether`
			`shards should be allocated or can remain on the node.`

			`cluster.routing.allocation.disk.watermark.low` controls the low
			`watermark for disk usage. It defaults to 0.70, meaning ES will not`
			`allocate new shards to nodes once they have more than 70% disk`
			`used. It can also be set to an absolute byte value (like 500mb) to`
			`prevent ES from allocating shards if less than the configured amount`
			`of space is available.`

			`cluster.routing.allocation.disk.watermark.high` controls the high
			`watermark. It defaults to 0.85, meaning ES will attempt to relocate`
			`shards to another node if the node disk usage rises above 85%. It can`
			`also be set to an absolute byte value (similar to the low watermark)`
			`to relocate shards once less than the configured amount of space is`
			`available on the node.`

			`Both watermark settings can be changed dynamically using the cluster`
			`settings API. By default, Elasticsearch will retrieve information`
			`about the disk usage of the nodes every 30 seconds. This can also be`
			changed by setting the `cluster.info.update.interval` setting.