OpenSearch/docs/reference/modules/cluster/disk_allocator.asciidoc

[[disk-allocator]]
=== Disk-based Shard Allocation

Elasticsearch factors in the available disk space on a node before deciding
whether to allocate new shards to that node or to actively relocate shards
away from that node.

Below are the settings that can be configured in the `elasticsearch.yml` config
file or updated dynamically on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API:

`cluster.routing.allocation.disk.threshold_enabled`::

    Defaults to `true`.  Set to `false` to disable the disk allocation decider.

`cluster.routing.allocation.disk.watermark.low`::

    Controls the low watermark for disk usage. It defaults to 85%, meaning ES will
    not allocate new shards to nodes once they have more than 85% disk used. It
    can also be set to an absolute byte value (like 500mb) to prevent ES from
    allocating shards if less than the configured amount of space is available.

`cluster.routing.allocation.disk.watermark.high`::

    Controls the high watermark. It defaults to 90%, meaning ES will attempt to
    relocate shards to another node if the node disk usage rises above 90%. It can
    also be set to an absolute byte value (similar to the low watermark) to
    relocate shards once less than the configured amount of space is available on
    the node.

`cluster.routing.allocation.disk.watermark.floodstage`::

    Controls the floodstage watermark. It defaults to 95%, meaning ES enforce a read-only
    index block (`index.blocks.read_only_allow_delete`) on every index that has
    one or more shards allocated on the node that has at least on disk exceeding the floodstage.
    This is a last resort to prevent nodes from running out of disk space.
    The index block must be released manually once there is enough disk space available
    to allow indexing operations to continue.

An example of resetting the read-only index block on the `twitter` index:

[source,js]
--------------------------------------------------
PUT /twitter/_settings
{
  "index.blocks.read_only_allow_delete": null
}
--------------------------------------------------
// CONSOLE
// TEST[setup:twitter]


NOTE: Percentage values refer to used disk space, while byte values refer to
free disk space. This can be confusing, since it flips the meaning of high and
low. For example, it makes sense to set the low watermark to 10gb and the high
watermark to 5gb, but not the other way around.


`cluster.info.update.interval`::

    How often Elasticsearch should check on disk usage for each node in the
    cluster. Defaults to `30s`.

`cluster.routing.allocation.disk.include_relocations`::

    Defaults to +true+, which means that Elasticsearch will take into account
    shards that are currently being relocated to the target node when computing a
    node's disk usage. Taking relocating shards' sizes into account may, however,
    mean that the disk usage for a node is incorrectly estimated on the high side,
    since the relocation could be 90% complete and a recently retrieved disk usage
    would include the total size of the relocating shard as well as the space
    already used by the running relocation.


An example of updating the low watermark to no more than 80% of the disk size, a
high watermark of at least 50 gigabytes free, and updating the information about
the cluster every minute:

[source,js]
--------------------------------------------------
PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "80%",
    "cluster.routing.allocation.disk.watermark.high": "50gb",
    "cluster.info.update.interval": "1m"
  }
}
--------------------------------------------------
// CONSOLE

NOTE: Prior to 2.0.0, when using multiple data paths, the disk threshold
decider only factored in the usage across all data paths (if you had two
data paths, one with 50b out of 100b free (50% used) and another with
40b out of 50b free (80% used) it would see the node's disk usage as 90b
out of 150b). In 2.0.0, the minimum and maximum disk usages are tracked
separately.
Docs: Refactored modules and index modules sections 2015-06-22 23:49:45 +02:00			`[[disk-allocator]]`
			`=== Disk-based Shard Allocation`

			`Elasticsearch factors in the available disk space on a node before deciding`
			`whether to allocate new shards to that node or to actively relocate shards`
			`away from that node.`

Add note about multi data path and disk threshold deciders Prior to 2.0 we summed up the available space on all disk on a node due to the raid-0 like behavior. Now we don't do this anymore and use the min & max disk space to make decisions. Closes #13106 2015-08-31 15:55:00 +02:00			Below are the settings that can be configured in the `elasticsearch.yml` config
Docs: Refactored modules and index modules sections 2015-06-22 23:49:45 +02:00			`file or updated dynamically on a live cluster with the`
			`<<cluster-update-settings,cluster-update-settings>> API:`

			`cluster.routing.allocation.disk.threshold_enabled`::

			Defaults to `true`. Set to `false` to disable the disk allocation decider.

			`cluster.routing.allocation.disk.watermark.low`::

			`Controls the low watermark for disk usage. It defaults to 85%, meaning ES will`
			`not allocate new shards to nodes once they have more than 85% disk used. It`
			`can also be set to an absolute byte value (like 500mb) to prevent ES from`
			`allocating shards if less than the configured amount of space is available.`

			`cluster.routing.allocation.disk.watermark.high`::

			`Controls the high watermark. It defaults to 90%, meaning ES will attempt to`
			`relocate shards to another node if the node disk usage rises above 90%. It can`
			`also be set to an absolute byte value (similar to the low watermark) to`
			`relocate shards once less than the configured amount of space is available on`
			`the node.`

Switch indices read-only if a node runs out of disk space (#25541) Today when we run out of disk all kinds of crazy things can happen and nodes are becoming hard to maintain once out of disk is hit. While we try to move shards away if we hit watermarks this might not be possible in many situations. Based on the discussion in #24299 this change monitors disk utilization and adds a flood-stage watermark that causes all indices that are allocated on a node hitting the flood-stage mark to be switched read-only (with the option to be deleted). This allows users to react on the low disk situation while subsequent write requests will be rejected. Users can switch individual indices read-write once the situation is sorted out. There is no automatic read-write switch once the node has enough space. This requires user interaction. The flood-stage watermark is set to `95%` utilization by default. Closes #24299 2017-07-05 22:18:23 +02:00			`cluster.routing.allocation.disk.watermark.floodstage`::

			`Controls the floodstage watermark. It defaults to 95%, meaning ES enforce a read-only`
			index block (`index.blocks.read_only_allow_delete`) on every index that has
			`one or more shards allocated on the node that has at least on disk exceeding the floodstage.`
			`This is a last resort to prevent nodes from running out of disk space.`
			`The index block must be released manually once there is enough disk space available`
			`to allow indexing operations to continue.`

			An example of resetting the read-only index block on the `twitter` index:

			`[source,js]`
			`--------------------------------------------------`
			`PUT /twitter/_settings`
			`{`
			`"index.blocks.read_only_allow_delete": null`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`
			`// TEST[setup:twitter]`


Docs: Refactored modules and index modules sections 2015-06-22 23:49:45 +02:00			`NOTE: Percentage values refer to used disk space, while byte values refer to`
			`free disk space. This can be confusing, since it flips the meaning of high and`
			`low. For example, it makes sense to set the low watermark to 10gb and the high`
			`watermark to 5gb, but not the other way around.`


			`cluster.info.update.interval`::

			`How often Elasticsearch should check on disk usage for each node in the`
			cluster. Defaults to `30s`.

			`cluster.routing.allocation.disk.include_relocations`::

			`Defaults to +true+, which means that Elasticsearch will take into account`
			`shards that are currently being relocated to the target node when computing a`
			`node's disk usage. Taking relocating shards' sizes into account may, however,`
			`mean that the disk usage for a node is incorrectly estimated on the high side,`
			`since the relocation could be 90% complete and a recently retrieved disk usage`
			`would include the total size of the relocating shard as well as the space`
			`already used by the running relocation.`


			`An example of updating the low watermark to no more than 80% of the disk size, a`
			`high watermark of at least 50 gigabytes free, and updating the information about`
			`the cluster every minute:`

			`[source,js]`
			`--------------------------------------------------`
Generate and run tests from the docs Adds infrastructure so `gradle :docs:check` will extract tests from snippets in the documentation and execute the tests. This is included in `gradle check` so it should happen on CI and during a normal build. By default each `// AUTOSENSE` snippet creates a unique REST test. These tests are executed in a random order and the cluster is wiped between each one. If multiple snippets chain together into a test you can annotate all snippets after the first with `// TEST[continued]` to have the generated tests for both snippets joined. Snippets marked as `// TESTRESPONSE` are checked against the response of the last action. See docs/README.asciidoc for lots more. Closes #12583. That issue is about catching bugs in the docs during build. This catches some bugs in the docs during build which is a good start. 2016-04-29 10:42:03 -04:00			`PUT _cluster/settings`
Docs: Refactored modules and index modules sections 2015-06-22 23:49:45 +02:00			`{`
			`"transient": {`
			`"cluster.routing.allocation.disk.watermark.low": "80%",`
			`"cluster.routing.allocation.disk.watermark.high": "50gb",`
			`"cluster.info.update.interval": "1m"`
			`}`
			`}`
			`--------------------------------------------------`
Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 15:42:23 +02:00			`// CONSOLE`
Docs: Refactored modules and index modules sections 2015-06-22 23:49:45 +02:00
Add note about multi data path and disk threshold deciders Prior to 2.0 we summed up the available space on all disk on a node due to the raid-0 like behavior. Now we don't do this anymore and use the min & max disk space to make decisions. Closes #13106 2015-08-31 15:55:00 +02:00			`NOTE: Prior to 2.0.0, when using multiple data paths, the disk threshold`
			`decider only factored in the usage across all data paths (if you had two`
			`data paths, one with 50b out of 100b free (50% used) and another with`
			`40b out of 50b free (80% used) it would see the node's disk usage as 90b`
			`out of 150b). In 2.0.0, the minimum and maximum disk usages are tracked`
			`separately.`