Add indexing pressure documentation (#59456)
This commit adds documentation about the new indexing pressure memory limit setting and exposure of this metrics in node stats.
This commit is contained in:
parent
0a73225cd8
commit
ceb54ed655
|
@ -58,6 +58,9 @@ using metrics.
|
|||
`http`::
|
||||
HTTP connection information.
|
||||
|
||||
`indexing_pressure`::
|
||||
Statistics about the node's indexing load and related rejections.
|
||||
|
||||
`indices`::
|
||||
Indices stats about size, document count, indexing and deletion times,
|
||||
search times, field cache size, merges and flushes.
|
||||
|
@ -2085,6 +2088,82 @@ Number of failed operations for the processor.
|
|||
=======
|
||||
======
|
||||
|
||||
[[cluster-nodes-stats-api-response-body-indexing-pressure]]
|
||||
`indexing_pressure`::
|
||||
(object)
|
||||
Contains <<index-modules-indexing-pressure,indexing pressure>> statistics for the node.
|
||||
+
|
||||
.Properties of `indexing_pressure`
|
||||
[%collapsible%open]
|
||||
======
|
||||
`total`::
|
||||
(object)
|
||||
Contains statistics for the cumulative indexing load since the node started.
|
||||
+
|
||||
.Properties of `<total>`
|
||||
[%collapsible%open]
|
||||
=======
|
||||
`combined_coordinating_and_primary_bytes`::
|
||||
(integer)
|
||||
Bytes consumed by indexing requests in the coordinating or primary stage. This
|
||||
value is not the sum of coordinating_bytes and primary_bytes as a node can reuse
|
||||
the coordinating bytes if the primary stage is executed locally.
|
||||
|
||||
`coordinating_bytes`::
|
||||
(integer)
|
||||
Bytes consumed by indexing requests in the coordinating stage.
|
||||
|
||||
`primary_bytes`::
|
||||
(integer)
|
||||
Bytes consumed by indexing requests in the primary stage.
|
||||
|
||||
`all_bytes`::
|
||||
(integer)
|
||||
Bytes consumed by indexing requests in the coordinating, primary, or replica stage.
|
||||
|
||||
`coordinating_rejections`::
|
||||
(integer)
|
||||
Number of indexing requests rejected in the coordinating stage.
|
||||
|
||||
`primary_rejections`::
|
||||
(integer)
|
||||
Number of indexing requests rejected in the primary stage.
|
||||
|
||||
`replica_rejections`::
|
||||
(integer)
|
||||
Number of indexing requests rejected in the replica stage.
|
||||
=======
|
||||
`current`::
|
||||
(object)
|
||||
Contains statistics for current indexing load.
|
||||
+
|
||||
.Properties of `<current>`
|
||||
[%collapsible%open]
|
||||
=======
|
||||
`combined_coordinating_and_primary_bytes`::
|
||||
(integer)
|
||||
Bytes consumed by indexing requests in the coordinating or primary stage. This
|
||||
value is not the sum of coordinating_bytes and primary_bytes as a node can reuse
|
||||
the coordinating bytes if the primary stage is executed locally.
|
||||
|
||||
`coordinating_bytes`::
|
||||
(integer)
|
||||
Bytes consumed by indexing requests in the coordinating stage.
|
||||
|
||||
`primary_bytes`::
|
||||
(integer)
|
||||
Bytes consumed by indexing requests in the primary stage.
|
||||
|
||||
`replica_bytes`::
|
||||
(integer)
|
||||
Bytes consumed by indexing requests in the replica stage.
|
||||
|
||||
`all_bytes`::
|
||||
(integer)
|
||||
Bytes consumed by indexing requests in the coordinating, primary, or replica stage.
|
||||
=======
|
||||
======
|
||||
|
||||
[[cluster-nodes-stats-api-response-body-adaptive-selection]]
|
||||
`adaptive_selection`::
|
||||
(object)
|
||||
|
|
|
@ -20,12 +20,15 @@ responsible for replicating the operation to the other copies.
|
|||
This purpose of this section is to give a high level overview of the Elasticsearch replication model and discuss the implications
|
||||
it has for various interactions between write and read operations.
|
||||
|
||||
[float]
|
||||
[discrete]
|
||||
[[basic-write-model]]
|
||||
==== Basic write model
|
||||
|
||||
Every indexing operation in Elasticsearch is first resolved to a replication group using <<index-routing,routing>>,
|
||||
typically based on the document ID. Once the replication group has been determined,
|
||||
the operation is forwarded internally to the current _primary shard_ of the group. The primary shard is responsible
|
||||
typically based on the document ID. Once the replication group has been determined, the operation is forwarded
|
||||
internally to the current _primary shard_ of the group. This stage of indexing is referred to as the _coordinating stage_.
|
||||
|
||||
The next stage of indexing is the _primary stage_, performed on the primary shard. The primary shard is responsible
|
||||
for validating the operation and forwarding it to the other replicas. Since replicas can be offline, the primary
|
||||
is not required to replicate to all replicas. Instead, Elasticsearch maintains a list of shard copies that should
|
||||
receive the operation. This list is called the _in-sync copies_ and is maintained by the master node. As the name implies,
|
||||
|
@ -42,6 +45,14 @@ The primary shard follows this basic flow:
|
|||
. Once all replicas have successfully performed the operation and responded to the primary, the primary acknowledges the successful
|
||||
completion of the request to the client.
|
||||
|
||||
Each in-sync replica copy performs the indexing operation locally so that it has a copy. This stage of indexing is the
|
||||
_replica stage_.
|
||||
|
||||
These indexing stages (coordinating, primary, and replica) are sequential. To enable internal retries, the lifetime of each stage
|
||||
encompasses the lifetime of each subsequent stage. For example, the coordinating stage is not complete until each primary
|
||||
stage, which may be spread out across different primary shards, has completed. Each primary stage will not complete until the
|
||||
in-sync replicas have finished indexing the docs locally and responded to the replica requests.
|
||||
|
||||
[float]
|
||||
===== Failure handling
|
||||
|
||||
|
|
|
@ -281,6 +281,10 @@ Other index settings are available in index modules:
|
|||
|
||||
Control over the retention of a history of operations in the index.
|
||||
|
||||
<<index-modules-indexing-pressure,Indexing pressure>>::
|
||||
|
||||
Configure indexing back pressure limits.
|
||||
|
||||
[float]
|
||||
[[x-pack-index-settings]]
|
||||
=== [xpack]#{xpack} index settings#
|
||||
|
@ -311,3 +315,5 @@ include::index-modules/translog.asciidoc[]
|
|||
include::index-modules/history-retention.asciidoc[]
|
||||
|
||||
include::index-modules/index-sorting.asciidoc[]
|
||||
|
||||
include::index-modules/indexing-pressure.asciidoc[]
|
||||
|
|
|
@ -0,0 +1,73 @@
|
|||
[[index-modules-indexing-pressure]]
|
||||
== Indexing pressure
|
||||
|
||||
Indexing documents into {es} introduces system load in the form of memory and
|
||||
CPU load. Each indexing operation includes coordinating, primary, and replica
|
||||
stages. These stages can be performed across multiple nodes in a cluster.
|
||||
|
||||
Indexing pressure can build up through external operations, such as indexing
|
||||
requests, or internal mechanisms, such as recoveries and {ccr}. If too much
|
||||
indexing work is introduced into the system, the cluster can become saturated.
|
||||
This can adversely impact other operations, such as search, cluster
|
||||
coordination, and background processing.
|
||||
|
||||
To prevent these issues, {es} internally monitors indexing load. When the load
|
||||
exceeds certain limits, new indexing work is rejected
|
||||
|
||||
[discrete]
|
||||
[[indexing-stages]]
|
||||
=== Indexing stages
|
||||
|
||||
External indexing operations go through three stages: coordinating, primary, and
|
||||
replica. See <<basic-write-model>>.
|
||||
|
||||
[discrete]
|
||||
[[memory-limits]]
|
||||
=== Memory limits
|
||||
|
||||
The `indexing_pressure.memory.limit` node setting restricts the number of bytes
|
||||
available for outstanding indexing requests. This setting defaults to 10% of
|
||||
the heap.
|
||||
|
||||
At the beginning of each indexing stage, {es} accounts for the
|
||||
bytes consumed by an indexing request. This accounting is only released at the
|
||||
end of the indexing stage. This means that upstream stages will account for the
|
||||
request overheard until all downstream stages are complete. For example, the
|
||||
coordinating request will remain accounted for until primary and replica
|
||||
stages are complete. The primary request will remain accounted for until each
|
||||
in-sync replica has responded to enable replica retries if necessary.
|
||||
|
||||
A node will start rejecting new indexing work at the coordinating or primary
|
||||
stage when the number of outstanding coordinating, primary, and replica indexing
|
||||
bytes exceeds the configured limit.
|
||||
|
||||
A node will start rejecting new indexing work at the replica stage when the
|
||||
number of outstanding replica indexing bytes exceeds 1.5x the configured limit.
|
||||
This design means that as indexing pressure builds on nodes, they will naturally
|
||||
stop accepting coordinating and primary work in favor of outstanding replica
|
||||
work.
|
||||
|
||||
The `indexing_pressure.memory.limit` setting's 10% default limit is generously
|
||||
sized. You should only change it after careful consideration. Only indexing
|
||||
requests contribute to this limit. This means there is additional indexing
|
||||
overhead (buffers, listeners, etc) which also require heap space. Other
|
||||
components of {es} also require memory. Setting this limit too high can deny
|
||||
operating memory to other operations and components.
|
||||
|
||||
[discrete]
|
||||
[[indexing-pressure-monitoring]]
|
||||
=== Monitoring
|
||||
|
||||
You can use the
|
||||
<<cluster-nodes-stats-api-response-body-indexing-pressure,node stats API>> to
|
||||
retrieve indexing pressure metrics.
|
||||
|
||||
[discrete]
|
||||
[[indexing-pressure-settings]]
|
||||
=== Indexing pressure settings
|
||||
|
||||
`indexing_pressure`::
|
||||
Number of outstanding bytes that may be consumed by indexing requests. When
|
||||
this limit is reached or exceeded, the node will reject new coordinating and
|
||||
primary operations. When replica operations consume 1.5x this limit, the node
|
||||
will reject new replica operations. Defaults to 10% of the heap.
|
Loading…
Reference in New Issue