Documentation changes for wait_for_active_shards (#19581)

Documentation changes and migration doc changes for introducing wait_for_active_shards and removing write consistency level. Closes #19581
2025-03-25 09:28:27 +00:00 · 2016-08-02 09:15:01 -04:00 · 2016-08-02 09:15:01 -04:00 · a21dd80f1b
commit a21dd80f1b
parent 3d2a105825
13 changed files with 170 additions and 67 deletions
--- a/docs/reference/docs/bulk.asciidoc
+++ b/docs/reference/docs/bulk.asciidoc
@ -128,27 +128,21 @@ field. It automatically follows the behavior of the index / delete
 operation based on the `_parent` / `_routing` mapping.

 [float]
-[[bulk-consistency]]
-=== Write Consistency
+[[bulk-wait-for-active-shards]]
+=== Wait For Active Shards

-When making bulk calls, you can require a minimum number of active
-shards in the partition through the `consistency` parameter. The values
-allowed are `one`, `quorum`, and `all`. It defaults to the node level
-setting of `action.write_consistency`, which in turn defaults to
-`quorum`.
-
-For example, in a N shards with 2 replicas index, there will have to be
-at least 2 active shards within the relevant partition (`quorum`) for
-the operation to succeed. In a N shards with 1 replica scenario, there
-will need to be a single shard active (in this case, `one` and `quorum`
-are the same).
+When making bulk calls, you can set the `wait_for_active_shards`
+parameter to require a minimum number of shard copies to be active
+before starting to process the bulk request. See 
+<<index-wait-for-active-shards,here>> for further details and a usage
+example.

 [float]
 [[bulk-refresh]]
 === Refresh

 Control when the changes made by this request are visible to search. See
-<<docs-refresh>>.
+<<docs-refresh,refresh>>.

 [float]
 [[bulk-update]]
--- a/docs/reference/docs/delete-by-query.asciidoc
+++ b/docs/reference/docs/delete-by-query.asciidoc
@ -144,7 +144,7 @@ POST twitter/_delete_by_query?scroll_size=5000
 === URL Parameters

 In addition to the standard parameters like `pretty`, the Delete By Query API
-also supports `refresh`, `wait_for_completion`, `consistency`, and `timeout`.
+also supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, and `timeout`.

 Sending the `refresh` will refresh all shards involved in the delete by query
 once the request completes. This is different than the Delete API's `refresh`
@ -159,8 +159,9 @@ record of this task as a document at `.tasks/task/${taskId}`. This is yours
 to keep or remove as you see fit. When you are done with it, delete it so
 Elasticsearch can reclaim the space it uses.

-`consistency` controls how many copies of a shard must respond to each write
-request. `timeout` controls how long each write request waits for unavailable
+`wait_for_active_shards` controls how many copies of a shard must be active
+before proceeding with the request. See <<index-wait-for-active-shards,here>> 
+for details. `timeout` controls how long each write request waits for unavailable
 shards to become available. Both work exactly how they work in the
 <<docs-bulk,Bulk API>>.

--- a/docs/reference/docs/delete.asciidoc
+++ b/docs/reference/docs/delete.asciidoc
@ -95,20 +95,14 @@ redirected into the primary shard within that id group, and replicated
 (if needed) to shard replicas within that id group.

 [float]
-[[delete-consistency]]
-=== Write Consistency
+[[delete-wait-for-active-shards]]
+=== Wait For Active Shards

-Control if the operation will be allowed to execute based on the number
-of active shards within that partition (replication group). The values
-allowed are `one`, `quorum`, and `all`. The parameter to set it is
-`consistency`, and it defaults to the node level setting of
-`action.write_consistency` which in turn defaults to `quorum`.
-
-For example, in a N shards with 2 replicas index, there will have to be
-at least 2 active shards within the relevant partition (`quorum`) for
-the operation to succeed. In a N shards with 1 replica scenario, there
-will need to be a single shard active (in this case, `one` and `quorum`
-is the same).
+When making delete requests, you can set the `wait_for_active_shards`
+parameter to require a minimum number of shard copies to be active
+before starting to process the delete request. See
+<<index-wait-for-active-shards,here>> for further details and a usage
+example.

 [float]
 [[delete-refresh]]
--- a/docs/reference/docs/index_.asciidoc
+++ b/docs/reference/docs/index_.asciidoc
@ -44,10 +44,10 @@ The `_shards` header provides information about the replication process of the i

 The index operation is successful in the case `successful` is at least 1.

-NOTE:   Replica shards may not all be started when an indexing operation successfully returns (by default, a  quorum is
-        required). In that case, `total` will be equal to the total shards based on the index replica settings and
-        `successful` will be equal to the number of shards started (primary plus replicas). As there were no failures,
-        the `failed` will be 0.
+NOTE:   Replica shards may not all be started when an indexing operation successfully returns (by default, only the 
+        primary is required, but this behavior can be <<index-wait-for-active-shards,changed>>). In that case, 
+        `total` will be equal to the total shards based on the `number_of_replicas` setting and `successful` will be 
+        equal to the number of shards started (primary plus replicas). If there were no failures, the `failed` will be 0.

 [float]
 [[index-creation]]
@ -308,31 +308,68 @@ containing this shard. After the primary shard completes the operation,
 if needed, the update is distributed to applicable replicas.

 [float]
-[[index-consistency]]
-=== Write Consistency
+[[index-wait-for-active-shards]]
+=== Wait For Active Shards

-To prevent writes from taking place on the "wrong" side of a network
-partition, by default, index operations only succeed if a quorum
-(>replicas/2+1) of active shards are available. This default can be
-overridden on a node-by-node basis using the `action.write_consistency`
-setting. To alter this behavior per-operation, the `consistency` request
-parameter can be used.
+To improve the resiliency of writes to the system, indexing operations 
+can be configured to wait for a certain number of active shard copies 
+before proceeding with the operation. If the requisite number of active
+shard copies are not available, then the write operation must wait and 
+retry, until either the requisite shard copies have started or a timeout 
+occurs. By default, write operations only wait for the primary shards 
+to be active before proceeding (i.e. `wait_for_active_shards=1`).
+This default can be overridden in the index settings dynamically
+by setting `index.write.wait_for_active_shards`. To alter this behavior 
+per operation, the `wait_for_active_shards` request parameter can be used.

-Valid write consistency values are `one`, `quorum`, and `all`.
+Valid values are `all` or any positive integer up to the total number
+of configured copies per shard in the index (which is `number_of_replicas+1`).
+Specifying a negative value or a number greater than the number of 
+shard copies will throw an error.

-Note, for the case where the number of replicas is 1 (total of 2 copies
-of the data), then the default behavior is to succeed if 1 copy (the primary)
-can perform the write.
+For example, suppose we have a cluster of three nodes, `A, `B`, and `C` and
+we create an index `index` with the number of replicas set to 3 (resulting in 
+4 shard copies, one more copy than there are nodes). If we 
+attempt an indexing operation, by default the operation will only ensure
+the primary copy of each shard is available before proceeding. This means
+that even if `B` and `C` went down, and `A` hosted the primary shard copies,
+the indexing operation would still proceed with only one copy of the data. 
+If `wait_for_active_shards` is set on the request to `3` (and all 3 nodes
+are up), then the indexing operation will require 3 active shard copies 
+before proceeding, a requirement which should be met because there are 3
+active nodes in the cluster, each one holding a copy of the shard. However,
+if we set `wait_for_active_shards` to `all` (or to `4`, which is the same), 
+the indexing operation will not proceed as we do not have all 4 copies of 
+each shard active in the index. The operation will timeout 
+unless a new node is brought up in the cluster to host the fourth copy of
+the shard.

-The index operation only returns after all *active* shards within the
-replication group have indexed the document (sync replication).
+It is important to note that this setting greatly reduces the chances of 
+the write operation not writing to the requisite number of shard copies, 
+but it does not completely eliminate the possibility, because this check
+occurs before the write operation commences. Once the write operation
+is underway, it is still possible for replication to fail on any number of 
+shard copies but still succeed on the primary. The `_shards` section of the
+write operation's response reveals the number of shard copies on which 
+replication succeeded/failed.
+
+[source,js]
+--------------------------------------------------
+{
+    "_shards" : {
+        "total" : 2,
+        "failed" : 0,
+        "successful" : 2
+    }
+}
+--------------------------------------------------

 [float]
 [[index-refresh]]
 === Refresh

 Control when the changes made by this request are visible to search. See
-<<docs-refresh>>.
+<<docs-refresh,refresh>>.

 [float]
 [[index-noop]]
--- a/docs/reference/docs/reindex.asciidoc
+++ b/docs/reference/docs/reindex.asciidoc
@ -419,7 +419,7 @@ is sent directly to the remote host without validation or modification.
 === URL Parameters

 In addition to the standard parameters like `pretty`, the Reindex API also
-supports `refresh`, `wait_for_completion`, `consistency`, `timeout`, and
+supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, `timeout`, and
 `requests_per_second`.

 Sending the `refresh` url parameter will cause all indexes to which the request
@ -434,8 +434,9 @@ record of this task as a document at `.tasks/task/${taskId}`. This is yours
 to keep or remove as you see fit. When you are done with it, delete it so
 Elasticsearch can reclaim the space it uses.

-`consistency` controls how many copies of a shard must respond to each write
-request. `timeout` controls how long each write request waits for unavailable
+`wait_for_active_shards` controls how many copies of a shard must be active
+before proceeding with the reindexing. See <<index-wait-for-active-shards,here>> 
+for details. `timeout` controls how long each write request waits for unavailable
 shards to become available. Both work exactly how they work in the
 <<docs-bulk,Bulk API>>.

--- a/docs/reference/docs/update-by-query.asciidoc
+++ b/docs/reference/docs/update-by-query.asciidoc
@ -202,7 +202,7 @@ POST twitter/_update_by_query?pipeline=set-foo
 === URL Parameters

 In addition to the standard parameters like `pretty`, the Update By Query API
-also supports `refresh`, `wait_for_completion`, `consistency`, and `timeout`.
+also supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, and `timeout`.

 Sending the `refresh` will update all shards in the index being updated when
 the request completes. This is different than the Index API's `refresh`
@ -216,8 +216,9 @@ record of this task as a document at `.tasks/task/${taskId}`. This is yours
 to keep or remove as you see fit. When you are done with it, delete it so
 Elasticsearch can reclaim the space it uses.

-`consistency` controls how many copies of a shard must respond to each write
-request. `timeout` controls how long each write request waits for unavailable
+`wait_for_active_shards` controls how many copies of a shard must be active
+before proceeding with the request. See <<index-wait-for-active-shards,here>> 
+for details. `timeout` controls how long each write request waits for unavailable
 shards to become available. Both work exactly how they work in the
 <<docs-bulk,Bulk API>>.

--- a/docs/reference/docs/update.asciidoc
+++ b/docs/reference/docs/update.asciidoc
@ -245,9 +245,10 @@ If an alias index routing is specified then it overrides the parent routing and

 Timeout waiting for a shard to become available.

-`consistency`::
+`wait_for_active_shards`::

-The write consistency of the index/delete operation.
+The number of shard copies required to be active before proceeding with the update operation. 
+See <<index-wait-for-active-shards,here>> for details.

 `refresh`::

--- a/docs/reference/getting-started.asciidoc
+++ b/docs/reference/getting-started.asciidoc
@ -260,7 +260,8 @@ And the response:
 --------------------------------------------------
 curl -XPUT 'localhost:9200/customer?pretty'
 {
-  "acknowledged" : true
+  "acknowledged" : true,
+  "shards_acknowledged": true
 }

 curl 'localhost:9200/_cat/indices?v'
--- a/docs/reference/indices/create-index.asciidoc
+++ b/docs/reference/indices/create-index.asciidoc
@ -122,3 +122,53 @@ curl -XPUT localhost:9200/test -d '{
 --------------------------------------------------

 <1> `creation_date` is set using epoch time in milliseconds.
+
+[float]
+[[create-index-wait-for-active-shards]]
+=== Wait For Active Shards
+
+By default, index creation will only return a response to the client when the primary copies of 
+each shard have been started, or the request times out. The index creation response will indicate
+what happened:
+
+[source,js]
+--------------------------------------------------
+{
+    "acknowledged": true,
+    "shards_acknowledged": true
+}
+--------------------------------------------------
+
+`acknowledged` indicates whether the index was successfully created in the cluster, while 
+`shards_acknowledged` indices whether the requisite number of shard copies were started for
+each shard in the index before timing out. Note that it is still possible for either 
+`acknowledged` or `shards_acknowledged` to be `false`, but the index creation was successful. 
+These values simply indicate whether the operation completed before the timeout. If 
+`acknowledged` is `false`, then we timed out before the cluster state was updated with the
+newly created index, but it probably will be created sometime soon. If `shards_acknowledged` 
+is `false`, then we timed out before the requisite number of shards were started (by default 
+just the primaries), even if the cluster state was successfully updated to reflect the newly 
+created index (i.e. `acknowledged=true`).
+
+We can change the default of only waiting for the primary shards to start through the index
+setting `index.write.wait_for_active_shards` (note that changing this setting will also affect
+the `wait_for_active_shards` value on all subsequent write operations):
+
+[source,js]
+--------------------------------------------------
+curl -XPUT localhost:9200/test -d '{
+    "settings": {
+        "index.write.wait_for_active_shards": "2"
+    }
+}
+--------------------------------------------------
+
+or through the request parameter `wait_for_active_shards`:
+
+[source,js]
+--------------------------------------------------
+curl -XPUT localhost:9200/test?wait_for_active_shards=2
+--------------------------------------------------
+
+A detailed explanation of `wait_for_active_shards` and its possible values can be found 
+<<index-wait-for-active-shards,here>>.
--- a/docs/reference/indices/rollover-index.asciidoc
+++ b/docs/reference/indices/rollover-index.asciidoc
@ -126,3 +126,9 @@ POST logs_write/_rollover?dry_run
 --------------------------------------------------
 // CONSOLE

+[float]
+=== Wait For Active Shards
+
+Because the rollover operation creates a new index to rollover to, the 
+<<create-index-wait-for-active-shards,wait for active shards>> setting on 
+index creation applies to the rollover action as well.
--- a/docs/reference/indices/shrink-index.asciidoc
+++ b/docs/reference/indices/shrink-index.asciidoc
@ -136,3 +136,9 @@ shrink process begins. When the shrink operation completes, the shard will
 become `active`. At that  point, Elasticsearch will try to allocate any
 replicas and may decide to relocate the primary shard to another node.

+[float]
+=== Wait For Active Shards
+
+Because the shrink operation creates a new index to shrink the shards to, 
+the <<create-index-wait-for-active-shards,wait for active shards>> setting 
+on index creation applies to the shrink index action as well.
--- a/docs/reference/migration/migrate_5_0/java.asciidoc
+++ b/docs/reference/migration/migrate_5_0/java.asciidoc
@ -91,6 +91,26 @@ language `org.codehaus.groovy:groovy` artifact.
 error description). This will influence code that use the `IndexRequest.opType()` or `IndexRequest.create()`
 to index a document only if it doesn't already exist.

+==== writeConsistencyLevel removed on write requests
+
+In previous versions of Elasticsearch, the various write requests had a
+`setWriteConsistencyLevel` method to set the shard consistency level for
+write operations. However, the semantics of write consistency were ambiguous 
+as this is just a pre-operation check to ensure the specified number of
+shards were available before the operation commenced. The write consistency
+level did not guarantee that the data would be replicated to those number
+of copies by the time the operation finished. The `setWriteConsistencyLevel`
+method on these write requests has been changed to `setWaitForActiveShards`,
+which can take a numerical value up to the total number of shard copies or 
+`ActiveShardCount.ALL` for all shard copies. The default is to just wait
+for the primary shard to be active before proceeding with the operation. 
+See the section on <<index-wait-for-active-shards,wait for active shards>> 
+for more details.
+
+This change affects `IndexRequest`, `IndexRequestBuilder`, `BulkRequest`, 
+`BulkRequestBuilder`, `UpdateRequest`, `UpdateRequestBuilder`, `DeleteRequest`, 
+and `DeleteRequestBuilder`.
+
 ==== Changes to Query Builders

 ===== BoostingQueryBuilder
--- a/docs/resiliency/index.asciidoc
+++ b/docs/resiliency/index.asciidoc
@ -361,15 +361,6 @@ Certain versions of the JVM are known to have bugs which can cause index corrupt

 When a node is experience network issues, the master detects it and removes the node from the cluster. That causes all ongoing recoveries from and to that node to be stopped and a new location is found for the relevant shards. However, in the of case partial network partition, where there are connectivity issues between the source and target nodes of a recovery but not between those nodes and the current master things may go wrong. While the nodes successfully restore the connection, the on going recoveries may have encountered issues. In {GIT}8720[#8720], we added test simulations for these and solved several issues that were flagged by them.

-
-[float]
-=== Validate quorum before accepting a write request (STATUS: DONE, v1.4.0)
-
-Today, when a node holding a primary shard receives an index request, it checks the local cluster state to see whether a quorum of shards is available before it accepts the request. However, it can take some time before an unresponsive node is removed from the cluster state. We are adding an optional live check, where the primary node tries to contact its replicas to confirm that they are still responding before accepting any changes. See {GIT}6937[#6937].
-
-While the work is going on, we tightened the current checks by bringing them closer to the index code. See {GIT}7873[#7873] (STATUS: DONE, fixed in v1.4.0)
-
-
 [float]
 === Improving Zen Discovery (STATUS: DONE, v1.4.0.Beta1)