Update resiliency page (#17586)

#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.
This commit is contained in:
Boaz Leskes 2016-04-07 12:17:13 +02:00
parent c0ebce0ba0
commit 8eee28e798
1 changed files with 26 additions and 24 deletions

View File

@ -94,27 +94,6 @@ space. The following issues have been identified:
Other safeguards are tracked in the meta-issue {GIT}11511[#11511]. Other safeguards are tracked in the meta-issue {GIT}11511[#11511].
[float]
=== Loss of documents during network partition (STATUS: ONGOING)
If a network partition separates a node from the master, there is some window of time before the node detects it. The length of the window is dependent on the type of the partition. This window is extremely small if a socket is broken. More adversarial partitions, for example, silently dropping requests without breaking the socket can take longer (up to 3x30s using current defaults).
If the node hosts a primary shard at the moment of partition, and ends up being isolated from the cluster (which could have resulted in {GIT}2488[split-brain] before), some documents that are being indexed into the primary may be lost if they fail to reach one of the allocated replicas (due to the partition) and that replica is later promoted to primary by the master ({GIT}7572[#7572]).
To prevent this situation, the primary needs to wait for the master to acknowledge replica shard failures before acknowledging the write to the client. {GIT}14252[#14252]
[float]
=== Safe primary relocations (STATUS: ONGOING)
When primary relocation completes, a cluster state is propagated that deactivates the old primary and marks the new primary as active. As
cluster state changes are not applied synchronously on all nodes, there can be a time interval where the relocation target has processed the
cluster state and believes to be the active primary and the relocation source has not yet processed the cluster state update and still
believes itself to be the active primary. This means that an index request that gets routed to the new primary does not get replicated to
the old primary (as it has been deactivated from point of view of the new primary). If a subsequent read request gets routed to the old
primary, it cannot see the indexed document. {GIT}15900[#15900]
In the reverse situation where a cluster state update that completes primary relocation is first applied on the relocation source and then
on the relocation target, each of the nodes believes the other to be the active primary. This leads to the issue of indexing requests
chasing the primary being quickly sent back and forth between the nodes, potentially making them both go OOM. {GIT}12573[#12573]
[float] [float]
=== Relocating shards omitted by reporting infrastructure (STATUS: ONGOING) === Relocating shards omitted by reporting infrastructure (STATUS: ONGOING)
@ -132,8 +111,32 @@ We have increased our test coverage to include scenarios tested by Jepsen. We ma
This status page is a start, but we can do a better job of explicitly documenting the processes at work in Elasticsearch, and what happens in the case of each type of failure. The plan is to have a test case that validates each behavior under simulated conditions. Every test will document the expected results, the associated test code and an explicit PASS or FAIL status for each simulated case. This status page is a start, but we can do a better job of explicitly documenting the processes at work in Elasticsearch, and what happens in the case of each type of failure. The plan is to have a test case that validates each behavior under simulated conditions. Every test will document the expected results, the associated test code and an explicit PASS or FAIL status for each simulated case.
== Unreleased
[float] [float]
=== Do not allow stale shards to automatically be promoted to primary (STATUS: ONGOING, v5.0.0) === Loss of documents during network partition (STATUS: UNRELEASED, v5.0.0)
If a network partition separates a node from the master, there is some window of time before the node detects it. The length of the window is dependent on the type of the partition. This window is extremely small if a socket is broken. More adversarial partitions, for example, silently dropping requests without breaking the socket can take longer (up to 3x30s using current defaults).
If the node hosts a primary shard at the moment of partition, and ends up being isolated from the cluster (which could have resulted in {GIT}2488[split-brain] before), some documents that are being indexed into the primary may be lost if they fail to reach one of the allocated replicas (due to the partition) and that replica is later promoted to primary by the master ({GIT}7572[#7572]).
To prevent this situation, the primary needs to wait for the master to acknowledge replica shard failures before acknowledging the write to the client. {GIT}14252[#14252]
[float]
=== Safe primary relocations (STATUS: UNRELEASED, v5.0.0)
When primary relocation completes, a cluster state is propagated that deactivates the old primary and marks the new primary as active. As
cluster state changes are not applied synchronously on all nodes, there can be a time interval where the relocation target has processed the
cluster state and believes to be the active primary and the relocation source has not yet processed the cluster state update and still
believes itself to be the active primary. This means that an index request that gets routed to the new primary does not get replicated to
the old primary (as it has been deactivated from point of view of the new primary). If a subsequent read request gets routed to the old
primary, it cannot see the indexed document. {GIT}15900[#15900]
In the reverse situation where a cluster state update that completes primary relocation is first applied on the relocation source and then
on the relocation target, each of the nodes believes the other to be the active primary. This leads to the issue of indexing requests
chasing the primary being quickly sent back and forth between the nodes, potentially making them both go OOM. {GIT}12573[#12573]
[float]
=== Do not allow stale shards to automatically be promoted to primary (STATUS: UNRELEASED, v5.0.0)
In some scenarios, after the loss of all valid copies, a stale replica shard can be automatically assigned as a primary, preferring old data In some scenarios, after the loss of all valid copies, a stale replica shard can be automatically assigned as a primary, preferring old data
to no data at all ({GIT}14671[#14671]). This can lead to a loss of acknowledged writes if the valid copies are not lost but are rather to no data at all ({GIT}14671[#14671]). This can lead to a loss of acknowledged writes if the valid copies are not lost but are rather
@ -143,7 +146,7 @@ for one of the good shard copies to reappear. In case where all good copies are
stale shard copy. stale shard copy.
[float] [float]
=== Make index creation resilient to index closing and full cluster crashes (STATUS: ONGOING, v5.0.0) === Make index creation resilient to index closing and full cluster crashes (STATUS: UNRELEASED, v5.0.0)
Recovering an index requires a quorum (with an exception for 2) of shard copies to be available to allocate a primary. This means that Recovering an index requires a quorum (with an exception for 2) of shard copies to be available to allocate a primary. This means that
a primary cannot be assigned if the cluster dies before enough shards have been allocated ({GIT}9126[#9126]). The same happens if an index a primary cannot be assigned if the cluster dies before enough shards have been allocated ({GIT}9126[#9126]). The same happens if an index
@ -153,7 +156,6 @@ recover an index in the presence of a single shard copy. Allocation IDs can also
but none of the shards have been started. If such an index was inadvertently closed before at least one shard could be started, a fresh but none of the shards have been started. If such an index was inadvertently closed before at least one shard could be started, a fresh
shard will be allocated upon reopening the index. shard will be allocated upon reopening the index.
== Unreleased
[float] [float]
=== Use two phase commit for Cluster State publishing (STATUS: UNRELEASED, v5.0.0) === Use two phase commit for Cluster State publishing (STATUS: UNRELEASED, v5.0.0)