Update resiliency page for the release of v5 (#21177)

This commit is contained in:
Boaz Leskes 2016-10-28 18:46:54 +02:00 committed by GitHub
parent 9e3eacec35
commit 75ee2bb61d

View File

@ -153,10 +153,10 @@ The new tests are run continuously in our testing farm and are passing. We are a
that no failures are found.
== Unreleased
== Completed
[float]
=== Port Jepsen tests dealing with loss of acknowledged writes to our testing framework (STATUS: UNRELEASED, V5.0.0)
=== Port Jepsen tests dealing with loss of acknowledged writes to our testing framework (STATUS: DONE, V5.0.0)
We have increased our test coverage to include scenarios tested by Jepsen that demonstrate loss of acknowledged writes, as described in
the Elasticsearch related blogs. We make heavy use of randomization to expand on the scenarios that can be tested and to introduce
@ -167,7 +167,7 @@ where the `testAckedIndexing` test was specifically added to check that we don't
[float]
=== Loss of documents during network partition (STATUS: UNRELEASED, v5.0.0)
=== Loss of documents during network partition (STATUS: DONE, v5.0.0)
If a network partition separates a node from the master, there is some window of time before the node detects it. The length of the window is dependent on the type of the partition. This window is extremely small if a socket is broken. More adversarial partitions, for example, silently dropping requests without breaking the socket can take longer (up to 3x30s using current defaults).
@ -175,7 +175,7 @@ If the node hosts a primary shard at the moment of partition, and ends up being
To prevent this situation, the primary needs to wait for the master to acknowledge replica shard failures before acknowledging the write to the client. {GIT}14252[#14252]
[float]
=== Safe primary relocations (STATUS: UNRELEASED, v5.0.0)
=== Safe primary relocations (STATUS: DONE, v5.0.0)
When primary relocation completes, a cluster state is propagated that deactivates the old primary and marks the new primary as active. As
cluster state changes are not applied synchronously on all nodes, there can be a time interval where the relocation target has processed the
@ -189,7 +189,7 @@ on the relocation target, each of the nodes believes the other to be the active
chasing the primary being quickly sent back and forth between the nodes, potentially making them both go OOM. {GIT}12573[#12573]
[float]
=== Do not allow stale shards to automatically be promoted to primary (STATUS: UNRELEASED, v5.0.0)
=== Do not allow stale shards to automatically be promoted to primary (STATUS: DONE, v5.0.0)
In some scenarios, after the loss of all valid copies, a stale replica shard can be automatically assigned as a primary, preferring old data
to no data at all ({GIT}14671[#14671]). This can lead to a loss of acknowledged writes if the valid copies are not lost but are rather
@ -199,7 +199,7 @@ for one of the good shard copies to reappear. In case where all good copies are
stale shard copy.
[float]
=== Make index creation resilient to index closing and full cluster crashes (STATUS: UNRELEASED, v5.0.0)
=== Make index creation resilient to index closing and full cluster crashes (STATUS: DONE, v5.0.0)
Recovering an index requires a quorum (with an exception for 2) of shard copies to be available to allocate a primary. This means that
a primary cannot be assigned if the cluster dies before enough shards have been allocated ({GIT}9126[#9126]). The same happens if an index
@ -211,7 +211,7 @@ shard will be allocated upon reopening the index.
[float]
=== Use two phase commit for Cluster State publishing (STATUS: UNRELEASED, v5.0.0)
=== Use two phase commit for Cluster State publishing (STATUS: DONE, v5.0.0)
A master node in Elasticsearch continuously https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#fault-detection[monitors the cluster nodes]
and removes any node from the cluster that doesn't respond to its pings in a timely
@ -225,8 +225,6 @@ a new phase to cluster state publishing where the proposed cluster state is sent
but is not yet committed. Only once enough nodes (`discovery.zen.minimum_master_nodes`) actively acknowledge
the change, it is committed and commit messages are sent to the nodes. See {GIT}13062[#13062].
== Completed
[float]
=== Wait on incoming joins before electing local node as master (STATUS: DONE, v2.0.0)