Do not perform cleanup if Manifest write fails with dirty exception (#40519)

Currently, if Manifest write is unsuccessful (i.e. WriteStateException
is thrown) we perform cleanup of newly created metadata files.
However, this is wrong.
Consider the following sequence (caught by CI here
https://github.com/elastic/elasticsearch/issues/39077):

- cluster global data is written **successful**
- the associated manifest write **fails** (during the fsync, ie files
have been written)
- deleting (revert) the manifest files, **fails**, metadata is
therefore persisted
- deleting (revert) the cluster global data is **successful**

In this case, when trying to load metadata (after node restart
because of dirty WriteStateException),  the following exception will
happen
```
java.io.IOException: failed to find global metadata [generation: 0]
```
because the manifest file is referencing missing global metadata file.

This commit checks if thrown WriteStateException is dirty and if its
we don't perform any cleanup, because new Manifest file might be
created, but its deletion has failed.
In the future, we might add more fine-grained check - perform the
clean up if WriteStateException is dirty, but Manifest deletion is
successful.

Closes https://github.com/elastic/elasticsearch/issues/39077

(cherry picked from commit 1fac56916bb3c4f3333c639e59188dbe743e385b)
This commit is contained in:
Andrey Ershov 2019-04-01 11:49:05 +03:00
parent 7cc79123df
commit 287e334ef3
2 changed files with 8 additions and 2 deletions

View File

@ -320,7 +320,14 @@ public class GatewayMetaState implements ClusterStateApplier, CoordinationState.
finished = true;
return generation;
} catch (WriteStateException e) {
rollback();
// if Manifest write results in dirty WriteStateException it's not safe to remove
// new metadata files, because if Manifest was actually written to disk and its deletion
// fails it will reference these new metadata files.
// In the future, we might decide to add more fine grained check to understand if after
// WriteStateException Manifest deletion has actually failed.
if (e.isDirty() == false) {
rollback();
}
throw e;
}
}

View File

@ -374,7 +374,6 @@ public class GatewayMetaStateTests extends ESAllocationTestCase {
return builder.build();
}
@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/39077")
public void testAtomicityWithFailures() throws IOException {
try (NodeEnvironment env = newNodeEnvironment()) {
MetaStateServiceWithFailures metaStateService =