Do not perform cleanup if Manifest write fails with dirty exception (#40519)
Currently, if Manifest write is unsuccessful (i.e. WriteStateException is thrown) we perform cleanup of newly created metadata files. However, this is wrong. Consider the following sequence (caught by CI here https://github.com/elastic/elasticsearch/issues/39077): - cluster global data is written **successful** - the associated manifest write **fails** (during the fsync, ie files have been written) - deleting (revert) the manifest files, **fails**, metadata is therefore persisted - deleting (revert) the cluster global data is **successful** In this case, when trying to load metadata (after node restart because of dirty WriteStateException), the following exception will happen ``` java.io.IOException: failed to find global metadata [generation: 0] ``` because the manifest file is referencing missing global metadata file. This commit checks if thrown WriteStateException is dirty and if its we don't perform any cleanup, because new Manifest file might be created, but its deletion has failed. In the future, we might add more fine-grained check - perform the clean up if WriteStateException is dirty, but Manifest deletion is successful. Closes https://github.com/elastic/elasticsearch/issues/39077 (cherry picked from commit 1fac56916bb3c4f3333c639e59188dbe743e385b)
This commit is contained in:
parent
7cc79123df
commit
287e334ef3
|
@ -320,7 +320,14 @@ public class GatewayMetaState implements ClusterStateApplier, CoordinationState.
|
|||
finished = true;
|
||||
return generation;
|
||||
} catch (WriteStateException e) {
|
||||
rollback();
|
||||
// if Manifest write results in dirty WriteStateException it's not safe to remove
|
||||
// new metadata files, because if Manifest was actually written to disk and its deletion
|
||||
// fails it will reference these new metadata files.
|
||||
// In the future, we might decide to add more fine grained check to understand if after
|
||||
// WriteStateException Manifest deletion has actually failed.
|
||||
if (e.isDirty() == false) {
|
||||
rollback();
|
||||
}
|
||||
throw e;
|
||||
}
|
||||
}
|
||||
|
|
|
@ -374,7 +374,6 @@ public class GatewayMetaStateTests extends ESAllocationTestCase {
|
|||
return builder.build();
|
||||
}
|
||||
|
||||
@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/39077")
|
||||
public void testAtomicityWithFailures() throws IOException {
|
||||
try (NodeEnvironment env = newNodeEnvironment()) {
|
||||
MetaStateServiceWithFailures metaStateService =
|
||||
|
|
Loading…
Reference in New Issue