Force flush in FullClusterRestartIT#testRecovery (#46956)

If peer recovery happens after indexing, and indexing flushes some shard
at the end, then the explicit flush in the test will be a noop. Then
replicas will have some uncommitted translog , which is transferred in
peer recovery, although all of these operations are in the commit
already. If that replica becomes primary (after we restarted the
cluster), it will have translog to replay and the test will fail.

Another issue in this test is that synced_flush is not a replication
action, then the global checkpoint on replicas might be not up to date.
We need to either wait for the global checkpoint to be synced or call a
replication action to sync it.

Closes #46712
This commit is contained in:
Nhat Nguyen 2019-09-22 16:55:31 -04:00
parent ee4e6b1382
commit e8515d1d13
1 changed files with 6 additions and 1 deletions

View File

@ -737,6 +737,8 @@ public class FullClusterRestartIT extends AbstractFullClusterRestartTestCase {
ensureGreen(index); ensureGreen(index);
// Recovering a synced-flush index from 5.x to 6.x might be subtle as a 5.x index commit does not have all 6.x commit tags. // Recovering a synced-flush index from 5.x to 6.x might be subtle as a 5.x index commit does not have all 6.x commit tags.
if (randomBoolean()) { if (randomBoolean()) {
// needs to call a replication action to sync the global checkpoint from primaries to replication.
assertOK(client().performRequest(new Request("POST", "/" + index + "/_refresh")));
// We have to spin synced-flush requests here because we fire the global checkpoint sync for the last write operation. // We have to spin synced-flush requests here because we fire the global checkpoint sync for the last write operation.
// A synced-flush request considers the global checkpoint sync as an going operation because it acquires a shard permit. // A synced-flush request considers the global checkpoint sync as an going operation because it acquires a shard permit.
assertBusy(() -> { assertBusy(() -> {
@ -751,7 +753,10 @@ public class FullClusterRestartIT extends AbstractFullClusterRestartTestCase {
}); });
} else { } else {
// Explicitly flush so we're sure to have a bunch of documents in the Lucene index // Explicitly flush so we're sure to have a bunch of documents in the Lucene index
assertOK(client().performRequest(new Request("POST", "/_flush"))); Request flushRequest = new Request("POST", "/" + index + "/_flush");
flushRequest.addParameter("force", "true");
flushRequest.addParameter("wait_if_ongoing", "true");
assertOK(client().performRequest(flushRequest));
} }
if (shouldHaveTranslog) { if (shouldHaveTranslog) {
// Update a few documents so we are sure to have a translog // Update a few documents so we are sure to have a translog