Allow truncation of clean translog (#47866)
Today the `elasticsearch-shard remove-corrupted-data` tool will only truncate a translog it determines to be corrupt. However there may be other cases in which it is desirable to truncate the translog, for instance if an operation in the translog cannot be replayed for some reason other than corruption. This commit adds a `--truncate-clean-translog` option to skip the corruption check on the translog and blindly truncate it.
This commit is contained in:
parent
58289a857d
commit
ba62eb3dce
|
@ -1,22 +1,36 @@
|
|||
[[shard-tool]]
|
||||
== elasticsearch-shard
|
||||
|
||||
In some cases the Lucene index or translog of a shard copy can become
|
||||
corrupted. The `elasticsearch-shard` command enables you to remove corrupted
|
||||
parts of the shard if a good copy of the shard cannot be recovered
|
||||
automatically or restored from backup.
|
||||
In some cases the Lucene index or translog of a shard copy can become corrupted.
|
||||
The `elasticsearch-shard` command enables you to remove corrupted parts of the
|
||||
shard if a good copy of the shard cannot be recovered automatically or restored
|
||||
from backup.
|
||||
|
||||
[WARNING]
|
||||
You will lose the corrupted data when you run `elasticsearch-shard`. This tool
|
||||
should only be used as a last resort if there is no way to recover from another
|
||||
copy of the shard or restore a snapshot.
|
||||
|
||||
When Elasticsearch detects that a shard's data is corrupted, it fails that
|
||||
shard copy and refuses to use it. Under normal conditions, the shard is
|
||||
automatically recovered from another copy. If no good copy of the shard is
|
||||
available and you cannot restore from backup, you can use `elasticsearch-shard`
|
||||
to remove the corrupted data and restore access to any remaining data in
|
||||
unaffected segments.
|
||||
[float]
|
||||
=== Synopsis
|
||||
|
||||
[source,shell]
|
||||
--------------------------------------------------
|
||||
bin/elasticsearch-shard remove-corrupted-data
|
||||
([--index <Index>] [--shard-id <ShardId>] | [--dir <IndexPath>])
|
||||
[--truncate-clean-translog]
|
||||
[-E <KeyValuePair>]
|
||||
[-h, --help] ([-s, --silent] | [-v, --verbose])
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Description
|
||||
|
||||
When {es} detects that a shard's data is corrupted, it fails that shard copy and
|
||||
refuses to use it. Under normal conditions, the shard is automatically recovered
|
||||
from another copy. If no good copy of the shard is available and you cannot
|
||||
restore one from a snapshot, you can use `elasticsearch-shard` to remove the
|
||||
corrupted data and restore access to any remaining data in unaffected segments.
|
||||
|
||||
[WARNING]
|
||||
Stop Elasticsearch before running `elasticsearch-shard`.
|
||||
|
@ -31,7 +45,7 @@ There are two ways to specify the path:
|
|||
translog files.
|
||||
|
||||
[float]
|
||||
=== Removing corrupted data
|
||||
==== Removing corrupted data
|
||||
|
||||
`elasticsearch-shard` analyses the shard copy and provides an overview of the
|
||||
corruption found. To proceed you must then confirm that you want to remove the
|
||||
|
@ -91,7 +105,7 @@ POST /_cluster/reroute
|
|||
]
|
||||
}
|
||||
|
||||
You must accept the possibility of data loss by changing parameter `accept_data_loss` to `true`.
|
||||
You must accept the possibility of data loss by changing the `accept_data_loss` parameter to `true`.
|
||||
|
||||
Deleted corrupt marker corrupted_FzTSBSuxT7i3Tls_TgwEag from /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/
|
||||
|
||||
|
@ -99,9 +113,11 @@ Deleted corrupt marker corrupted_FzTSBSuxT7i3Tls_TgwEag from /var/lib/elasticsea
|
|||
|
||||
When you use `elasticsearch-shard` to drop the corrupted data, the shard's
|
||||
allocation ID changes. After restarting the node, you must use the
|
||||
<<cluster-reroute,cluster reroute API>> to tell Elasticsearch to use the new
|
||||
ID. The `elasticsearch-shard` command shows the request that
|
||||
you need to submit.
|
||||
<<cluster-reroute,cluster reroute API>> to tell Elasticsearch to use the new ID.
|
||||
The `elasticsearch-shard` command shows the request that you need to submit.
|
||||
|
||||
You can also use the `-h` option to get a list of all options and parameters
|
||||
that the `elasticsearch-shard` tool supports.
|
||||
|
||||
Finally, you can use the `--truncate-clean-translog` option to truncate the
|
||||
shard's translog even if it does not appear to be corrupt.
|
||||
|
|
|
@ -80,6 +80,7 @@ public class RemoveCorruptedShardDataCommand extends EnvironmentAwareCommand {
|
|||
private final OptionSpec<String> folderOption;
|
||||
private final OptionSpec<String> indexNameOption;
|
||||
private final OptionSpec<Integer> shardIdOption;
|
||||
static final String TRUNCATE_CLEAN_TRANSLOG_FLAG = "truncate-clean-translog";
|
||||
|
||||
private final RemoveCorruptedLuceneSegmentsAction removeCorruptedLuceneSegmentsAction;
|
||||
private final TruncateTranslogAction truncateTranslogAction;
|
||||
|
@ -99,6 +100,8 @@ public class RemoveCorruptedShardDataCommand extends EnvironmentAwareCommand {
|
|||
.withRequiredArg()
|
||||
.ofType(Integer.class);
|
||||
|
||||
parser.accepts(TRUNCATE_CLEAN_TRANSLOG_FLAG, "Truncate the translog even if it is not corrupt");
|
||||
|
||||
namedXContentRegistry = new NamedXContentRegistry(ClusterModule.getNamedXWriteables());
|
||||
|
||||
removeCorruptedLuceneSegmentsAction = new RemoveCorruptedLuceneSegmentsAction();
|
||||
|
@ -320,8 +323,11 @@ public class RemoveCorruptedShardDataCommand extends EnvironmentAwareCommand {
|
|||
terminal.println("");
|
||||
|
||||
////////// Translog
|
||||
// as translog relies on data stored in an index commit - we have to have non unrecoverable index to truncate translog
|
||||
if (indexCleanStatus.v1() != CleanStatus.UNRECOVERABLE) {
|
||||
if (options.has(TRUNCATE_CLEAN_TRANSLOG_FLAG)) {
|
||||
translogCleanStatus = Tuple.tuple(CleanStatus.OVERRIDDEN,
|
||||
"Translog was not analysed and will be truncated due to the --" + TRUNCATE_CLEAN_TRANSLOG_FLAG + " flag");
|
||||
} else if (indexCleanStatus.v1() != CleanStatus.UNRECOVERABLE) {
|
||||
// translog relies on data stored in an index commit so we have to have a recoverable index to check the translog
|
||||
terminal.println("");
|
||||
terminal.println("Opening translog at " + translogPath);
|
||||
terminal.println("");
|
||||
|
@ -344,7 +350,8 @@ public class RemoveCorruptedShardDataCommand extends EnvironmentAwareCommand {
|
|||
final CleanStatus translogStatus = translogCleanStatus.v1();
|
||||
|
||||
if (indexStatus == CleanStatus.CLEAN && translogStatus == CleanStatus.CLEAN) {
|
||||
throw new ElasticsearchException("Shard does not seem to be corrupted at " + shardPath.getDataPath());
|
||||
throw new ElasticsearchException("Shard does not seem to be corrupted at " + shardPath.getDataPath()
|
||||
+ " (pass --" + TRUNCATE_CLEAN_TRANSLOG_FLAG + " to truncate the translog anyway)");
|
||||
}
|
||||
|
||||
if (indexStatus == CleanStatus.UNRECOVERABLE) {
|
||||
|
@ -493,7 +500,7 @@ public class RemoveCorruptedShardDataCommand extends EnvironmentAwareCommand {
|
|||
terminal.println("");
|
||||
terminal.println("POST /_cluster/reroute\n" + Strings.toString(commands, true, true));
|
||||
terminal.println("");
|
||||
terminal.println("You must accept the possibility of data loss by changing parameter `accept_data_loss` to `true`.");
|
||||
terminal.println("You must accept the possibility of data loss by changing the `accept_data_loss` parameter to `true`.");
|
||||
terminal.println("");
|
||||
}
|
||||
|
||||
|
@ -509,7 +516,8 @@ public class RemoveCorruptedShardDataCommand extends EnvironmentAwareCommand {
|
|||
CLEAN("clean"),
|
||||
CLEAN_WITH_CORRUPTED_MARKER("marked corrupted, but no corruption detected"),
|
||||
CORRUPTED("corrupted"),
|
||||
UNRECOVERABLE("corrupted and unrecoverable");
|
||||
UNRECOVERABLE("corrupted and unrecoverable"),
|
||||
OVERRIDDEN("to be truncated regardless of whether it is corrupt");
|
||||
|
||||
private final String msg;
|
||||
|
||||
|
|
|
@ -56,6 +56,8 @@ import java.util.Set;
|
|||
import java.util.regex.Matcher;
|
||||
import java.util.regex.Pattern;
|
||||
|
||||
import static org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.TRUNCATE_CLEAN_TRANSLOG_FLAG;
|
||||
import static org.hamcrest.Matchers.allOf;
|
||||
import static org.hamcrest.Matchers.containsString;
|
||||
import static org.hamcrest.Matchers.either;
|
||||
import static org.hamcrest.Matchers.equalTo;
|
||||
|
@ -349,6 +351,39 @@ public class RemoveCorruptedShardDataCommandTests extends IndexShardTestCase {
|
|||
shardPath -> assertThat(shardPath.resolveIndex(), equalTo(indexPath)));
|
||||
}
|
||||
|
||||
public void testFailsOnCleanIndex() throws Exception {
|
||||
indexDocs(indexShard, true);
|
||||
closeShards(indexShard);
|
||||
|
||||
final RemoveCorruptedShardDataCommand command = new RemoveCorruptedShardDataCommand();
|
||||
final MockTerminal t = new MockTerminal();
|
||||
final OptionParser parser = command.getParser();
|
||||
|
||||
final OptionSet options = parser.parse("-d", translogPath.toString());
|
||||
t.setVerbosity(Terminal.Verbosity.VERBOSE);
|
||||
assertThat(expectThrows(ElasticsearchException.class, () -> command.execute(t, options, environment)).getMessage(),
|
||||
allOf(containsString("Shard does not seem to be corrupted"), containsString("--" + TRUNCATE_CLEAN_TRANSLOG_FLAG)));
|
||||
assertThat(t.getOutput(), containsString("Lucene index is clean"));
|
||||
assertThat(t.getOutput(), containsString("Translog is clean"));
|
||||
}
|
||||
|
||||
public void testTruncatesCleanTranslogIfRequested() throws Exception {
|
||||
indexDocs(indexShard, true);
|
||||
closeShards(indexShard);
|
||||
|
||||
final RemoveCorruptedShardDataCommand command = new RemoveCorruptedShardDataCommand();
|
||||
final MockTerminal t = new MockTerminal();
|
||||
final OptionParser parser = command.getParser();
|
||||
|
||||
final OptionSet options = parser.parse("-d", translogPath.toString(), "--" + TRUNCATE_CLEAN_TRANSLOG_FLAG);
|
||||
t.addTextInput("y");
|
||||
t.setVerbosity(Terminal.Verbosity.VERBOSE);
|
||||
command.execute(t, options, environment);
|
||||
assertThat(t.getOutput(), containsString("Lucene index is clean"));
|
||||
assertThat(t.getOutput(), containsString("Translog was not analysed and will be truncated"));
|
||||
assertThat(t.getOutput(), containsString("Creating new empty translog"));
|
||||
}
|
||||
|
||||
public void testCleanWithCorruptionMarker() throws Exception {
|
||||
// index some docs in several segments
|
||||
final int numDocs = indexDocs(indexShard, true);
|
||||
|
|
Loading…
Reference in New Issue