483 lines
20 KiB
Plaintext
483 lines
20 KiB
Plaintext
[[node-tool]]
|
|
== elasticsearch-node
|
|
|
|
The `elasticsearch-node` command enables you to perform certain unsafe
|
|
operations on a node that are only possible while it is shut down. This command
|
|
allows you to adjust the <<modules-node,role>> of a node and may be able to
|
|
recover some data after a disaster or start a node even if it is incompatible
|
|
with the data on disk.
|
|
|
|
[float]
|
|
=== Synopsis
|
|
|
|
[source,shell]
|
|
--------------------------------------------------
|
|
bin/elasticsearch-node repurpose|unsafe-bootstrap|detach-cluster|override-version
|
|
[--ordinal <Integer>] [-E <KeyValuePair>]
|
|
[-h, --help] ([-s, --silent] | [-v, --verbose])
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
=== Description
|
|
|
|
This tool has four modes:
|
|
|
|
* `elasticsearch-node repurpose` can be used to delete unwanted data from a
|
|
node if it used to be a <<data-node,data node>> or a
|
|
<<master-node,master-eligible node>> but has been repurposed not to have one
|
|
or other of these roles.
|
|
|
|
* `elasticsearch-node unsafe-bootstrap` can be used to perform _unsafe cluster
|
|
bootstrapping_. It forces one of the nodes to form a brand-new cluster on
|
|
its own, using its local copy of the cluster metadata.
|
|
|
|
* `elasticsearch-node detach-cluster` enables you to move nodes from one
|
|
cluster to another. This can be used to move nodes into a new cluster
|
|
created with the `elasticsearch-node unsafe-bootstap` command. If unsafe
|
|
cluster bootstrapping was not possible, it also enables you to move nodes
|
|
into a brand-new cluster.
|
|
|
|
* `elasticsearch-node override-version` enables you to start up a node
|
|
even if the data in the data path was written by an incompatible version of
|
|
{es}. This may sometimes allow you to downgrade to an earlier version of
|
|
{es}.
|
|
|
|
[[node-tool-repurpose]]
|
|
[float]
|
|
==== Changing the role of a node
|
|
|
|
There may be situations where you want to repurpose a node without following
|
|
the <<change-node-role,proper repurposing processes>>. The `elasticsearch-node
|
|
repurpose` tool allows you to delete any excess on-disk data and start a node
|
|
after repurposing it.
|
|
|
|
The intended use is:
|
|
|
|
* Stop the node
|
|
* Update `elasticsearch.yml` by setting `node.master` and `node.data` as
|
|
desired.
|
|
* Run `elasticsearch-node repurpose` on the node
|
|
* Start the node
|
|
|
|
If you run `elasticsearch-node repurpose` on a node with `node.data: false` and
|
|
`node.master: true` then it will delete any remaining shard data on that node,
|
|
but it will leave the index and cluster metadata alone. If you run
|
|
`elasticsearch-node repurpose` on a node with `node.data: false` and
|
|
`node.master: false` then it will delete any remaining shard data and index
|
|
metadata, but it will leave the cluster metadata alone.
|
|
|
|
[WARNING]
|
|
Running this command can lead to data loss for the indices mentioned if the
|
|
data contained is not available on other nodes in the cluster. Only run this
|
|
tool if you understand and accept the possible consequences, and only after
|
|
determining that the node cannot be repurposed cleanly.
|
|
|
|
The tool provides a summary of the data to be deleted and asks for confirmation
|
|
before making any changes. You can get detailed information about the affected
|
|
indices and shards by passing the verbose (`-v`) option.
|
|
|
|
[float]
|
|
==== Recovering data after a disaster
|
|
|
|
Sometimes {es} nodes are temporarily stopped, perhaps because of the need to
|
|
perform some maintenance activity or perhaps because of a hardware failure.
|
|
After you resolve the temporary condition and restart the node,
|
|
it will rejoin the cluster and continue normally. Depending on your
|
|
configuration, your cluster may be able to remain completely available even
|
|
while one or more of its nodes are stopped.
|
|
|
|
Sometimes it might not be possible to restart a node after it has stopped. For
|
|
example, the node's host may suffer from a hardware problem that cannot be
|
|
repaired. If the cluster is still available then you can start up a fresh node
|
|
on another host and {es} will bring this node into the cluster in place of the
|
|
failed node.
|
|
|
|
Each node stores its data in the data directories defined by the
|
|
<<path-settings,`path.data` setting>>. This means that in a disaster you can
|
|
also restart a node by moving its data directories to another host, presuming
|
|
that those data directories can be recovered from the faulty host.
|
|
|
|
{es} <<modules-discovery-quorums,requires a response from a majority of the
|
|
master-eligible nodes>> in order to elect a master and to update the cluster
|
|
state. This means that if you have three master-eligible nodes then the cluster
|
|
will remain available even if one of them has failed. However if two of the
|
|
three master-eligible nodes fail then the cluster will be unavailable until at
|
|
least one of them is restarted.
|
|
|
|
In very rare circumstances it may not be possible to restart enough nodes to
|
|
restore the cluster's availability. If such a disaster occurs, you should
|
|
build a new cluster from a recent snapshot and re-import any data that was
|
|
ingested since that snapshot was taken.
|
|
|
|
However, if the disaster is serious enough then it may not be possible to
|
|
recover from a recent snapshot either. Unfortunately in this case there is no
|
|
way forward that does not risk data loss, but it may be possible to use the
|
|
`elasticsearch-node` tool to construct a new cluster that contains some of the
|
|
data from the failed cluster.
|
|
|
|
[[node-tool-override-version]]
|
|
[float]
|
|
==== Bypassing version checks
|
|
|
|
The data that {es} writes to disk is designed to be read by the current version
|
|
and a limited set of future versions. It cannot generally be read by older
|
|
versions, nor by versions that are more than one major version newer. The data
|
|
stored on disk includes the version of the node that wrote it, and {es} checks
|
|
that it is compatible with this version when starting up.
|
|
|
|
In rare circumstances it may be desirable to bypass this check and start up an
|
|
{es} node using data that was written by an incompatible version. This may not
|
|
work if the format of the stored data has changed, and it is a risky process
|
|
because it is possible for the format to change in ways that {es} may
|
|
misinterpret, silently leading to data loss.
|
|
|
|
To bypass this check, you can use the `elasticsearch-node override-version`
|
|
tool to overwrite the version number stored in the data path with the current
|
|
version, causing {es} to believe that it is compatible with the on-disk data.
|
|
|
|
[[node-tool-unsafe-bootstrap]]
|
|
[float]
|
|
===== Unsafe cluster bootstrapping
|
|
|
|
If there is at least one remaining master-eligible node, but it is not possible
|
|
to restart a majority of them, then the `elasticsearch-node unsafe-bootstrap`
|
|
command will unsafely override the cluster's <<modules-discovery-voting,voting
|
|
configuration>> as if performing another
|
|
<<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.
|
|
The target node can then form a new cluster on its own by using
|
|
the cluster metadata held locally on the target node.
|
|
|
|
[WARNING]
|
|
These steps can lead to arbitrary data loss since the target node may not hold the latest cluster
|
|
metadata, and this out-of-date metadata may make it impossible to use some or
|
|
all of the indices in the cluster.
|
|
|
|
Since unsafe bootstrapping forms a new cluster containing a single node, once
|
|
you have run it you must use the <<node-tool-detach-cluster,`elasticsearch-node
|
|
detach-cluster` tool>> to migrate any other surviving nodes from the failed
|
|
cluster into this new cluster.
|
|
|
|
When you run the `elasticsearch-node unsafe-bootstrap` tool it will analyse the
|
|
state of the node and ask for confirmation before taking any action. Before
|
|
asking for confirmation it reports the term and version of the cluster state on
|
|
the node on which it runs as follows:
|
|
|
|
[source,txt]
|
|
----
|
|
Current node cluster state (term, version) pair is (4, 12)
|
|
----
|
|
|
|
If you have a choice of nodes on which to run this tool then you should choose
|
|
one with a term that is as large as possible. If there is more than one
|
|
node with the same term, pick the one with the largest version.
|
|
This information identifies the node with the freshest cluster state, which minimizes the
|
|
quantity of data that might be lost. For example, if the first node reports
|
|
`(4, 12)` and a second node reports `(5, 3)`, then the second node is preferred
|
|
since its term is larger. However if the second node reports `(3, 17)` then
|
|
the first node is preferred since its term is larger. If the second node
|
|
reports `(4, 10)` then it has the same term as the first node, but has a
|
|
smaller version, so the first node is preferred.
|
|
|
|
[WARNING]
|
|
Running this command can lead to arbitrary data loss. Only run this tool if you
|
|
understand and accept the possible consequences and have exhausted all other
|
|
possibilities for recovery of your cluster.
|
|
|
|
The sequence of operations for using this tool are as follows:
|
|
|
|
1. Make sure you have really lost access to at least half of the
|
|
master-eligible nodes in the cluster, and they cannot be repaired or recovered
|
|
by moving their data paths to healthy hardware.
|
|
2. Stop **all** remaining nodes.
|
|
3. Choose one of the remaining master-eligible nodes to become the new elected
|
|
master as described above.
|
|
4. On this node, run the `elasticsearch-node unsafe-bootstrap` command as shown
|
|
below. Verify that the tool reported `Master node was successfully
|
|
bootstrapped`.
|
|
5. Start this node and verify that it is elected as the master node.
|
|
6. Run the <<node-tool-detach-cluster,`elasticsearch-node detach-cluster`
|
|
tool>>, described below, on every other node in the cluster.
|
|
7. Start all other nodes and verify that each one joins the cluster.
|
|
8. Investigate the data in the cluster to discover if any was lost during this
|
|
process.
|
|
|
|
When you run the tool it will make sure that the node that is being used to
|
|
bootstrap the cluster is not running. It is important that all other
|
|
master-eligible nodes are also stopped while this tool is running, but the tool
|
|
does not check this.
|
|
|
|
The message `Master node was successfully bootstrapped` does not mean that
|
|
there has been no data loss, it just means that tool was able to complete its
|
|
job.
|
|
|
|
[[node-tool-detach-cluster]]
|
|
[float]
|
|
===== Detaching nodes from their cluster
|
|
|
|
It is unsafe for nodes to move between clusters, because different clusters
|
|
have completely different cluster metadata. There is no way to safely merge the
|
|
metadata from two clusters together.
|
|
|
|
To protect against inadvertently joining the wrong cluster, each cluster
|
|
creates a unique identifier, known as the _cluster UUID_, when it first starts
|
|
up. Every node records the UUID of its cluster and refuses to join a
|
|
cluster with a different UUID.
|
|
|
|
However, if a node's cluster has permanently failed then it may be desirable to
|
|
try and move it into a new cluster. The `elasticsearch-node detach-cluster`
|
|
command lets you detach a node from its cluster by resetting its cluster UUID.
|
|
It can then join another cluster with a different UUID.
|
|
|
|
For example, after unsafe cluster bootstrapping you will need to detach all the
|
|
other surviving nodes from their old cluster so they can join the new,
|
|
unsafely-bootstrapped cluster.
|
|
|
|
Unsafe cluster bootstrapping is only possible if there is at least one
|
|
surviving master-eligible node. If there are no remaining master-eligible nodes
|
|
then the cluster metadata is completely lost. However, the individual data
|
|
nodes also contain a copy of the index metadata corresponding with their
|
|
shards. This sometimes allows a new cluster to import these shards as
|
|
<<modules-gateway-dangling-indices,dangling indices>>. You can sometimes
|
|
recover some indices after the loss of all master-eligible nodes in a cluster
|
|
by creating a new cluster and then using the `elasticsearch-node
|
|
detach-cluster` command to move any surviving nodes into this new cluster.
|
|
|
|
There is a risk of data loss when importing a dangling index because data nodes
|
|
may not have the most recent copy of the index metadata and do not have any
|
|
information about <<docs-replication,which shard copies are in-sync>>. This
|
|
means that a stale shard copy may be selected to be the primary, and some of
|
|
the shards may be incompatible with the imported mapping.
|
|
|
|
[WARNING]
|
|
Execution of this command can lead to arbitrary data loss. Only run this tool
|
|
if you understand and accept the possible consequences and have exhausted all
|
|
other possibilities for recovery of your cluster.
|
|
|
|
The sequence of operations for using this tool are as follows:
|
|
|
|
1. Make sure you have really lost access to every one of the master-eligible
|
|
nodes in the cluster, and they cannot be repaired or recovered by moving their
|
|
data paths to healthy hardware.
|
|
2. Start a new cluster and verify that it is healthy. This cluster may comprise
|
|
one or more brand-new master-eligible nodes, or may be an unsafely-bootstrapped
|
|
cluster formed as described above.
|
|
3. Stop **all** remaining data nodes.
|
|
4. On each data node, run the `elasticsearch-node detach-cluster` tool as shown
|
|
below. Verify that the tool reported `Node was successfully detached from the
|
|
cluster`.
|
|
5. If necessary, configure each data node to
|
|
<<modules-discovery-hosts-providers,discover the new cluster>>.
|
|
6. Start each data node and verify that it has joined the new cluster.
|
|
7. Wait for all recoveries to have completed, and investigate the data in the
|
|
cluster to discover if any was lost during this process.
|
|
|
|
The message `Node was successfully detached from the cluster` does not mean
|
|
that there has been no data loss, it just means that tool was able to complete
|
|
its job.
|
|
|
|
|
|
[float]
|
|
=== Parameters
|
|
|
|
`repurpose`:: Delete excess data when a node's roles are changed.
|
|
|
|
`unsafe-bootstrap`:: Specifies to unsafely bootstrap this node as a new
|
|
one-node cluster.
|
|
|
|
`detach-cluster`:: Specifies to unsafely detach this node from its cluster so
|
|
it can join a different cluster.
|
|
|
|
`override-version`:: Overwrites the version number stored in the data path so
|
|
that a node can start despite being incompatible with the on-disk data.
|
|
|
|
`--ordinal <Integer>`:: If there is <<max-local-storage-nodes,more than one
|
|
node sharing a data path>> then this specifies which node to target. Defaults
|
|
to `0`, meaning to use the first node in the data path.
|
|
|
|
`-E <KeyValuePair>`:: Configures a setting.
|
|
|
|
`-h, --help`:: Returns all of the command parameters.
|
|
|
|
`-s, --silent`:: Shows minimal output.
|
|
|
|
`-v, --verbose`:: Shows verbose output.
|
|
|
|
[float]
|
|
=== Examples
|
|
|
|
[float]
|
|
==== Repurposing a node as a dedicated master node (master: true, data: false)
|
|
|
|
In this example, a former data node is repurposed as a dedicated master node.
|
|
First update the node's settings to `node.master: true` and `node.data: false`
|
|
in its `elasticsearch.yml` config file. Then run the `elasticsearch-node
|
|
repurpose` command to find and remove excess shard data:
|
|
|
|
[source,txt]
|
|
----
|
|
node$ ./bin/elasticsearch-node repurpose
|
|
|
|
WARNING: Elasticsearch MUST be stopped before running this tool.
|
|
|
|
Found 2 shards in 2 indices to clean up
|
|
Use -v to see list of paths and indices affected
|
|
Node is being re-purposed as master and no-data. Clean-up of shard data will be performed.
|
|
Do you want to proceed?
|
|
Confirm [y/N] y
|
|
Node successfully repurposed to master and no-data.
|
|
----
|
|
|
|
[float]
|
|
==== Repurposing a node as a coordinating-only node (master: false, data: false)
|
|
|
|
In this example, a node that previously held data is repurposed as a
|
|
coordinating-only node. First update the node's settings to `node.master:
|
|
false` and `node.data: false` in its `elasticsearch.yml` config file. Then run
|
|
the `elasticsearch-node repurpose` command to find and remove excess shard data
|
|
and index metadata:
|
|
|
|
[source,txt]
|
|
----
|
|
node$./bin/elasticsearch-node repurpose
|
|
|
|
WARNING: Elasticsearch MUST be stopped before running this tool.
|
|
|
|
Found 2 indices (2 shards and 2 index meta data) to clean up
|
|
Use -v to see list of paths and indices affected
|
|
Node is being re-purposed as no-master and no-data. Clean-up of index data will be performed.
|
|
Do you want to proceed?
|
|
Confirm [y/N] y
|
|
Node successfully repurposed to no-master and no-data.
|
|
----
|
|
|
|
[float]
|
|
==== Unsafe cluster bootstrapping
|
|
|
|
Suppose your cluster had five master-eligible nodes and you have permanently
|
|
lost three of them, leaving two nodes remaining.
|
|
|
|
* Run the tool on the first remaining node, but answer `n` at the confirmation
|
|
step.
|
|
|
|
[source,txt]
|
|
----
|
|
node_1$ ./bin/elasticsearch-node unsafe-bootstrap
|
|
|
|
WARNING: Elasticsearch MUST be stopped before running this tool.
|
|
|
|
Current node cluster state (term, version) pair is (4, 12)
|
|
|
|
You should only run this tool if you have permanently lost half or more
|
|
of the master-eligible nodes in this cluster, and you cannot restore the
|
|
cluster from a snapshot. This tool can cause arbitrary data loss and its
|
|
use should be your last resort. If you have multiple surviving master
|
|
eligible nodes, you should run this tool on the node with the highest
|
|
cluster state (term, version) pair.
|
|
|
|
Do you want to proceed?
|
|
|
|
Confirm [y/N] n
|
|
----
|
|
|
|
* Run the tool on the second remaining node, and again answer `n` at the
|
|
confirmation step.
|
|
|
|
[source,txt]
|
|
----
|
|
node_2$ ./bin/elasticsearch-node unsafe-bootstrap
|
|
|
|
WARNING: Elasticsearch MUST be stopped before running this tool.
|
|
|
|
Current node cluster state (term, version) pair is (5, 3)
|
|
|
|
You should only run this tool if you have permanently lost half or more
|
|
of the master-eligible nodes in this cluster, and you cannot restore the
|
|
cluster from a snapshot. This tool can cause arbitrary data loss and its
|
|
use should be your last resort. If you have multiple surviving master
|
|
eligible nodes, you should run this tool on the node with the highest
|
|
cluster state (term, version) pair.
|
|
|
|
Do you want to proceed?
|
|
|
|
Confirm [y/N] n
|
|
----
|
|
|
|
* Since the second node has a greater term it has a fresher cluster state, so
|
|
it is better to unsafely bootstrap the cluster using this node:
|
|
|
|
[source,txt]
|
|
----
|
|
node_2$ ./bin/elasticsearch-node unsafe-bootstrap
|
|
|
|
WARNING: Elasticsearch MUST be stopped before running this tool.
|
|
|
|
Current node cluster state (term, version) pair is (5, 3)
|
|
|
|
You should only run this tool if you have permanently lost half or more
|
|
of the master-eligible nodes in this cluster, and you cannot restore the
|
|
cluster from a snapshot. This tool can cause arbitrary data loss and its
|
|
use should be your last resort. If you have multiple surviving master
|
|
eligible nodes, you should run this tool on the node with the highest
|
|
cluster state (term, version) pair.
|
|
|
|
Do you want to proceed?
|
|
|
|
Confirm [y/N] y
|
|
Master node was successfully bootstrapped
|
|
----
|
|
|
|
[float]
|
|
==== Detaching nodes from their cluster
|
|
|
|
After unsafely bootstrapping a new cluster, run the `elasticsearch-node
|
|
detach-cluster` command to detach all remaining nodes from the failed cluster
|
|
so they can join the new cluster:
|
|
|
|
[source, txt]
|
|
----
|
|
node_3$ ./bin/elasticsearch-node detach-cluster
|
|
|
|
WARNING: Elasticsearch MUST be stopped before running this tool.
|
|
|
|
You should only run this tool if you have permanently lost all of the
|
|
master-eligible nodes in this cluster and you cannot restore the cluster
|
|
from a snapshot, or you have already unsafely bootstrapped a new cluster
|
|
by running `elasticsearch-node unsafe-bootstrap` on a master-eligible
|
|
node that belonged to the same cluster as this node. This tool can cause
|
|
arbitrary data loss and its use should be your last resort.
|
|
|
|
Do you want to proceed?
|
|
|
|
Confirm [y/N] y
|
|
Node was successfully detached from the cluster
|
|
----
|
|
|
|
[float]
|
|
==== Bypassing version checks
|
|
|
|
Run the `elasticsearch-node override-version` command to overwrite the version
|
|
stored in the data path so that a node can start despite being incompatible
|
|
with the data stored in the data path:
|
|
|
|
[source, txt]
|
|
----
|
|
node$ ./bin/elasticsearch-node override-version
|
|
|
|
WARNING: Elasticsearch MUST be stopped before running this tool.
|
|
|
|
This data path was last written by Elasticsearch version [x.x.x] and may no
|
|
longer be compatible with Elasticsearch version [y.y.y]. This tool will bypass
|
|
this compatibility check, allowing a version [y.y.y] node to start on this data
|
|
path, but a version [y.y.y] node may not be able to read this data or may read
|
|
it incorrectly leading to data loss.
|
|
|
|
You should not use this tool. Instead, continue to use a version [x.x.x] node
|
|
on this data path. If necessary, you can use reindex-from-remote to copy the
|
|
data from here into an older cluster.
|
|
|
|
Do you want to proceed?
|
|
|
|
Confirm [y/N] y
|
|
Successfully overwrote this node's metadata to bypass its version compatibility checks.
|
|
----
|