2018-12-20 08:02:44 -05:00
|
|
|
[[modules-discovery-adding-removing-nodes]]
|
|
|
|
=== Adding and removing nodes
|
|
|
|
|
|
|
|
As nodes are added or removed Elasticsearch maintains an optimal level of fault
|
|
|
|
tolerance by automatically updating the cluster's _voting configuration_, which
|
|
|
|
is the set of <<master-node,master-eligible nodes>> whose responses are counted
|
|
|
|
when making decisions such as electing a new master or committing a new cluster
|
|
|
|
state.
|
|
|
|
|
|
|
|
It is recommended to have a small and fixed number of master-eligible nodes in a
|
|
|
|
cluster, and to scale the cluster up and down by adding and removing
|
|
|
|
master-ineligible nodes only. However there are situations in which it may be
|
|
|
|
desirable to add or remove some master-eligible nodes to or from a cluster.
|
|
|
|
|
2019-01-07 12:11:14 -05:00
|
|
|
[[modules-discovery-adding-nodes]]
|
2018-12-20 08:02:44 -05:00
|
|
|
==== Adding master-eligible nodes
|
|
|
|
|
2018-12-21 14:24:48 -05:00
|
|
|
If you wish to add some nodes to your cluster, simply configure the new nodes
|
|
|
|
to find the existing cluster and start them up. Elasticsearch adds the new nodes
|
|
|
|
to the voting configuration if it is appropriate to do so.
|
|
|
|
|
|
|
|
During master election or when joining an existing formed cluster, a node
|
|
|
|
sends a join request to the master in order to be officially added to the
|
|
|
|
cluster. You can use the `cluster.join.timeout` setting to configure how long a
|
|
|
|
node waits after sending a request to join a cluster. Its default value is `30s`.
|
|
|
|
See <<modules-discovery-settings>>.
|
2018-12-20 08:02:44 -05:00
|
|
|
|
2019-01-07 12:11:14 -05:00
|
|
|
[[modules-discovery-removing-nodes]]
|
2018-12-20 08:02:44 -05:00
|
|
|
==== Removing master-eligible nodes
|
|
|
|
|
|
|
|
When removing master-eligible nodes, it is important not to remove too many all
|
|
|
|
at the same time. For instance, if there are currently seven master-eligible
|
|
|
|
nodes and you wish to reduce this to three, it is not possible simply to stop
|
|
|
|
four of the nodes at once: to do so would leave only three nodes remaining,
|
|
|
|
which is less than half of the voting configuration, which means the cluster
|
|
|
|
cannot take any further actions.
|
|
|
|
|
2019-06-03 12:20:47 -04:00
|
|
|
More precisely, if you shut down half or more of the master-eligible nodes all
|
|
|
|
at the same time then the cluster will normally become unavailable. If this
|
|
|
|
happens then you can bring the cluster back online by starting the removed
|
|
|
|
nodes again.
|
|
|
|
|
2018-12-20 08:02:44 -05:00
|
|
|
As long as there are at least three master-eligible nodes in the cluster, as a
|
|
|
|
general rule it is best to remove nodes one-at-a-time, allowing enough time for
|
|
|
|
the cluster to <<modules-discovery-quorums,automatically adjust>> the voting
|
|
|
|
configuration and adapt the fault tolerance level to the new set of nodes.
|
|
|
|
|
|
|
|
If there are only two master-eligible nodes remaining then neither node can be
|
2019-06-03 12:20:47 -04:00
|
|
|
safely removed since both are required to reliably make progress. To remove one
|
|
|
|
of these nodes you must first inform {es} that it should not be part of the
|
|
|
|
voting configuration, and that the voting power should instead be given to the
|
|
|
|
other node. You can then take the excluded node offline without preventing the
|
|
|
|
other node from making progress. A node which is added to a voting
|
|
|
|
configuration exclusion list still works normally, but {es} tries to remove it
|
|
|
|
from the voting configuration so its vote is no longer required. Importantly,
|
|
|
|
{es} will never automatically move a node on the voting exclusions list back
|
|
|
|
into the voting configuration. Once an excluded node has been successfully
|
2018-12-20 08:02:44 -05:00
|
|
|
auto-reconfigured out of the voting configuration, it is safe to shut it down
|
|
|
|
without affecting the cluster's master-level availability. A node can be added
|
2019-06-03 12:20:47 -04:00
|
|
|
to the voting configuration exclusion list using the
|
|
|
|
<<voting-config-exclusions>> API. For example:
|
2018-12-20 08:02:44 -05:00
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
2019-06-03 12:20:47 -04:00
|
|
|
# Add node to voting configuration exclusions list and wait for the system
|
|
|
|
# to auto-reconfigure the node out of the voting configuration up to the
|
|
|
|
# default timeout of 30 seconds
|
2018-12-20 08:02:44 -05:00
|
|
|
POST /_cluster/voting_config_exclusions/node_name
|
|
|
|
|
|
|
|
# Add node to voting configuration exclusions list and wait for
|
|
|
|
# auto-reconfiguration up to one minute
|
|
|
|
POST /_cluster/voting_config_exclusions/node_name?timeout=1m
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TEST[skip:this would break the test cluster if executed]
|
|
|
|
|
|
|
|
The node that should be added to the exclusions list is specified using
|
|
|
|
<<cluster-nodes,node filters>> in place of `node_name` here. If a call to the
|
|
|
|
voting configuration exclusions API fails, you can safely retry it. Only a
|
|
|
|
successful response guarantees that the node has actually been removed from the
|
2019-01-29 06:43:04 -05:00
|
|
|
voting configuration and will not be reinstated. If it's the active master that
|
|
|
|
was removed from the voting configuration, then it will abdicate to another
|
|
|
|
master-eligible node that's still in the voting configuration, if such a node
|
|
|
|
is available.
|
2018-12-20 08:02:44 -05:00
|
|
|
|
|
|
|
Although the voting configuration exclusions API is most useful for down-scaling
|
|
|
|
a two-node to a one-node cluster, it is also possible to use it to remove
|
|
|
|
multiple master-eligible nodes all at the same time. Adding multiple nodes to
|
|
|
|
the exclusions list has the system try to auto-reconfigure all of these nodes
|
|
|
|
out of the voting configuration, allowing them to be safely shut down while
|
|
|
|
keeping the cluster available. In the example described above, shrinking a
|
|
|
|
seven-master-node cluster down to only have three master nodes, you could add
|
|
|
|
four nodes to the exclusions list, wait for confirmation, and then shut them
|
|
|
|
down simultaneously.
|
|
|
|
|
|
|
|
NOTE: Voting exclusions are only required when removing at least half of the
|
|
|
|
master-eligible nodes from a cluster in a short time period. They are not
|
|
|
|
required when removing master-ineligible nodes, nor are they required when
|
|
|
|
removing fewer than half of the master-eligible nodes.
|
|
|
|
|
|
|
|
Adding an exclusion for a node creates an entry for that node in the voting
|
|
|
|
configuration exclusions list, which has the system automatically try to
|
|
|
|
reconfigure the voting configuration to remove that node and prevents it from
|
|
|
|
returning to the voting configuration once it has removed. The current list of
|
|
|
|
exclusions is stored in the cluster state and can be inspected as follows:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
|
2018-12-21 14:24:48 -05:00
|
|
|
This list is limited in size by the `cluster.max_voting_config_exclusions`
|
|
|
|
setting, which defaults to `10`. See <<modules-discovery-settings>>. Since
|
|
|
|
voting configuration exclusions are persistent and limited in number, they must
|
|
|
|
be cleaned up. Normally an exclusion is added when performing some maintenance
|
|
|
|
on the cluster, and the exclusions should be cleaned up when the maintenance is
|
|
|
|
complete. Clusters should have no voting configuration exclusions in normal
|
|
|
|
operation.
|
2018-12-20 08:02:44 -05:00
|
|
|
|
|
|
|
If a node is excluded from the voting configuration because it is to be shut
|
|
|
|
down permanently, its exclusion can be removed after it is shut down and removed
|
|
|
|
from the cluster. Exclusions can also be cleared if they were created in error
|
|
|
|
or were only required temporarily:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
# Wait for all the nodes with voting configuration exclusions to be removed from
|
|
|
|
# the cluster and then remove all the exclusions, allowing any node to return to
|
|
|
|
# the voting configuration in the future.
|
|
|
|
DELETE /_cluster/voting_config_exclusions
|
|
|
|
|
|
|
|
# Immediately remove all the voting configuration exclusions, allowing any node
|
|
|
|
# to return to the voting configuration in the future.
|
|
|
|
DELETE /_cluster/voting_config_exclusions?wait_for_removal=false
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|