[DOCS] Adds overview and API ref for cluster voting configurations (#36954)
This commit is contained in:
parent
1780ced82d
commit
f307847f29
|
@ -104,3 +104,5 @@ include::cluster/tasks.asciidoc[]
|
|||
include::cluster/nodes-hot-threads.asciidoc[]
|
||||
|
||||
include::cluster/allocation-explain.asciidoc[]
|
||||
|
||||
include::cluster/voting-exclusions.asciidoc[]
|
||||
|
|
|
@ -0,0 +1,76 @@
|
|||
[[voting-config-exclusions]]
|
||||
== Voting configuration exclusions API
|
||||
++++
|
||||
<titleabbrev>Voting Configuration Exclusions</titleabbrev>
|
||||
++++
|
||||
|
||||
Adds or removes master-eligible nodes from the
|
||||
<<modules-discovery-voting,voting configuration exclusion list>>.
|
||||
|
||||
[float]
|
||||
=== Request
|
||||
|
||||
`POST _cluster/voting_config_exclusions/<node_name>` +
|
||||
|
||||
`DELETE _cluster/voting_config_exclusions`
|
||||
|
||||
[float]
|
||||
=== Path parameters
|
||||
|
||||
`node_name`::
|
||||
A <<cluster-nodes,node filter>> that identifies {es} nodes.
|
||||
|
||||
[float]
|
||||
=== Description
|
||||
|
||||
By default, if there are more than three master-eligible nodes in the cluster
|
||||
and you remove fewer than half of the master-eligible nodes in the cluster at
|
||||
once, the <<modules-discovery-voting,voting configuration>> automatically
|
||||
shrinks.
|
||||
|
||||
If you want to shrink the voting configuration to contain fewer than three nodes
|
||||
or to remove half or more of the master-eligible nodes in the cluster at once,
|
||||
you must use this API to remove departed nodes from the voting configuration
|
||||
manually. It adds an entry for that node in the voting configuration exclusions
|
||||
list. The cluster then tries to reconfigure the voting configuration to remove
|
||||
that node and to prevent it from returning.
|
||||
|
||||
If the API fails, you can safely retry it. Only a successful response
|
||||
guarantees that the node has been removed from the voting configuration and will
|
||||
not be reinstated.
|
||||
|
||||
NOTE: Voting exclusions are required only when you remove at least half of the
|
||||
master-eligible nodes from a cluster in a short time period. They are not
|
||||
required when removing master-ineligible nodes or fewer than half of the
|
||||
master-eligible nodes.
|
||||
|
||||
The <<modules-discovery-settings,`cluster.max_voting_config_exclusions`
|
||||
setting>> limits the size of the voting configuration exclusion list. The
|
||||
default value is `10`. Since voting configuration exclusions are persistent and
|
||||
limited in number, you must clear the voting config exclusions list once the
|
||||
exclusions are no longer required.
|
||||
|
||||
There is also a
|
||||
<<modules-discovery-settings,`cluster.auto_shrink_voting_configuration` setting>>,
|
||||
which is set to true by default. If it is set to false, you must use this API to
|
||||
maintain the voting configuration.
|
||||
|
||||
For more information, see <<modules-discovery-removing-nodes>>.
|
||||
|
||||
[float]
|
||||
=== Examples
|
||||
|
||||
Add `nodeId1` to the voting configuration exclusions list:
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST /_cluster/voting_config_exclusions/nodeId1
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[catch:bad_request]
|
||||
|
||||
Remove all exclusions from the list:
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
DELETE /_cluster/voting_config_exclusions
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
|
@ -13,6 +13,16 @@ module. This module is divided into the following sections:
|
|||
unknown, such as when a node has just started up or when the previous
|
||||
master has failed.
|
||||
|
||||
<<modules-discovery-quorums>>::
|
||||
|
||||
This section describes how {es} uses a quorum-based voting mechanism to
|
||||
make decisions even if some nodes are unavailable.
|
||||
|
||||
<<modules-discovery-voting>>::
|
||||
|
||||
This section describes the concept of voting configurations, which {es}
|
||||
automatically updates as nodes leave and join the cluster.
|
||||
|
||||
<<modules-discovery-bootstrap-cluster>>::
|
||||
|
||||
Bootstrapping a cluster is required when an Elasticsearch cluster starts up
|
||||
|
@ -40,11 +50,10 @@ module. This module is divided into the following sections:
|
|||
Cluster state publishing is the process by which the elected master node
|
||||
updates the cluster state on all the other nodes in the cluster.
|
||||
|
||||
<<modules-discovery-quorums>>::
|
||||
<<cluster-fault-detection>>::
|
||||
|
||||
{es} performs health checks to detect and remove faulty nodes.
|
||||
|
||||
This section describes the detailed design behind the master election and
|
||||
auto-reconfiguration logic.
|
||||
|
||||
<<modules-discovery-settings,Settings>>::
|
||||
|
||||
There are settings that enable users to influence the discovery, cluster
|
||||
|
@ -52,14 +61,16 @@ module. This module is divided into the following sections:
|
|||
|
||||
include::discovery/discovery.asciidoc[]
|
||||
|
||||
include::discovery/quorums.asciidoc[]
|
||||
|
||||
include::discovery/voting.asciidoc[]
|
||||
|
||||
include::discovery/bootstrapping.asciidoc[]
|
||||
|
||||
include::discovery/adding-removing-nodes.asciidoc[]
|
||||
|
||||
include::discovery/publishing.asciidoc[]
|
||||
|
||||
include::discovery/quorums.asciidoc[]
|
||||
|
||||
include::discovery/fault-detection.asciidoc[]
|
||||
|
||||
include::discovery/discovery-settings.asciidoc[]
|
||||
include::discovery/discovery-settings.asciidoc[]
|
||||
|
|
|
@ -12,6 +12,7 @@ cluster, and to scale the cluster up and down by adding and removing
|
|||
master-ineligible nodes only. However there are situations in which it may be
|
||||
desirable to add or remove some master-eligible nodes to or from a cluster.
|
||||
|
||||
[[modules-discovery-adding-nodes]]
|
||||
==== Adding master-eligible nodes
|
||||
|
||||
If you wish to add some nodes to your cluster, simply configure the new nodes
|
||||
|
@ -24,6 +25,7 @@ cluster. You can use the `cluster.join.timeout` setting to configure how long a
|
|||
node waits after sending a request to join a cluster. Its default value is `30s`.
|
||||
See <<modules-discovery-settings>>.
|
||||
|
||||
[[modules-discovery-removing-nodes]]
|
||||
==== Removing master-eligible nodes
|
||||
|
||||
When removing master-eligible nodes, it is important not to remove too many all
|
||||
|
@ -50,7 +52,7 @@ will never automatically move a node on the voting exclusions list back into the
|
|||
voting configuration. Once an excluded node has been successfully
|
||||
auto-reconfigured out of the voting configuration, it is safe to shut it down
|
||||
without affecting the cluster's master-level availability. A node can be added
|
||||
to the voting configuration exclusion list using the following API:
|
||||
to the voting configuration exclusion list using the <<voting-config-exclusions>> API. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
|
|
@ -3,6 +3,15 @@
|
|||
|
||||
Discovery and cluster formation are affected by the following settings:
|
||||
|
||||
`cluster.auto_shrink_voting_configuration`::
|
||||
|
||||
Controls whether the <<modules-discovery-voting,voting configuration>>
|
||||
sheds departed nodes automatically, as long as it still contains at least 3
|
||||
nodes. The default value is `true`. If set to `false`, the voting
|
||||
configuration never shrinks automatically and you must remove departed
|
||||
nodes manually with the <<voting-config-exclusions,voting configuration
|
||||
exclusions API>>.
|
||||
|
||||
[[master-election-settings]]`cluster.election.back_off_time`::
|
||||
|
||||
Sets the amount to increase the upper bound on the wait before an election
|
||||
|
@ -152,9 +161,11 @@ APIs are not be blocked and can run on any available node.
|
|||
|
||||
Provides a list of master-eligible nodes in the cluster. The list contains
|
||||
either an array of hosts or a comma-delimited string. Each value has the
|
||||
format `host:port` or `host`, where `port` defaults to the setting `transport.profiles.default.port`. Note that IPv6 hosts must be bracketed.
|
||||
format `host:port` or `host`, where `port` defaults to the setting
|
||||
`transport.profiles.default.port`. Note that IPv6 hosts must be bracketed.
|
||||
The default value is `127.0.0.1, [::1]`. See <<unicast.hosts>>.
|
||||
|
||||
`discovery.zen.ping.unicast.hosts.resolve_timeout`::
|
||||
|
||||
Sets the amount of time to wait for DNS lookups on each round of discovery. This is specified as a <<time-units, time unit>> and defaults to `5s`.
|
||||
Sets the amount of time to wait for DNS lookups on each round of discovery.
|
||||
This is specified as a <<time-units, time unit>> and defaults to `5s`.
|
||||
|
|
|
@ -2,8 +2,9 @@
|
|||
=== Cluster fault detection
|
||||
|
||||
The elected master periodically checks each of the nodes in the cluster to
|
||||
ensure that they are still connected and healthy. Each node in the cluster also periodically checks the health of the elected master. These checks
|
||||
are known respectively as _follower checks_ and _leader checks_.
|
||||
ensure that they are still connected and healthy. Each node in the cluster also
|
||||
periodically checks the health of the elected master. These checks are known
|
||||
respectively as _follower checks_ and _leader checks_.
|
||||
|
||||
Elasticsearch allows these checks to occasionally fail or timeout without
|
||||
taking any action. It considers a node to be faulty only after a number of
|
||||
|
@ -16,4 +17,4 @@ and retry setting values and attempts to remove the node from the cluster.
|
|||
Similarly, if a node detects that the elected master has disconnected, this
|
||||
situation is treated as an immediate failure. The node bypasses the timeout and
|
||||
retry settings and restarts its discovery phase to try and find or elect a new
|
||||
master.
|
||||
master.
|
||||
|
|
|
@ -18,13 +18,13 @@ cluster. In many cases you can do this simply by starting or stopping the nodes
|
|||
as required. See <<modules-discovery-adding-removing-nodes>>.
|
||||
|
||||
As nodes are added or removed Elasticsearch maintains an optimal level of fault
|
||||
tolerance by updating the cluster's _voting configuration_, which is the set of
|
||||
master-eligible nodes whose responses are counted when making decisions such as
|
||||
electing a new master or committing a new cluster state. A decision is made only
|
||||
after more than half of the nodes in the voting configuration have responded.
|
||||
Usually the voting configuration is the same as the set of all the
|
||||
master-eligible nodes that are currently in the cluster. However, there are some
|
||||
situations in which they may be different.
|
||||
tolerance by updating the cluster's <<modules-discovery-voting,voting
|
||||
configuration>>, which is the set of master-eligible nodes whose responses are
|
||||
counted when making decisions such as electing a new master or committing a new
|
||||
cluster state. A decision is made only after more than half of the nodes in the
|
||||
voting configuration have responded. Usually the voting configuration is the
|
||||
same as the set of all the master-eligible nodes that are currently in the
|
||||
cluster. However, there are some situations in which they may be different.
|
||||
|
||||
To be sure that the cluster remains available you **must not stop half or more
|
||||
of the nodes in the voting configuration at the same time**. As long as more
|
||||
|
@ -38,46 +38,6 @@ cluster-state update that adjusts the voting configuration to match, and this
|
|||
can take a short time to complete. It is important to wait for this adjustment
|
||||
to complete before removing more nodes from the cluster.
|
||||
|
||||
[float]
|
||||
==== Setting the initial quorum
|
||||
|
||||
When a brand-new cluster starts up for the first time, it must elect its first
|
||||
master node. To do this election, it needs to know the set of master-eligible
|
||||
nodes whose votes should count. This initial voting configuration is known as
|
||||
the _bootstrap configuration_ and is set in the
|
||||
<<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.
|
||||
|
||||
It is important that the bootstrap configuration identifies exactly which nodes
|
||||
should vote in the first election. It is not sufficient to configure each node
|
||||
with an expectation of how many nodes there should be in the cluster. It is also
|
||||
important to note that the bootstrap configuration must come from outside the
|
||||
cluster: there is no safe way for the cluster to determine the bootstrap
|
||||
configuration correctly on its own.
|
||||
|
||||
If the bootstrap configuration is not set correctly, when you start a brand-new
|
||||
cluster there is a risk that you will accidentally form two separate clusters
|
||||
instead of one. This situation can lead to data loss: you might start using both
|
||||
clusters before you notice that anything has gone wrong and it is impossible to
|
||||
merge them together later.
|
||||
|
||||
NOTE: To illustrate the problem with configuring each node to expect a certain
|
||||
cluster size, imagine starting up a three-node cluster in which each node knows
|
||||
that it is going to be part of a three-node cluster. A majority of three nodes
|
||||
is two, so normally the first two nodes to discover each other form a cluster
|
||||
and the third node joins them a short time later. However, imagine that four
|
||||
nodes were erroneously started instead of three. In this case, there are enough
|
||||
nodes to form two separate clusters. Of course if each node is started manually
|
||||
then it's unlikely that too many nodes are started. If you're using an automated
|
||||
orchestrator, however, it's certainly possible to get into this situation--
|
||||
particularly if the orchestrator is not resilient to failures such as network
|
||||
partitions.
|
||||
|
||||
The initial quorum is only required the very first time a whole cluster starts
|
||||
up. New nodes joining an established cluster can safely obtain all the
|
||||
information they need from the elected master. Nodes that have previously been
|
||||
part of a cluster will have stored to disk all the information that is required
|
||||
when they restart.
|
||||
|
||||
[float]
|
||||
==== Master elections
|
||||
|
||||
|
@ -104,92 +64,3 @@ and then started again then it will automatically recover, such as during a
|
|||
action with the APIs described here in these cases, because the set of master
|
||||
nodes is not changing permanently.
|
||||
|
||||
[float]
|
||||
==== Automatic changes to the voting configuration
|
||||
|
||||
Nodes may join or leave the cluster, and Elasticsearch reacts by automatically
|
||||
making corresponding changes to the voting configuration in order to ensure that
|
||||
the cluster is as resilient as possible.
|
||||
|
||||
The default auto-reconfiguration
|
||||
behaviour is expected to give the best results in most situations. The current
|
||||
voting configuration is stored in the cluster state so you can inspect its
|
||||
current contents as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
NOTE: The current voting configuration is not necessarily the same as the set of
|
||||
all available master-eligible nodes in the cluster. Altering the voting
|
||||
configuration involves taking a vote, so it takes some time to adjust the
|
||||
configuration as nodes join or leave the cluster. Also, there are situations
|
||||
where the most resilient configuration includes unavailable nodes, or does not
|
||||
include some available nodes, and in these situations the voting configuration
|
||||
differs from the set of available master-eligible nodes in the cluster.
|
||||
|
||||
Larger voting configurations are usually more resilient, so Elasticsearch
|
||||
normally prefers to add master-eligible nodes to the voting configuration after
|
||||
they join the cluster. Similarly, if a node in the voting configuration
|
||||
leaves the cluster and there is another master-eligible node in the cluster that
|
||||
is not in the voting configuration then it is preferable to swap these two nodes
|
||||
over. The size of the voting configuration is thus unchanged but its
|
||||
resilience increases.
|
||||
|
||||
It is not so straightforward to automatically remove nodes from the voting
|
||||
configuration after they have left the cluster. Different strategies have
|
||||
different benefits and drawbacks, so the right choice depends on how the cluster
|
||||
will be used. You can control whether the voting configuration automatically shrinks by using the following setting:
|
||||
|
||||
`cluster.auto_shrink_voting_configuration`::
|
||||
|
||||
Defaults to `true`, meaning that the voting configuration will automatically
|
||||
shrink, shedding departed nodes, as long as it still contains at least 3
|
||||
nodes. If set to `false`, the voting configuration never automatically
|
||||
shrinks; departed nodes must be removed manually using the
|
||||
<<modules-discovery-adding-removing-nodes,voting configuration exclusions API>>.
|
||||
|
||||
NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true`, the
|
||||
recommended and default setting, and there are at least three master-eligible
|
||||
nodes in the cluster, then Elasticsearch remains capable of processing
|
||||
cluster-state updates as long as all but one of its master-eligible nodes are
|
||||
healthy.
|
||||
|
||||
There are situations in which Elasticsearch might tolerate the loss of multiple
|
||||
nodes, but this is not guaranteed under all sequences of failures. If this
|
||||
setting is set to `false` then departed nodes must be removed from the voting
|
||||
configuration manually, using the
|
||||
<<modules-discovery-adding-removing-nodes,voting exclusions API>>, to achieve
|
||||
the desired level of resilience.
|
||||
|
||||
No matter how it is configured, Elasticsearch will not suffer from a "split-brain" inconsistency.
|
||||
The `cluster.auto_shrink_voting_configuration` setting affects only its availability in the
|
||||
event of the failure of some of its nodes, and the administrative tasks that
|
||||
must be performed as nodes join and leave the cluster.
|
||||
|
||||
[float]
|
||||
==== Even numbers of master-eligible nodes
|
||||
|
||||
There should normally be an odd number of master-eligible nodes in a cluster.
|
||||
If there is an even number, Elasticsearch leaves one of them out of the voting
|
||||
configuration to ensure that it has an odd size. This omission does not decrease
|
||||
the failure-tolerance of the cluster. In fact, improves it slightly: if the
|
||||
cluster suffers from a network partition that divides it into two equally-sized
|
||||
halves then one of the halves will contain a majority of the voting
|
||||
configuration and will be able to keep operating. If all of the master-eligible
|
||||
nodes' votes were counted, neither side would contain a strict majority of the
|
||||
nodes and so the cluster would not be able to make any progress.
|
||||
|
||||
For instance if there are four master-eligible nodes in the cluster and the
|
||||
voting configuration contained all of them, any quorum-based decision would
|
||||
require votes from at least three of them. This situation means that the cluster
|
||||
can tolerate the loss of only a single master-eligible node. If this cluster
|
||||
were split into two equal halves, neither half would contain three
|
||||
master-eligible nodes and the cluster would not be able to make any progress.
|
||||
If the voting configuration contains only three of the four master-eligible
|
||||
nodes, however, the cluster is still only fully tolerant to the loss of one
|
||||
node, but quorum-based decisions require votes from two of the three voting
|
||||
nodes. In the event of an even split, one half will contain two of the three
|
||||
voting nodes so that half will remain available.
|
||||
|
|
|
@ -0,0 +1,140 @@
|
|||
[[modules-discovery-voting]]
|
||||
=== Voting configurations
|
||||
|
||||
Each {es} cluster has a _voting configuration_, which is the set of
|
||||
<<master-node,master-eligible nodes>> whose responses are counted when making
|
||||
decisions such as electing a new master or committing a new cluster state.
|
||||
Decisions are made only after a majority (more than half) of the nodes in the
|
||||
voting configuration respond.
|
||||
|
||||
Usually the voting configuration is the same as the set of all the
|
||||
master-eligible nodes that are currently in the cluster. However, there are some
|
||||
situations in which they may be different.
|
||||
|
||||
IMPORTANT: To ensure the cluster remains available, you **must not stop half or
|
||||
more of the nodes in the voting configuration at the same time**. As long as more
|
||||
than half of the voting nodes are available, the cluster can work normally. For
|
||||
example, if there are three or four master-eligible nodes, the cluster
|
||||
can tolerate one unavailable node. If there are two or fewer master-eligible
|
||||
nodes, they must all remain available.
|
||||
|
||||
After a node joins or leaves the cluster, {es} reacts by automatically making
|
||||
corresponding changes to the voting configuration in order to ensure that the
|
||||
cluster is as resilient as possible. It is important to wait for this adjustment
|
||||
to complete before you remove more nodes from the cluster. For more information,
|
||||
see <<modules-discovery-adding-removing-nodes>>.
|
||||
|
||||
The current voting configuration is stored in the cluster state so you can
|
||||
inspect its current contents as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
NOTE: The current voting configuration is not necessarily the same as the set of
|
||||
all available master-eligible nodes in the cluster. Altering the voting
|
||||
configuration involves taking a vote, so it takes some time to adjust the
|
||||
configuration as nodes join or leave the cluster. Also, there are situations
|
||||
where the most resilient configuration includes unavailable nodes or does not
|
||||
include some available nodes. In these situations, the voting configuration
|
||||
differs from the set of available master-eligible nodes in the cluster.
|
||||
|
||||
Larger voting configurations are usually more resilient, so Elasticsearch
|
||||
normally prefers to add master-eligible nodes to the voting configuration after
|
||||
they join the cluster. Similarly, if a node in the voting configuration
|
||||
leaves the cluster and there is another master-eligible node in the cluster that
|
||||
is not in the voting configuration then it is preferable to swap these two nodes
|
||||
over. The size of the voting configuration is thus unchanged but its
|
||||
resilience increases.
|
||||
|
||||
It is not so straightforward to automatically remove nodes from the voting
|
||||
configuration after they have left the cluster. Different strategies have
|
||||
different benefits and drawbacks, so the right choice depends on how the cluster
|
||||
will be used. You can control whether the voting configuration automatically
|
||||
shrinks by using the
|
||||
<<modules-discovery-settings,`cluster.auto_shrink_voting_configuration` setting>>.
|
||||
|
||||
NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true` (which is
|
||||
the default and recommended value) and there are at least three master-eligible
|
||||
nodes in the cluster, Elasticsearch remains capable of processing cluster state
|
||||
updates as long as all but one of its master-eligible nodes are healthy.
|
||||
|
||||
There are situations in which Elasticsearch might tolerate the loss of multiple
|
||||
nodes, but this is not guaranteed under all sequences of failures. If the
|
||||
`cluster.auto_shrink_voting_configuration` setting is `false`, you must remove
|
||||
departed nodes from the voting configuration manually. Use the
|
||||
<<voting-config-exclusions,voting exclusions API>> to achieve the desired level
|
||||
of resilience.
|
||||
|
||||
No matter how it is configured, Elasticsearch will not suffer from a
|
||||
"split-brain" inconsistency. The `cluster.auto_shrink_voting_configuration`
|
||||
setting affects only its availability in the event of the failure of some of its
|
||||
nodes and the administrative tasks that must be performed as nodes join and
|
||||
leave the cluster.
|
||||
|
||||
[float]
|
||||
==== Even numbers of master-eligible nodes
|
||||
|
||||
There should normally be an odd number of master-eligible nodes in a cluster.
|
||||
If there is an even number, Elasticsearch leaves one of them out of the voting
|
||||
configuration to ensure that it has an odd size. This omission does not decrease
|
||||
the failure-tolerance of the cluster. In fact, improves it slightly: if the
|
||||
cluster suffers from a network partition that divides it into two equally-sized
|
||||
halves then one of the halves will contain a majority of the voting
|
||||
configuration and will be able to keep operating. If all of the votes from
|
||||
master-eligible nodes were counted, neither side would contain a strict majority
|
||||
of the nodes and so the cluster would not be able to make any progress.
|
||||
|
||||
For instance if there are four master-eligible nodes in the cluster and the
|
||||
voting configuration contained all of them, any quorum-based decision would
|
||||
require votes from at least three of them. This situation means that the cluster
|
||||
can tolerate the loss of only a single master-eligible node. If this cluster
|
||||
were split into two equal halves, neither half would contain three
|
||||
master-eligible nodes and the cluster would not be able to make any progress.
|
||||
If the voting configuration contains only three of the four master-eligible
|
||||
nodes, however, the cluster is still only fully tolerant to the loss of one
|
||||
node, but quorum-based decisions require votes from two of the three voting
|
||||
nodes. In the event of an even split, one half will contain two of the three
|
||||
voting nodes so that half will remain available.
|
||||
|
||||
[float]
|
||||
==== Setting the initial voting configuration
|
||||
|
||||
When a brand-new cluster starts up for the first time, it must elect its first
|
||||
master node. To do this election, it needs to know the set of master-eligible
|
||||
nodes whose votes should count. This initial voting configuration is known as
|
||||
the _bootstrap configuration_ and is set in the
|
||||
<<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.
|
||||
|
||||
It is important that the bootstrap configuration identifies exactly which nodes
|
||||
should vote in the first election. It is not sufficient to configure each node
|
||||
with an expectation of how many nodes there should be in the cluster. It is also
|
||||
important to note that the bootstrap configuration must come from outside the
|
||||
cluster: there is no safe way for the cluster to determine the bootstrap
|
||||
configuration correctly on its own.
|
||||
|
||||
If the bootstrap configuration is not set correctly, when you start a brand-new
|
||||
cluster there is a risk that you will accidentally form two separate clusters
|
||||
instead of one. This situation can lead to data loss: you might start using both
|
||||
clusters before you notice that anything has gone wrong and it is impossible to
|
||||
merge them together later.
|
||||
|
||||
NOTE: To illustrate the problem with configuring each node to expect a certain
|
||||
cluster size, imagine starting up a three-node cluster in which each node knows
|
||||
that it is going to be part of a three-node cluster. A majority of three nodes
|
||||
is two, so normally the first two nodes to discover each other form a cluster
|
||||
and the third node joins them a short time later. However, imagine that four
|
||||
nodes were erroneously started instead of three. In this case, there are enough
|
||||
nodes to form two separate clusters. Of course if each node is started manually
|
||||
then it's unlikely that too many nodes are started. If you're using an automated
|
||||
orchestrator, however, it's certainly possible to get into this situation--
|
||||
particularly if the orchestrator is not resilient to failures such as network
|
||||
partitions.
|
||||
|
||||
The initial quorum is only required the very first time a whole cluster starts
|
||||
up. New nodes joining an established cluster can safely obtain all the
|
||||
information they need from the elected master. Nodes that have previously been
|
||||
part of a cluster will have stored to disk all the information that is required
|
||||
when they restart.
|
Loading…
Reference in New Issue