[DOCS] Merges list of discovery and cluster formation settings (#36909)
This commit is contained in:
parent
c8a8391dfa
commit
33e9cf3892
|
@ -40,22 +40,15 @@ module. This module is divided into the following sections:
|
|||
Cluster state publishing is the process by which the elected master node
|
||||
updates the cluster state on all the other nodes in the cluster.
|
||||
|
||||
<<no-master-block>>::
|
||||
|
||||
The no-master block is put in place when there is no known elected master,
|
||||
and can be configured to determine which operations should be rejected when
|
||||
it is in place.
|
||||
|
||||
Advanced settings::
|
||||
|
||||
There are settings that allow advanced users to influence the
|
||||
<<master-election-settings,master election>> and
|
||||
<<fault-detection-settings,fault detection>> processes.
|
||||
|
||||
<<modules-discovery-quorums>>::
|
||||
|
||||
This section describes the detailed design behind the master election and
|
||||
auto-reconfiguration logic.
|
||||
|
||||
<<modules-discovery-settings,Settings>>::
|
||||
|
||||
There are settings that enable users to influence the discovery, cluster
|
||||
formation, master election and fault detection processes.
|
||||
|
||||
include::discovery/discovery.asciidoc[]
|
||||
|
||||
|
@ -65,11 +58,8 @@ include::discovery/adding-removing-nodes.asciidoc[]
|
|||
|
||||
include::discovery/publishing.asciidoc[]
|
||||
|
||||
include::discovery/no-master-block.asciidoc[]
|
||||
|
||||
include::discovery/master-election.asciidoc[]
|
||||
include::discovery/quorums.asciidoc[]
|
||||
|
||||
include::discovery/fault-detection.asciidoc[]
|
||||
|
||||
include::discovery/quorums.asciidoc[]
|
||||
|
||||
include::discovery/discovery-settings.asciidoc[]
|
|
@ -14,9 +14,15 @@ desirable to add or remove some master-eligible nodes to or from a cluster.
|
|||
|
||||
==== Adding master-eligible nodes
|
||||
|
||||
If you wish to add some master-eligible nodes to your cluster, simply configure
|
||||
the new nodes to find the existing cluster and start them up. Elasticsearch will
|
||||
add the new nodes to the voting configuration if it is appropriate to do so.
|
||||
If you wish to add some nodes to your cluster, simply configure the new nodes
|
||||
to find the existing cluster and start them up. Elasticsearch adds the new nodes
|
||||
to the voting configuration if it is appropriate to do so.
|
||||
|
||||
During master election or when joining an existing formed cluster, a node
|
||||
sends a join request to the master in order to be officially added to the
|
||||
cluster. You can use the `cluster.join.timeout` setting to configure how long a
|
||||
node waits after sending a request to join a cluster. Its default value is `30s`.
|
||||
See <<modules-discovery-settings>>.
|
||||
|
||||
==== Removing master-eligible nodes
|
||||
|
||||
|
@ -93,18 +99,13 @@ GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_excl
|
|||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
This list is limited in size by the following setting:
|
||||
|
||||
`cluster.max_voting_config_exclusions`::
|
||||
|
||||
Sets a limits on the number of voting configuration exclusions at any one
|
||||
time. Defaults to `10`.
|
||||
|
||||
Since voting configuration exclusions are persistent and limited in number, they
|
||||
must be cleaned up. Normally an exclusion is added when performing some
|
||||
maintenance on the cluster, and the exclusions should be cleaned up when the
|
||||
maintenance is complete. Clusters should have no voting configuration exclusions
|
||||
in normal operation.
|
||||
This list is limited in size by the `cluster.max_voting_config_exclusions`
|
||||
setting, which defaults to `10`. See <<modules-discovery-settings>>. Since
|
||||
voting configuration exclusions are persistent and limited in number, they must
|
||||
be cleaned up. Normally an exclusion is added when performing some maintenance
|
||||
on the cluster, and the exclusions should be cleaned up when the maintenance is
|
||||
complete. Clusters should have no voting configuration exclusions in normal
|
||||
operation.
|
||||
|
||||
If a node is excluded from the voting configuration because it is to be shut
|
||||
down permanently, its exclusion can be removed after it is shut down and removed
|
||||
|
|
|
@ -7,19 +7,13 @@ more of the master-eligible nodes in the cluster. This is known as _cluster
|
|||
bootstrapping_. This is only required the very first time the cluster starts
|
||||
up: nodes that have already joined a cluster store this information in their
|
||||
data folder and freshly-started nodes that are joining an existing cluster
|
||||
obtain this information from the cluster's elected master. This information is
|
||||
given using this setting:
|
||||
obtain this information from the cluster's elected master.
|
||||
|
||||
`cluster.initial_master_nodes`::
|
||||
|
||||
Sets a list of the <<node.name,node names>> or transport addresses of the
|
||||
initial set of master-eligible nodes in a brand-new cluster. By default
|
||||
this list is empty, meaning that this node expects to join a cluster that
|
||||
has already been bootstrapped.
|
||||
|
||||
This setting can be given on the command line or in the `elasticsearch.yml`
|
||||
configuration file when starting up a master-eligible node. Once the cluster
|
||||
has formed this setting is no longer required and is ignored. It need not be set
|
||||
The initial set of master-eligible nodes is defined in the
|
||||
<<initial_master_nodes,`cluster.initial_master_nodes` setting>>. When you
|
||||
start a master-eligible node, you can provide this setting on the command line
|
||||
or in the `elasticsearch.yml` file. After the cluster has formed, this setting
|
||||
is no longer required and is ignored. It need not be set
|
||||
on master-ineligible nodes, nor on master-eligible nodes that are started to
|
||||
join an existing cluster. Note that master-eligible nodes should use storage
|
||||
that persists across restarts. If they do not, and
|
||||
|
|
|
@ -0,0 +1,160 @@
|
|||
[[modules-discovery-settings]]
|
||||
=== Discovery and cluster formation settings
|
||||
|
||||
Discovery and cluster formation are affected by the following settings:
|
||||
|
||||
[[master-election-settings]]`cluster.election.back_off_time`::
|
||||
|
||||
Sets the amount to increase the upper bound on the wait before an election
|
||||
on each election failure. Note that this is _linear_ backoff. This defaults
|
||||
to `100ms`
|
||||
|
||||
`cluster.election.duration`::
|
||||
|
||||
Sets how long each election is allowed to take before a node considers it to
|
||||
have failed and schedules a retry. This defaults to `500ms`.
|
||||
|
||||
`cluster.election.initial_timeout`::
|
||||
|
||||
Sets the upper bound on how long a node will wait initially, or after the
|
||||
elected master fails, before attempting its first election. This defaults
|
||||
to `100ms`.
|
||||
|
||||
|
||||
`cluster.election.max_timeout`::
|
||||
|
||||
Sets the maximum upper bound on how long a node will wait before attempting
|
||||
an first election, so that an network partition that lasts for a long time
|
||||
does not result in excessively sparse elections. This defaults to `10s`
|
||||
|
||||
[[fault-detection-settings]]`cluster.fault_detection.follower_check.interval`::
|
||||
|
||||
Sets how long the elected master waits between follower checks to each
|
||||
other node in the cluster. Defaults to `1s`.
|
||||
|
||||
`cluster.fault_detection.follower_check.timeout`::
|
||||
|
||||
Sets how long the elected master waits for a response to a follower check
|
||||
before considering it to have failed. Defaults to `30s`.
|
||||
|
||||
`cluster.fault_detection.follower_check.retry_count`::
|
||||
|
||||
Sets how many consecutive follower check failures must occur to each node
|
||||
before the elected master considers that node to be faulty and removes it
|
||||
from the cluster. Defaults to `3`.
|
||||
|
||||
`cluster.fault_detection.leader_check.interval`::
|
||||
|
||||
Sets how long each node waits between checks of the elected master.
|
||||
Defaults to `1s`.
|
||||
|
||||
`cluster.fault_detection.leader_check.timeout`::
|
||||
|
||||
Sets how long each node waits for a response to a leader check from the
|
||||
elected master before considering it to have failed. Defaults to `30s`.
|
||||
|
||||
`cluster.fault_detection.leader_check.retry_count`::
|
||||
|
||||
Sets how many consecutive leader check failures must occur before a node
|
||||
considers the elected master to be faulty and attempts to find or elect a
|
||||
new master. Defaults to `3`.
|
||||
|
||||
`cluster.follower_lag.timeout`::
|
||||
|
||||
Sets how long the master node waits to receive acknowledgements for cluster
|
||||
state updates from lagging nodes. The default value is `90s`. If a node does
|
||||
not successfully apply the cluster state update within this period of time,
|
||||
it is considered to have failed and is removed from the cluster. See
|
||||
<<cluster-state-publishing>>.
|
||||
|
||||
`cluster.initial_master_nodes`::
|
||||
|
||||
Sets a list of the <<node.name,node names>> or transport addresses of the
|
||||
initial set of master-eligible nodes in a brand-new cluster. By default
|
||||
this list is empty, meaning that this node expects to join a cluster that
|
||||
has already been bootstrapped. See <<initial_master_nodes>>.
|
||||
|
||||
`cluster.join.timeout`::
|
||||
|
||||
Sets how long a node will wait after sending a request to join a cluster
|
||||
before it considers the request to have failed and retries. Defaults to
|
||||
`60s`.
|
||||
|
||||
`cluster.max_voting_config_exclusions`::
|
||||
|
||||
Sets a limit on the number of voting configuration exclusions at any one
|
||||
time. The default value is `10`. See
|
||||
<<modules-discovery-adding-removing-nodes>>.
|
||||
|
||||
`cluster.publish.timeout`::
|
||||
|
||||
Sets how long the master node waits for each cluster state update to be
|
||||
completely published to all nodes. The default value is `30s`. If this
|
||||
period of time elapses, the cluster state change is rejected. See
|
||||
<<cluster-state-publishing>>.
|
||||
|
||||
`discovery.cluster_formation_warning_timeout`::
|
||||
|
||||
Sets how long a node will try to form a cluster before logging a warning
|
||||
that the cluster did not form. Defaults to `10s`. If a cluster has not
|
||||
formed after `discovery.cluster_formation_warning_timeout` has elapsed then
|
||||
the node will log a warning message that starts with the phrase `master not discovered` which describes the current state of the discovery process.
|
||||
|
||||
`discovery.find_peers_interval`::
|
||||
|
||||
Sets how long a node will wait before attempting another discovery round.
|
||||
Defaults to `1s`.
|
||||
|
||||
`discovery.probe.connect_timeout`::
|
||||
|
||||
Sets how long to wait when attempting to connect to each address. Defaults
|
||||
to `3s`.
|
||||
|
||||
`discovery.probe.handshake_timeout`::
|
||||
|
||||
Sets how long to wait when attempting to identify the remote node via a
|
||||
handshake. Defaults to `1s`.
|
||||
|
||||
`discovery.request_peers_timeout`::
|
||||
Sets how long a node will wait after asking its peers again before
|
||||
considering the request to have failed. Defaults to `3s`.
|
||||
|
||||
`discovery.zen.hosts_provider`::
|
||||
Specifies which type of <<built-in-hosts-providers,hosts provider>> provides
|
||||
the list of seed nodes. By default, it is the
|
||||
<<settings-based-hosts-provider,settings-based hosts provider>>.
|
||||
|
||||
[[no-master-block]]`discovery.zen.no_master_block`::
|
||||
Specifies which operations are rejected when there is no active master in a
|
||||
cluster. This setting has two valid values:
|
||||
+
|
||||
--
|
||||
`all`::: All operations on the node (both read and write operations) are rejected.
|
||||
This also applies for API cluster state read or write operations, like the get
|
||||
index settings, put mapping and cluster state API.
|
||||
|
||||
`write`::: (default) Write operations are rejected. Read operations succeed,
|
||||
based on the last known cluster configuration. This situation may result in
|
||||
partial reads of stale data as this node may be isolated from the rest of the
|
||||
cluster.
|
||||
|
||||
[NOTE]
|
||||
===============================
|
||||
* The `discovery.zen.no_master_block` setting doesn't apply to nodes-based APIs
|
||||
(for example, cluster stats, node info, and node stats APIs). Requests to these
|
||||
APIs are not be blocked and can run on any available node.
|
||||
|
||||
* For the cluster to be fully operational, it must have an active master.
|
||||
===============================
|
||||
--
|
||||
|
||||
`discovery.zen.ping.unicast.hosts`::
|
||||
|
||||
Provides a list of master-eligible nodes in the cluster. The list contains
|
||||
either an array of hosts or a comma-delimited string. Each value has the
|
||||
format `host:port` or `host`, where `port` defaults to the setting `transport.profiles.default.port`. Note that IPv6 hosts must be bracketed.
|
||||
The default value is `127.0.0.1, [::1]`. See <<unicast.hosts>>.
|
||||
|
||||
`discovery.zen.ping.unicast.hosts.resolve_timeout`::
|
||||
|
||||
Sets the amount of time to wait for DNS lookups on each round of discovery. This is specified as a <<time-units, time unit>> and defaults to `5s`.
|
|
@ -82,9 +82,10 @@ gives a convenient mechanism for an Elasticsearch instance that is run in a
|
|||
Docker container to be dynamically supplied with a list of IP addresses to
|
||||
connect to when those IP addresses may not be known at node startup.
|
||||
|
||||
To enable file-based discovery, configure the `file` hosts provider as follows:
|
||||
To enable file-based discovery, configure the `file` hosts provider as follows
|
||||
in the `elasticsearch.yml` file:
|
||||
|
||||
[source,txt]
|
||||
[source,yml]
|
||||
----------------------------------------------------------------
|
||||
discovery.zen.hosts_provider: file
|
||||
----------------------------------------------------------------
|
||||
|
@ -150,39 +151,3 @@ a hosts provider that uses the Azure Classic API find a list of seed nodes.
|
|||
|
||||
The {plugins}/discovery-gce.html[GCE discovery plugin] adds a hosts provider
|
||||
that uses the GCE API find a list of seed nodes.
|
||||
|
||||
[float]
|
||||
==== Discovery settings
|
||||
|
||||
The discovery process is controlled by the following settings.
|
||||
|
||||
`discovery.find_peers_interval`::
|
||||
|
||||
Sets how long a node will wait before attempting another discovery round.
|
||||
Defaults to `1s`.
|
||||
|
||||
`discovery.request_peers_timeout`::
|
||||
|
||||
Sets how long a node will wait after asking its peers again before
|
||||
considering the request to have failed. Defaults to `3s`.
|
||||
|
||||
`discovery.probe.connect_timeout`::
|
||||
|
||||
Sets how long to wait when attempting to connect to each address. Defaults
|
||||
to `3s`.
|
||||
|
||||
`discovery.probe.handshake_timeout`::
|
||||
|
||||
Sets how long to wait when attempting to identify the remote node via a
|
||||
handshake. Defaults to `1s`.
|
||||
|
||||
`discovery.cluster_formation_warning_timeout`::
|
||||
|
||||
Sets how long a node will try to form a cluster before logging a warning
|
||||
that the cluster did not form. Defaults to `10s`.
|
||||
|
||||
If a cluster has not formed after `discovery.cluster_formation_warning_timeout`
|
||||
has elapsed then the node will log a warning message that starts with the phrase
|
||||
`master not discovered` which describes the current state of the discovery
|
||||
process.
|
||||
|
||||
|
|
|
@ -1,52 +1,19 @@
|
|||
[[fault-detection-settings]]
|
||||
=== Cluster fault detection settings
|
||||
[[cluster-fault-detection]]
|
||||
=== Cluster fault detection
|
||||
|
||||
An elected master periodically checks each of the nodes in the cluster in order
|
||||
to ensure that they are still connected and healthy, and in turn each node in
|
||||
the cluster periodically checks the health of the elected master. These checks
|
||||
The elected master periodically checks each of the nodes in the cluster to
|
||||
ensure that they are still connected and healthy. Each node in the cluster also periodically checks the health of the elected master. These checks
|
||||
are known respectively as _follower checks_ and _leader checks_.
|
||||
|
||||
Elasticsearch allows for these checks occasionally to fail or timeout without
|
||||
taking any action, and will only consider a node to be truly faulty after a
|
||||
number of consecutive checks have failed. The following settings control the
|
||||
behaviour of fault detection.
|
||||
|
||||
`cluster.fault_detection.follower_check.interval`::
|
||||
|
||||
Sets how long the elected master waits between follower checks to each
|
||||
other node in the cluster. Defaults to `1s`.
|
||||
|
||||
`cluster.fault_detection.follower_check.timeout`::
|
||||
|
||||
Sets how long the elected master waits for a response to a follower check
|
||||
before considering it to have failed. Defaults to `30s`.
|
||||
|
||||
`cluster.fault_detection.follower_check.retry_count`::
|
||||
|
||||
Sets how many consecutive follower check failures must occur to each node
|
||||
before the elected master considers that node to be faulty and removes it
|
||||
from the cluster. Defaults to `3`.
|
||||
|
||||
`cluster.fault_detection.leader_check.interval`::
|
||||
|
||||
Sets how long each node waits between checks of the elected master.
|
||||
Defaults to `1s`.
|
||||
|
||||
`cluster.fault_detection.leader_check.timeout`::
|
||||
|
||||
Sets how long each node waits for a response to a leader check from the
|
||||
elected master before considering it to have failed. Defaults to `30s`.
|
||||
|
||||
`cluster.fault_detection.leader_check.retry_count`::
|
||||
|
||||
Sets how many consecutive leader check failures must occur before a node
|
||||
considers the elected master to be faulty and attempts to find or elect a
|
||||
new master. Defaults to `3`.
|
||||
|
||||
If the elected master detects that a node has disconnected then this is treated
|
||||
as an immediate failure, bypassing the timeouts and retries listed above, and
|
||||
the master attempts to remove the node from the cluster. Similarly, if a node
|
||||
detects that the elected master has disconnected then this is treated as an
|
||||
immediate failure, bypassing the timeouts and retries listed above, and the
|
||||
follower restarts its discovery phase to try and find or elect a new master.
|
||||
Elasticsearch allows these checks to occasionally fail or timeout without
|
||||
taking any action. It considers a node to be faulty only after a number of
|
||||
consecutive checks have failed. You can control fault detection behavior with
|
||||
<<modules-discovery-settings,`cluster.fault_detection.*` settings>>.
|
||||
|
||||
If the elected master detects that a node has disconnected, however, this
|
||||
situation is treated as an immediate failure. The master bypasses the timeout
|
||||
and retry setting values and attempts to remove the node from the cluster.
|
||||
Similarly, if a node detects that the elected master has disconnected, this
|
||||
situation is treated as an immediate failure. The node bypasses the timeout and
|
||||
retry settings and restarts its discovery phase to try and find or elect a new
|
||||
master.
|
|
@ -1,40 +0,0 @@
|
|||
[[master-election-settings]]
|
||||
=== Master election settings
|
||||
|
||||
The following settings control the scheduling of elections.
|
||||
|
||||
`cluster.election.initial_timeout`::
|
||||
|
||||
Sets the upper bound on how long a node will wait initially, or after the
|
||||
elected master fails, before attempting its first election. This defaults
|
||||
to `100ms`.
|
||||
|
||||
`cluster.election.back_off_time`::
|
||||
|
||||
Sets the amount to increase the upper bound on the wait before an election
|
||||
on each election failure. Note that this is _linear_ backoff. This defaults
|
||||
to `100ms`
|
||||
|
||||
`cluster.election.max_timeout`::
|
||||
|
||||
Sets the maximum upper bound on how long a node will wait before attempting
|
||||
an first election, so that an network partition that lasts for a long time
|
||||
does not result in excessively sparse elections. This defaults to `10s`
|
||||
|
||||
`cluster.election.duration`::
|
||||
|
||||
Sets how long each election is allowed to take before a node considers it to
|
||||
have failed and schedules a retry. This defaults to `500ms`.
|
||||
|
||||
[float]
|
||||
==== Joining an elected master
|
||||
|
||||
During master election, or when joining an existing formed cluster, a node will
|
||||
send a join request to the master in order to be officially added to the
|
||||
cluster. This join process can be configured with the following settings.
|
||||
|
||||
`cluster.join.timeout`::
|
||||
|
||||
Sets how long a node will wait after sending a request to join a cluster
|
||||
before it considers the request to have failed and retries. Defaults to
|
||||
`60s`.
|
|
@ -1,22 +0,0 @@
|
|||
[[no-master-block]]
|
||||
=== No master block settings
|
||||
|
||||
For the cluster to be fully operational, it must have an active master. The
|
||||
`discovery.zen.no_master_block` settings controls what operations should be
|
||||
rejected when there is no active master.
|
||||
|
||||
The `discovery.zen.no_master_block` setting has two valid values:
|
||||
|
||||
[horizontal]
|
||||
`all`:: All operations on the node--i.e. both read & writes--will be rejected.
|
||||
This also applies for api cluster state read or write operations, like the get
|
||||
index settings, put mapping and cluster state api.
|
||||
`write`:: (default) Write operations will be rejected. Read operations will
|
||||
succeed, based on the last known cluster configuration. This may result in
|
||||
partial reads of stale data as this node may be isolated from the rest of the
|
||||
cluster.
|
||||
|
||||
The `discovery.zen.no_master_block` setting doesn't apply to nodes-based APIs
|
||||
(for example cluster stats, node info, and node stats APIs). Requests to these
|
||||
APIs will not be blocked and can run on any available node.
|
||||
|
|
@ -109,7 +109,9 @@ nodes is not changing permanently.
|
|||
|
||||
Nodes may join or leave the cluster, and Elasticsearch reacts by automatically
|
||||
making corresponding changes to the voting configuration in order to ensure that
|
||||
the cluster is as resilient as possible. The default auto-reconfiguration
|
||||
the cluster is as resilient as possible.
|
||||
|
||||
The default auto-reconfiguration
|
||||
behaviour is expected to give the best results in most situations. The current
|
||||
voting configuration is stored in the cluster state so you can inspect its
|
||||
current contents as follows:
|
||||
|
|
|
@ -1,5 +1,8 @@
|
|||
[[discovery-settings]]
|
||||
=== Discovery and cluster formation settings
|
||||
=== Important discovery and cluster formation settings
|
||||
++++
|
||||
<titleabbrev>Discovery and cluster formation settings</titleabbrev>
|
||||
++++
|
||||
|
||||
There are two important discovery and cluster formation settings that should be
|
||||
configured before going to production so that nodes in the cluster can discover
|
||||
|
@ -58,3 +61,5 @@ cluster.initial_master_nodes:
|
|||
<4> Initial master nodes can also be identified by their IP address.
|
||||
<5> If multiple master nodes share an IP address then the port must be used to
|
||||
disambiguate them.
|
||||
|
||||
For more information, see <<modules-discovery-settings>>.
|
Loading…
Reference in New Issue