[Zen2] Update documentation for Zen2 (#34714)

This commit overhauls the documentation of discovery and cluster coordination,
removing mention of the Zen Discovery module and replacing it with docs for the
new cluster coordination mechanism introduced in 7.0.

Relates #32006
This commit is contained in:
David Turner 2018-12-20 13:02:44 +00:00 committed by GitHub
parent 08bcd83757
commit 1a23417aeb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
27 changed files with 986 additions and 415 deletions

View File

@ -1,8 +1,8 @@
[[discovery]]
== Discovery Plugins
Discovery plugins extend Elasticsearch by adding new discovery mechanisms that
can be used instead of {ref}/modules-discovery-zen.html[Zen Discovery].
Discovery plugins extend Elasticsearch by adding new hosts providers that can be
used to extend the {ref}/modules-discovery.html[cluster formation module].
[float]
==== Core discovery plugins
@ -11,22 +11,24 @@ The core discovery plugins are:
<<discovery-ec2,EC2 discovery>>::
The EC2 discovery plugin uses the https://github.com/aws/aws-sdk-java[AWS API] for unicast discovery.
The EC2 discovery plugin uses the https://github.com/aws/aws-sdk-java[AWS API]
for unicast discovery.
<<discovery-azure-classic,Azure Classic discovery>>::
The Azure Classic discovery plugin uses the Azure Classic API for unicast discovery.
The Azure Classic discovery plugin uses the Azure Classic API for unicast
discovery.
<<discovery-gce,GCE discovery>>::
The Google Compute Engine discovery plugin uses the GCE API for unicast discovery.
The Google Compute Engine discovery plugin uses the GCE API for unicast
discovery.
[float]
==== Community contributed discovery plugins
A number of discovery plugins have been contributed by our community:
* https://github.com/shikhar/eskka[eskka Discovery Plugin] (by Shikhar Bhushan)
* https://github.com/fabric8io/elasticsearch-cloud-kubernetes[Kubernetes Discovery Plugin] (by Jimmi Dyson, http://fabric8.io[fabric8])
include::discovery-ec2.asciidoc[]

View File

@ -11,6 +11,7 @@ See also <<release-highlights>> and <<es-release-notes>>.
* <<breaking_70_aggregations_changes>>
* <<breaking_70_cluster_changes>>
* <<breaking_70_discovery_changes>>
* <<breaking_70_indices_changes>>
* <<breaking_70_mappings_changes>>
* <<breaking_70_search_changes>>
@ -44,6 +45,7 @@ Elasticsearch 6.x in order to be readable by Elasticsearch 7.x.
include::migrate_7_0/aggregations.asciidoc[]
include::migrate_7_0/analysis.asciidoc[]
include::migrate_7_0/cluster.asciidoc[]
include::migrate_7_0/discovery.asciidoc[]
include::migrate_7_0/indices.asciidoc[]
include::migrate_7_0/mappings.asciidoc[]
include::migrate_7_0/search.asciidoc[]

View File

@ -25,12 +25,3 @@ Clusters now have soft limits on the total number of open shards in the cluster
based on the number of nodes and the `cluster.max_shards_per_node` cluster
setting, to prevent accidental operations that would destabilize the cluster.
More information can be found in the <<misc-cluster,documentation for that setting>>.
[float]
==== Discovery configuration is required in production
Production deployments of Elasticsearch now require at least one of the following settings
to be specified in the `elasticsearch.yml` configuration file:
- `discovery.zen.ping.unicast.hosts`
- `discovery.zen.hosts_provider`
- `cluster.initial_master_nodes`

View File

@ -0,0 +1,40 @@
[float]
[[breaking_70_discovery_changes]]
=== Discovery changes
[float]
==== Cluster bootstrapping is required if discovery is configured
The first time a cluster is started, `cluster.initial_master_nodes` must be set
to perform cluster bootstrapping. It should contain the names of the
master-eligible nodes in the initial cluster and be defined on every
master-eligible node in the cluster. See <<discovery-settings,the discovery
settings summary>> for an example, and the
<<modules-discovery-bootstrap-cluster,cluster bootstrapping reference
documentation>> describes this setting in more detail.
The `discovery.zen.minimum_master_nodes` setting is required during a rolling
upgrade from 6.x, but can be removed in all other circumstances.
[float]
==== Removing master-eligible nodes sometimes requires voting exclusions
If you wish to remove half or more of the master-eligible nodes from a cluster,
you must first exclude the affected nodes from the voting configuration using
the <<modules-discovery-adding-removing-nodes,voting config exclusions API>>.
If you remove fewer than half of the master-eligible nodes at the same time,
voting exclusions are not required. If you remove only master-ineligible nodes
such as data-only nodes or coordinating-only nodes, voting exclusions are not
required. Likewise, if you add nodes to the cluster, voting exclusions are not
required.
[float]
==== Discovery configuration is required in production
Production deployments of Elasticsearch now require at least one of the
following settings to be specified in the `elasticsearch.yml` configuration
file:
- `discovery.zen.ping.unicast.hosts`
- `discovery.zen.hosts_provider`
- `cluster.initial_master_nodes`

View File

@ -18,14 +18,14 @@ These settings can be dynamically updated on a live cluster with the
The modules in this section are:
<<modules-cluster,Cluster-level routing and shard allocation>>::
<<modules-discovery,Discovery and cluster formation>>::
How nodes discover each other, elect a master and form a cluster.
<<modules-cluster,Shard allocation and cluster-level routing>>::
Settings to control where, when, and how shards are allocated to nodes.
<<modules-discovery,Discovery>>::
How nodes discover each other to form a cluster.
<<modules-gateway,Gateway>>::
How many nodes need to join the cluster before recovery can start.
@ -85,10 +85,10 @@ The modules in this section are:
--
include::modules/cluster.asciidoc[]
include::modules/discovery.asciidoc[]
include::modules/cluster.asciidoc[]
include::modules/gateway.asciidoc[]
include::modules/http.asciidoc[]

View File

@ -1,5 +1,5 @@
[[modules-cluster]]
== Cluster
== Shard allocation and cluster-level routing
One of the main roles of the master is to decide which shards to allocate to
which nodes, and when to move shards between nodes in order to rebalance the

View File

@ -1,30 +1,75 @@
[[modules-discovery]]
== Discovery
== Discovery and cluster formation
The discovery module is responsible for discovering nodes within a
cluster, as well as electing a master node.
The discovery and cluster formation module is responsible for discovering
nodes, electing a master, forming a cluster, and publishing the cluster state
each time it changes. It is integrated with other modules. For example, all
communication between nodes is done using the <<modules-transport,transport>>
module. This module is divided into the following sections:
Note, Elasticsearch is a peer to peer based system, nodes communicate
with one another directly if operations are delegated / broadcast. All
the main APIs (index, delete, search) do not communicate with the master
node. The responsibility of the master node is to maintain the global
cluster state, and act if nodes join or leave the cluster by reassigning
shards. Each time a cluster state is changed, the state is made known to
the other nodes in the cluster (the manner depends on the actual
discovery implementation).
<<modules-discovery-hosts-providers>>::
[float]
=== Settings
Discovery is the process where nodes find each other when the master is
unknown, such as when a node has just started up or when the previous
master has failed.
The `cluster.name` allows to create separated clusters from one another.
The default value for the cluster name is `elasticsearch`, though it is
recommended to change this to reflect the logical group name of the
cluster running.
<<modules-discovery-bootstrap-cluster>>::
include::discovery/azure.asciidoc[]
Bootstrapping a cluster is required when an Elasticsearch cluster starts up
for the very first time. In <<dev-vs-prod-mode,development mode>>, with no
discovery settings configured, this is automatically performed by the nodes
themselves. As this auto-bootstrapping is
<<modules-discovery-quorums,inherently unsafe>>, running a node in
<<dev-vs-prod-mode,production mode>> requires bootstrapping to be
explicitly configured via the
<<modules-discovery-bootstrap-cluster,`cluster.initial_master_nodes`
setting>>.
include::discovery/ec2.asciidoc[]
<<modules-discovery-adding-removing-nodes,Adding and removing master-eligible nodes>>::
include::discovery/gce.asciidoc[]
It is recommended to have a small and fixed number of master-eligible nodes
in a cluster, and to scale the cluster up and down by adding and removing
master-ineligible nodes only. However there are situations in which it may
be desirable to add or remove some master-eligible nodes to or from a
cluster. This section describes the process for adding or removing
master-eligible nodes, including the extra steps that need to be performed
when removing more than half of the master-eligible nodes at the same time.
<<cluster-state-publishing>>::
Cluster state publishing is the process by which the elected master node
updates the cluster state on all the other nodes in the cluster.
<<no-master-block>>::
The no-master block is put in place when there is no known elected master,
and can be configured to determine which operations should be rejected when
it is in place.
Advanced settings::
There are settings that allow advanced users to influence the
<<master-election-settings,master election>> and
<<fault-detection-settings,fault detection>> processes.
<<modules-discovery-quorums>>::
This section describes the detailed design behind the master election and
auto-reconfiguration logic.
include::discovery/discovery.asciidoc[]
include::discovery/bootstrapping.asciidoc[]
include::discovery/adding-removing-nodes.asciidoc[]
include::discovery/publishing.asciidoc[]
include::discovery/no-master-block.asciidoc[]
include::discovery/master-election.asciidoc[]
include::discovery/fault-detection.asciidoc[]
include::discovery/quorums.asciidoc[]
include::discovery/zen.asciidoc[]

View File

@ -0,0 +1,125 @@
[[modules-discovery-adding-removing-nodes]]
=== Adding and removing nodes
As nodes are added or removed Elasticsearch maintains an optimal level of fault
tolerance by automatically updating the cluster's _voting configuration_, which
is the set of <<master-node,master-eligible nodes>> whose responses are counted
when making decisions such as electing a new master or committing a new cluster
state.
It is recommended to have a small and fixed number of master-eligible nodes in a
cluster, and to scale the cluster up and down by adding and removing
master-ineligible nodes only. However there are situations in which it may be
desirable to add or remove some master-eligible nodes to or from a cluster.
==== Adding master-eligible nodes
If you wish to add some master-eligible nodes to your cluster, simply configure
the new nodes to find the existing cluster and start them up. Elasticsearch will
add the new nodes to the voting configuration if it is appropriate to do so.
==== Removing master-eligible nodes
When removing master-eligible nodes, it is important not to remove too many all
at the same time. For instance, if there are currently seven master-eligible
nodes and you wish to reduce this to three, it is not possible simply to stop
four of the nodes at once: to do so would leave only three nodes remaining,
which is less than half of the voting configuration, which means the cluster
cannot take any further actions.
As long as there are at least three master-eligible nodes in the cluster, as a
general rule it is best to remove nodes one-at-a-time, allowing enough time for
the cluster to <<modules-discovery-quorums,automatically adjust>> the voting
configuration and adapt the fault tolerance level to the new set of nodes.
If there are only two master-eligible nodes remaining then neither node can be
safely removed since both are required to reliably make progress. You must first
inform Elasticsearch that one of the nodes should not be part of the voting
configuration, and that the voting power should instead be given to other nodes.
You can then take the excluded node offline without preventing the other node
from making progress. A node which is added to a voting configuration exclusion
list still works normally, but Elasticsearch tries to remove it from the voting
configuration so its vote is no longer required. Importantly, Elasticsearch
will never automatically move a node on the voting exclusions list back into the
voting configuration. Once an excluded node has been successfully
auto-reconfigured out of the voting configuration, it is safe to shut it down
without affecting the cluster's master-level availability. A node can be added
to the voting configuration exclusion list using the following API:
[source,js]
--------------------------------------------------
# Add node to voting configuration exclusions list and wait for the system to
# auto-reconfigure the node out of the voting configuration up to the default
# timeout of 30 seconds
POST /_cluster/voting_config_exclusions/node_name
# Add node to voting configuration exclusions list and wait for
# auto-reconfiguration up to one minute
POST /_cluster/voting_config_exclusions/node_name?timeout=1m
--------------------------------------------------
// CONSOLE
// TEST[skip:this would break the test cluster if executed]
The node that should be added to the exclusions list is specified using
<<cluster-nodes,node filters>> in place of `node_name` here. If a call to the
voting configuration exclusions API fails, you can safely retry it. Only a
successful response guarantees that the node has actually been removed from the
voting configuration and will not be reinstated.
Although the voting configuration exclusions API is most useful for down-scaling
a two-node to a one-node cluster, it is also possible to use it to remove
multiple master-eligible nodes all at the same time. Adding multiple nodes to
the exclusions list has the system try to auto-reconfigure all of these nodes
out of the voting configuration, allowing them to be safely shut down while
keeping the cluster available. In the example described above, shrinking a
seven-master-node cluster down to only have three master nodes, you could add
four nodes to the exclusions list, wait for confirmation, and then shut them
down simultaneously.
NOTE: Voting exclusions are only required when removing at least half of the
master-eligible nodes from a cluster in a short time period. They are not
required when removing master-ineligible nodes, nor are they required when
removing fewer than half of the master-eligible nodes.
Adding an exclusion for a node creates an entry for that node in the voting
configuration exclusions list, which has the system automatically try to
reconfigure the voting configuration to remove that node and prevents it from
returning to the voting configuration once it has removed. The current list of
exclusions is stored in the cluster state and can be inspected as follows:
[source,js]
--------------------------------------------------
GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions
--------------------------------------------------
// CONSOLE
This list is limited in size by the following setting:
`cluster.max_voting_config_exclusions`::
Sets a limits on the number of voting configuration exclusions at any one
time. Defaults to `10`.
Since voting configuration exclusions are persistent and limited in number, they
must be cleaned up. Normally an exclusion is added when performing some
maintenance on the cluster, and the exclusions should be cleaned up when the
maintenance is complete. Clusters should have no voting configuration exclusions
in normal operation.
If a node is excluded from the voting configuration because it is to be shut
down permanently, its exclusion can be removed after it is shut down and removed
from the cluster. Exclusions can also be cleared if they were created in error
or were only required temporarily:
[source,js]
--------------------------------------------------
# Wait for all the nodes with voting configuration exclusions to be removed from
# the cluster and then remove all the exclusions, allowing any node to return to
# the voting configuration in the future.
DELETE /_cluster/voting_config_exclusions
# Immediately remove all the voting configuration exclusions, allowing any node
# to return to the voting configuration in the future.
DELETE /_cluster/voting_config_exclusions?wait_for_removal=false
--------------------------------------------------
// CONSOLE

View File

@ -1,5 +0,0 @@
[[modules-discovery-azure-classic]]
=== Azure Classic Discovery
Azure classic discovery allows to use the Azure Classic APIs to perform automatic discovery (similar to multicast).
It is available as a plugin. See {plugins}/discovery-azure-classic.html[discovery-azure-classic] for more information.

View File

@ -0,0 +1,105 @@
[[modules-discovery-bootstrap-cluster]]
=== Bootstrapping a cluster
Starting an Elasticsearch cluster for the very first time requires the initial
set of <<master-node,master-eligible nodes>> to be explicitly defined on one or
more of the master-eligible nodes in the cluster. This is known as _cluster
bootstrapping_. This is only required the very first time the cluster starts
up: nodes that have already joined a cluster store this information in their
data folder and freshly-started nodes that are joining an existing cluster
obtain this information from the cluster's elected master. This information is
given using this setting:
`cluster.initial_master_nodes`::
Sets a list of the <<node.name,node names>> or transport addresses of the
initial set of master-eligible nodes in a brand-new cluster. By default
this list is empty, meaning that this node expects to join a cluster that
has already been bootstrapped.
This setting can be given on the command line or in the `elasticsearch.yml`
configuration file when starting up a master-eligible node. Once the cluster
has formed this setting is no longer required and is ignored. It need not be set
on master-ineligible nodes, nor on master-eligible nodes that are started to
join an existing cluster. Note that master-eligible nodes should use storage
that persists across restarts. If they do not, and
`cluster.initial_master_nodes` is set, and a full cluster restart occurs, then
another brand-new cluster will form and this may result in data loss.
It is technically sufficient to set `cluster.initial_master_nodes` on a single
master-eligible node in the cluster, and only to mention that single node in the
setting's value, but this provides no fault tolerance before the cluster has
fully formed. It is therefore better to bootstrap using at least three
master-eligible nodes, each with a `cluster.initial_master_nodes` setting
containing all three nodes.
NOTE: In alpha releases, all listed master-eligible nodes are required to be
discovered before bootstrapping can take place. This requirement will be relaxed
in production-ready releases.
WARNING: You must set `cluster.initial_master_nodes` to the same list of nodes
on each node on which it is set in order to be sure that only a single cluster
forms during bootstrapping and therefore to avoid the risk of data loss.
For a cluster with 3 master-eligible nodes (with <<node.name,node names>>
`master-a`, `master-b` and `master-c`) the configuration will look as follows:
[source,yaml]
--------------------------------------------------
cluster.initial_master_nodes:
- master-a
- master-b
- master-c
--------------------------------------------------
Alternatively the IP addresses or hostnames (<<node.name,if node name defaults
to the host name>>) can be used. If there is more than one Elasticsearch node
with the same IP address or hostname then the transport ports must also be given
to specify exactly which node is meant:
[source,yaml]
--------------------------------------------------
cluster.initial_master_nodes:
- 10.0.10.101
- 10.0.10.102:9300
- 10.0.10.102:9301
- master-node-hostname
--------------------------------------------------
Like all node settings, it is also possible to specify the initial set of master
nodes on the command-line that is used to start Elasticsearch:
[source,bash]
--------------------------------------------------
$ bin/elasticsearch -Ecluster.initial_master_nodes=master-a,master-b,master-c
--------------------------------------------------
[float]
==== Choosing a cluster name
The <<cluster.name,`cluster.name`>> setting enables you to create multiple
clusters which are separated from each other. Nodes verify that they agree on
their cluster name when they first connect to each other, and Elasticsearch
will only form a cluster from nodes that all have the same cluster name. The
default value for the cluster name is `elasticsearch`, but it is recommended to
change this to reflect the logical name of the cluster.
[float]
==== Auto-bootstrapping in development mode
If the cluster is running with a completely default configuration then it will
automatically bootstrap a cluster based on the nodes that could be discovered to
be running on the same host within a short time after startup. This means that
by default it is possible to start up several nodes on a single machine and have
them automatically form a cluster which is very useful for development
environments and experimentation. However, since nodes may not always
successfully discover each other quickly enough this automatic bootstrapping
cannot be relied upon and cannot be used in production deployments.
If any of the following settings are configured then auto-bootstrapping will not
take place, and you must configure `cluster.initial_master_nodes` as described
in the <<modules-discovery-bootstrap-cluster,section on cluster bootstrapping>>:
* `discovery.zen.hosts_provider`
* `discovery.zen.ping.unicast.hosts`
* `cluster.initial_master_nodes`

View File

@ -0,0 +1,195 @@
[[modules-discovery-hosts-providers]]
=== Discovery
Discovery is the process by which the cluster formation module finds other
nodes with which to form a cluster. This process runs when you start an
Elasticsearch node or when a node believes the master node failed and continues
until the master node is found or a new master node is elected.
Discovery operates in two phases: First, each node probes the addresses of all
known master-eligible nodes by connecting to each address and attempting to
identify the node to which it is connected. Secondly it shares with the remote
node a list of all of its known master-eligible peers and the remote node
responds with _its_ peers in turn. The node then probes all the new nodes that
it just discovered, requests their peers, and so on.
This process starts with a list of _seed_ addresses from one or more
<<built-in-hosts-providers,hosts providers>>, together with the addresses of
any master-eligible nodes that were in the last known cluster. The process
operates in two phases: First, each node probes the seed addresses by
connecting to each address and attempting to identify the node to which it is
connected. Secondly it shares with the remote node a list of all of its known
master-eligible peers and the remote node responds with _its_ peers in turn.
The node then probes all the new nodes that it just discovered, requests their
peers, and so on.
If the node is not master-eligible then it continues this discovery process
until it has discovered an elected master node. If no elected master is
discovered then the node will retry after `discovery.find_peers_interval` which
defaults to `1s`.
If the node is master-eligible then it continues this discovery process until it
has either discovered an elected master node or else it has discovered enough
masterless master-eligible nodes to complete an election. If neither of these
occur quickly enough then the node will retry after
`discovery.find_peers_interval` which defaults to `1s`.
[[built-in-hosts-providers]]
==== Hosts providers
By default the cluster formation module offers two hosts providers to configure
the list of seed nodes: a _settings_-based and a _file_-based hosts provider.
It can be extended to support cloud environments and other forms of hosts
providers via {plugins}/discovery.html[discovery plugins]. Hosts providers are
configured using the `discovery.zen.hosts_provider` setting, which defaults to
the _settings_-based hosts provider. Multiple hosts providers can be specified
as a list.
[float]
[[settings-based-hosts-provider]]
===== Settings-based hosts provider
The settings-based hosts provider uses a node setting to configure a static list
of hosts to use as seed nodes. These hosts can be specified as hostnames or IP
addresses; hosts specified as hostnames are resolved to IP addresses during each
round of discovery. Note that if you are in an environment where DNS resolutions
vary with time, you might need to adjust your <<networkaddress-cache-ttl,JVM
security settings>>.
The list of hosts is set using the <<unicast.hosts,`discovery.zen.ping.unicast.hosts`>> static
setting. For example:
[source,yaml]
--------------------------------------------------
discovery.zen.ping.unicast.hosts:
- 192.168.1.10:9300
- 192.168.1.11 <1>
- seeds.mydomain.com <2>
--------------------------------------------------
<1> The port will default to `transport.profiles.default.port` and fallback to
`transport.port` if not specified.
<2> A hostname that resolves to multiple IP addresses will try all resolved
addresses.
Additionally, the `discovery.zen.ping.unicast.hosts.resolve_timeout` configures
the amount of time to wait for DNS lookups on each round of discovery. This is
specified as a <<time-units, time unit>> and defaults to 5s.
Unicast discovery uses the <<modules-transport,transport>> module to perform the
discovery.
[float]
[[file-based-hosts-provider]]
===== File-based hosts provider
The file-based hosts provider configures a list of hosts via an external file.
Elasticsearch reloads this file when it changes, so that the list of seed nodes
can change dynamically without needing to restart each node. For example, this
gives a convenient mechanism for an Elasticsearch instance that is run in a
Docker container to be dynamically supplied with a list of IP addresses to
connect to when those IP addresses may not be known at node startup.
To enable file-based discovery, configure the `file` hosts provider as follows:
[source,txt]
----------------------------------------------------------------
discovery.zen.hosts_provider: file
----------------------------------------------------------------
Then create a file at `$ES_PATH_CONF/unicast_hosts.txt` in the format described
below. Any time a change is made to the `unicast_hosts.txt` file the new changes
will be picked up by Elasticsearch and the new hosts list will be used.
Note that the file-based discovery plugin augments the unicast hosts list in
`elasticsearch.yml`. If there are valid unicast host entries in
`discovery.zen.ping.unicast.hosts`, they are used in addition to those
supplied in `unicast_hosts.txt`.
The `discovery.zen.ping.unicast.hosts.resolve_timeout` setting also applies to
DNS lookups for nodes specified by address via file-based discovery. This is
specified as a <<time-units, time unit>> and defaults to 5s.
The format of the file is to specify one node entry per line. Each node entry
consists of the host (host name or IP address) and an optional transport port
number. If the port number is specified, is must come immediately after the
host (on the same line) separated by a `:`. If the port number is not
specified, a default value of 9300 is used.
For example, this is an example of `unicast_hosts.txt` for a cluster with four
nodes that participate in unicast discovery, some of which are not running on
the default port:
[source,txt]
----------------------------------------------------------------
10.10.10.5
10.10.10.6:9305
10.10.10.5:10005
# an IPv6 address
[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:9301
----------------------------------------------------------------
Host names are allowed instead of IP addresses (similar to
`discovery.zen.ping.unicast.hosts`), and IPv6 addresses must be specified in
brackets with the port coming after the brackets.
You can also add comments to this file. All comments must appear on
their lines starting with `#` (i.e. comments cannot start in the middle of a
line).
[float]
[[ec2-hosts-provider]]
===== EC2 hosts provider
The {plugins}/discovery-ec2.html[EC2 discovery plugin] adds a hosts provider
that uses the https://github.com/aws/aws-sdk-java[AWS API] to find a list of
seed nodes.
[float]
[[azure-classic-hosts-provider]]
===== Azure Classic hosts provider
The {plugins}/discovery-azure-classic.html[Azure Classic discovery plugin] adds
a hosts provider that uses the Azure Classic API find a list of seed nodes.
[float]
[[gce-hosts-provider]]
===== Google Compute Engine hosts provider
The {plugins}/discovery-gce.html[GCE discovery plugin] adds a hosts provider
that uses the GCE API find a list of seed nodes.
[float]
==== Discovery settings
The discovery process is controlled by the following settings.
`discovery.find_peers_interval`::
Sets how long a node will wait before attempting another discovery round.
Defaults to `1s`.
`discovery.request_peers_timeout`::
Sets how long a node will wait after asking its peers again before
considering the request to have failed. Defaults to `3s`.
`discovery.probe.connect_timeout`::
Sets how long to wait when attempting to connect to each address. Defaults
to `3s`.
`discovery.probe.handshake_timeout`::
Sets how long to wait when attempting to identify the remote node via a
handshake. Defaults to `1s`.
`discovery.cluster_formation_warning_timeout`::
Sets how long a node will try to form a cluster before logging a warning
that the cluster did not form. Defaults to `10s`.
If a cluster has not formed after `discovery.cluster_formation_warning_timeout`
has elapsed then the node will log a warning message that starts with the phrase
`master not discovered` which describes the current state of the discovery
process.

View File

@ -1,4 +0,0 @@
[[modules-discovery-ec2]]
=== EC2 Discovery
EC2 discovery is available as a plugin. See {plugins}/discovery-ec2.html[discovery-ec2] for more information.

View File

@ -0,0 +1,52 @@
[[fault-detection-settings]]
=== Cluster fault detection settings
An elected master periodically checks each of the nodes in the cluster in order
to ensure that they are still connected and healthy, and in turn each node in
the cluster periodically checks the health of the elected master. These checks
are known respectively as _follower checks_ and _leader checks_.
Elasticsearch allows for these checks occasionally to fail or timeout without
taking any action, and will only consider a node to be truly faulty after a
number of consecutive checks have failed. The following settings control the
behaviour of fault detection.
`cluster.fault_detection.follower_check.interval`::
Sets how long the elected master waits between follower checks to each
other node in the cluster. Defaults to `1s`.
`cluster.fault_detection.follower_check.timeout`::
Sets how long the elected master waits for a response to a follower check
before considering it to have failed. Defaults to `30s`.
`cluster.fault_detection.follower_check.retry_count`::
Sets how many consecutive follower check failures must occur to each node
before the elected master considers that node to be faulty and removes it
from the cluster. Defaults to `3`.
`cluster.fault_detection.leader_check.interval`::
Sets how long each node waits between checks of the elected master.
Defaults to `1s`.
`cluster.fault_detection.leader_check.timeout`::
Sets how long each node waits for a response to a leader check from the
elected master before considering it to have failed. Defaults to `30s`.
`cluster.fault_detection.leader_check.retry_count`::
Sets how many consecutive leader check failures must occur before a node
considers the elected master to be faulty and attempts to find or elect a
new master. Defaults to `3`.
If the elected master detects that a node has disconnected then this is treated
as an immediate failure, bypassing the timeouts and retries listed above, and
the master attempts to remove the node from the cluster. Similarly, if a node
detects that the elected master has disconnected then this is treated as an
immediate failure, bypassing the timeouts and retries listed above, and the
follower restarts its discovery phase to try and find or elect a new master.

View File

@ -1,6 +0,0 @@
[[modules-discovery-gce]]
=== Google Compute Engine Discovery
Google Compute Engine (GCE) discovery allows to use the GCE APIs to perform automatic discovery (similar to multicast).
It is available as a plugin. See {plugins}/discovery-gce.html[discovery-gce] for more information.

View File

@ -0,0 +1,40 @@
[[master-election-settings]]
=== Master election settings
The following settings control the scheduling of elections.
`cluster.election.initial_timeout`::
Sets the upper bound on how long a node will wait initially, or after the
elected master fails, before attempting its first election. This defaults
to `100ms`.
`cluster.election.back_off_time`::
Sets the amount to increase the upper bound on the wait before an election
on each election failure. Note that this is _linear_ backoff. This defaults
to `100ms`
`cluster.election.max_timeout`::
Sets the maximum upper bound on how long a node will wait before attempting
an first election, so that an network partition that lasts for a long time
does not result in excessively sparse elections. This defaults to `10s`
`cluster.election.duration`::
Sets how long each election is allowed to take before a node considers it to
have failed and schedules a retry. This defaults to `500ms`.
[float]
==== Joining an elected master
During master election, or when joining an existing formed cluster, a node will
send a join request to the master in order to be officially added to the
cluster. This join process can be configured with the following settings.
`cluster.join.timeout`::
Sets how long a node will wait after sending a request to join a cluster
before it considers the request to have failed and retries. Defaults to
`60s`.

View File

@ -0,0 +1,22 @@
[[no-master-block]]
=== No master block settings
For the cluster to be fully operational, it must have an active master. The
`discovery.zen.no_master_block` settings controls what operations should be
rejected when there is no active master.
The `discovery.zen.no_master_block` setting has two valid values:
[horizontal]
`all`:: All operations on the node--i.e. both read & writes--will be rejected.
This also applies for api cluster state read or write operations, like the get
index settings, put mapping and cluster state api.
`write`:: (default) Write operations will be rejected. Read operations will
succeed, based on the last known cluster configuration. This may result in
partial reads of stale data as this node may be isolated from the rest of the
cluster.
The `discovery.zen.no_master_block` setting doesn't apply to nodes-based APIs
(for example cluster stats, node info, and node stats APIs). Requests to these
APIs will not be blocked and can run on any available node.

View File

@ -0,0 +1,42 @@
[[cluster-state-publishing]]
=== Publishing the cluster state
The master node is the only node in a cluster that can make changes to the
cluster state. The master node processes one batch of cluster state updates at
a time, computing the required changes and publishing the updated cluster state
to all the other nodes in the cluster. Each publication starts with the master
broadcasting the updated cluster state to all nodes in the cluster. Each node
responds with an acknowledgement but does not yet apply the newly-received
state. Once the master has collected acknowledgements from enough
master-eligible nodes, the new cluster state is said to be _committed_ and the
master broadcasts another message instructing nodes to apply the now-committed
state. Each node receives this message, applies the updated state, and then
sends a second acknowledgement back to the master.
The master allows a limited amount of time for each cluster state update to be
completely published to all nodes. It is defined by the
`cluster.publish.timeout` setting, which defaults to `30s`, measured from the
time the publication started. If this time is reached before the new cluster
state is committed then the cluster state change is rejected and the master
considers itself to have failed. It stands down and starts trying to elect a
new master.
If the new cluster state is committed before `cluster.publish.timeout` has
elapsed, the master node considers the change to have succeeded. It waits until
the timeout has elapsed or until it has received acknowledgements that each
node in the cluster has applied the updated state, and then starts processing
and publishing the next cluster state update. If some acknowledgements have not
been received (i.e. some nodes have not yet confirmed that they have applied
the current update), these nodes are said to be _lagging_ since their cluster
states have fallen behind the master's latest state. The master waits for the
lagging nodes to catch up for a further time, `cluster.follower_lag.timeout`,
which defaults to `90s`. If a node has still not successfully applied the
cluster state update within this time then it is considered to have failed and
is removed from the cluster.
NOTE: Elasticsearch is a peer to peer based system, in which nodes communicate
with one another directly. The high-throughput APIs (index, delete, search) do
not normally interact with the master node. The responsibility of the master
node is to maintain the global cluster state and reassign shards when nodes join or leave
the cluster. Each time the cluster state is changed, the
new state is published to all nodes in the cluster as described above.

View File

@ -0,0 +1,193 @@
[[modules-discovery-quorums]]
=== Quorum-based decision making
Electing a master node and changing the cluster state are the two fundamental
tasks that master-eligible nodes must work together to perform. It is important
that these activities work robustly even if some nodes have failed.
Elasticsearch achieves this robustness by considering each action to have
succeeded on receipt of responses from a _quorum_, which is a subset of the
master-eligible nodes in the cluster. The advantage of requiring only a subset
of the nodes to respond is that it means some of the nodes can fail without
preventing the cluster from making progress. The quorums are carefully chosen so
the cluster does not have a "split brain" scenario where it's partitioned into
two pieces such that each piece may make decisions that are inconsistent with
those of the other piece.
Elasticsearch allows you to add and remove master-eligible nodes to a running
cluster. In many cases you can do this simply by starting or stopping the nodes
as required. See <<modules-discovery-adding-removing-nodes>>.
As nodes are added or removed Elasticsearch maintains an optimal level of fault
tolerance by updating the cluster's _voting configuration_, which is the set of
master-eligible nodes whose responses are counted when making decisions such as
electing a new master or committing a new cluster state. A decision is made only
after more than half of the nodes in the voting configuration have responded.
Usually the voting configuration is the same as the set of all the
master-eligible nodes that are currently in the cluster. However, there are some
situations in which they may be different.
To be sure that the cluster remains available you **must not stop half or more
of the nodes in the voting configuration at the same time**. As long as more
than half of the voting nodes are available the cluster can still work normally.
This means that if there are three or four master-eligible nodes, the cluster
can tolerate one of them being unavailable. If there are two or fewer
master-eligible nodes, they must all remain available.
After a node has joined or left the cluster the elected master must issue a
cluster-state update that adjusts the voting configuration to match, and this
can take a short time to complete. It is important to wait for this adjustment
to complete before removing more nodes from the cluster.
[float]
==== Setting the initial quorum
When a brand-new cluster starts up for the first time, it must elect its first
master node. To do this election, it needs to know the set of master-eligible
nodes whose votes should count. This initial voting configuration is known as
the _bootstrap configuration_ and is set in the
<<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.
It is important that the bootstrap configuration identifies exactly which nodes
should vote in the first election. It is not sufficient to configure each node
with an expectation of how many nodes there should be in the cluster. It is also
important to note that the bootstrap configuration must come from outside the
cluster: there is no safe way for the cluster to determine the bootstrap
configuration correctly on its own.
If the bootstrap configuration is not set correctly, when you start a brand-new
cluster there is a risk that you will accidentally form two separate clusters
instead of one. This situation can lead to data loss: you might start using both
clusters before you notice that anything has gone wrong and it is impossible to
merge them together later.
NOTE: To illustrate the problem with configuring each node to expect a certain
cluster size, imagine starting up a three-node cluster in which each node knows
that it is going to be part of a three-node cluster. A majority of three nodes
is two, so normally the first two nodes to discover each other form a cluster
and the third node joins them a short time later. However, imagine that four
nodes were erroneously started instead of three. In this case, there are enough
nodes to form two separate clusters. Of course if each node is started manually
then it's unlikely that too many nodes are started. If you're using an automated
orchestrator, however, it's certainly possible to get into this situation--
particularly if the orchestrator is not resilient to failures such as network
partitions.
The initial quorum is only required the very first time a whole cluster starts
up. New nodes joining an established cluster can safely obtain all the
information they need from the elected master. Nodes that have previously been
part of a cluster will have stored to disk all the information that is required
when they restart.
[float]
==== Master elections
Elasticsearch uses an election process to agree on an elected master node, both
at startup and if the existing elected master fails. Any master-eligible node
can start an election, and normally the first election that takes place will
succeed. Elections only usually fail when two nodes both happen to start their
elections at about the same time, so elections are scheduled randomly on each
node to reduce the probability of this happening. Nodes will retry elections
until a master is elected, backing off on failure, so that eventually an
election will succeed (with arbitrarily high probability). The scheduling of
master elections are controlled by the <<master-election-settings,master
election settings>>.
[float]
==== Cluster maintenance, rolling restarts and migrations
Many cluster maintenance tasks involve temporarily shutting down one or more
nodes and then starting them back up again. By default Elasticsearch can remain
available if one of its master-eligible nodes is taken offline, such as during a
<<rolling-upgrades,rolling restart>>. Furthermore, if multiple nodes are stopped
and then started again then it will automatically recover, such as during a
<<restart-upgrade,full cluster restart>>. There is no need to take any further
action with the APIs described here in these cases, because the set of master
nodes is not changing permanently.
[float]
==== Automatic changes to the voting configuration
Nodes may join or leave the cluster, and Elasticsearch reacts by automatically
making corresponding changes to the voting configuration in order to ensure that
the cluster is as resilient as possible. The default auto-reconfiguration
behaviour is expected to give the best results in most situations. The current
voting configuration is stored in the cluster state so you can inspect its
current contents as follows:
[source,js]
--------------------------------------------------
GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config
--------------------------------------------------
// CONSOLE
NOTE: The current voting configuration is not necessarily the same as the set of
all available master-eligible nodes in the cluster. Altering the voting
configuration involves taking a vote, so it takes some time to adjust the
configuration as nodes join or leave the cluster. Also, there are situations
where the most resilient configuration includes unavailable nodes, or does not
include some available nodes, and in these situations the voting configuration
differs from the set of available master-eligible nodes in the cluster.
Larger voting configurations are usually more resilient, so Elasticsearch
normally prefers to add master-eligible nodes to the voting configuration after
they join the cluster. Similarly, if a node in the voting configuration
leaves the cluster and there is another master-eligible node in the cluster that
is not in the voting configuration then it is preferable to swap these two nodes
over. The size of the voting configuration is thus unchanged but its
resilience increases.
It is not so straightforward to automatically remove nodes from the voting
configuration after they have left the cluster. Different strategies have
different benefits and drawbacks, so the right choice depends on how the cluster
will be used. You can control whether the voting configuration automatically shrinks by using the following setting:
`cluster.auto_shrink_voting_configuration`::
Defaults to `true`, meaning that the voting configuration will automatically
shrink, shedding departed nodes, as long as it still contains at least 3
nodes. If set to `false`, the voting configuration never automatically
shrinks; departed nodes must be removed manually using the
<<modules-discovery-adding-removing-nodes,voting configuration exclusions API>>.
NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true`, the
recommended and default setting, and there are at least three master-eligible
nodes in the cluster, then Elasticsearch remains capable of processing
cluster-state updates as long as all but one of its master-eligible nodes are
healthy.
There are situations in which Elasticsearch might tolerate the loss of multiple
nodes, but this is not guaranteed under all sequences of failures. If this
setting is set to `false` then departed nodes must be removed from the voting
configuration manually, using the
<<modules-discovery-adding-removing-nodes,voting exclusions API>>, to achieve
the desired level of resilience.
No matter how it is configured, Elasticsearch will not suffer from a "split-brain" inconsistency.
The `cluster.auto_shrink_voting_configuration` setting affects only its availability in the
event of the failure of some of its nodes, and the administrative tasks that
must be performed as nodes join and leave the cluster.
[float]
==== Even numbers of master-eligible nodes
There should normally be an odd number of master-eligible nodes in a cluster.
If there is an even number, Elasticsearch leaves one of them out of the voting
configuration to ensure that it has an odd size. This omission does not decrease
the failure-tolerance of the cluster. In fact, improves it slightly: if the
cluster suffers from a network partition that divides it into two equally-sized
halves then one of the halves will contain a majority of the voting
configuration and will be able to keep operating. If all of the master-eligible
nodes' votes were counted, neither side would contain a strict majority of the
nodes and so the cluster would not be able to make any progress.
For instance if there are four master-eligible nodes in the cluster and the
voting configuration contained all of them, any quorum-based decision would
require votes from at least three of them. This situation means that the cluster
can tolerate the loss of only a single master-eligible node. If this cluster
were split into two equal halves, neither half would contain three
master-eligible nodes and the cluster would not be able to make any progress.
If the voting configuration contains only three of the four master-eligible
nodes, however, the cluster is still only fully tolerant to the loss of one
node, but quorum-based decisions require votes from two of the three voting
nodes. In the event of an even split, one half will contain two of the three
voting nodes so that half will remain available.

View File

@ -1,226 +0,0 @@
[[modules-discovery-zen]]
=== Zen Discovery
Zen discovery is the built-in, default, discovery module for Elasticsearch. It
provides unicast and file-based discovery, and can be extended to support cloud
environments and other forms of discovery via plugins.
Zen discovery is integrated with other modules, for example, all communication
between nodes is done using the <<modules-transport,transport>> module.
It is separated into several sub modules, which are explained below:
[float]
[[ping]]
==== Ping
This is the process where a node uses the discovery mechanisms to find other
nodes.
[float]
[[discovery-seed-nodes]]
==== Seed nodes
Zen discovery uses a list of _seed_ nodes in order to start off the discovery
process. At startup, or when electing a new master, Elasticsearch tries to
connect to each seed node in its list, and holds a gossip-like conversation with
them to find other nodes and to build a complete picture of the cluster. By
default there are two methods for configuring the list of seed nodes: _unicast_
and _file-based_. It is recommended that the list of seed nodes comprises the
list of master-eligible nodes in the cluster.
[float]
[[unicast]]
===== Unicast
Unicast discovery configures a static list of hosts for use as seed nodes.
These hosts can be specified as hostnames or IP addresses; hosts specified as
hostnames are resolved to IP addresses during each round of pinging. Note that
if you are in an environment where DNS resolutions vary with time, you might
need to adjust your <<networkaddress-cache-ttl,JVM security settings>>.
The list of hosts is set using the `discovery.zen.ping.unicast.hosts` static
setting. This is either an array of hosts or a comma-delimited string. Each
value should be in the form of `host:port` or `host` (where `port` defaults to
the setting `transport.profiles.default.port` falling back to
`transport.port` if not set). Note that IPv6 hosts must be bracketed. The
default for this setting is `127.0.0.1, [::1]`
Additionally, the `discovery.zen.ping.unicast.resolve_timeout` configures the
amount of time to wait for DNS lookups on each round of pinging. This is
specified as a <<time-units, time unit>> and defaults to 5s.
Unicast discovery uses the <<modules-transport,transport>> module to perform the
discovery.
[float]
[[file-based-hosts-provider]]
===== File-based
In addition to hosts provided by the static `discovery.zen.ping.unicast.hosts`
setting, it is possible to provide a list of hosts via an external file.
Elasticsearch reloads this file when it changes, so that the list of seed nodes
can change dynamically without needing to restart each node. For example, this
gives a convenient mechanism for an Elasticsearch instance that is run in a
Docker container to be dynamically supplied with a list of IP addresses to
connect to for Zen discovery when those IP addresses may not be known at node
startup.
To enable file-based discovery, configure the `file` hosts provider as follows:
[source,txt]
----------------------------------------------------------------
discovery.zen.hosts_provider: file
----------------------------------------------------------------
Then create a file at `$ES_PATH_CONF/unicast_hosts.txt` in the format described
below. Any time a change is made to the `unicast_hosts.txt` file the new
changes will be picked up by Elasticsearch and the new hosts list will be used.
Note that the file-based discovery plugin augments the unicast hosts list in
`elasticsearch.yml`: if there are valid unicast host entries in
`discovery.zen.ping.unicast.hosts` then they will be used in addition to those
supplied in `unicast_hosts.txt`.
The `discovery.zen.ping.unicast.resolve_timeout` setting also applies to DNS
lookups for nodes specified by address via file-based discovery. This is
specified as a <<time-units, time unit>> and defaults to 5s.
The format of the file is to specify one node entry per line. Each node entry
consists of the host (host name or IP address) and an optional transport port
number. If the port number is specified, is must come immediately after the
host (on the same line) separated by a `:`. If the port number is not
specified, a default value of 9300 is used.
For example, this is an example of `unicast_hosts.txt` for a cluster with four
nodes that participate in unicast discovery, some of which are not running on
the default port:
[source,txt]
----------------------------------------------------------------
10.10.10.5
10.10.10.6:9305
10.10.10.5:10005
# an IPv6 address
[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:9301
----------------------------------------------------------------
Host names are allowed instead of IP addresses (similar to
`discovery.zen.ping.unicast.hosts`), and IPv6 addresses must be specified in
brackets with the port coming after the brackets.
It is also possible to add comments to this file. All comments must appear on
their lines starting with `#` (i.e. comments cannot start in the middle of a
line).
[float]
[[master-election]]
==== Master Election
As part of the ping process a master of the cluster is either elected or joined
to. This is done automatically. The `discovery.zen.ping_timeout` (which defaults
to `3s`) determines how long the node will wait before deciding on starting an
election or joining an existing cluster. Three pings will be sent over this
timeout interval. In case where no decision can be reached after the timeout,
the pinging process restarts. In slow or congested networks, three seconds
might not be enough for a node to become aware of the other nodes in its
environment before making an election decision. Increasing the timeout should
be done with care in that case, as it will slow down the election process. Once
a node decides to join an existing formed cluster, it will send a join request
to the master (`discovery.zen.join_timeout`) with a timeout defaulting at 20
times the ping timeout.
When the master node stops or has encountered a problem, the cluster nodes start
pinging again and will elect a new master. This pinging round also serves as a
protection against (partial) network failures where a node may unjustly think
that the master has failed. In this case the node will simply hear from other
nodes about the currently active master.
If `discovery.zen.master_election.ignore_non_master_pings` is `true`, pings from
nodes that are not master eligible (nodes where `node.master` is `false`) are
ignored during master election; the default value is `false`.
Nodes can be excluded from becoming a master by setting `node.master` to
`false`.
The `discovery.zen.minimum_master_nodes` sets the minimum number of master
eligible nodes that need to join a newly elected master in order for an election
to complete and for the elected node to accept its mastership. The same setting
controls the minimum number of active master eligible nodes that should be a
part of any active cluster. If this requirement is not met the active master
node will step down and a new master election will begin.
This setting must be set to a <<minimum_master_nodes,quorum>> of your master
eligible nodes. It is recommended to avoid having only two master eligible
nodes, since a quorum of two is two. Therefore, a loss of either master eligible
node will result in an inoperable cluster.
[float]
[[fault-detection]]
==== Fault Detection
There are two fault detection processes running. The first is by the master, to
ping all the other nodes in the cluster and verify that they are alive. And on
the other end, each node pings to master to verify if its still alive or an
election process needs to be initiated.
The following settings control the fault detection process using the
`discovery.zen.fd` prefix:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`ping_interval` |How often a node gets pinged. Defaults to `1s`.
|`ping_timeout` |How long to wait for a ping response, defaults to
`30s`.
|`ping_retries` |How many ping failures / timeouts cause a node to be
considered failed. Defaults to `3`.
|=======================================================================
[float]
==== Cluster state updates
The master node is the only node in a cluster that can make changes to the
cluster state. The master node processes one cluster state update at a time,
applies the required changes and publishes the updated cluster state to all the
other nodes in the cluster. Each node receives the publish message, acknowledges
it, but does *not* yet apply it. If the master does not receive acknowledgement
from at least `discovery.zen.minimum_master_nodes` nodes within a certain time
(controlled by the `discovery.zen.commit_timeout` setting and defaults to 30
seconds) the cluster state change is rejected.
Once enough nodes have responded, the cluster state is committed and a message
will be sent to all the nodes. The nodes then proceed to apply the new cluster
state to their internal state. The master node waits for all nodes to respond,
up to a timeout, before going ahead processing the next updates in the queue.
The `discovery.zen.publish_timeout` is set by default to 30 seconds and is
measured from the moment the publishing started. Both timeout settings can be
changed dynamically through the <<cluster-update-settings,cluster update
settings api>>
[float]
[[no-master-block]]
==== No master block
For the cluster to be fully operational, it must have an active master and the
number of running master eligible nodes must satisfy the
`discovery.zen.minimum_master_nodes` setting if set. The
`discovery.zen.no_master_block` settings controls what operations should be
rejected when there is no active master.
The `discovery.zen.no_master_block` setting has two valid options:
[horizontal]
`all`:: All operations on the node--i.e. both read & writes--will be rejected.
This also applies for api cluster state read or write operations, like the get
index settings, put mapping and cluster state api.
`write`:: (default) Write operations will be rejected. Read operations will
succeed, based on the last known cluster configuration. This may result in
partial reads of stale data as this node may be isolated from the rest of the
cluster.
The `discovery.zen.no_master_block` setting doesn't apply to nodes-based apis
(for example cluster stats, node info and node stats apis). Requests to these
apis will not be blocked and can run on any available node.

View File

@ -19,7 +19,7 @@ purpose:
<<master-node,Master-eligible node>>::
A node that has `node.master` set to `true` (default), which makes it eligible
to be <<modules-discovery-zen,elected as the _master_ node>>, which controls
to be <<modules-discovery,elected as the _master_ node>>, which controls
the cluster.
<<data-node,Data node>>::
@ -69,7 +69,7 @@ and deciding which shards to allocate to which nodes. It is important for
cluster health to have a stable master node.
Any master-eligible node (all nodes by default) may be elected to become the
master node by the <<modules-discovery-zen,master election process>>.
master node by the <<modules-discovery,master election process>>.
IMPORTANT: Master nodes must have access to the `data/` directory (just like
`data` nodes) as this is where the cluster state is persisted between node restarts.
@ -105,74 +105,6 @@ NOTE: These settings apply only when {xpack} is not installed. To create a
dedicated master-eligible node when {xpack} is installed, see <<modules-node-xpack,{xpack} node settings>>.
endif::include-xpack[]
[float]
[[split-brain]]
==== Avoiding split brain with `minimum_master_nodes`
To prevent data loss, it is vital to configure the
`discovery.zen.minimum_master_nodes` setting (which defaults to `1`) so that
each master-eligible node knows the _minimum number of master-eligible nodes_
that must be visible in order to form a cluster.
To explain, imagine that you have a cluster consisting of two master-eligible
nodes. A network failure breaks communication between these two nodes. Each
node sees one master-eligible node... itself. With `minimum_master_nodes` set
to the default of `1`, this is sufficient to form a cluster. Each node elects
itself as the new master (thinking that the other master-eligible node has
died) and the result is two clusters, or a _split brain_. These two nodes
will never rejoin until one node is restarted. Any data that has been written
to the restarted node will be lost.
Now imagine that you have a cluster with three master-eligible nodes, and
`minimum_master_nodes` set to `2`. If a network split separates one node from
the other two nodes, the side with one node cannot see enough master-eligible
nodes and will realise that it cannot elect itself as master. The side with
two nodes will elect a new master (if needed) and continue functioning
correctly. As soon as the network split is resolved, the single node will
rejoin the cluster and start serving requests again.
This setting should be set to a _quorum_ of master-eligible nodes:
(master_eligible_nodes / 2) + 1
In other words, if there are three master-eligible nodes, then minimum master
nodes should be set to `(3 / 2) + 1` or `2`:
[source,yaml]
----------------------------
discovery.zen.minimum_master_nodes: 2 <1>
----------------------------
<1> Defaults to `1`.
To be able to remain available when one of the master-eligible nodes fails,
clusters should have at least three master-eligible nodes, with
`minimum_master_nodes` set accordingly. A <<rolling-upgrades,rolling upgrade>>,
performed without any downtime, also requires at least three master-eligible
nodes to avoid the possibility of data loss if a network split occurs while the
upgrade is in progress.
This setting can also be changed dynamically on a live cluster with the
<<cluster-update-settings,cluster update settings API>>:
[source,js]
----------------------------
PUT _cluster/settings
{
"transient": {
"discovery.zen.minimum_master_nodes": 2
}
}
----------------------------
// CONSOLE
// TEST[skip:Test use Zen2 now so we can't test Zen1 behaviour here]
TIP: An advantage of splitting the master and data roles between dedicated
nodes is that you can have just three master-eligible nodes and set
`minimum_master_nodes` to `2`. You never have to change this setting, no
matter how many dedicated data nodes you add to the cluster.
[float]
[[data-node]]
=== Data Node

View File

@ -597,9 +597,8 @@ if the new cluster doesn't contain nodes with appropriate attributes that a rest
index will not be successfully restored unless these index allocation settings are changed during restore operation.
The restore operation also checks that restored persistent settings are compatible with the current cluster to avoid accidentally
restoring an incompatible settings such as `discovery.zen.minimum_master_nodes` and as a result disable a smaller cluster until the
required number of master eligible nodes is added. If you need to restore a snapshot with incompatible persistent settings, try
restoring it without the global cluster state.
restoring incompatible settings. If you need to restore a snapshot with incompatible persistent settings, try restoring it without
the global cluster state.
[float]
=== Snapshot status

View File

@ -560,3 +560,19 @@ See <<api-definitions>>.
The standard token filter has been removed.
[role="exclude",id="modules-discovery-azure-classic"]
See <<azure-classic-hosts-provider>>.
[role="exclude",id="modules-discovery-ec2"]
See <<ec2-hosts-provider>>.
[role="exclude",id="modules-discovery-gce"]
See <<gce-hosts-provider>>.
[role="exclude",id="modules-discovery-zen"]
Zen discovery is replaced by the <<modules-discovery,discovery and cluster
formation module>>.

View File

@ -108,7 +108,7 @@ services:
image: {docker-image}
environment:
- node.name=es01
- discovery.zen.minimum_master_nodes=2
- cluster.initial_master_nodes=es01,es02
- ELASTIC_PASSWORD=$ELASTIC_PASSWORD <1>
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.license.self_generated.type=trial <2>
@ -133,9 +133,9 @@ services:
image: {docker-image}
environment:
- node.name=es02
- discovery.zen.minimum_master_nodes=2
- ELASTIC_PASSWORD=$ELASTIC_PASSWORD
- discovery.zen.ping.unicast.hosts=es01
- cluster.initial_master_nodes=es01,es02
- ELASTIC_PASSWORD=$ELASTIC_PASSWORD
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.license.self_generated.type=trial
- xpack.security.enabled=true

View File

@ -21,6 +21,7 @@ Elasticsearch from running with incompatible settings. These checks are
documented individually.
[float]
[[dev-vs-prod-mode]]
=== Development vs. production mode
By default, Elasticsearch binds to loopback addresses for <<modules-http,HTTP>>

View File

@ -1,22 +1,43 @@
[[discovery-settings]]
=== Discovery settings
=== Discovery and cluster formation settings
Elasticsearch uses a custom discovery implementation called "Zen Discovery" for
node-to-node clustering and master election. There are two important discovery
settings that should be configured before going to production.
There are two important discovery and cluster formation settings that should be
configured before going to production so that nodes in the cluster can discover
each other and elect a master node.
[float]
[[unicast.hosts]]
==== `discovery.zen.ping.unicast.hosts`
Out of the box, without any network configuration, Elasticsearch will bind to
the available loopback addresses and will scan ports 9300 to 9305 to try to
connect to other nodes running on the same server. This provides an auto-
the available loopback addresses and will scan local ports 9300 to 9305 to try
to connect to other nodes running on the same server. This provides an auto-
clustering experience without having to do any configuration.
When the moment comes to form a cluster with nodes on other servers, you have to
provide a seed list of other nodes in the cluster that are likely to be live and
contactable. This can be specified as follows:
When the moment comes to form a cluster with nodes on other servers, you must
use the `discovery.zen.ping.unicast.hosts` setting to provide a seed list of
other nodes in the cluster that are master-eligible and likely to be live and
contactable. This setting should normally contain the addresses of all the
master-eligible nodes in the cluster.
This setting contains either an array of hosts or a comma-delimited string. Each
value should be in the form of `host:port` or `host` (where `port` defaults to
the setting `transport.profiles.default.port` falling back to `transport.port`
if not set). Note that IPv6 hosts must be bracketed. The default for this
setting is `127.0.0.1, [::1]`
[float]
[[initial_master_nodes]]
==== `cluster.initial_master_nodes`
When you start a brand new Elasticsearch cluster for the very first time, there
is a <<modules-discovery-bootstrap-cluster,cluster bootstrapping>> step, which
determines the set of master-eligible nodes whose votes are counted in the very
first election. In <<dev-vs-prod-mode,development mode>>, with no discovery
settings configured, this step is automatically performed by the nodes
themselves. As this auto-bootstrapping is <<modules-discovery-quorums,inherently
unsafe>>, when you start a brand new cluster in <<dev-vs-prod-mode,production
mode>>, you must explicitly list the names or IP addresses of the
master-eligible nodes whose votes should be counted in the very first election.
This list is set using the `cluster.initial_master_nodes` setting.
[source,yaml]
--------------------------------------------------
@ -24,35 +45,16 @@ discovery.zen.ping.unicast.hosts:
- 192.168.1.10:9300
- 192.168.1.11 <1>
- seeds.mydomain.com <2>
cluster.initial_master_nodes:
- master-node-a <3>
- 192.168.1.12 <4>
- 192.168.1.13:9301 <5>
--------------------------------------------------
<1> The port will default to `transport.profiles.default.port` and fallback to
`transport.port` if not specified.
<2> A hostname that resolves to multiple IP addresses will try all resolved
addresses.
[float]
[[minimum_master_nodes]]
==== `discovery.zen.minimum_master_nodes`
To prevent data loss, it is vital to configure the
`discovery.zen.minimum_master_nodes` setting so that each master-eligible node
knows the _minimum number of master-eligible nodes_ that must be visible in
order to form a cluster.
Without this setting, a cluster that suffers a network failure is at risk of
having the cluster split into two independent clusters -- a split brain -- which
will lead to data loss. A more detailed explanation is provided in
<<split-brain>>.
To avoid a split brain, this setting should be set to a _quorum_ of
master-eligible nodes:
(master_eligible_nodes / 2) + 1
In other words, if there are three master-eligible nodes, then minimum master
nodes should be set to `(3 / 2) + 1` or `2`:
[source,yaml]
--------------------------------------------------
discovery.zen.minimum_master_nodes: 2
--------------------------------------------------
<2> If a hostname resolves to multiple IP addresses then the node will attempt to
discover other nodes at all resolved addresses.
<3> Initial master nodes can be identified by their <<node.name,node name>>.
<4> Initial master nodes can also be identified by their IP address.
<5> If multiple master nodes share an IP address then the port must be used to
disambiguate them.

View File

@ -142,12 +142,12 @@ endif::[]
Instructions for installing it can be found on the
https://docs.docker.com/compose/install/#install-using-pip[Docker Compose webpage].
The node `elasticsearch` listens on `localhost:9200` while `elasticsearch2`
talks to `elasticsearch` over a Docker network.
The node `es01` listens on `localhost:9200` while `es02`
talks to `es01` over a Docker network.
This example also uses
https://docs.docker.com/engine/tutorials/dockervolumes[Docker named volumes],
called `esdata1` and `esdata2` which will be created if not already present.
called `esdata01` and `esdata02` which will be created if not already present.
[[docker-prod-cluster-composefile]]
`docker-compose.yml`:
@ -163,10 +163,12 @@ ifeval::["{release-state}"!="unreleased"]
--------------------------------------------
version: '2.2'
services:
elasticsearch:
es01:
image: {docker-image}
container_name: elasticsearch
container_name: es01
environment:
- node.name=es01
- cluster.initial_master_nodes=es01,es02
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
@ -175,32 +177,34 @@ services:
soft: -1
hard: -1
volumes:
- esdata1:/usr/share/elasticsearch/data
- esdata01:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- esnet
elasticsearch2:
es02:
image: {docker-image}
container_name: elasticsearch2
container_name: es02
environment:
- node.name=es02
- discovery.zen.ping.unicast.hosts=es01
- cluster.initial_master_nodes=es01,es02
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- "discovery.zen.ping.unicast.hosts=elasticsearch"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata2:/usr/share/elasticsearch/data
- esdata02:/usr/share/elasticsearch/data
networks:
- esnet
volumes:
esdata1:
esdata01:
driver: local
esdata2:
esdata02:
driver: local
networks:

View File

@ -59,10 +59,14 @@ If you have dedicated master nodes, start them first and wait for them to
form a cluster and elect a master before proceeding with your data nodes.
You can check progress by looking at the logs.
As soon as the <<master-election,minimum number of master-eligible nodes>>
have discovered each other, they form a cluster and elect a master. At
that point, you can use <<cat-health,`_cat/health`>> and
<<cat-nodes,`_cat/nodes`>> to monitor nodes joining the cluster:
If upgrading from a 6.x cluster, you must
<<modules-discovery-bootstrap-cluster,configure cluster bootstrapping>> by
setting the `cluster.initial_master_nodes` setting.
As soon as enough master-eligible nodes have discovered each other, they form a
cluster and elect a master. At that point, you can use
<<cat-health,`_cat/health`>> and <<cat-nodes,`_cat/nodes`>> to monitor nodes
joining the cluster:
[source,sh]
--------------------------------------------------