227 lines
10 KiB
Plaintext
227 lines
10 KiB
Plaintext
[[modules-discovery-zen]]
|
|
=== Zen Discovery
|
|
|
|
Zen discovery is the built-in, default, discovery module for Elasticsearch. It
|
|
provides unicast and file-based discovery, and can be extended to support cloud
|
|
environments and other forms of discovery via plugins.
|
|
|
|
Zen discovery is integrated with other modules, for example, all communication
|
|
between nodes is done using the <<modules-transport,transport>> module.
|
|
|
|
It is separated into several sub modules, which are explained below:
|
|
|
|
[float]
|
|
[[ping]]
|
|
==== Ping
|
|
|
|
This is the process where a node uses the discovery mechanisms to find other
|
|
nodes.
|
|
|
|
[float]
|
|
[[discovery-seed-nodes]]
|
|
==== Seed nodes
|
|
|
|
Zen discovery uses a list of _seed_ nodes in order to start off the discovery
|
|
process. At startup, or when electing a new master, Elasticsearch tries to
|
|
connect to each seed node in its list, and holds a gossip-like conversation with
|
|
them to find other nodes and to build a complete picture of the cluster. By
|
|
default there are two methods for configuring the list of seed nodes: _unicast_
|
|
and _file-based_. It is recommended that the list of seed nodes comprises the
|
|
list of master-eligible nodes in the cluster.
|
|
|
|
[float]
|
|
[[unicast]]
|
|
===== Unicast
|
|
|
|
Unicast discovery configures a static list of hosts for use as seed nodes.
|
|
These hosts can be specified as hostnames or IP addresses; hosts specified as
|
|
hostnames are resolved to IP addresses during each round of pinging. Note that
|
|
if you are in an environment where DNS resolutions vary with time, you might
|
|
need to adjust your <<networkaddress-cache-ttl,JVM security settings>>.
|
|
|
|
The list of hosts is set using the `discovery.zen.ping.unicast.hosts` static
|
|
setting. This is either an array of hosts or a comma-delimited string. Each
|
|
value should be in the form of `host:port` or `host` (where `port` defaults to
|
|
the setting `transport.profiles.default.port` falling back to
|
|
`transport.port` if not set). Note that IPv6 hosts must be bracketed. The
|
|
default for this setting is `127.0.0.1, [::1]`
|
|
|
|
Additionally, the `discovery.zen.ping.unicast.resolve_timeout` configures the
|
|
amount of time to wait for DNS lookups on each round of pinging. This is
|
|
specified as a <<time-units, time unit>> and defaults to 5s.
|
|
|
|
Unicast discovery uses the <<modules-transport,transport>> module to perform the
|
|
discovery.
|
|
|
|
[float]
|
|
[[file-based-hosts-provider]]
|
|
===== File-based
|
|
|
|
In addition to hosts provided by the static `discovery.zen.ping.unicast.hosts`
|
|
setting, it is possible to provide a list of hosts via an external file.
|
|
Elasticsearch reloads this file when it changes, so that the list of seed nodes
|
|
can change dynamically without needing to restart each node. For example, this
|
|
gives a convenient mechanism for an Elasticsearch instance that is run in a
|
|
Docker container to be dynamically supplied with a list of IP addresses to
|
|
connect to for Zen discovery when those IP addresses may not be known at node
|
|
startup.
|
|
|
|
To enable file-based discovery, configure the `file` hosts provider as follows:
|
|
|
|
[source,txt]
|
|
----------------------------------------------------------------
|
|
discovery.zen.hosts_provider: file
|
|
----------------------------------------------------------------
|
|
|
|
Then create a file at `$ES_PATH_CONF/unicast_hosts.txt` in the format described
|
|
below. Any time a change is made to the `unicast_hosts.txt` file the new
|
|
changes will be picked up by Elasticsearch and the new hosts list will be used.
|
|
|
|
Note that the file-based discovery plugin augments the unicast hosts list in
|
|
`elasticsearch.yml`: if there are valid unicast host entries in
|
|
`discovery.zen.ping.unicast.hosts` then they will be used in addition to those
|
|
supplied in `unicast_hosts.txt`.
|
|
|
|
The `discovery.zen.ping.unicast.resolve_timeout` setting also applies to DNS
|
|
lookups for nodes specified by address via file-based discovery. This is
|
|
specified as a <<time-units, time unit>> and defaults to 5s.
|
|
|
|
The format of the file is to specify one node entry per line. Each node entry
|
|
consists of the host (host name or IP address) and an optional transport port
|
|
number. If the port number is specified, is must come immediately after the
|
|
host (on the same line) separated by a `:`. If the port number is not
|
|
specified, a default value of 9300 is used.
|
|
|
|
For example, this is an example of `unicast_hosts.txt` for a cluster with four
|
|
nodes that participate in unicast discovery, some of which are not running on
|
|
the default port:
|
|
|
|
[source,txt]
|
|
----------------------------------------------------------------
|
|
10.10.10.5
|
|
10.10.10.6:9305
|
|
10.10.10.5:10005
|
|
# an IPv6 address
|
|
[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:9301
|
|
----------------------------------------------------------------
|
|
|
|
Host names are allowed instead of IP addresses (similar to
|
|
`discovery.zen.ping.unicast.hosts`), and IPv6 addresses must be specified in
|
|
brackets with the port coming after the brackets.
|
|
|
|
It is also possible to add comments to this file. All comments must appear on
|
|
their lines starting with `#` (i.e. comments cannot start in the middle of a
|
|
line).
|
|
|
|
[float]
|
|
[[master-election]]
|
|
==== Master Election
|
|
|
|
As part of the ping process a master of the cluster is either elected or joined
|
|
to. This is done automatically. The `discovery.zen.ping_timeout` (which defaults
|
|
to `3s`) determines how long the node will wait before deciding on starting an
|
|
election or joining an existing cluster. Three pings will be sent over this
|
|
timeout interval. In case where no decision can be reached after the timeout,
|
|
the pinging process restarts. In slow or congested networks, three seconds
|
|
might not be enough for a node to become aware of the other nodes in its
|
|
environment before making an election decision. Increasing the timeout should
|
|
be done with care in that case, as it will slow down the election process. Once
|
|
a node decides to join an existing formed cluster, it will send a join request
|
|
to the master (`discovery.zen.join_timeout`) with a timeout defaulting at 20
|
|
times the ping timeout.
|
|
|
|
When the master node stops or has encountered a problem, the cluster nodes start
|
|
pinging again and will elect a new master. This pinging round also serves as a
|
|
protection against (partial) network failures where a node may unjustly think
|
|
that the master has failed. In this case the node will simply hear from other
|
|
nodes about the currently active master.
|
|
|
|
If `discovery.zen.master_election.ignore_non_master_pings` is `true`, pings from
|
|
nodes that are not master eligible (nodes where `node.master` is `false`) are
|
|
ignored during master election; the default value is `false`.
|
|
|
|
Nodes can be excluded from becoming a master by setting `node.master` to
|
|
`false`.
|
|
|
|
The `discovery.zen.minimum_master_nodes` sets the minimum number of master
|
|
eligible nodes that need to join a newly elected master in order for an election
|
|
to complete and for the elected node to accept its mastership. The same setting
|
|
controls the minimum number of active master eligible nodes that should be a
|
|
part of any active cluster. If this requirement is not met the active master
|
|
node will step down and a new master election will begin.
|
|
|
|
This setting must be set to a <<minimum_master_nodes,quorum>> of your master
|
|
eligible nodes. It is recommended to avoid having only two master eligible
|
|
nodes, since a quorum of two is two. Therefore, a loss of either master eligible
|
|
node will result in an inoperable cluster.
|
|
|
|
[float]
|
|
[[fault-detection]]
|
|
==== Fault Detection
|
|
|
|
There are two fault detection processes running. The first is by the master, to
|
|
ping all the other nodes in the cluster and verify that they are alive. And on
|
|
the other end, each node pings to master to verify if its still alive or an
|
|
election process needs to be initiated.
|
|
|
|
The following settings control the fault detection process using the
|
|
`discovery.zen.fd` prefix:
|
|
|
|
[cols="<,<",options="header",]
|
|
|=======================================================================
|
|
|Setting |Description
|
|
|`ping_interval` |How often a node gets pinged. Defaults to `1s`.
|
|
|
|
|`ping_timeout` |How long to wait for a ping response, defaults to
|
|
`30s`.
|
|
|
|
|`ping_retries` |How many ping failures / timeouts cause a node to be
|
|
considered failed. Defaults to `3`.
|
|
|=======================================================================
|
|
|
|
[float]
|
|
==== Cluster state updates
|
|
|
|
The master node is the only node in a cluster that can make changes to the
|
|
cluster state. The master node processes one cluster state update at a time,
|
|
applies the required changes and publishes the updated cluster state to all the
|
|
other nodes in the cluster. Each node receives the publish message, acknowledges
|
|
it, but does *not* yet apply it. If the master does not receive acknowledgement
|
|
from at least `discovery.zen.minimum_master_nodes` nodes within a certain time
|
|
(controlled by the `discovery.zen.commit_timeout` setting and defaults to 30
|
|
seconds) the cluster state change is rejected.
|
|
|
|
Once enough nodes have responded, the cluster state is committed and a message
|
|
will be sent to all the nodes. The nodes then proceed to apply the new cluster
|
|
state to their internal state. The master node waits for all nodes to respond,
|
|
up to a timeout, before going ahead processing the next updates in the queue.
|
|
The `discovery.zen.publish_timeout` is set by default to 30 seconds and is
|
|
measured from the moment the publishing started. Both timeout settings can be
|
|
changed dynamically through the <<cluster-update-settings,cluster update
|
|
settings api>>
|
|
|
|
[float]
|
|
[[no-master-block]]
|
|
==== No master block
|
|
|
|
For the cluster to be fully operational, it must have an active master and the
|
|
number of running master eligible nodes must satisfy the
|
|
`discovery.zen.minimum_master_nodes` setting if set. The
|
|
`discovery.zen.no_master_block` settings controls what operations should be
|
|
rejected when there is no active master.
|
|
|
|
The `discovery.zen.no_master_block` setting has two valid options:
|
|
|
|
[horizontal]
|
|
`all`:: All operations on the node--i.e. both read & writes--will be rejected.
|
|
This also applies for api cluster state read or write operations, like the get
|
|
index settings, put mapping and cluster state api.
|
|
`write`:: (default) Write operations will be rejected. Read operations will
|
|
succeed, based on the last known cluster configuration. This may result in
|
|
partial reads of stale data as this node may be isolated from the rest of the
|
|
cluster.
|
|
|
|
The `discovery.zen.no_master_block` setting doesn't apply to nodes-based apis
|
|
(for example cluster stats, node info and node stats apis). Requests to these
|
|
apis will not be blocked and can run on any available node.
|