157 lines
8.0 KiB
Plaintext
157 lines
8.0 KiB
Plaintext
[[modules-discovery-zen]]
|
|
=== Zen Discovery
|
|
|
|
The zen discovery is the built in discovery module for elasticsearch and
|
|
the default. It provides unicast discovery, but can be extended to
|
|
support cloud environments and other forms of discovery.
|
|
|
|
The zen discovery is integrated with other modules, for example, all
|
|
communication between nodes is done using the
|
|
<<modules-transport,transport>> module.
|
|
|
|
It is separated into several sub modules, which are explained below:
|
|
|
|
[float]
|
|
[[ping]]
|
|
==== Ping
|
|
|
|
This is the process where a node uses the discovery mechanisms to find
|
|
other nodes.
|
|
|
|
[float]
|
|
[[unicast]]
|
|
===== Unicast
|
|
|
|
Unicast discovery requires a list of hosts to use that will act as gossip routers. These hosts can be specified as
|
|
hostnames or IP addresses; hosts specified as hostnames are resolved to IP addresses during each round of pinging. Note
|
|
that with the Java security manager in place, the JVM defaults to caching positive hostname resolutions indefinitely.
|
|
This can be modified by adding
|
|
http://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html[`networkaddress.cache.ttl=<timeout>`] to your
|
|
http://docs.oracle.com/javase/8/docs/technotes/guides/security/PolicyFiles.html[Java security policy]. Any hosts that
|
|
fail to resolve will be logged. Note also that with the Java security manager in place, the JVM defaults to caching
|
|
negative hostname resolutions for ten seconds. This can be modified by adding
|
|
http://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html[`networkaddress.cache.negative.ttl=<timeout>`]
|
|
to your http://docs.oracle.com/javase/8/docs/technotes/guides/security/PolicyFiles.html[Java security policy].
|
|
|
|
It is recommended that the unicast hosts list be maintained as the list of
|
|
master-eligible nodes in the cluster.
|
|
|
|
Unicast discovery provides the following settings with the `discovery.zen.ping.unicast` prefix:
|
|
|
|
[cols="<,<",options="header",]
|
|
|=======================================================================
|
|
|Setting |Description
|
|
|`hosts` |Either an array setting or a comma delimited setting. Each
|
|
value should be in the form of `host:port` or `host` (where `port` defaults to the setting `transport.profiles.default.port`
|
|
falling back to `transport.tcp.port` if not set). Note that IPv6 hosts must be bracketed. Defaults to `127.0.0.1, [::1]`
|
|
|`hosts.resolve_timeout` |The amount of time to wait for DNS lookups on each round of pinging. Specified as
|
|
<<time-units, time units>>. Defaults to 5s.
|
|
|=======================================================================
|
|
|
|
The unicast discovery uses the <<modules-transport,transport>> module to perform the discovery.
|
|
|
|
[float]
|
|
[[master-election]]
|
|
==== Master Election
|
|
|
|
As part of the ping process a master of the cluster is either
|
|
elected or joined to. This is done automatically. The
|
|
`discovery.zen.ping_timeout` (which defaults to `3s`) determines how long the node
|
|
will wait before deciding on starting an election or joining an existing cluster.
|
|
Three pings will be sent over this timeout interval. In case where no decision can be
|
|
reached after the timeout, the pinging process restarts.
|
|
In slow or congested networks, three seconds might not be enough for a node to become
|
|
aware of the other nodes in its environment before making an election decision.
|
|
Increasing the timeout should be done with care in that case, as it will slow down the
|
|
election process.
|
|
Once a node decides to join an existing formed cluster, it
|
|
will send a join request to the master (`discovery.zen.join_timeout`)
|
|
with a timeout defaulting at 20 times the ping timeout.
|
|
|
|
When the master node stops or has encountered a problem, the cluster nodes
|
|
start pinging again and will elect a new master. This pinging round also
|
|
serves as a protection against (partial) network failures where a node may unjustly
|
|
think that the master has failed. In this case the node will simply hear from
|
|
other nodes about the currently active master.
|
|
|
|
If `discovery.zen.master_election.ignore_non_master_pings` is `true`, pings from nodes that are not master
|
|
eligible (nodes where `node.master` is `false`) are ignored during master election; the default value is
|
|
`false`.
|
|
|
|
Nodes can be excluded from becoming a master by setting `node.master` to `false`.
|
|
|
|
The `discovery.zen.minimum_master_nodes` sets the minimum
|
|
number of master eligible nodes that need to join a newly elected master in order for an election to
|
|
complete and for the elected node to accept its mastership. The same setting controls the minimum number of
|
|
active master eligible nodes that should be a part of any active cluster. If this requirement is not met the
|
|
active master node will step down and a new master election will begin.
|
|
|
|
This setting must be set to a <<minimum_master_nodes,quorum>> of your master
|
|
eligible nodes. It is recommended to avoid having only two master eligible
|
|
nodes, since a quorum of two is two. Therefore, a loss of either master
|
|
eligible node will result in an inoperable cluster.
|
|
|
|
[float]
|
|
[[fault-detection]]
|
|
==== Fault Detection
|
|
|
|
There are two fault detection processes running. The first is by the
|
|
master, to ping all the other nodes in the cluster and verify that they
|
|
are alive. And on the other end, each node pings to master to verify if
|
|
its still alive or an election process needs to be initiated.
|
|
|
|
The following settings control the fault detection process using the
|
|
`discovery.zen.fd` prefix:
|
|
|
|
[cols="<,<",options="header",]
|
|
|=======================================================================
|
|
|Setting |Description
|
|
|`ping_interval` |How often a node gets pinged. Defaults to `1s`.
|
|
|
|
|`ping_timeout` |How long to wait for a ping response, defaults to
|
|
`30s`.
|
|
|
|
|`ping_retries` |How many ping failures / timeouts cause a node to be
|
|
considered failed. Defaults to `3`.
|
|
|=======================================================================
|
|
|
|
[float]
|
|
==== Cluster state updates
|
|
|
|
The master node is the only node in a cluster that can make changes to the
|
|
cluster state. The master node processes one cluster state update at a time,
|
|
applies the required changes and publishes the updated cluster state to all
|
|
the other nodes in the cluster. Each node receives the publish message, acknowledges
|
|
it, but does *not* yet apply it. If the master does not receive acknowledgement from
|
|
at least `discovery.zen.minimum_master_nodes` nodes within a certain time (controlled by
|
|
the `discovery.zen.commit_timeout` setting and defaults to 30 seconds) the cluster state
|
|
change is rejected.
|
|
|
|
Once enough nodes have responded, the cluster state is committed and a message will
|
|
be sent to all the nodes. The nodes then proceed to apply the new cluster state to their
|
|
internal state. The master node waits for all nodes to respond, up to a timeout, before
|
|
going ahead processing the next updates in the queue. The `discovery.zen.publish_timeout` is
|
|
set by default to 30 seconds and is measured from the moment the publishing started. Both
|
|
timeout settings can be changed dynamically through the <<cluster-update-settings,cluster update settings api>>
|
|
|
|
[float]
|
|
[[no-master-block]]
|
|
==== No master block
|
|
|
|
For the cluster to be fully operational, it must have an active master and the
|
|
number of running master eligible nodes must satisfy the
|
|
`discovery.zen.minimum_master_nodes` setting if set. The
|
|
`discovery.zen.no_master_block` settings controls what operations should be
|
|
rejected when there is no active master.
|
|
|
|
The `discovery.zen.no_master_block` setting has two valid options:
|
|
|
|
[horizontal]
|
|
`all`:: All operations on the node--i.e. both read & writes--will be rejected. This also applies for api cluster state
|
|
read or write operations, like the get index settings, put mapping and cluster state api.
|
|
`write`:: (default) Write operations will be rejected. Read operations will succeed, based on the last known cluster configuration.
|
|
This may result in partial reads of stale data as this node may be isolated from the rest of the cluster.
|
|
|
|
The `discovery.zen.no_master_block` setting doesn't apply to nodes-based apis (for example cluster stats, node info and
|
|
node stats apis). Requests to these apis will not be blocked and can run on any available node.
|