2020-06-05 12:08:45 -04:00
|
|
|
|
[[high-availability-cluster-design]]
|
|
|
|
|
== Designing for resilience
|
|
|
|
|
|
|
|
|
|
Distributed systems like {es} are designed to keep working even if some of
|
|
|
|
|
their components have failed. As long as there are enough well-connected
|
|
|
|
|
nodes to take over their responsibilities, an {es} cluster can continue
|
|
|
|
|
operating normally if some of its nodes are unavailable or disconnected.
|
|
|
|
|
|
|
|
|
|
There is a limit to how small a resilient cluster can be. All {es} clusters
|
|
|
|
|
require:
|
|
|
|
|
|
|
|
|
|
* One <<modules-discovery-quorums,elected master node>> node
|
2020-06-08 08:50:10 -04:00
|
|
|
|
* At least one node for each <<modules-node,role>>.
|
2020-06-05 12:08:45 -04:00
|
|
|
|
* At least one copy of every <<scalability,shard>>.
|
|
|
|
|
|
2020-06-08 08:50:10 -04:00
|
|
|
|
A resilient cluster requires redundancy for every required cluster component.
|
|
|
|
|
This means a resilient cluster must have:
|
2020-06-05 12:08:45 -04:00
|
|
|
|
|
|
|
|
|
* At least three master-eligible nodes
|
|
|
|
|
* At least two nodes of each role
|
|
|
|
|
* At least two copies of each shard (one primary and one or more replicas)
|
|
|
|
|
|
|
|
|
|
A resilient cluster needs three master-eligible nodes so that if one of
|
|
|
|
|
them fails then the remaining two still form a majority and can hold a
|
|
|
|
|
successful election.
|
|
|
|
|
|
2020-06-08 08:50:10 -04:00
|
|
|
|
Similarly, redundancy of nodes of each role means that if a node for a
|
|
|
|
|
particular role fails, another node can take on its responsibilities.
|
2020-06-05 12:08:45 -04:00
|
|
|
|
|
|
|
|
|
Finally, a resilient cluster should have at least two copies of each shard. If
|
2020-06-08 08:50:10 -04:00
|
|
|
|
one copy fails then there should be another good copy to take over. {es}
|
|
|
|
|
automatically rebuilds any failed shard copies on the remaining nodes in order
|
|
|
|
|
to restore the cluster to full health after a failure.
|
|
|
|
|
|
|
|
|
|
Failures temporarily reduce the total capacity of your cluster. In addition,
|
|
|
|
|
after a failure the cluster must perform additional background activities to
|
|
|
|
|
restore itself to health. You should make sure that your cluster has the
|
|
|
|
|
capacity to handle your workload even if some nodes fail.
|
2020-06-05 12:08:45 -04:00
|
|
|
|
|
|
|
|
|
Depending on your needs and budget, an {es} cluster can consist of a single
|
|
|
|
|
node, hundreds of nodes, or any number in between. When designing a smaller
|
|
|
|
|
cluster, you should typically focus on making it resilient to single-node
|
|
|
|
|
failures. Designers of larger clusters must also consider cases where multiple
|
|
|
|
|
nodes fail at the same time. The following pages give some recommendations for
|
|
|
|
|
building resilient clusters of various sizes:
|
|
|
|
|
|
|
|
|
|
* <<high-availability-cluster-small-clusters>>
|
|
|
|
|
* <<high-availability-cluster-design-large-clusters>>
|
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-small-clusters]]
|
|
|
|
|
=== Resilience in small clusters
|
|
|
|
|
|
|
|
|
|
In smaller clusters, it is most important to be resilient to single-node
|
|
|
|
|
failures. This section gives some guidance on making your cluster as resilient
|
|
|
|
|
as possible to the failure of an individual node.
|
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-one-node]]
|
|
|
|
|
==== One-node clusters
|
|
|
|
|
|
|
|
|
|
If your cluster consists of one node, that single node must do everything.
|
|
|
|
|
To accommodate this, {es} assigns nodes every role by default.
|
|
|
|
|
|
2020-10-29 10:47:42 -04:00
|
|
|
|
A single node cluster is not resilient. If the node fails, the cluster will
|
2020-06-05 12:08:45 -04:00
|
|
|
|
stop working. Because there are no replicas in a one-node cluster, you cannot
|
2020-06-08 08:50:10 -04:00
|
|
|
|
store your data redundantly. However, by default at least one replica is
|
|
|
|
|
required for a <<cluster-health,`green` cluster health status>>. To ensure your
|
|
|
|
|
cluster can report a `green` status, override the default by setting
|
|
|
|
|
<<dynamic-index-settings,`index.number_of_replicas`>> to `0` on every index.
|
|
|
|
|
|
|
|
|
|
If the node fails, you may need to restore an older copy of any lost indices
|
|
|
|
|
from a <<modules-snapshots,snapshot>>.
|
|
|
|
|
|
|
|
|
|
Because they are not resilient to any failures, we do not recommend using
|
|
|
|
|
one-node clusters in production.
|
2020-06-05 12:08:45 -04:00
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-two-nodes]]
|
|
|
|
|
==== Two-node clusters
|
|
|
|
|
|
|
|
|
|
If you have two nodes, we recommend they both be data nodes. You should also
|
|
|
|
|
ensure every shard is stored redundantly on both nodes by setting
|
|
|
|
|
<<dynamic-index-settings,`index.number_of_replicas`>> to `1` on every index.
|
|
|
|
|
This is the default number of replicas but may be overridden by an
|
|
|
|
|
<<indices-templates,index template>>. <<dynamic-index-settings,Auto-expand
|
|
|
|
|
replicas>> can also achieve the same thing, but it's not necessary to use this
|
|
|
|
|
feature in such a small cluster.
|
|
|
|
|
|
|
|
|
|
We recommend you set `node.master: false` on one of your two nodes so that it is
|
|
|
|
|
not <<master-node,master-eligible>>. This means you can be certain which of your
|
|
|
|
|
nodes is the elected master of the cluster. The cluster can tolerate the loss of
|
|
|
|
|
the other master-ineligible node. If you don't set `node.master: false` on one
|
|
|
|
|
node, both nodes are master-eligible. This means both nodes are required for a
|
2020-06-08 08:50:10 -04:00
|
|
|
|
master election. Since the election will fail if either node is unavailable,
|
|
|
|
|
your cluster cannot reliably tolerate the loss of either node.
|
2020-06-05 12:08:45 -04:00
|
|
|
|
|
|
|
|
|
By default, each node is assigned every role. We recommend you assign both nodes
|
|
|
|
|
all other roles except master eligibility. If one node fails, the other node can
|
|
|
|
|
handle its tasks.
|
|
|
|
|
|
|
|
|
|
You should avoid sending client requests to just one of your nodes. If you do
|
|
|
|
|
and this node fails, such requests will not receive responses even if the
|
|
|
|
|
remaining node is a healthy cluster on its own. Ideally, you should balance your
|
|
|
|
|
client requests across both nodes. A good way to do this is to specify the
|
|
|
|
|
addresses of both nodes when configuring the client to connect to your cluster.
|
|
|
|
|
Alternatively, you can use a resilient load balancer to balance client requests
|
|
|
|
|
across the nodes in your cluster.
|
|
|
|
|
|
|
|
|
|
Because it's not resilient to failures, we do not recommend deploying a two-node
|
|
|
|
|
cluster in production.
|
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-two-nodes-plus]]
|
|
|
|
|
==== Two-node clusters with a tiebreaker
|
|
|
|
|
|
|
|
|
|
Because master elections are majority-based, the two-node cluster described
|
|
|
|
|
above is tolerant to the loss of one of its nodes but not the
|
|
|
|
|
other one. You cannot configure a two-node cluster so that it can tolerate
|
|
|
|
|
the loss of _either_ node because this is theoretically impossible. You might
|
|
|
|
|
expect that if either node fails then {es} can elect the remaining node as the
|
|
|
|
|
master, but it is impossible to tell the difference between the failure of a
|
|
|
|
|
remote node and a mere loss of connectivity between the nodes. If both nodes
|
|
|
|
|
were capable of running independent elections, a loss of connectivity would
|
2020-08-17 11:27:04 -04:00
|
|
|
|
lead to a {wikipedia}/Split-brain_(computing)[split-brain
|
2020-06-08 08:50:10 -04:00
|
|
|
|
problem] and therefore data loss. {es} avoids this and
|
2020-06-05 12:08:45 -04:00
|
|
|
|
protects your data by electing neither node as master until that node can be
|
|
|
|
|
sure that it has the latest cluster state and that there is no other master in
|
|
|
|
|
the cluster. This could result in the cluster having no master until
|
|
|
|
|
connectivity is restored.
|
|
|
|
|
|
|
|
|
|
You can solve this problem by adding a third node and making all three nodes
|
|
|
|
|
master-eligible. A <<modules-discovery-quorums,master election>> requires only
|
|
|
|
|
two of the three master-eligible nodes. This means the cluster can tolerate the
|
|
|
|
|
loss of any single node. This third node acts as a tiebreaker in cases where the
|
|
|
|
|
two original nodes are disconnected from each other. You can reduce the resource
|
|
|
|
|
requirements of this extra node by making it a <<voting-only-node,dedicated
|
|
|
|
|
voting-only master-eligible node>>, also known as a dedicated tiebreaker.
|
|
|
|
|
Because it has no other roles, a dedicated tiebreaker does not need to be as
|
|
|
|
|
powerful as the other two nodes. It will not perform any searches nor coordinate
|
|
|
|
|
any client requests and cannot be elected as the master of the cluster.
|
|
|
|
|
|
|
|
|
|
The two original nodes should not be voting-only master-eligible nodes since a
|
|
|
|
|
resilient cluster requires at least three master-eligible nodes, at least two
|
|
|
|
|
of which are not voting-only master-eligible nodes. If two of your three nodes
|
|
|
|
|
are voting-only master-eligible nodes then the elected master must be the third
|
|
|
|
|
node. This node then becomes a single point of failure.
|
|
|
|
|
|
|
|
|
|
We recommend assigning both non-tiebreaker nodes all other roles. This creates
|
|
|
|
|
redundancy by ensuring any task in the cluster can be handled by either node.
|
|
|
|
|
|
|
|
|
|
You should not send any client requests to the dedicated tiebreaker node.
|
|
|
|
|
You should also avoid sending client requests to just one of the other two
|
|
|
|
|
nodes. If you do, and this node fails, then any requests will not
|
|
|
|
|
receive responses, even if the remaining nodes form a healthy cluster. Ideally,
|
|
|
|
|
you should balance your client requests across both of the non-tiebreaker
|
|
|
|
|
nodes. You can do this by specifying the address of both nodes
|
|
|
|
|
when configuring your client to connect to your cluster. Alternatively, you can
|
|
|
|
|
use a resilient load balancer to balance client requests across the appropriate
|
|
|
|
|
nodes in your cluster. The {ess-trial}[Elastic Cloud] service
|
|
|
|
|
provides such a load balancer.
|
|
|
|
|
|
|
|
|
|
A two-node cluster with an additional tiebreaker node is the smallest possible
|
|
|
|
|
cluster that is suitable for production deployments.
|
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-three-nodes]]
|
|
|
|
|
==== Three-node clusters
|
|
|
|
|
|
|
|
|
|
If you have three nodes, we recommend they all be <<data-node,data
|
|
|
|
|
nodes>> and every index should have at least one replica. Nodes are data nodes
|
|
|
|
|
by default. You may prefer for some indices to have two replicas so that each
|
|
|
|
|
node has a copy of each shard in those indices. You should also configure each
|
|
|
|
|
node to be <<master-node,master-eligible>> so that any two of them can hold a
|
|
|
|
|
master election without needing to communicate with the third node. Nodes are
|
|
|
|
|
master-eligible by default. This cluster will be resilient to the loss of any
|
|
|
|
|
single node.
|
|
|
|
|
|
|
|
|
|
You should avoid sending client requests to just one of your nodes. If you do,
|
|
|
|
|
and this node fails, then any requests will not receive responses even if the
|
|
|
|
|
remaining two nodes form a healthy cluster. Ideally, you should balance your
|
|
|
|
|
client requests across all three nodes. You can do this by specifying the
|
|
|
|
|
address of multiple nodes when configuring your client to connect to your
|
|
|
|
|
cluster. Alternatively you can use a resilient load balancer to balance client
|
|
|
|
|
requests across your cluster. The {ess-trial}[Elastic Cloud]
|
|
|
|
|
service provides such a load balancer.
|
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-three-plus-nodes]]
|
|
|
|
|
==== Clusters with more than three nodes
|
|
|
|
|
|
|
|
|
|
Once your cluster grows to more than three nodes, you can start to specialise
|
|
|
|
|
these nodes according to their responsibilities, allowing you to scale their
|
|
|
|
|
resources independently as needed. You can have as many <<data-node,data
|
|
|
|
|
nodes>>, <<ingest,ingest nodes>>, <<ml-node,{ml} nodes>>, etc. as needed to
|
|
|
|
|
support your workload. As your cluster grows larger, we recommend using
|
|
|
|
|
dedicated nodes for each role. This lets you to independently scale resources
|
|
|
|
|
for each task.
|
|
|
|
|
|
|
|
|
|
However, it is good practice to limit the number of master-eligible nodes in
|
|
|
|
|
the cluster to three. Master nodes do not scale like other node types since
|
|
|
|
|
the cluster always elects just one of them as the master of the cluster. If
|
|
|
|
|
there are too many master-eligible nodes then master elections may take a
|
|
|
|
|
longer time to complete. In larger clusters, we recommend you
|
|
|
|
|
configure some of your nodes as dedicated master-eligible nodes and avoid
|
|
|
|
|
sending any client requests to these dedicated nodes. Your cluster may become
|
|
|
|
|
unstable if the master-eligible nodes are overwhelmed with unnecessary extra
|
|
|
|
|
work that could be handled by one of the other nodes.
|
|
|
|
|
|
|
|
|
|
You may configure one of your master-eligible nodes to be a
|
|
|
|
|
<<voting-only-node,voting-only node>> so that it can never be elected as the
|
|
|
|
|
master node. For instance, you may have two dedicated master nodes and a third
|
|
|
|
|
node that is both a data node and a voting-only master-eligible node. This
|
|
|
|
|
third voting-only node will act as a tiebreaker in master elections but will
|
|
|
|
|
never become the master itself.
|
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-small-cluster-summary]]
|
|
|
|
|
==== Summary
|
|
|
|
|
|
|
|
|
|
The cluster will be resilient to the loss of any node as long as:
|
|
|
|
|
|
|
|
|
|
- The <<cluster-health,cluster health status>> is `green`.
|
|
|
|
|
- There are at least two data nodes.
|
|
|
|
|
- Every index has at least one replica of each shard, in addition to the
|
|
|
|
|
primary.
|
2020-06-08 08:50:10 -04:00
|
|
|
|
- The cluster has at least three master-eligible nodes, as long as at least two
|
|
|
|
|
of these nodes are not voting-only master-eligible nodes.
|
2020-06-05 12:08:45 -04:00
|
|
|
|
- Clients are configured to send their requests to more than one node or are
|
|
|
|
|
configured to use a load balancer that balances the requests across an
|
|
|
|
|
appropriate set of nodes. The {ess-trial}[Elastic Cloud] service provides such
|
|
|
|
|
a load balancer.
|
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-large-clusters]]
|
|
|
|
|
=== Resilience in larger clusters
|
|
|
|
|
|
|
|
|
|
It is not unusual for nodes to share some common infrastructure, such as a power
|
|
|
|
|
supply or network router. If so, you should plan for the failure of this
|
|
|
|
|
infrastructure and ensure that such a failure would not affect too many of your
|
|
|
|
|
nodes. It is common practice to group all the nodes sharing some infrastructure
|
|
|
|
|
into _zones_ and to plan for the failure of any whole zone at once.
|
|
|
|
|
|
|
|
|
|
Your cluster’s zones should all be contained within a single data centre. {es}
|
|
|
|
|
expects its node-to-node connections to be reliable and have low latency and
|
|
|
|
|
high bandwidth. Connections between data centres typically do not meet these
|
|
|
|
|
expectations. Although {es} will behave correctly on an unreliable or slow
|
|
|
|
|
network, it will not necessarily behave optimally. It may take a considerable
|
|
|
|
|
length of time for a cluster to fully recover from a network partition since it
|
|
|
|
|
must resynchronize any missing data and rebalance the cluster once the
|
|
|
|
|
partition heals. If you want your data to be available in multiple data centres,
|
|
|
|
|
deploy a separate cluster in each data centre and use
|
|
|
|
|
<<modules-cross-cluster-search,{ccs}>> or <<xpack-ccr,{ccr}>> to link the
|
|
|
|
|
clusters together. These features are designed to perform well even if the
|
|
|
|
|
cluster-to-cluster connections are less reliable or slower than the network
|
|
|
|
|
within each cluster.
|
|
|
|
|
|
|
|
|
|
After losing a whole zone's worth of nodes, a properly-designed cluster may be
|
|
|
|
|
functional but running with significantly reduced capacity. You may need
|
|
|
|
|
to provision extra nodes to restore acceptable performance in your
|
|
|
|
|
cluster when handling such a failure.
|
|
|
|
|
|
|
|
|
|
For resilience against whole-zone failures, it is important that there is a copy
|
|
|
|
|
of each shard in more than one zone, which can be achieved by placing data
|
|
|
|
|
nodes in multiple zones and configuring <<allocation-awareness,shard allocation
|
|
|
|
|
awareness>>. You should also ensure that client requests are sent to nodes in
|
|
|
|
|
more than one zone.
|
|
|
|
|
|
|
|
|
|
You should consider all node roles and ensure that each role is split
|
|
|
|
|
redundantly across two or more zones. For instance, if you are using
|
2020-06-16 19:43:54 -04:00
|
|
|
|
<<ingest,ingest pipelines>> or {ml}, you should have ingest or {ml} nodes in two
|
|
|
|
|
or more zones. However, the placement of master-eligible nodes requires a little
|
|
|
|
|
more care because a resilient cluster needs at least two of the three
|
|
|
|
|
master-eligible nodes in order to function. The following sections explore the
|
|
|
|
|
options for placing master-eligible nodes across multiple zones.
|
2020-06-05 12:08:45 -04:00
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-two-zones]]
|
|
|
|
|
==== Two-zone clusters
|
|
|
|
|
|
|
|
|
|
If you have two zones, you should have a different number of
|
|
|
|
|
master-eligible nodes in each zone so that the zone with more nodes will
|
|
|
|
|
contain a majority of them and will be able to survive the loss of the other
|
|
|
|
|
zone. For instance, if you have three master-eligible nodes then you may put
|
|
|
|
|
all of them in one zone or you may put two in one zone and the third in the
|
|
|
|
|
other zone. You should not place an equal number of master-eligible nodes in
|
|
|
|
|
each zone. If you place the same number of master-eligible nodes in each zone,
|
|
|
|
|
neither zone has a majority of its own. Therefore, the cluster may not survive
|
|
|
|
|
the loss of either zone.
|
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-two-zones-plus]]
|
|
|
|
|
==== Two-zone clusters with a tiebreaker
|
|
|
|
|
|
|
|
|
|
The two-zone deployment described above is tolerant to the loss of one of its
|
|
|
|
|
zones but not to the loss of the other one because master elections are
|
|
|
|
|
majority-based. You cannot configure a two-zone cluster so that it can tolerate
|
|
|
|
|
the loss of _either_ zone because this is theoretically impossible. You might
|
|
|
|
|
expect that if either zone fails then {es} can elect a node from the remaining
|
|
|
|
|
zone as the master but it is impossible to tell the difference between the
|
|
|
|
|
failure of a remote zone and a mere loss of connectivity between the zones. If
|
|
|
|
|
both zones were capable of running independent elections then a loss of
|
|
|
|
|
connectivity would lead to a
|
2020-08-17 11:27:04 -04:00
|
|
|
|
{wikipedia}/Split-brain_(computing)[split-brain problem] and
|
2020-06-05 12:08:45 -04:00
|
|
|
|
therefore data loss. {es} avoids this and protects your data by not electing
|
|
|
|
|
a node from either zone as master until that node can be sure that it has the
|
|
|
|
|
latest cluster state and that there is no other master in the cluster. This may
|
|
|
|
|
mean there is no master at all until connectivity is restored.
|
|
|
|
|
|
|
|
|
|
You can solve this by placing one master-eligible node in each of your two
|
|
|
|
|
zones and adding a single extra master-eligible node in an independent third
|
|
|
|
|
zone. The extra master-eligible node acts as a tiebreaker in cases
|
|
|
|
|
where the two original zones are disconnected from each other. The extra
|
|
|
|
|
tiebreaker node should be a <<voting-only-node,dedicated voting-only
|
|
|
|
|
master-eligible node>>, also known as a dedicated tiebreaker. A dedicated
|
|
|
|
|
tiebreaker need not be as powerful as the other two nodes since it has no other
|
|
|
|
|
roles and will not perform any searches nor coordinate any client requests nor
|
|
|
|
|
be elected as the master of the cluster.
|
|
|
|
|
|
|
|
|
|
You should use <<allocation-awareness,shard allocation awareness>> to ensure
|
|
|
|
|
that there is a copy of each shard in each zone. This means either zone remains
|
|
|
|
|
fully available if the other zone fails.
|
|
|
|
|
|
|
|
|
|
All master-eligible nodes, including voting-only nodes, are on the critical path
|
|
|
|
|
for publishing cluster state updates. Because of this, these nodes require
|
|
|
|
|
reasonably fast persistent storage and a reliable, low-latency network
|
|
|
|
|
connection to the rest of the cluster. If you add a tiebreaker node in a third
|
|
|
|
|
independent zone then you must make sure it has adequate resources and good
|
|
|
|
|
connectivity to the rest of the cluster.
|
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-three-zones]]
|
|
|
|
|
==== Clusters with three or more zones
|
|
|
|
|
|
|
|
|
|
If you have three zones then you should have one master-eligible node in each
|
|
|
|
|
zone. If you have more than three zones then you should choose three of the
|
|
|
|
|
zones and put a master-eligible node in each of these three zones. This will
|
|
|
|
|
mean that the cluster can still elect a master even if one of the zones fails.
|
|
|
|
|
|
|
|
|
|
As always, your indices should have at least one replica in case a node fails.
|
|
|
|
|
You should also use <<allocation-awareness,shard allocation awareness>> to
|
|
|
|
|
limit the number of copies of each shard in each zone. For instance, if you have
|
|
|
|
|
an index with one or two replicas configured then allocation awareness will
|
|
|
|
|
ensure that the replicas of the shard are in a different zone from the primary.
|
|
|
|
|
This means that a copy of every shard will still be available if one zone
|
|
|
|
|
fails. The availability of this shard will not be affected by such a
|
|
|
|
|
failure.
|
|
|
|
|
|
|
|
|
|
[[high-availability-cluster-design-large-cluster-summary]]
|
|
|
|
|
==== Summary
|
|
|
|
|
|
|
|
|
|
The cluster will be resilient to the loss of any zone as long as:
|
|
|
|
|
|
|
|
|
|
- The <<cluster-health,cluster health status>> is `green`.
|
|
|
|
|
- There are at least two zones containing data nodes.
|
|
|
|
|
- Every index has at least one replica of each shard, in addition to the
|
|
|
|
|
primary.
|
|
|
|
|
- Shard allocation awareness is configured to avoid concentrating all copies of
|
|
|
|
|
a shard within a single zone.
|
|
|
|
|
- The cluster has at least three master-eligible nodes. At least two of these
|
2020-06-08 08:50:10 -04:00
|
|
|
|
nodes are not voting-only master-eligible nodes, and they are spread evenly
|
|
|
|
|
across at least three zones.
|
2020-06-05 12:08:45 -04:00
|
|
|
|
- Clients are configured to send their requests to nodes in more than one zone
|
|
|
|
|
or are configured to use a load balancer that balances the requests across an
|
|
|
|
|
appropriate set of nodes. The {ess-trial}[Elastic Cloud] service provides such
|
|
|
|
|
a load balancer.
|