52 lines
3.2 KiB
Plaintext
52 lines
3.2 KiB
Plaintext
[[cluster-state-publishing]]
|
|
=== Publishing the cluster state
|
|
|
|
The master node is the only node in a cluster that can make changes to the
|
|
cluster state. The master node processes one batch of cluster state updates at
|
|
a time, computing the required changes and publishing the updated cluster state
|
|
to all the other nodes in the cluster. Each publication starts with the master
|
|
broadcasting the updated cluster state to all nodes in the cluster. Each node
|
|
responds with an acknowledgement but does not yet apply the newly-received
|
|
state. Once the master has collected acknowledgements from enough
|
|
master-eligible nodes, the new cluster state is said to be _committed_ and the
|
|
master broadcasts another message instructing nodes to apply the now-committed
|
|
state. Each node receives this message, applies the updated state, and then
|
|
sends a second acknowledgement back to the master.
|
|
|
|
The master allows a limited amount of time for each cluster state update to be
|
|
completely published to all nodes. It is defined by the
|
|
`cluster.publish.timeout` setting, which defaults to `30s`, measured from the
|
|
time the publication started. If this time is reached before the new cluster
|
|
state is committed then the cluster state change is rejected and the master
|
|
considers itself to have failed. It stands down and starts trying to elect a
|
|
new master.
|
|
|
|
If the new cluster state is committed before `cluster.publish.timeout` has
|
|
elapsed, the master node considers the change to have succeeded. It waits until
|
|
the timeout has elapsed or until it has received acknowledgements that each
|
|
node in the cluster has applied the updated state, and then starts processing
|
|
and publishing the next cluster state update. If some acknowledgements have not
|
|
been received (i.e. some nodes have not yet confirmed that they have applied
|
|
the current update), these nodes are said to be _lagging_ since their cluster
|
|
states have fallen behind the master's latest state. The master waits for the
|
|
lagging nodes to catch up for a further time, `cluster.follower_lag.timeout`,
|
|
which defaults to `90s`. If a node has still not successfully applied the
|
|
cluster state update within this time then it is considered to have failed and
|
|
is removed from the cluster.
|
|
|
|
Cluster state updates are typically published as diffs to the previous cluster
|
|
state, which reduces the time and network bandwidth needed to publish a cluster
|
|
state update. For example, when updating the mappings for only a subset of the
|
|
indices in the cluster state, only the updates for those indices need to be
|
|
published to the nodes in the cluster, as long as those nodes have the previous
|
|
cluster state. If a node is missing the previous cluster state, for example
|
|
when rejoining a cluster, the master will publish the full cluster state to
|
|
that node so that it can receive future updates as diffs.
|
|
|
|
NOTE: {es} is a peer to peer based system, in which nodes communicate with one
|
|
another directly. The high-throughput APIs (index, delete, search) do not
|
|
normally interact with the master node. The responsibility of the master node
|
|
is to maintain the global cluster state and reassign shards when nodes join or
|
|
leave the cluster. Each time the cluster state is changed, the new state is
|
|
published to all nodes in the cluster as described above.
|