272 lines
10 KiB
Plaintext
272 lines
10 KiB
Plaintext
[[modules-node]]
|
|
== Node
|
|
|
|
Any time that you start an instance of Elasticsearch, you are starting a
|
|
_node_. A collection of connected nodes is called a
|
|
<<modules-cluster,cluster>>. If you are running a single node of Elasticsearch,
|
|
then you have a cluster of one node.
|
|
|
|
Every node in the cluster can handle <<modules-http,HTTP>> and
|
|
<<modules-transport,Transport>> traffic by default. The transport layer
|
|
is used exclusively for communication between nodes and between nodes and the
|
|
{javaclient}/transport-client.html[Java `TransportClient`]; the HTTP layer is
|
|
used only by external REST clients.
|
|
|
|
All nodes know about all the other nodes in the cluster and can forward client
|
|
requests to the appropriate node. Besides that, each node serves one or more
|
|
purpose:
|
|
|
|
<<master-node,Master-eligible node>>::
|
|
|
|
A node that has `node.master` set to `true` (default), which makes it eligible
|
|
to be <<modules-discovery-zen,elected as the _master_ node>>, which controls
|
|
the cluster.
|
|
|
|
<<data-node,Data node>>::
|
|
|
|
A node that has `node.data` set to `true` (default). Data nodes hold data and
|
|
perform data related operations such as CRUD, search, and aggregations.
|
|
|
|
<<client-node,Client node>>::
|
|
|
|
A client node has both `node.master` and `node.data` set to `false`. It can
|
|
neither hold data nor become the master node. It behaves as a ``smart
|
|
router'' and is used to forward cluster-level requests to the master node and
|
|
data-related requests (such as search) to the appropriate data nodes.
|
|
|
|
<<modules-tribe,Tribe node>>::
|
|
|
|
A tribe node, configured via the `tribe.*` settings, is a special type of
|
|
client node that can connect to multiple clusters and perform search and other
|
|
operations across all connected clusters.
|
|
|
|
By default a node is both a master-eligible node and a data node. This is very
|
|
convenient for small clusters but, as the cluster grows, it becomes important
|
|
to consider separating dedicated master-eligible nodes from dedicated data
|
|
nodes.
|
|
|
|
[NOTE]
|
|
[[coordinating-node]]
|
|
.Coordinating node
|
|
===============================================
|
|
|
|
Requests like search requests or bulk-indexing requests may involve data held
|
|
on different data nodes. A search request, for example, is executed in two
|
|
phases which are coordinated by the node which receives the client request --
|
|
the _coordinating node_.
|
|
|
|
In the _scatter_ phase, the coordinating node forwards the request to the data
|
|
nodes which hold the data. Each data node executes the request locally and
|
|
returns its results to the coordinating node. In the _gather_ phase, the
|
|
coordinating node reduces each data node's results into a single global
|
|
resultset.
|
|
|
|
This means that a _client_ node needs to have enough memory and CPU in order to
|
|
deal with the gather phase.
|
|
|
|
===============================================
|
|
|
|
[float]
|
|
[[master-node]]
|
|
=== Master Eligible Node
|
|
|
|
The master node is responsible for lightweight cluster-wide actions such as
|
|
creating or deleting an index, tracking which nodes are part of the cluster,
|
|
and deciding which shards to allocate to which nodes. It is important for
|
|
cluster health to have a stable master node.
|
|
|
|
Any master-eligible node (all nodes by default) may be elected to become the
|
|
master node by the <<modules-discovery-zen,master election process>>.
|
|
|
|
Indexing and searching your data is CPU-, memory-, and I/O-intensive work
|
|
which can put pressure on a node's resources. To ensure that your master
|
|
node is stable and not under pressure, it is a good idea in a bigger
|
|
cluster to split the roles between dedicated master-eligible nodes and
|
|
dedicated data nodes.
|
|
|
|
While master nodes can also behave as <<coordinating-node,coordinating nodes>>
|
|
and route search and indexing requests from clients to data nodes, it is
|
|
better _not_ to use dedicated master nodes for this purpose. It is important
|
|
for the stability of the cluster that master-eligible nodes do as little work
|
|
as possible.
|
|
|
|
To create a standalone master-eligible node, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: true <1>
|
|
node.data: false <2>
|
|
-------------------
|
|
<1> The `node.master` role is enabled by default.
|
|
<2> Disable the `node.data` role (enabled by default).
|
|
|
|
[float]
|
|
[[split-brain]]
|
|
==== Avoiding split brain with `minimum_master_nodes`
|
|
|
|
To prevent data loss, it is vital to configure the
|
|
`discovery.zen.minimum_master_nodes` setting (which defaults to `1`) so that
|
|
each master-eligible node knows the _minimum number of master-eligible nodes_
|
|
that must be visible in order to form a cluster.
|
|
|
|
To explain, imagine that you have a cluster consisting of two master-eligible
|
|
nodes. A network failure breaks communication between these two nodes. Each
|
|
node sees one master-eligible node... itself. With `minimum_master_nodes` set
|
|
to the default of `1`, this is sufficient to form a cluster. Each node elects
|
|
itself as the new master (thinking that the other master-eligible node has
|
|
died) and the result is two clusters, or a _split brain_. These two nodes
|
|
will never rejoin until one node is restarted. Any data that has been written
|
|
to the restarted node will be lost.
|
|
|
|
Now imagine that you have a cluster with three master-eligible nodes, and
|
|
`minimum_master_nodes` set to `2`. If a network split separates one node from
|
|
the other two nodes, the side with one node cannot see enough master-eligible
|
|
nodes and will realise that it cannot elect itself as master. The side with
|
|
two nodes will elect a new master (if needed) and continue functioning
|
|
correctly. As soon as the network split is resolved, the single node will
|
|
rejoin the cluster and start serving requests again.
|
|
|
|
This setting should be set to a _quorum_ of master-eligible nodes:
|
|
|
|
(master_eligible_nodes / 2) + 1
|
|
|
|
In other words, if there are three master-eligible nodes, then minimum master
|
|
nodes should be set to `(3 / 2) + 1` or `2`:
|
|
|
|
[source,yaml]
|
|
----------------------------
|
|
discovery.zen.minimum_master_nodes: 2 <1>
|
|
----------------------------
|
|
<1> Defaults to `1`.
|
|
|
|
This setting can also be changed dynamically on a live cluster with the
|
|
<<cluster-update-settings,cluster update settings API>>:
|
|
|
|
[source,js]
|
|
----------------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"transient": {
|
|
"discovery.zen.minimum_master_nodes": 2
|
|
}
|
|
}
|
|
----------------------------
|
|
// AUTOSENSE
|
|
|
|
TIP: An advantage of splitting the master and data roles between dedicated
|
|
nodes is that you can have just three master-eligible nodes and set
|
|
`minimum_master_nodes` to `2`. You never have to change this setting, no
|
|
matter how many dedicated data nodes you add to the cluster.
|
|
|
|
|
|
[float]
|
|
[[data-node]]
|
|
=== Data Node
|
|
|
|
Data nodes hold the shards that contain the documents you have indexed. Data
|
|
nodes handle data related operations like CRUD, search, and aggregations.
|
|
These operations are I/O-, memory-, and CPU-intensive. It is important to
|
|
monitor these resources and to add more data nodes if they are overloaded.
|
|
|
|
The main benefit of having dedicated data nodes is the separation of the
|
|
master and data roles.
|
|
|
|
To create a dedicated data node, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: false <1>
|
|
node.data: true <2>
|
|
-------------------
|
|
<1> Disable the `node.master` role (enabled by default).
|
|
<2> The `node.data` role is enabled by default.
|
|
|
|
[float]
|
|
[[client-node]]
|
|
=== Client Node
|
|
|
|
If you take away the ability to be able to handle master duties and take away
|
|
the ability to hold data, then you are left with a _client_ node that can only
|
|
route requests, handle the search reduce phase, and distribute bulk indexing.
|
|
Essentially, client nodes behave as smart load balancers.
|
|
|
|
Standalone client nodes can benefit large clusters by offloading the
|
|
coordinating node role from data and master-eligible nodes. Client nodes join
|
|
the cluster and receive the full <<cluster-state,cluster state>>, like every
|
|
other node, and they use the cluster state to route requests directly to the
|
|
appropriate place(s).
|
|
|
|
WARNING: Adding too many client nodes to a cluster can increase the burden on
|
|
the entire cluster because the elected master node must await acknowledgement
|
|
of cluster state updates from every node! The benefit of client nodes should
|
|
not be overstated -- data nodes can happily serve the same purpose as client
|
|
nodes.
|
|
|
|
To create a deciated client node, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: false <1>
|
|
node.data: false <2>
|
|
-------------------
|
|
<1> Disable the `node.master` role (enabled by default).
|
|
<2> Disable the `node.data` role (enabled by default).
|
|
|
|
[float]
|
|
== Node data path settings
|
|
|
|
[float]
|
|
[[data-path]]
|
|
=== `path.data`
|
|
|
|
Every data and master-eligible node requires access to a data directory where
|
|
shards and index and cluster metadata will be stored. The `path.data` defaults
|
|
to `$ES_HOME/data` but can be configured in the `elasticsearch.yml` config
|
|
file an absolute path or a path relative to `$ES_HOME` as follows:
|
|
|
|
[source,yaml]
|
|
-----------------------
|
|
path.data: /var/elasticsearch/data
|
|
-----------------------
|
|
|
|
Like all node settings, it can also be specified on the command line as:
|
|
|
|
[source,sh]
|
|
-----------------------
|
|
./bin/elasticsearch --path.data /var/elasticsearch/data
|
|
-----------------------
|
|
|
|
TIP: When using the `.zip` or `.tar.gz` distributions, the `path.data` setting
|
|
should be configured to locate the data directory outside the Elasticsearch
|
|
home directory, so that the home directory can be deleted without deleting
|
|
your data! The RPM and Debian distributions do this for you already.
|
|
|
|
|
|
[float]
|
|
[[max-local-storage-nodes]]
|
|
=== `node.max_local_storage_nodes`
|
|
|
|
The <<data-path,data path>> can be shared by multiple nodes, even by nodes
|
|
from different clusters. This is very useful for testing failover and
|
|
different configurations on your development machine. In production, however,
|
|
it is recommended to run only one node of Elasticsearch per server.
|
|
|
|
To prevent more than one node from sharing the same data path, add this
|
|
setting to the `elasticsearch.yml` config file:
|
|
|
|
[source,yaml]
|
|
------------------------------
|
|
node.max_local_storage_nodes: 1
|
|
------------------------------
|
|
|
|
WARNING: Never run different node types (i.e. master, data, client) from the
|
|
same data directory. This can lead to unexpected data loss.
|
|
|
|
[float]
|
|
== Other node settings
|
|
|
|
More node settings can be found in <<modules,Modules>>. Of particular note are
|
|
the <<cluster-name,`cluster.name`>>, the <<node-name,`node.name`>> and the
|
|
<<modules-network,network settings>>.
|
|
|