470 lines
18 KiB
Plaintext
470 lines
18 KiB
Plaintext
[[modules-node]]
|
|
== Node
|
|
|
|
Any time that you start an instance of Elasticsearch, you are starting a
|
|
_node_. A collection of connected nodes is called a
|
|
<<modules-cluster,cluster>>. If you are running a single node of Elasticsearch,
|
|
then you have a cluster of one node.
|
|
|
|
Every node in the cluster can handle <<modules-http,HTTP>> and
|
|
<<modules-transport,Transport>> traffic by default. The transport layer
|
|
is used exclusively for communication between nodes and the
|
|
{javaclient}/transport-client.html[Java `TransportClient`]; the HTTP layer is
|
|
used only by external REST clients.
|
|
|
|
All nodes know about all the other nodes in the cluster and can forward client
|
|
requests to the appropriate node.
|
|
|
|
By default, a node is all of the following types: master-eligible, data, ingest,
|
|
and machine learning (if available).
|
|
|
|
TIP: As the cluster grows and in particular if you have large {ml} jobs,
|
|
consider separating dedicated master-eligible nodes from dedicated data nodes
|
|
and dedicated {ml} nodes.
|
|
|
|
<<master-node,Master-eligible node>>::
|
|
|
|
A node that has `node.master` set to `true` (default), which makes it eligible
|
|
to be <<modules-discovery,elected as the _master_ node>>, which controls
|
|
the cluster.
|
|
|
|
<<data-node,Data node>>::
|
|
|
|
A node that has `node.data` set to `true` (default). Data nodes hold data and
|
|
perform data related operations such as CRUD, search, and aggregations.
|
|
|
|
<<ingest,Ingest node>>::
|
|
|
|
A node that has `node.ingest` set to `true` (default). Ingest nodes are able
|
|
to apply an <<pipeline,ingest pipeline>> to a document in order to transform
|
|
and enrich the document before indexing. With a heavy ingest load, it makes
|
|
sense to use dedicated ingest nodes and to mark the master and data nodes as
|
|
`node.ingest: false`.
|
|
|
|
<<ml-node,Machine learning node>>::
|
|
|
|
A node that has `xpack.ml.enabled` and `node.ml` set to `true`, which is the
|
|
default behavior in the {es} {default-dist}. If you want to use {ml-features},
|
|
there must be at least one {ml} node in your cluster. For more information about
|
|
{ml-features}, see
|
|
{stack-ov}/xpack-ml.html[Machine learning in the {stack}].
|
|
+
|
|
IMPORTANT: If you use the {oss-dist}, do not set `node.ml`. Otherwise, the node
|
|
fails to start.
|
|
|
|
[NOTE]
|
|
[[coordinating-node]]
|
|
.Coordinating node
|
|
===============================================
|
|
|
|
Requests like search requests or bulk-indexing requests may involve data held
|
|
on different data nodes. A search request, for example, is executed in two
|
|
phases which are coordinated by the node which receives the client request --
|
|
the _coordinating node_.
|
|
|
|
In the _scatter_ phase, the coordinating node forwards the request to the data
|
|
nodes which hold the data. Each data node executes the request locally and
|
|
returns its results to the coordinating node. In the _gather_ phase, the
|
|
coordinating node reduces each data node's results into a single global
|
|
resultset.
|
|
|
|
Every node is implicitly a coordinating node. This means that a node that has
|
|
all three `node.master`, `node.data` and `node.ingest` set to `false` will
|
|
only act as a coordinating node, which cannot be disabled. As a result, such
|
|
a node needs to have enough memory and CPU in order to deal with the gather
|
|
phase.
|
|
|
|
===============================================
|
|
|
|
[float]
|
|
[[master-node]]
|
|
=== Master Eligible Node
|
|
|
|
The master node is responsible for lightweight cluster-wide actions such as
|
|
creating or deleting an index, tracking which nodes are part of the cluster,
|
|
and deciding which shards to allocate to which nodes. It is important for
|
|
cluster health to have a stable master node.
|
|
|
|
Any master-eligible node that is not a <<voting-only-node,voting-only node>> may
|
|
be elected to become the master node by the <<modules-discovery,master election
|
|
process>>.
|
|
|
|
IMPORTANT: Master nodes must have access to the `data/` directory (just like
|
|
`data` nodes) as this is where the cluster state is persisted between node restarts.
|
|
|
|
Indexing and searching your data is CPU-, memory-, and I/O-intensive work
|
|
which can put pressure on a node's resources. To ensure that your master
|
|
node is stable and not under pressure, it is a good idea in a bigger
|
|
cluster to split the roles between dedicated master-eligible nodes and
|
|
dedicated data nodes.
|
|
|
|
While master nodes can also behave as <<coordinating-node,coordinating nodes>>
|
|
and route search and indexing requests from clients to data nodes, it is
|
|
better _not_ to use dedicated master nodes for this purpose. It is important
|
|
for the stability of the cluster that master-eligible nodes do as little work
|
|
as possible.
|
|
|
|
To create a dedicated master-eligible node in the {default-dist}, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: true <1>
|
|
node.voting_only: false <2>
|
|
node.data: false <3>
|
|
node.ingest: false <4>
|
|
node.ml: false <5>
|
|
xpack.ml.enabled: true <6>
|
|
cluster.remote.connect: false <7>
|
|
-------------------
|
|
<1> The `node.master` role is enabled by default.
|
|
<2> The `node.voting_only` role is disabled by default.
|
|
<3> Disable the `node.data` role (enabled by default).
|
|
<4> Disable the `node.ingest` role (enabled by default).
|
|
<5> Disable the `node.ml` role (enabled by default).
|
|
<6> The `xpack.ml.enabled` setting is enabled by default.
|
|
<7> Disable {ccs} (enabled by default).
|
|
|
|
To create a dedicated master-eligible node in the {oss-dist}, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: true <1>
|
|
node.data: false <2>
|
|
node.ingest: false <3>
|
|
cluster.remote.connect: false <4>
|
|
-------------------
|
|
<1> The `node.master` role is enabled by default.
|
|
<2> Disable the `node.data` role (enabled by default).
|
|
<3> Disable the `node.ingest` role (enabled by default).
|
|
<4> Disable {ccs} (enabled by default).
|
|
|
|
[float]
|
|
[[voting-only-node]]
|
|
==== Voting-only master-eligible node
|
|
|
|
A voting-only master-eligible node is a node that participates in
|
|
<<modules-discovery,master elections>> but which will not act as the cluster's
|
|
elected master node. In particular, a voting-only node can serve as a tiebreaker
|
|
in elections.
|
|
|
|
It may seem confusing to use the term "master-eligible" to describe a
|
|
voting-only node since such a node is not actually eligible to become the master
|
|
at all. This terminology is an unfortunate consequence of history:
|
|
master-eligible nodes are those nodes that participate in elections and perform
|
|
certain tasks during cluster state publications, and voting-only nodes have the
|
|
same responsibilities even if they can never become the elected master.
|
|
|
|
To configure a master-eligible node as a voting-only node, set the following
|
|
setting:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.voting_only: true <1>
|
|
-------------------
|
|
<1> The default for `node.voting_only` is `false`.
|
|
|
|
IMPORTANT: The `voting_only` role requires the {default-dist} of Elasticsearch
|
|
and is not supported in the {oss-dist}. If you use the {oss-dist} and set
|
|
`node.voting_only` then the node will fail to start. Also note that only
|
|
master-eligible nodes can be marked as voting-only.
|
|
|
|
High availability (HA) clusters require at least three master-eligible nodes, at
|
|
least two of which are not voting-only nodes. Such a cluster will be able to
|
|
elect a master node even if one of the nodes fails.
|
|
|
|
Since voting-only nodes never act as the cluster's elected master, they may
|
|
require require less heap and a less powerful CPU than the true master nodes.
|
|
However all master-eligible nodes, including voting-only nodes, require
|
|
reasonably fast persistent storage and a reliable and low-latency network
|
|
connection to the rest of the cluster, since they are on the critical path for
|
|
<<cluster-state-publishing,publishing cluster state updates>>.
|
|
|
|
Voting-only master-eligible nodes may also fill other roles in your cluster.
|
|
For instance, a node may be both a data node and a voting-only master-eligible
|
|
node. A _dedicated_ voting-only master-eligible nodes is a voting-only
|
|
master-eligible node that fills no other roles in the cluster. To create a
|
|
dedicated voting-only master-eligible node in the {default-dist}, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: true <1>
|
|
node.voting_only: true <2>
|
|
node.data: false <3>
|
|
node.ingest: false <4>
|
|
node.ml: false <5>
|
|
xpack.ml.enabled: true <6>
|
|
cluster.remote.connect: false <7>
|
|
-------------------
|
|
<1> The `node.master` role is enabled by default.
|
|
<2> Enable the `node.voting_only` role (disabled by default).
|
|
<3> Disable the `node.data` role (enabled by default).
|
|
<4> Disable the `node.ingest` role (enabled by default).
|
|
<5> Disable the `node.ml` role (enabled by default).
|
|
<6> The `xpack.ml.enabled` setting is enabled by default.
|
|
<7> Disable {ccs} (enabled by default).
|
|
|
|
[float]
|
|
[[data-node]]
|
|
=== Data Node
|
|
|
|
Data nodes hold the shards that contain the documents you have indexed. Data
|
|
nodes handle data related operations like CRUD, search, and aggregations.
|
|
These operations are I/O-, memory-, and CPU-intensive. It is important to
|
|
monitor these resources and to add more data nodes if they are overloaded.
|
|
|
|
The main benefit of having dedicated data nodes is the separation of the
|
|
master and data roles.
|
|
|
|
To create a dedicated data node in the {default-dist}, set:
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: false <1>
|
|
node.voting_only: false <2>
|
|
node.data: true <3>
|
|
node.ingest: false <4>
|
|
node.ml: false <5>
|
|
cluster.remote.connect: false <6>
|
|
-------------------
|
|
<1> Disable the `node.master` role (enabled by default).
|
|
<2> The `node.voting_only` role is disabled by default.
|
|
<3> The `node.data` role is enabled by default.
|
|
<4> Disable the `node.ingest` role (enabled by default).
|
|
<5> Disable the `node.ml` role (enabled by default).
|
|
<6> Disable {ccs} (enabled by default).
|
|
|
|
To create a dedicated data node in the {oss-dist}, set:
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: false <1>
|
|
node.data: true <2>
|
|
node.ingest: false <3>
|
|
cluster.remote.connect: false <4>
|
|
-------------------
|
|
<1> Disable the `node.master` role (enabled by default).
|
|
<2> The `node.data` role is enabled by default.
|
|
<3> Disable the `node.ingest` role (enabled by default).
|
|
<4> Disable {ccs} (enabled by default).
|
|
|
|
[float]
|
|
[[node-ingest-node]]
|
|
=== Ingest Node
|
|
|
|
Ingest nodes can execute pre-processing pipelines, composed of one or more
|
|
ingest processors. Depending on the type of operations performed by the ingest
|
|
processors and the required resources, it may make sense to have dedicated
|
|
ingest nodes, that will only perform this specific task.
|
|
|
|
To create a dedicated ingest node in the {default-dist}, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: false <1>
|
|
node.voting_only: false <2>
|
|
node.data: false <3>
|
|
node.ingest: true <4>
|
|
node.ml: false <5>
|
|
cluster.remote.connect: false <6>
|
|
-------------------
|
|
<1> Disable the `node.master` role (enabled by default).
|
|
<2> The `node.voting_only` role is disabled by default.
|
|
<3> Disable the `node.data` role (enabled by default).
|
|
<4> The `node.ingest` role is enabled by default.
|
|
<5> Disable the `node.ml` role (enabled by default).
|
|
<6> Disable {ccs} (enabled by default).
|
|
|
|
To create a dedicated ingest node in the {oss-dist}, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: false <1>
|
|
node.data: false <2>
|
|
node.ingest: true <3>
|
|
cluster.remote.connect: false <4>
|
|
-------------------
|
|
<1> Disable the `node.master` role (enabled by default).
|
|
<2> Disable the `node.data` role (enabled by default).
|
|
<3> The `node.ingest` role is enabled by default.
|
|
<4> Disable {ccs} (enabled by default).
|
|
|
|
[float]
|
|
[[coordinating-only-node]]
|
|
=== Coordinating only node
|
|
|
|
If you take away the ability to be able to handle master duties, to hold data,
|
|
and pre-process documents, then you are left with a _coordinating_ node that
|
|
can only route requests, handle the search reduce phase, and distribute bulk
|
|
indexing. Essentially, coordinating only nodes behave as smart load balancers.
|
|
|
|
Coordinating only nodes can benefit large clusters by offloading the
|
|
coordinating node role from data and master-eligible nodes. They join the
|
|
cluster and receive the full <<cluster-state,cluster state>>, like every other
|
|
node, and they use the cluster state to route requests directly to the
|
|
appropriate place(s).
|
|
|
|
WARNING: Adding too many coordinating only nodes to a cluster can increase the
|
|
burden on the entire cluster because the elected master node must await
|
|
acknowledgement of cluster state updates from every node! The benefit of
|
|
coordinating only nodes should not be overstated -- data nodes can happily
|
|
serve the same purpose.
|
|
|
|
To create a dedicated coordinating node in the {default-dist}, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: false <1>
|
|
node.voting_only: false <2>
|
|
node.data: false <3>
|
|
node.ingest: false <4>
|
|
node.ml: false <5>
|
|
cluster.remote.connect: false <6>
|
|
-------------------
|
|
<1> Disable the `node.master` role (enabled by default).
|
|
<2> The `node.voting_only` role is disabled by default.
|
|
<3> Disable the `node.data` role (enabled by default).
|
|
<4> Disable the `node.ingest` role (enabled by default).
|
|
<5> Disable the `node.ml` role (enabled by default).
|
|
<6> Disable {ccs} (enabled by default).
|
|
|
|
To create a dedicated coordinating node in the {oss-dist}, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: false <1>
|
|
node.data: false <2>
|
|
node.ingest: false <3>
|
|
cluster.remote.connect: false <4>
|
|
-------------------
|
|
<1> Disable the `node.master` role (enabled by default).
|
|
<2> Disable the `node.data` role (enabled by default).
|
|
<3> Disable the `node.ingest` role (enabled by default).
|
|
<4> Disable {ccs} (enabled by default).
|
|
|
|
[float]
|
|
[[ml-node]]
|
|
=== [xpack]#Machine learning node#
|
|
|
|
The {ml-features} provide {ml} nodes, which run jobs and handle {ml} API
|
|
requests. If `xpack.ml.enabled` is set to true and `node.ml` is set to `false`,
|
|
the node can service API requests but it cannot run jobs.
|
|
|
|
If you want to use {ml-features} in your cluster, you must enable {ml}
|
|
(set `xpack.ml.enabled` to `true`) on all master-eligible nodes. If you have the
|
|
{oss-dist}, do not use these settings.
|
|
|
|
For more information about these settings, see <<ml-settings>>.
|
|
|
|
To create a dedicated {ml} node in the {default-dist}, set:
|
|
|
|
[source,yaml]
|
|
-------------------
|
|
node.master: false <1>
|
|
node.voting_only: false <2>
|
|
node.data: false <3>
|
|
node.ingest: false <4>
|
|
node.ml: true <5>
|
|
xpack.ml.enabled: true <6>
|
|
cluster.remote.connect: false <7>
|
|
-------------------
|
|
<1> Disable the `node.master` role (enabled by default).
|
|
<2> The `node.voting_only` role is disabled by default.
|
|
<3> Disable the `node.data` role (enabled by default).
|
|
<4> Disable the `node.ingest` role (enabled by default).
|
|
<5> The `node.ml` role is enabled by default.
|
|
<6> The `xpack.ml.enabled` setting is enabled by default.
|
|
<7> Disable {ccs} (enabled by default).
|
|
|
|
[float]
|
|
[[change-node-role]]
|
|
=== Changing the role of a node
|
|
|
|
Each data node maintains the following data on disk:
|
|
|
|
* the shard data for every shard allocated to that node,
|
|
* the index metadata corresponding with every shard allocated to that node, and
|
|
* the cluster-wide metadata, such as settings and index templates.
|
|
|
|
Similarly, each master-eligible node maintains the following data on disk:
|
|
|
|
* the index metadata for every index in the cluster, and
|
|
* the cluster-wide metadata, such as settings and index templates.
|
|
|
|
Each node checks the contents of its data path at startup. If it discovers
|
|
unexpected data then it will refuse to start. This is to avoid importing
|
|
unwanted <<modules-gateway-dangling-indices,dangling indices>> which can lead
|
|
to a red cluster health. To be more precise, nodes with `node.data: false` will
|
|
refuse to start if they find any shard data on disk at startup, and nodes with
|
|
both `node.master: false` and `node.data: false` will refuse to start if they
|
|
have any index metadata on disk at startup.
|
|
|
|
It is possible to change the roles of a node by adjusting its
|
|
`elasticsearch.yml` file and restarting it. This is known as _repurposing_ a
|
|
node. In order to satisfy the checks for unexpected data described above, you
|
|
must perform some extra steps to prepare a node for repurposing when setting
|
|
its `node.data` or `node.master` roles to `false`:
|
|
|
|
* If you want to repurpose a data node by changing `node.data` to `false` then
|
|
you should first use an <<allocation-filtering,allocation filter>> to safely
|
|
migrate all the shard data onto other nodes in the cluster.
|
|
|
|
* If you want to repurpose a node to have both `node.master: false` and
|
|
`node.data: false` then it is simplest to start a brand-new node with an
|
|
empty data path and the desired roles. You may find it safest to use an
|
|
<<allocation-filtering,allocation filter>> to migrate the shard data
|
|
elsewhere in the cluster first.
|
|
|
|
If it is not possible to follow these extra steps then you may be able to use
|
|
the <<node-tool-repurpose,`elasticsearch-node repurpose`>> tool to delete any
|
|
excess data that prevents a node from starting.
|
|
|
|
[float]
|
|
== Node data path settings
|
|
|
|
[float]
|
|
[[data-path]]
|
|
=== `path.data`
|
|
|
|
Every data and master-eligible node requires access to a data directory where
|
|
shards and index and cluster metadata will be stored. The `path.data` defaults
|
|
to `$ES_HOME/data` but can be configured in the `elasticsearch.yml` config
|
|
file an absolute path or a path relative to `$ES_HOME` as follows:
|
|
|
|
[source,yaml]
|
|
-----------------------
|
|
path.data: /var/elasticsearch/data
|
|
-----------------------
|
|
|
|
Like all node settings, it can also be specified on the command line as:
|
|
|
|
[source,sh]
|
|
-----------------------
|
|
./bin/elasticsearch -Epath.data=/var/elasticsearch/data
|
|
-----------------------
|
|
|
|
TIP: When using the `.zip` or `.tar.gz` distributions, the `path.data` setting
|
|
should be configured to locate the data directory outside the Elasticsearch
|
|
home directory, so that the home directory can be deleted without deleting
|
|
your data! The RPM and Debian distributions do this for you already.
|
|
|
|
|
|
[float]
|
|
[[max-local-storage-nodes]]
|
|
=== `node.max_local_storage_nodes`
|
|
|
|
The <<data-path,data path>> can be shared by multiple nodes, even by nodes from different
|
|
clusters. It is recommended however to only run one node of Elasticsearch using the same data path.
|
|
This setting is deprecated in 7.x and will be removed in version 8.0.
|
|
|
|
By default, Elasticsearch is configured to prevent more than one node from sharing the same data
|
|
path. To allow for more than one node (e.g., on your development machine), use the setting
|
|
`node.max_local_storage_nodes` and set this to a positive integer larger than one.
|
|
|
|
WARNING: Never run different node types (i.e. master, data) from the same data directory. This can
|
|
lead to unexpected data loss.
|
|
|
|
[float]
|
|
== Other node settings
|
|
|
|
More node settings can be found in <<modules,Modules>>. Of particular note are
|
|
the <<cluster.name,`cluster.name`>>, the <<node.name,`node.name`>> and the
|
|
<<modules-network,network settings>>.
|