213 lines
7.9 KiB
Plaintext
213 lines
7.9 KiB
Plaintext
[[important-settings]]
|
|
== Important Elasticsearch configuration
|
|
|
|
While Elasticsearch requires very little configuration, there are a number of
|
|
settings which need to be configured manually and should definitely be
|
|
configured before going into production.
|
|
|
|
* <<path-settings,`path.data` and `path.logs`>>
|
|
* <<cluster.name,`cluster.name`>>
|
|
* <<node.name,`node.name`>>
|
|
* <<bootstrap.memory_lock,`bootstrap.memory_lock`>>
|
|
* <<network.host,`network.host`>>
|
|
* <<unicast.hosts,`discovery.zen.ping.unicast.hosts`>>
|
|
* <<minimum_master_nodes,`discovery.zen.minimum_master_nodes`>>
|
|
* <<heap-dump-path,JVM heap dump path>>
|
|
|
|
[float]
|
|
[[path-settings]]
|
|
=== `path.data` and `path.logs`
|
|
|
|
If you are using the `.zip` or `.tar.gz` archives, the `data` and `logs`
|
|
directories are sub-folders of `$ES_HOME`. If these important folders are
|
|
left in their default locations, there is a high risk of them being deleted
|
|
while upgrading Elasticsearch to a new version.
|
|
|
|
In production use, you will almost certainly want to change the locations of
|
|
the data and log folder:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
path:
|
|
logs: /var/log/elasticsearch
|
|
data: /var/data/elasticsearch
|
|
--------------------------------------------------
|
|
|
|
The RPM and Debian distributions already use custom paths for `data` and
|
|
`logs`.
|
|
|
|
The `path.data` settings can be set to multiple paths, in which case all paths
|
|
will be used to store data (although the files belonging to a single shard
|
|
will all be stored on the same data path):
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
path:
|
|
data:
|
|
- /mnt/elasticsearch_1
|
|
- /mnt/elasticsearch_2
|
|
- /mnt/elasticsearch_3
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
[[cluster.name]]
|
|
=== `cluster.name`
|
|
|
|
A node can only join a cluster when it shares its `cluster.name` with all the
|
|
other nodes in the cluster. The default name is `elasticsearch`, but you
|
|
should change it to an appropriate name which describes the purpose of the
|
|
cluster.
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
cluster.name: logging-prod
|
|
--------------------------------------------------
|
|
|
|
Make sure that you don't reuse the same cluster names in different
|
|
environments, otherwise you might end up with nodes joining the wrong cluster.
|
|
|
|
[float]
|
|
[[node.name]]
|
|
=== `node.name`
|
|
|
|
By default, Elasticsearch will take the 7 first character of the randomly generated uuid used as the node id.
|
|
Note that the node id is persisted and does not change when a node restarts and therefore the default node name
|
|
will also not change.
|
|
|
|
It is worth configuring a more meaningful name which will also have the
|
|
advantage of persisting after restarting the node:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
node.name: prod-data-2
|
|
--------------------------------------------------
|
|
|
|
The `node.name` can also be set to the server's HOSTNAME as follows:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
node.name: ${HOSTNAME}
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
[[bootstrap.memory_lock]]
|
|
=== `bootstrap.memory_lock`
|
|
|
|
It is vitally important to the health of your node that none of the JVM is
|
|
ever swapped out to disk. One way of achieving that is set the
|
|
`bootstrap.memory_lock` setting to `true`.
|
|
|
|
For this setting to have effect, other system settings need to be configured
|
|
first. See <<mlockall>> for more details about how to set up memory locking
|
|
correctly.
|
|
|
|
[float]
|
|
[[network.host]]
|
|
=== `network.host`
|
|
|
|
By default, Elasticsearch binds to loopback addresses only -- e.g. `127.0.0.1`
|
|
and `[::1]`. This is sufficient to run a single development node on a server.
|
|
|
|
TIP: In fact, more than one node can be started from the same `$ES_HOME` location
|
|
on a single node. This can be useful for testing Elasticsearch's ability to
|
|
form clusters, but it is not a configuration recommended for production.
|
|
|
|
In order to communicate and to form a cluster with nodes on other servers,
|
|
your node will need to bind to a non-loopback address. While there are many
|
|
<<modules-network,network settings>>, usually all you need to configure is
|
|
`network.host`:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
network.host: 192.168.1.10
|
|
--------------------------------------------------
|
|
|
|
The `network.host` setting also understands some special values such as
|
|
`_local_`, `_site_`, `_global_` and modifiers like `:ip4` and `:ip6`, details
|
|
of which can be found in <<network-interface-values>>.
|
|
|
|
IMPORTANT: As soon you provide a custom setting for `network.host`,
|
|
Elasticsearch assumes that you are moving from development mode to production
|
|
mode, and upgrades a number of system startup checks from warnings to
|
|
exceptions. See <<dev-vs-prod>> for more information.
|
|
|
|
[float]
|
|
[[unicast.hosts]]
|
|
=== `discovery.zen.ping.unicast.hosts`
|
|
|
|
Out of the box, without any network configuration, Elasticsearch will bind to
|
|
the available loopback addresses and will scan ports 9300 to 9305 to try to
|
|
connect to other nodes running on the same server. This provides an auto-
|
|
clustering experience without having to do any configuration.
|
|
|
|
When the moment comes to form a cluster with nodes on other servers, you have
|
|
to provide a seed list of other nodes in the cluster that are likely to be
|
|
live and contactable. This can be specified as follows:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
discovery.zen.ping.unicast.hosts:
|
|
- 192.168.1.10:9300
|
|
- 192.168.1.11 <1>
|
|
- seeds.mydomain.com <2>
|
|
--------------------------------------------------
|
|
<1> The port will default to `transport.profiles.default.port` and fallback to `transport.tcp.port` if not specified.
|
|
<2> A hostname that resolves to multiple IP addresses will try all resolved addresses.
|
|
|
|
[float]
|
|
[[minimum_master_nodes]]
|
|
=== `discovery.zen.minimum_master_nodes`
|
|
|
|
To prevent data loss, it is vital to configure the
|
|
`discovery.zen.minimum_master_nodes` setting so that each master-eligible node
|
|
knows the _minimum number of master-eligible nodes_ that must be visible in
|
|
order to form a cluster.
|
|
|
|
Without this setting, a cluster that suffers a network failure is at risk of
|
|
having the cluster split into two independent clusters -- a split brain --
|
|
which will lead to data loss. A more detailed explanation is provided
|
|
in <<split-brain>>.
|
|
|
|
To avoid a split brain, this setting should be set to a _quorum_ of master-
|
|
eligible nodes:
|
|
|
|
(master_eligible_nodes / 2) + 1
|
|
|
|
In other words, if there are three master-eligible nodes, then minimum master
|
|
nodes should be set to `(3 / 2) + 1` or `2`:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
discovery.zen.minimum_master_nodes: 2
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
[[heap-dump-path]]
|
|
=== JVM heap dump path
|
|
|
|
The <<rpm,RPM>> and <<deb,Debian>> package distributions default to configuring
|
|
the JVM to dump the heap on out of memory exceptions to
|
|
`/var/lib/elasticsearch`. If this path is not suitable for storing heap dumps,
|
|
you should modify the entry `-XX:HeapDumpPath=/var/lib/elasticsearch` in
|
|
<<jvm-options,`jvm.options`>> to an alternate path. If you specify a filename
|
|
instead of a directory, the JVM will repeatedly use the same file; this is one
|
|
mechanism for preventing heap dumps from accumulating in the heap dump path.
|
|
Alternatively, you can configure a scheduled task via your OS to remove heap
|
|
dumps that are older than a configured age.
|
|
|
|
Note that the archive distributions do not configure the heap dump path by
|
|
default. Instead, the JVM will default to dumping to the working directory for
|
|
the Elasticsearch process. If you wish to configure a heap dump path, you should
|
|
modify the entry `#-XX:HeapDumpPath=/heap/dump/path` in
|
|
<<jvm-options,`jvm.options`>> to remove the comment marker `#` and to specify an
|
|
actual path.
|
|
|
|
[float]
|
|
[[gc-logging]]
|
|
=== GC logging
|
|
|
|
By default, Elasticsearch enables GC logs. These are configured in
|
|
<<jvm-options,`jvm.options`>> and default to the same default location as the
|
|
Elasticsearch logs. The default configuration rotates the logs every 64 MB and
|
|
can consume up to 2 GB of disk space.
|