2016-04-03 10:09:24 -04:00
|
|
|
[[important-settings]]
|
|
|
|
== Important Elasticsearch configuration
|
|
|
|
|
|
|
|
While Elasticsearch requires very little configuration, there are a number of
|
|
|
|
settings which need to be configured manually and should definitely be
|
|
|
|
configured before going into production.
|
|
|
|
|
|
|
|
* <<path-settings,`path.data` and `path.logs`>>
|
|
|
|
* <<cluster.name,`cluster.name`>>
|
|
|
|
* <<node.name,`node.name`>>
|
2016-06-01 16:25:51 -04:00
|
|
|
* <<bootstrap.memory_lock,`bootstrap.memory_lock`>>
|
2016-04-03 10:09:24 -04:00
|
|
|
* <<network.host,`network.host`>>
|
|
|
|
* <<unicast.hosts,`discovery.zen.ping.unicast.hosts`>>
|
|
|
|
* <<minimum_master_nodes,`discovery.zen.minimum_master_nodes`>>
|
2017-09-22 14:22:03 -04:00
|
|
|
* <<heap-dump-path,JVM heap dump path>>
|
2016-04-03 10:09:24 -04:00
|
|
|
|
|
|
|
[float]
|
|
|
|
[[path-settings]]
|
|
|
|
=== `path.data` and `path.logs`
|
|
|
|
|
|
|
|
If you are using the `.zip` or `.tar.gz` archives, the `data` and `logs`
|
|
|
|
directories are sub-folders of `$ES_HOME`. If these important folders are
|
|
|
|
left in their default locations, there is a high risk of them being deleted
|
|
|
|
while upgrading Elasticsearch to a new version.
|
|
|
|
|
|
|
|
In production use, you will almost certainly want to change the locations of
|
|
|
|
the data and log folder:
|
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
--------------------------------------------------
|
|
|
|
path:
|
|
|
|
logs: /var/log/elasticsearch
|
|
|
|
data: /var/data/elasticsearch
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
The RPM and Debian distributions already use custom paths for `data` and
|
|
|
|
`logs`.
|
|
|
|
|
|
|
|
The `path.data` settings can be set to multiple paths, in which case all paths
|
|
|
|
will be used to store data (although the files belonging to a single shard
|
|
|
|
will all be stored on the same data path):
|
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
--------------------------------------------------
|
|
|
|
path:
|
|
|
|
data:
|
|
|
|
- /mnt/elasticsearch_1
|
|
|
|
- /mnt/elasticsearch_2
|
|
|
|
- /mnt/elasticsearch_3
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[cluster.name]]
|
|
|
|
=== `cluster.name`
|
|
|
|
|
|
|
|
A node can only join a cluster when it shares its `cluster.name` with all the
|
|
|
|
other nodes in the cluster. The default name is `elasticsearch`, but you
|
|
|
|
should change it to an appropriate name which describes the purpose of the
|
|
|
|
cluster.
|
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
--------------------------------------------------
|
|
|
|
cluster.name: logging-prod
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Make sure that you don't reuse the same cluster names in different
|
|
|
|
environments, otherwise you might end up with nodes joining the wrong cluster.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[node.name]]
|
|
|
|
=== `node.name`
|
|
|
|
|
2016-10-10 16:51:47 -04:00
|
|
|
By default, Elasticsearch will take the 7 first character of the randomly generated uuid used as the node id.
|
Persistent Node Names (#19456)
With #19140 we started persisting the node ID across node restarts. Now that we have a "stable" anchor, we can use it to generate a stable default node name and make it easier to track nodes over a restarts. Sadly, this means we will not have those random fun Marvel characters but we feel this is the right tradeoff.
On the implementation side, this requires a bit of juggling because we now need to read the node id from disk before we can log as the node node is part of each log message. The PR move the initialization of NodeEnvironment as high up in the starting sequence as possible, with only one logging message before it to indicate we are initializing. Things look now like this:
```
[2016-07-15 19:38:39,742][INFO ][node ] [_unset_] initializing ...
[2016-07-15 19:38:39,826][INFO ][node ] [aAmiW40] node name set to [aAmiW40] by default. set the [node.name] settings to change it
[2016-07-15 19:38:39,829][INFO ][env ] [aAmiW40] using [1] data paths, mounts [[ /(/dev/disk1)]], net usable_space [5.5gb], net total_space [232.6gb], spins? [unknown], types [hfs]
[2016-07-15 19:38:39,830][INFO ][env ] [aAmiW40] heap size [1.9gb], compressed ordinary object pointers [true]
[2016-07-15 19:38:39,837][INFO ][node ] [aAmiW40] version[5.0.0-alpha5-SNAPSHOT], pid[46048], build[473d3c0/2016-07-15T17:38:06.771Z], OS[Mac OS X/10.11.5/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_51/25.51-b03]
[2016-07-15 19:38:40,980][INFO ][plugins ] [aAmiW40] modules [percolator, lang-mustache, lang-painless, reindex, aggs-matrix-stats, lang-expression, ingest-common, lang-groovy, transport-netty], plugins []
[2016-07-15 19:38:43,218][INFO ][node ] [aAmiW40] initialized
```
Needless to say, settings `node.name` explicitly still works as before.
The commit also contains some clean ups to the relationship between Environment, Settings and Plugins. The previous code suggested the path related settings could be changed after the initial Environment was changed. This did not have any effect as the security manager already locked things down.
2016-07-23 16:46:48 -04:00
|
|
|
Note that the node id is persisted and does not change when a node restarts and therefore the default node name
|
|
|
|
will also not change.
|
2016-04-03 10:09:24 -04:00
|
|
|
|
|
|
|
It is worth configuring a more meaningful name which will also have the
|
|
|
|
advantage of persisting after restarting the node:
|
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
--------------------------------------------------
|
|
|
|
node.name: prod-data-2
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
The `node.name` can also be set to the server's HOSTNAME as follows:
|
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
--------------------------------------------------
|
|
|
|
node.name: ${HOSTNAME}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
[float]
|
2016-06-01 16:25:51 -04:00
|
|
|
[[bootstrap.memory_lock]]
|
|
|
|
=== `bootstrap.memory_lock`
|
2016-04-03 10:09:24 -04:00
|
|
|
|
|
|
|
It is vitally important to the health of your node that none of the JVM is
|
|
|
|
ever swapped out to disk. One way of achieving that is set the
|
2016-06-01 16:25:51 -04:00
|
|
|
`bootstrap.memory_lock` setting to `true`.
|
2016-04-03 10:09:24 -04:00
|
|
|
|
|
|
|
For this setting to have effect, other system settings need to be configured
|
|
|
|
first. See <<mlockall>> for more details about how to set up memory locking
|
|
|
|
correctly.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[network.host]]
|
|
|
|
=== `network.host`
|
|
|
|
|
|
|
|
By default, Elasticsearch binds to loopback addresses only -- e.g. `127.0.0.1`
|
|
|
|
and `[::1]`. This is sufficient to run a single development node on a server.
|
|
|
|
|
|
|
|
TIP: In fact, more than one node can be started from the same `$ES_HOME` location
|
|
|
|
on a single node. This can be useful for testing Elasticsearch's ability to
|
|
|
|
form clusters, but it is not a configuration recommended for production.
|
|
|
|
|
|
|
|
In order to communicate and to form a cluster with nodes on other servers,
|
|
|
|
your node will need to bind to a non-loopback address. While there are many
|
|
|
|
<<modules-network,network settings>>, usually all you need to configure is
|
|
|
|
`network.host`:
|
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
--------------------------------------------------
|
|
|
|
network.host: 192.168.1.10
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
The `network.host` setting also understands some special values such as
|
|
|
|
`_local_`, `_site_`, `_global_` and modifiers like `:ip4` and `:ip6`, details
|
|
|
|
of which can be found in <<network-interface-values>>.
|
|
|
|
|
|
|
|
IMPORTANT: As soon you provide a custom setting for `network.host`,
|
|
|
|
Elasticsearch assumes that you are moving from development mode to production
|
|
|
|
mode, and upgrades a number of system startup checks from warnings to
|
|
|
|
exceptions. See <<dev-vs-prod>> for more information.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[unicast.hosts]]
|
|
|
|
=== `discovery.zen.ping.unicast.hosts`
|
|
|
|
|
|
|
|
Out of the box, without any network configuration, Elasticsearch will bind to
|
|
|
|
the available loopback addresses and will scan ports 9300 to 9305 to try to
|
|
|
|
connect to other nodes running on the same server. This provides an auto-
|
|
|
|
clustering experience without having to do any configuration.
|
|
|
|
|
|
|
|
When the moment comes to form a cluster with nodes on other servers, you have
|
|
|
|
to provide a seed list of other nodes in the cluster that are likely to be
|
|
|
|
live and contactable. This can be specified as follows:
|
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
--------------------------------------------------
|
|
|
|
discovery.zen.ping.unicast.hosts:
|
|
|
|
- 192.168.1.10:9300
|
|
|
|
- 192.168.1.11 <1>
|
|
|
|
- seeds.mydomain.com <2>
|
|
|
|
--------------------------------------------------
|
2017-01-11 17:10:56 -05:00
|
|
|
<1> The port will default to `transport.profiles.default.port` and fallback to `transport.tcp.port` if not specified.
|
2016-04-03 10:09:24 -04:00
|
|
|
<2> A hostname that resolves to multiple IP addresses will try all resolved addresses.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[minimum_master_nodes]]
|
|
|
|
=== `discovery.zen.minimum_master_nodes`
|
|
|
|
|
|
|
|
To prevent data loss, it is vital to configure the
|
2016-11-18 06:56:23 -05:00
|
|
|
`discovery.zen.minimum_master_nodes` setting so that each master-eligible node
|
2016-04-03 10:09:24 -04:00
|
|
|
knows the _minimum number of master-eligible nodes_ that must be visible in
|
|
|
|
order to form a cluster.
|
|
|
|
|
|
|
|
Without this setting, a cluster that suffers a network failure is at risk of
|
|
|
|
having the cluster split into two independent clusters -- a split brain --
|
2016-11-18 06:56:23 -05:00
|
|
|
which will lead to data loss. A more detailed explanation is provided
|
2016-04-03 10:09:24 -04:00
|
|
|
in <<split-brain>>.
|
|
|
|
|
|
|
|
To avoid a split brain, this setting should be set to a _quorum_ of master-
|
|
|
|
eligible nodes:
|
|
|
|
|
|
|
|
(master_eligible_nodes / 2) + 1
|
|
|
|
|
|
|
|
In other words, if there are three master-eligible nodes, then minimum master
|
|
|
|
nodes should be set to `(3 / 2) + 1` or `2`:
|
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
--------------------------------------------------
|
|
|
|
discovery.zen.minimum_master_nodes: 2
|
|
|
|
--------------------------------------------------
|
|
|
|
|
2017-09-22 14:22:03 -04:00
|
|
|
[float]
|
|
|
|
[[heap-dump-path]]
|
|
|
|
=== JVM heap dump path
|
|
|
|
|
|
|
|
The <<rpm,RPM>> and <<deb,Debian>> package distributions default to configuring
|
|
|
|
the JVM to dump the heap on out of memory exceptions to
|
|
|
|
`/var/lib/elasticsearch`. If this path is not suitable for storing heap dumps,
|
|
|
|
you should modify the entry `-XX:HeapDumpPath=/var/lib/elasticsearch` in
|
|
|
|
<<jvm-options,`jvm.options`>> to an alternate path. If you specify a filename
|
|
|
|
instead of a directory, the JVM will repeatedly use the same file; this is one
|
|
|
|
mechanism for preventing heap dumps from accumulating in the heap dump path.
|
|
|
|
Alternatively, you can configure a scheduled task via your OS to remove heap
|
|
|
|
dumps that are older than a configured age.
|
|
|
|
|
|
|
|
Note that the archive distributions do not configure the heap dump path by
|
|
|
|
default. Instead, the JVM will default to dumping to the working directory for
|
|
|
|
the Elasticsearch process. If you wish to configure a heap dump path, you should
|
|
|
|
modify the entry `#-XX:HeapDumpPath=/heap/dump/path` in
|
|
|
|
<<jvm-options,`jvm.options`>> to remove the comment marker `#` and to specify an
|
|
|
|
actual path.
|