OpenSearch/docs/reference/setup/bootstrap-checks.asciidoc

258 lines
13 KiB
Plaintext
Raw Normal View History

[[bootstrap-checks]]
== Bootstrap Checks
Collectively, we have a lot of experience with users suffering
unexpected issues because they have not configured
<<important-settings,important settings>>. In previous versions of
Elasticsearch, misconfiguration of some of these settings were logged
as warnings. Understandably, users sometimes miss these log messages.
To ensure that these settings receive the attention that they deserve,
Elasticsearch has bootstrap checks upon startup.
These bootstrap checks inspect a variety of Elasticsearch and system
settings and compare them to values that are safe for the operation of
Elasticsearch. If Elasticsearch is in development mode, any bootstrap
checks that fail appear as warnings in the Elasticsearch log. If
Elasticsearch is in production mode, any bootstrap checks that fail will
cause Elasticsearch to refuse to start.
There are some bootstrap checks that are always enforced to prevent
Elasticsearch from running with incompatible settings. These checks are
documented individually.
[float]
[[dev-vs-prod-mode]]
=== Development vs. production mode
Only bind loopback addresses when binding to local * Only bind loopback addresses when binding to local Today when binding to local (the default) we bind to any address that is a loopback address, or any address on an interface that declares itself as a loopback interface. Yet, not all addresses on loopback interfaces are loopback addresses. This arises on macOS where there is a link-local address assigned to the loopback interface (fe80::1%lo0) and in Docker services where virtual IPs of the service are assigned to the loopback interface (docker/libnetwork#1877). These situations cause problems: - because we do not handle the scope ID of a link-local address, we end up bound to an address for which publishing of that address does not allow that address to be reached (since we drop the scope) - the virtual IPs in the Docker situation are not loopback addresses, they are not link-local addresses, so we end up bound to interfaces that cause the bootstrap checks to be enforced even though the instance is only bound to local We address this by only binding to actual loopback addresses, and skip binding to any address on a loopback interface that is not a loopback address. This lets us simplify some code where in the bootstrap checks we were skipping link-local addresses, and in writing the ports file where we had to skip link-local addresses because again the formatting of them does not allow them to be connected to by another node (to be clear, they could be connected to via the scope-qualified address, but that information is not written out). Relates #28029
2018-01-02 07:04:09 -05:00
By default, Elasticsearch binds to loopback addresses for <<modules-http,HTTP>>
and <<modules-transport,transport (internal)>> communication. This is fine for
downloading and playing with Elasticsearch as well as everyday development, but
it's useless for production systems. To join a cluster, an Elasticsearch node
must be reachable via transport communication. To join a cluster via a
non-loopback address, a node must bind transport to a non-loopback address and
not be using <<single-node-discovery,single-node discovery>>. Thus, we consider
an Elasticsearch node to be in development mode if it can not form a cluster
with another machine via a non-loopback address, and is otherwise in production
mode if it can join a cluster via non-loopback addresses.
Note that HTTP and transport can be configured independently via
<<modules-http,`http.host`>> and <<modules-transport,`transport.host`>>; this
can be useful for configuring a single node to be reachable via HTTP for testing
purposes without triggering production mode.
[[single-node-discovery]]
[float]
=== Single-node discovery
We recognize that some users need to bind transport to an external interface for
testing their usage of the transport client. For this situation, we provide the
discovery type `single-node` (configure it by setting `discovery.type` to
`single-node`); in this situation, a node will elect itself master and will not
join a cluster with any other node.
[float]
=== Forcing the bootstrap checks
If you are running a single node in production, it is possible to evade the
bootstrap checks (either by not binding transport to an external interface, or
by binding transport to an external interface and setting the discovery type to
`single-node`). For this situation, you can force execution of the bootstrap
checks by setting the system property `es.enforce.bootstrap.checks` to `true`
(set this in <<jvm-options>>, or by adding `-Des.enforce.bootstrap.checks=true`
to the environment variable `ES_JAVA_OPTS`). We strongly encourage you to do
this if you are in this specific situation. This system property can be used to
force execution of the bootstrap checks independent of the node configuration.
=== Heap size check
If a JVM is started with unequal initial and max heap size, it can be
prone to pauses as the JVM heap is resized during system usage. To avoid
these resize pauses, it's best to start the JVM with the initial heap
size equal to the maximum heap size. Additionally, if
<<bootstrap-memory_lock,`bootstrap.memory_lock`>> is enabled, the JVM
will lock the initial size of the heap on startup. If the initial heap
size is not equal to the maximum heap size, after a resize it will not
be the case that all of the JVM heap is locked in memory. To pass the
heap size check, you must configure the <<heap-size,heap size>>.
=== File descriptor check
File descriptors are a Unix construct for tracking open "files". In Unix
though, https://en.wikipedia.org/wiki/Everything_is_a_file[everything is
a file]. For example, "files" could be a physical file, a virtual file
(e.g., `/proc/loadavg`), or network sockets. Elasticsearch requires
2016-11-08 05:52:18 -05:00
lots of file descriptors (e.g., every shard is composed of multiple
segments and other files, plus connections to other nodes, etc.). This
bootstrap check is enforced on OS X and Linux. To pass the file
descriptor check, you might have to configure <<file-descriptors,file
descriptors>>.
=== Memory lock check
When the JVM does a major garbage collection it touches every page of
the heap. If any of those pages are swapped out to disk they will have
to be swapped back in to memory. That causes lots of disk thrashing that
Elasticsearch would much rather use to service requests. There are
several ways to configure a system to disallow swapping. One way is by
requesting the JVM to lock the heap in memory through `mlockall` (Unix)
or virtual lock (Windows). This is done via the Elasticsearch setting
<<bootstrap-memory_lock,`bootstrap.memory_lock`>>. However, there are
cases where this setting can be passed to Elasticsearch but
Elasticsearch is not able to lock the heap (e.g., if the `elasticsearch`
user does not have `memlock unlimited`). The memory lock check verifies
that *if* the `bootstrap.memory_lock` setting is enabled, that the JVM
was successfully able to lock the heap. To pass the memory lock check,
you might have to configure <<bootstrap-memory_lock,`bootstrap.memory_lock`>>.
[[max-number-threads-check]]
=== Maximum number of threads check
Elasticsearch executes requests by breaking the request down into stages
and handing those stages off to different thread pool executors. There
are different <<modules-threadpool,thread pool executors>> for a variety
of tasks within Elasticsearch. Thus, Elasticsearch needs the ability to
create a lot of threads. The maximum number of threads check ensures
that the Elasticsearch process has the rights to create enough threads
under normal use. This check is enforced only on Linux. If you are on
Linux, to pass the maximum number of threads check, you must configure
your system to allow the Elasticsearch process the ability to create at
least 4096 threads. This can be done via `/etc/security/limits.conf`
using the `nproc` setting (note that you might have to increase the
limits for the `root` user too).
=== Max file size check
The segment files that are the components of individual shards and the translog
generations that are components of the translog can get large (exceeding
multiple gigabytes). On systems where the max size of files that can be created
by the Elasticsearch process is limited, this can lead to failed
writes. Therefore, the safest option here is that the max file size is unlimited
and that is what the max file size bootstrap check enforces. To pass the max
file check, you must configure your system to allow the Elasticsearch process
the ability to write files of unlimited size. This can be done via
`/etc/security/limits.conf` using the `fsize` setting to `unlimited` (note that
you might have to increase the limits for the `root` user too).
[[max-size-virtual-memory-check]]
=== Maximum size virtual memory check
Elasticsearch and Lucene use `mmap` to great effect to map portions of
an index into the Elasticsearch address space. This keeps certain index
data off the JVM heap but in memory for blazing fast access. For this to
be effective, the Elasticsearch should have unlimited address space. The
maximum size virtual memory check enforces that the Elasticsearch
process has unlimited address space and is enforced only on Linux. To
pass the maximum size virtual memory check, you must configure your
system to allow the Elasticsearch process the ability to have unlimited
address space. This can be done via adding `<user> - as unlimited`
to `/etc/security/limits.conf`. This may require you to increase the limits
for the `root` user too.
=== Maximum map count check
Continuing from the previous <<max-size-virtual-memory-check,point>>, to
use `mmap` effectively, Elasticsearch also requires the ability to
create many memory-mapped areas. The maximum map count check checks that
the kernel allows a process to have at least 262,144 memory-mapped areas
and is enforced on Linux only. To pass the maximum map count check, you
must configure `vm.max_map_count` via `sysctl` to be at least `262144`.
Alternatively, the maximum map count check is only needed if you are using
`mmapfs` or `hybridfs` as the <<index-modules-store,store type>> for your
indices. If you <<allow-mmap,do not allow>> the use of `mmap` then this
bootstrap check will not be enforced.
=== Client JVM check
There are two different JVMs provided by OpenJDK-derived JVMs: the
client JVM and the server JVM. These JVMs use different compilers for
producing executable machine code from Java bytecode. The client JVM is
tuned for startup time and memory footprint while the server JVM is
tuned for maximizing performance. The difference in performance between
the two VMs can be substantial. The client JVM check ensures that
Elasticsearch is not running inside the client JVM. To pass the client
JVM check, you must start Elasticsearch with the server VM. On modern
systems and operating systems, the server VM is the
default.
=== Use serial collector check
There are various garbage collectors for the OpenJDK-derived JVMs
targeting different workloads. The serial collector in particular is
best suited for single logical CPU machines or extremely small heaps,
neither of which are suitable for running Elasticsearch. Using the
serial collector with Elasticsearch can be devastating for performance.
The serial collector check ensures that Elasticsearch is not configured
to run with the serial collector. To pass the serial collector check,
you must not start Elasticsearch with the serial collector (whether it's
from the defaults for the JVM that you're using, or you've explicitly
specified it with `-XX:+UseSerialGC`). Note that the default JVM
configuration that ships with Elasticsearch configures Elasticsearch to
use the CMS collector.
=== System call filter check
Elasticsearch installs system call filters of various flavors depending
on the operating system (e.g., seccomp on Linux). These system call
filters are installed to prevent the ability to execute system calls
related to forking as a defense mechanism against arbitrary code
execution attacks on Elasticsearch. The system call filter check ensures
that if system call filters are enabled, then they were successfully
installed. To pass the system call filter check you must either fix any
configuration errors on your system that prevented system call filters
from installing (check your logs), or *at your own risk* disable system
call filters by setting `bootstrap.system_call_filter` to `false`.
=== OnError and OnOutOfMemoryError checks
The JVM options `OnError` and `OnOutOfMemoryError` enable executing
arbitrary commands if the JVM encounters a fatal error (`OnError`) or an
`OutOfMemoryError` (`OnOutOfMemoryError`). However, by default,
Elasticsearch system call filters (seccomp) are enabled and these
filters prevent forking. Thus, using `OnError` or `OnOutOfMemoryError`
and system call filters are incompatible. The `OnError` and
`OnOutOfMemoryError` checks prevent Elasticsearch from starting if
either of these JVM options are used and system call filters are
enabled. This check is always enforced. To pass this check do not enable
`OnError` nor `OnOutOfMemoryError`; instead, upgrade to Java 8u92 and
use the JVM flag `ExitOnOutOfMemoryError`. While this does not have the
full capabilities of `OnError` nor `OnOutOfMemoryError`, arbitrary
forking will not be supported with seccomp enabled.
=== Early-access check
The OpenJDK project provides early-access snapshots of upcoming releases. These
releases are not suitable for production. The early-access check detects these
early-access snapshots. To pass this check, you must start Elasticsearch on a
release build of the JVM.
=== G1GC check
Early versions of the HotSpot JVM that shipped with JDK 8 are known to
have issues that can lead to index corruption when the G1GC collector is
enabled. The versions impacted are those earlier than the version of
HotSpot that shipped with JDK 8u40. The G1GC check detects these early
versions of the HotSpot JVM.
=== All permission check
The all permission check ensures that the security policy used during bootstrap
does not grant the `java.security.AllPermission` to Elasticsearch. Running with
the all permission granted is equivalent to disabling the security manager.
=== Discovery configuration check
By default, when Elasticsearch first starts up it will try and discover other
nodes running on the same host. If no elected master can be discovered within a
few seconds then Elasticsearch will form a cluster that includes any other
nodes that were discovered. It is useful to be able to form this cluster
without any extra configuration in development mode, but this is unsuitable for
production because it's possible to form multiple clusters and lose data as a
result.
This bootstrap check ensures that discovery is not running with the default
configuration. It can be satisfied by setting at least one of the following
properties:
- `discovery.seed_hosts`
- `discovery.seed_providers`
- `cluster.initial_master_nodes`