mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-09 22:45:04 +00:00
Today, certain bootstrap properties are set and read via system properties. This action-at-distance way of managing these properties is rather confusing, and completely unnecessary. But another problem exists with setting these as system properties. Namely, these system properties are interpreted as Elasticsearch settings, not all of which are registered. This leads to Elasticsearch failing to startup if any of these special properties are set. Instead, these properties should be kept as local as possible, and passed around as method parameters where needed. This eliminates the action-at-distance way of handling these properties, and eliminates the need to register these non-setting properties. This commit does exactly that. Additionally, today we use the "-D" command line flag to set the properties, but this is confusing because "-D" is a special flag to the JVM for setting system properties. This creates confusion because some "-D" properties should be passed via arguments to the JVM (so via ES_JAVA_OPTS), and some should be passed as arguments to Elasticsearch. This commit changes the "-D" flag for Elasticsearch settings to "-E".
386 lines
13 KiB
Plaintext
386 lines
13 KiB
Plaintext
[[setup-configuration]]
|
|
== Configuration
|
|
|
|
[float]
|
|
=== Environment Variables
|
|
|
|
Within the scripts, Elasticsearch comes with built in `JAVA_OPTS` passed
|
|
to the JVM started. The most important setting for that is the `-Xmx` to
|
|
control the maximum allowed memory for the process, and `-Xms` to
|
|
control the minimum allocated memory for the process (_in general, the
|
|
more memory allocated to the process, the better_).
|
|
|
|
Most times it is better to leave the default `JAVA_OPTS` as they are,
|
|
and use the `ES_JAVA_OPTS` environment variable in order to set / change
|
|
JVM settings or arguments.
|
|
|
|
The `ES_HEAP_SIZE` environment variable allows to set the heap memory
|
|
that will be allocated to elasticsearch java process. It will allocate
|
|
the same value to both min and max values, though those can be set
|
|
explicitly (not recommended) by setting `ES_MIN_MEM` (defaults to
|
|
`256m`), and `ES_MAX_MEM` (defaults to `1g`).
|
|
|
|
It is recommended to set the min and max memory to the same value, and
|
|
enable <<setup-configuration-memory,`mlockall`>>.
|
|
|
|
[float]
|
|
[[system]]
|
|
=== System Configuration
|
|
|
|
[float]
|
|
[[file-descriptors]]
|
|
==== File Descriptors
|
|
|
|
Make sure to increase the number of open files descriptors on the
|
|
machine (or for the user running elasticsearch). Setting it to 32k or
|
|
even 64k is recommended.
|
|
|
|
You can retrieve the `max_file_descriptors` for each node
|
|
using the <<cluster-nodes-info>> API, with:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl localhost:9200/_nodes/stats/process?pretty
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
[[max-number-of-threads]]
|
|
==== Number of threads
|
|
|
|
Make sure that the number of threads that the Elasticsearch user can
|
|
create is at least 2048.
|
|
|
|
[float]
|
|
[[vm-max-map-count]]
|
|
==== Virtual memory
|
|
|
|
Elasticsearch uses a <<default_fs,`hybrid mmapfs / niofs`>> directory by default to store its indices. The default
|
|
operating system limits on mmap counts is likely to be too low, which may
|
|
result in out of memory exceptions. On Linux, you can increase the limits by
|
|
running the following command as `root`:
|
|
|
|
[source,sh]
|
|
-------------------------------------
|
|
sysctl -w vm.max_map_count=262144
|
|
-------------------------------------
|
|
|
|
To set this value permanently, update the `vm.max_map_count` setting in
|
|
`/etc/sysctl.conf`.
|
|
|
|
NOTE: If you installed Elasticsearch using a package (.deb, .rpm) this setting will be changed automatically. To verify, run `sysctl vm.max_map_count`.
|
|
|
|
[float]
|
|
[[setup-configuration-memory]]
|
|
==== Memory Settings
|
|
|
|
Most operating systems try to use as much memory as possible for file system
|
|
caches and eagerly swap out unused application memory, possibly resulting
|
|
in the elasticsearch process being swapped. Swapping is very bad for
|
|
performance and for node stability, so it should be avoided at all costs.
|
|
|
|
There are three options:
|
|
|
|
* **Disable swap**
|
|
+
|
|
--
|
|
|
|
The simplest option is to completely disable swap. Usually Elasticsearch
|
|
is the only service running on a box, and its memory usage is controlled
|
|
by the `ES_HEAP_SIZE` environment variable. There should be no need
|
|
to have swap enabled.
|
|
|
|
On Linux systems, you can disable swap temporarily
|
|
by running: `sudo swapoff -a`. To disable it permanently, you will need
|
|
to edit the `/etc/fstab` file and comment out any lines that contain the
|
|
word `swap`.
|
|
|
|
On Windows, the equivalent can be achieved by disabling the paging file entirely
|
|
via `System Properties → Advanced → Performance → Advanced → Virtual memory`.
|
|
|
|
--
|
|
|
|
* **Configure `swappiness`**
|
|
+
|
|
--
|
|
The second option is to ensure that the sysctl value `vm.swappiness` is set
|
|
to `0`. This reduces the kernel's tendency to swap and should not lead to
|
|
swapping under normal circumstances, while still allowing the whole system
|
|
to swap in emergency conditions.
|
|
|
|
NOTE: From kernel version 3.5-rc1 and above, a `swappiness` of `0` will
|
|
cause the OOM killer to kill the process instead of allowing swapping.
|
|
You will need to set `swappiness` to `1` to still allow swapping in
|
|
emergencies.
|
|
--
|
|
|
|
* **`mlockall`**
|
|
+
|
|
--
|
|
The third option is to use
|
|
http://opengroup.org/onlinepubs/007908799/xsh/mlockall.html[mlockall] on Linux/Unix systems, or https://msdn.microsoft.com/en-us/library/windows/desktop/aa366895%28v=vs.85%29.aspx[VirtualLock] on Windows, to
|
|
try to lock the process address space into RAM, preventing any Elasticsearch
|
|
memory from being swapped out. This can be done, by adding this line
|
|
to the `config/elasticsearch.yml` file:
|
|
|
|
[source,yaml]
|
|
--------------
|
|
bootstrap.mlockall: true
|
|
--------------
|
|
|
|
After starting Elasticsearch, you can see whether this setting was applied
|
|
successfully by checking the value of `mlockall` in the output from this
|
|
request:
|
|
|
|
[source,sh]
|
|
--------------
|
|
curl http://localhost:9200/_nodes/process?pretty
|
|
--------------
|
|
|
|
If you see that `mlockall` is `false`, then it means that the `mlockall`
|
|
request has failed. The most probable reason, on Linux/Unix systems, is that
|
|
the user running Elasticsearch doesn't have permission to lock memory. This can
|
|
be granted by running `ulimit -l unlimited` as `root` before starting Elasticsearch.
|
|
|
|
Another possible reason why `mlockall` can fail is that the temporary directory
|
|
(usually `/tmp`) is mounted with the `noexec` option. This can be solved by
|
|
specifying a new temp directory, by starting Elasticsearch with:
|
|
|
|
[source,sh]
|
|
--------------
|
|
./bin/elasticsearch -Djna.tmpdir=/path/to/new/dir
|
|
--------------
|
|
|
|
WARNING: `mlockall` might cause the JVM or shell session to exit if it tries
|
|
to allocate more memory than is available!
|
|
--
|
|
|
|
[float]
|
|
[[settings]]
|
|
=== Elasticsearch Settings
|
|
|
|
*elasticsearch* configuration files can be found under `ES_HOME/config`
|
|
folder. The folder comes with two files, the `elasticsearch.yml` for
|
|
configuring Elasticsearch different
|
|
<<modules,modules>>, and `logging.yml` for
|
|
configuring the Elasticsearch logging.
|
|
|
|
The configuration format is http://www.yaml.org/[YAML]. Here is an
|
|
example of changing the address all network based modules will use to
|
|
bind and publish to:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
network :
|
|
host : 10.0.0.4
|
|
--------------------------------------------------
|
|
|
|
|
|
[float]
|
|
[[paths]]
|
|
==== Paths
|
|
|
|
In production use, you will almost certainly want to change paths for
|
|
data and log files:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
path:
|
|
logs: /var/log/elasticsearch
|
|
data: /var/data/elasticsearch
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
[[cluster-name]]
|
|
==== Cluster name
|
|
|
|
Also, don't forget to give your production cluster a name, which is used
|
|
to discover and auto-join other nodes:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
cluster:
|
|
name: <NAME OF YOUR CLUSTER>
|
|
--------------------------------------------------
|
|
|
|
Make sure that you don't reuse the same cluster names in different
|
|
environments, otherwise you might end up with nodes joining the wrong cluster.
|
|
For instance you could use `logging-dev`, `logging-stage`, and `logging-prod`
|
|
for the development, staging, and production clusters.
|
|
|
|
[float]
|
|
[[node-name]]
|
|
==== Node name
|
|
|
|
You may also want to change the default node name for each node to
|
|
something like the display hostname. By default Elasticsearch will
|
|
randomly pick a Marvel character name from a list of around 3000 names
|
|
when your node starts up.
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
node:
|
|
name: <NAME OF YOUR NODE>
|
|
--------------------------------------------------
|
|
|
|
The hostname of the machine is provided in the environment
|
|
variable `HOSTNAME`. If on your machine you only run a
|
|
single elasticsearch node for that cluster, you can set
|
|
the node name to the hostname using the `${...}` notation:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
node:
|
|
name: ${HOSTNAME}
|
|
--------------------------------------------------
|
|
|
|
Internally, all settings are collapsed into "namespaced" settings. For
|
|
example, the above gets collapsed into `node.name`. This means that
|
|
its easy to support other configuration formats, for example,
|
|
http://www.json.org[JSON]. If JSON is a preferred configuration format,
|
|
simply rename the `elasticsearch.yml` file to `elasticsearch.json` and
|
|
add:
|
|
|
|
[float]
|
|
[[styles]]
|
|
==== Configuration styles
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
{
|
|
"network" : {
|
|
"host" : "10.0.0.4"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
It also means that its easy to provide the settings externally either
|
|
using the `ES_JAVA_OPTS` or as parameters to the `elasticsearch`
|
|
command, for example:
|
|
|
|
[source,sh]
|
|
--------------------------------------------------
|
|
$ elasticsearch -Ees.network.host=10.0.0.4
|
|
--------------------------------------------------
|
|
|
|
Another option is to set `es.default.` prefix instead of `es.` prefix,
|
|
which means the default setting will be used only if not explicitly set
|
|
in the configuration file.
|
|
|
|
Another option is to use the `${...}` notation within the configuration
|
|
file which will resolve to an environment setting, for example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"network" : {
|
|
"host" : "${ES_NET_HOST}"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Additionally, for settings that you do not wish to store in the configuration
|
|
file, you can use the value `${prompt.text}` or `${prompt.secret}` and start
|
|
Elasticsearch in the foreground. `${prompt.secret}` has echoing disabled so
|
|
that the value entered will not be shown in your terminal; `${prompt.text}`
|
|
will allow you to see the value as you type it in. For example:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
node:
|
|
name: ${prompt.text}
|
|
--------------------------------------------------
|
|
|
|
On execution of the `elasticsearch` command, you will be prompted to enter
|
|
the actual value like so:
|
|
|
|
[source,sh]
|
|
--------------------------------------------------
|
|
Enter value for [node.name]:
|
|
--------------------------------------------------
|
|
|
|
NOTE: Elasticsearch will not start if `${prompt.text}` or `${prompt.secret}`
|
|
is used in the settings and the process is run as a service or in the background.
|
|
|
|
[float]
|
|
[[configuration-index-settings]]
|
|
=== Index Settings
|
|
|
|
Indices created within the cluster can provide their own settings. For
|
|
example, the following creates an index with a refresh interval of 5
|
|
seconds instead of the default refresh interval (the format can be either
|
|
YAML or JSON):
|
|
|
|
[source,sh]
|
|
--------------------------------------------------
|
|
$ curl -XPUT http://localhost:9200/kimchy/ -d \
|
|
'
|
|
index:
|
|
refresh_interval: 5s
|
|
'
|
|
--------------------------------------------------
|
|
|
|
Index level settings can be set on the node level as well, for example,
|
|
within the `elasticsearch.yml` file, the following can be set:
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
index :
|
|
refresh_interval: 5s
|
|
--------------------------------------------------
|
|
|
|
This means that every index that gets created on the specific node
|
|
started with the mentioned configuration will use a refresh interval of
|
|
5 seconds *unless the index explicitly sets it*. In other words, any
|
|
index level settings override what is set in the node configuration. Of
|
|
course, the above can also be set as a "collapsed" setting, for example:
|
|
|
|
[source,sh]
|
|
--------------------------------------------------
|
|
$ elasticsearch -Ees.index.refresh_interval=5s
|
|
--------------------------------------------------
|
|
|
|
All of the index level configuration can be found within each
|
|
<<index-modules,index module>>.
|
|
|
|
[float]
|
|
[[logging]]
|
|
=== Logging
|
|
|
|
Elasticsearch uses an internal logging abstraction and comes, out of the
|
|
box, with http://logging.apache.org/log4j/1.2/[log4j]. It tries to simplify
|
|
log4j configuration by using http://www.yaml.org/[YAML] to configure it,
|
|
and the logging configuration file is `config/logging.yml`. The
|
|
http://en.wikipedia.org/wiki/JSON[JSON] and
|
|
http://en.wikipedia.org/wiki/.properties[properties] formats are also
|
|
supported. Multiple configuration files can be loaded, in which case they will
|
|
get merged, as long as they start with the `logging.` prefix and end with one
|
|
of the supported suffixes (either `.yml`, `.yaml`, `.json` or `.properties`).
|
|
The logger section contains the java packages and their corresponding log
|
|
level, where it is possible to omit the `org.elasticsearch` prefix. The
|
|
appender section contains the destinations for the logs. Extensive information
|
|
on how to customize logging and all the supported appenders can be found on
|
|
the http://logging.apache.org/log4j/1.2/manual.html[log4j documentation].
|
|
|
|
Additional Appenders and other logging classes provided by
|
|
http://logging.apache.org/log4j/extras/[log4j-extras] are also available,
|
|
out of the box.
|
|
|
|
[float]
|
|
[[deprecation-logging]]
|
|
==== Deprecation logging
|
|
|
|
In addition to regular logging, Elasticsearch allows you to enable logging
|
|
of deprecated actions. For example this allows you to determine early, if
|
|
you need to migrate certain functionality in the future. By default,
|
|
deprecation logging is disabled. You can enable it in the `config/logging.yml`
|
|
file by setting the deprecation log level to `DEBUG`.
|
|
|
|
[source,yaml]
|
|
--------------------------------------------------
|
|
deprecation: DEBUG, deprecation_log_file
|
|
--------------------------------------------------
|
|
|
|
This will create a daily rolling deprecation log file in your log directory.
|
|
Check this file regularly, especially when you intend to upgrade to a new
|
|
major version.
|