OpenSearch/docs/reference/setup/configuration.asciidoc

340 lines
11 KiB
Plaintext

[[setup-configuration]]
== Configuration
[float]
=== Environment Variables
Within the scripts, Elasticsearch comes with built in `JAVA_OPTS` passed
to the JVM started. The most important setting for that is the `-Xmx` to
control the maximum allowed memory for the process, and `-Xms` to
control the minimum allocated memory for the process (_in general, the
more memory allocated to the process, the better_).
Most times it is better to leave the default `JAVA_OPTS` as they are,
and use the `ES_JAVA_OPTS` environment variable in order to set / change
JVM settings or arguments.
The `ES_HEAP_SIZE` environment variable allows to set the heap memory
that will be allocated to elasticsearch java process. It will allocate
the same value to both min and max values, though those can be set
explicitly (not recommended) by setting `ES_MIN_MEM` (defaults to
`256m`), and `ES_MAX_MEM` (defaults to `1g`).
It is recommended to set the min and max memory to the same value, and
enable <<setup-configuration-memory,`mlockall`>>.
[float]
[[system]]
=== System Configuration
[float]
[[file-descriptors]]
==== File Descriptors
Make sure to increase the number of open files descriptors on the
machine (or for the user running elasticsearch). Setting it to 32k or
even 64k is recommended.
In order to test how many open files the process can open, start it with
`-Des.max-open-files` set to `true`. This will print the number of open
files the process can open on startup.
Alternatively, you can retrieve the `max_file_descriptors` for each node
using the <<cluster-nodes-info>> API, with:
[source,js]
--------------------------------------------------
curl localhost:9200/_nodes/process?pretty
--------------------------------------------------
[float]
[[vm-max-map-count]]
==== Virtual memory
Elasticsearch uses a <<default_fs,`hybrid mmapfs / niofs`>> directory by default to store its indices. The default
operating system limits on mmap counts is likely to be too low, which may
result in out of memory exceptions. On Linux, you can increase the limits by
running the following command as `root`:
[source,bash]
-------------------------------------
sysctl -w vm.max_map_count=262144
-------------------------------------
To set this value permanently, update the `vm.max_map_count` setting in
`/etc/sysctl.conf`.
NOTE: If you installed Elasticsearch using a package (.deb, .rpm) this setting will be changed automatically. To verify, run `sysctl vm.max_map_count`.
[float]
[[setup-configuration-memory]]
==== Memory Settings
The Linux kernel tries to use as much memory as possible for file system
caches and eagerly swaps out unused application memory, possibly resulting
in the elasticsearch process being swapped. Swapping is very bad for
performance and for node stability, so it should be avoided at all costs.
There are three options:
* **Disable swap**
+
--
The simplest option is to completely disable swap. Usually Elasticsearch
is the only service running on a box, and its memory usage is controlled
by the `ES_HEAP_SIZE` environment variable. There should be no need
to have swap enabled. On Linux systems, you can disable swap temporarily
by running: `sudo swapoff -a`. To disable it permanently, you will need
to edit the `/etc/fstab` file and comment out any lines that contain the
word `swap`.
--
* **Configure `swappiness`**
+
--
The second option is to ensure that the sysctl value `vm.swappiness` is set
to `0`. This reduces the kernel's tendency to swap and should not lead to
swapping under normal circumstances, while still allowing the whole system
to swap in emergency conditions.
NOTE: From kernel version 3.5-rc1 and above, a `swappiness` of `0` will
cause the OOM killer to kill the process instead of allowing swapping.
You will need to set `swappiness` to `1` to still allow swapping in
emergencies.
--
* **`mlockall`**
+
--
The third option on Linux/Unix systems only, is to use
http://opengroup.org/onlinepubs/007908799/xsh/mlockall.html[mlockall] to
try to lock the process address space into RAM, preventing any Elasticsearch
memory from being swapped out. This can be done, by adding this line
to the `config/elasticsearch.yml` file:
[source,yaml]
--------------
bootstrap.mlockall: true
--------------
After starting Elasticsearch, you can see whether this setting was applied
successfully by checking the value of `mlockall` in the output from this
request:
[source,sh]
--------------
curl http://localhost:9200/_nodes/process?pretty
--------------
If you see that `mlockall` is `false`, then it means that the the `mlockall`
request has failed. The most probable reason is that the user running
Elasticsearch doesn't have permission to lock memory. This can be granted
by running `ulimit -l unlimited` as `root` before starting Elasticsearch.
Another possible reason why `mlockall` can fail is that the temporary directory
(usually `/tmp`) is mounted with the `noexec` option. This can be solved by
specfying a new temp directory, by starting Elasticsearch with:
[source,sh]
--------------
./bin/elasticsearch -Djna.tmpdir=/path/to/new/dir
--------------
WARNING: `mlockall` might cause the JVM or shell session to exit if it tries
to allocate more memory than is available!
--
[float]
[[settings]]
=== Elasticsearch Settings
*elasticsearch* configuration files can be found under `ES_HOME/config`
folder. The folder comes with two files, the `elasticsearch.yml` for
configuring Elasticsearch different
<<modules,modules>>, and `logging.yml` for
configuring the Elasticsearch logging.
The configuration format is http://www.yaml.org/[YAML]. Here is an
example of changing the address all network based modules will use to
bind and publish to:
[source,yaml]
--------------------------------------------------
network :
host : 10.0.0.4
--------------------------------------------------
[float]
[[paths]]
==== Paths
In production use, you will almost certainly want to change paths for
data and log files:
[source,yaml]
--------------------------------------------------
path:
logs: /var/log/elasticsearch
data: /var/data/elasticsearch
--------------------------------------------------
[float]
[[cluster-name]]
==== Cluster name
Also, don't forget to give your production cluster a name, which is used
to discover and auto-join other nodes:
[source,yaml]
--------------------------------------------------
cluster:
name: <NAME OF YOUR CLUSTER>
--------------------------------------------------
[float]
[[node-name]]
==== Node name
You may also want to change the default node name for each node to
something like the display hostname. By default Elasticsearch will
randomly pick a Marvel character name from a list of around 3000 names
when your node starts up.
[source,yaml]
--------------------------------------------------
node:
name: <NAME OF YOUR NODE>
--------------------------------------------------
The hostname of the machine is provided in the environment
variable `HOSTNAME`. If on your machine you only run a
single elasticsearch node for that cluster, you can set
the node name to the hostname using the `${...}` notation:
[source,yaml]
--------------------------------------------------
node:
name: ${HOSTNAME}
--------------------------------------------------
Internally, all settings are collapsed into "namespaced" settings. For
example, the above gets collapsed into `node.name`. This means that
its easy to support other configuration formats, for example,
http://www.json.org[JSON]. If JSON is a preferred configuration format,
simply rename the `elasticsearch.yml` file to `elasticsearch.json` and
add:
[float]
[[styles]]
==== Configuration styles
[source,yaml]
--------------------------------------------------
{
"network" : {
"host" : "10.0.0.4"
}
}
--------------------------------------------------
It also means that its easy to provide the settings externally either
using the `ES_JAVA_OPTS` or as parameters to the `elasticsearch`
command, for example:
[source,sh]
--------------------------------------------------
$ elasticsearch -Des.network.host=10.0.0.4
--------------------------------------------------
Another option is to set `es.default.` prefix instead of `es.` prefix,
which means the default setting will be used only if not explicitly set
in the configuration file.
Another option is to use the `${...}` notation within the configuration
file which will resolve to an environment setting, for example:
[source,js]
--------------------------------------------------
{
"network" : {
"host" : "${ES_NET_HOST}"
}
}
--------------------------------------------------
The location of the configuration file can be set externally using a
system property:
[source,sh]
--------------------------------------------------
$ elasticsearch -Des.config=/path/to/config/file
--------------------------------------------------
[float]
[[configuration-index-settings]]
=== Index Settings
Indices created within the cluster can provide their own settings. For
example, the following creates an index with memory based storage
instead of the default file system based one (the format can be either
YAML or JSON):
[source,sh]
--------------------------------------------------
$ curl -XPUT http://localhost:9200/kimchy/ -d \
'
index :
store:
type: memory
'
--------------------------------------------------
Index level settings can be set on the node level as well, for example,
within the `elasticsearch.yml` file, the following can be set:
[source,yaml]
--------------------------------------------------
index :
store:
type: memory
--------------------------------------------------
This means that every index that gets created on the specific node
started with the mentioned configuration will store the index in memory
*unless the index explicitly sets it*. In other words, any index level
settings override what is set in the node configuration. Of course, the
above can also be set as a "collapsed" setting, for example:
[source,sh]
--------------------------------------------------
$ elasticsearch -Des.index.store.type=memory
--------------------------------------------------
All of the index level configuration can be found within each
<<index-modules,index module>>.
[float]
[[logging]]
=== Logging
Elasticsearch uses an internal logging abstraction and comes, out of the
box, with http://logging.apache.org/log4j/1.2/[log4j]. It tries to simplify
log4j configuration by using http://www.yaml.org/[YAML] to configure it,
and the logging configuration file is `config/logging.yml`. The
http://en.wikipedia.org/wiki/JSON[JSON] and
http://en.wikipedia.org/wiki/.properties[properties] formats are also
supported. Multiple configuration files can be loaded, in which case they will
get merged, as long as they start with the `logging.` prefix and end with one
of the supported suffixes (either `.yml`, `.yaml`, `.json` or `.properties`).
The logger section contains the java packages and their corresponding log
level, where it is possible to omit the `org.elasticsearch` prefix. The
appender section contains the destinations for the logs. Extensive information
on how to customize logging and all the supported appenders can be found on
the http://logging.apache.org/log4j/1.2/manual.html[log4j documentation].
Additional Appenders and other logging classes provided by
http://logging.apache.org/log4j/extras/[log4j-extras] are also available,
out of the box.