Docs: Refactored modules and index modules sections

This commit is contained in:
Clinton Gormley 2015-06-22 23:49:45 +02:00
parent 1df2d3015e
commit f123a53d72
37 changed files with 1136 additions and 1102 deletions

View File

@ -7,7 +7,7 @@ can be cached for faster responses. These cached results are the same results
that would be returned by an uncached aggregation -- you will never get stale
results.
See <<index-modules-shard-query-cache>> for more details.
See <<shard-query-cache>> for more details.
[[returning-only-agg-results]]
== Returning only aggregation results

View File

@ -10,8 +10,8 @@ survive a full cluster restart). Here is an example:
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"discovery.zen.minimum_master_nodes" : 2
}
}'
}
}'
--------------------------------------------------
Or:
@ -21,8 +21,8 @@ Or:
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"discovery.zen.minimum_master_nodes" : 2
}
}'
}
}'
--------------------------------------------------
The cluster responds with the settings updated. So the response for the
@ -34,8 +34,8 @@ last example will be:
"persistent" : {},
"transient" : {
"discovery.zen.minimum_master_nodes" : "2"
}
}'
}
}'
--------------------------------------------------
Cluster wide settings can be returned using:
@ -45,157 +45,7 @@ Cluster wide settings can be returned using:
curl -XGET localhost:9200/_cluster/settings
--------------------------------------------------
There is a specific list of settings that can be updated, those include:
[float]
[[cluster-settings]]
=== Cluster settings
[float]
==== Routing allocation
[float]
===== Awareness
`cluster.routing.allocation.awareness.attributes`::
See <<modules-cluster>>.
`cluster.routing.allocation.awareness.force.*`::
See <<modules-cluster>>.
[float]
===== Balanced Shards
All these values are relative to one another. The first three are used to
compose a three separate weighting functions into one. The cluster is balanced
when no allowed action can bring the weights of each node closer together by
more then the fourth setting. Actions might not be allowed, for instance,
due to forced awareness or allocation filtering.
`cluster.routing.allocation.balance.shard`::
Defines the weight factor for shards allocated on a node
(float). Defaults to `0.45f`. Raising this raises the tendency to
equalize the number of shards across all nodes in the cluster.
`cluster.routing.allocation.balance.index`::
Defines a factor to the number of shards per index allocated
on a specific node (float). Defaults to `0.55f`. Raising this raises the
tendency to equalize the number of shards per index across all nodes in
the cluster.
`cluster.routing.allocation.balance.threshold`::
Minimal optimization value of operations that should be performed (non
negative float). Defaults to `1.0f`. Raising this will cause the cluster
to be less aggressive about optimizing the shard balance.
[float]
===== Concurrent Rebalance
`cluster.routing.allocation.cluster_concurrent_rebalance`::
Allow to control how many concurrent rebalancing of shards are
allowed cluster wide, and default it to `2` (integer). `-1` for
unlimited. See also <<modules-cluster>>.
[float]
===== Enable allocation
`cluster.routing.allocation.enable`::
See <<modules-cluster>>.
[float]
===== Throttling allocation
`cluster.routing.allocation.node_initial_primaries_recoveries`::
See <<modules-cluster>>.
`cluster.routing.allocation.node_concurrent_recoveries`::
See <<modules-cluster>>.
[float]
===== Filter allocation
`cluster.routing.allocation.include.*`::
See <<modules-cluster>>.
`cluster.routing.allocation.exclude.*`::
See <<modules-cluster>>.
`cluster.routing.allocation.require.*`
See <<modules-cluster>>.
[float]
==== Metadata
`cluster.blocks.read_only`::
Have the whole cluster read only (indices do not accept write operations), metadata is not allowed to be modified (create or delete indices).
[float]
==== Discovery
`discovery.zen.minimum_master_nodes`::
See <<modules-discovery-zen>>
`discovery.zen.publish_timeout`::
See <<modules-discovery-zen>>
[float]
==== Threadpools
`threadpool.*`::
See <<modules-threadpool>>
[float]
[[cluster-index-settings]]
=== Index settings
[float]
==== Index filter cache
`indices.cache.filter.size`::
See <<index-modules-cache>>
[float]
==== TTL interval
`indices.ttl.interval` (time)::
See <<mapping-ttl-field>>
[float]
==== Recovery
`indices.recovery.concurrent_streams`::
See <<modules-indices>>
`indices.recovery.concurrent_small_file_streams`::
See <<modules-indices>>
`indices.recovery.file_chunk_size`::
See <<modules-indices>>
`indices.recovery.translog_ops`::
See <<modules-indices>>
`indices.recovery.translog_size`::
See <<modules-indices>>
`indices.recovery.compress`::
See <<modules-indices>>
`indices.recovery.max_bytes_per_sec`::
See <<modules-indices>>
[float]
[[logger]]
=== Logger
Logger values can also be updated by setting `logger.` prefix. More
settings will be allowed to be updated.
[float]
=== Field data circuit breaker
`indices.breaker.fielddata.limit`::
See <<index-modules-fielddata>>
`indices.breaker.fielddata.overhead`::
See <<index-modules-fielddata>>
A list of dynamically updatable settings can be found in the
<<modules,Modules>> documentation.

View File

@ -1,49 +1,177 @@
[[index-modules]]
= Index Modules
[partintro]
--
Index Modules are modules created per index and control all aspects
related to an index. Since those modules lifecycle are tied to an index,
all the relevant modules settings can be provided when creating an index
(and it is actually the recommended way to configure an index).
Index Modules are modules created per index and control all aspects related to
an index.
[float]
[[index-modules-settings]]
== Index Settings
There are specific index level settings that are not associated with any
specific module. These include:
Index level settings can be set per-index. Settings may be:
_static_::
They can only be set at index creation time or on a
<<indices-open-close,closed index>>.
_dynamic_::
They can be changed on a live index using the
<<indices-update-settings,update-index-settings>> API.
WARNING: Changing static or dynamic index settings on a closed index could
result in incorrect settings that are impossible to rectify without deleting
and recreating the index.
[float]
=== Static index settings
Below is a list of all _static_ index settings that are not associated with any
specific index module:
`index.number_of_shards`::
The number of primary shards that an index should have. Defaults to 5.
This setting can only be set at index creation time. It cannot be
changed on a closed index.
`index.shard.check_on_startup`::
+
--
experimental[] Whether or not shards should be checked for corruption before opening. When
corruption is detected, it will prevent the shard from being opened. Accepts:
`false`::
(default) Don't check for corruption when opening a shard.
`checksum`::
Check for physical corruption.
`true`::
Check for both physical and logical corruption. This is much more
expensive in terms of CPU and memory usage.
`fix`::
Check for both physical and logical corruption. Segments that were reported
as corrupted will be automatically removed. This option *may result in data loss*.
Use with extreme caution!
Checking shards may take a lot of time on large indices.
--
[float]
[[dynamic-index-settings]]
=== Dynamic index settings
Below is a list of all _dynamic_ index settings that are not associated with any
specific index module:
`index.number_of_replicas`::
The number of replicas each primary shard has. Defaults to 1.
`index.auto_expand_replicas`::
Auto-expand the number of replicas based on the number of available nodes.
Set to a dash delimited lower and upper bound (e.g. `0-5`) or use `all`
for the upper bound (e.g. `0-all`). Defaults to `false` (i.e. disabled).
`index.refresh_interval`::
A time setting controlling how often the
refresh operation will be executed. Defaults to `1s`. Can be set to `-1`
in order to disable it.
How often to perform a refresh operation, which makes recent changes to the
index visible to search. Defaults to `1s`. Can be set to `-1` to disable
refresh.
`index.codec`::
experimental[]
The `default` value compresses stored data with LZ4 compression, but
this can be set to `best_compression` for a higher compression ratio,
at the expense of slower stored fields performance.
experimental[] The `default` value compresses stored data with LZ4
compression, but this can be set to `best_compression` for a higher
compression ratio, at the expense of slower stored fields performance.
`index.shard.check_on_startup`::
`index.blocks.read_only`::
experimental[]
Should shard consistency be checked upon opening. When corruption is detected,
it will prevent the shard from being opened.
+
When `checksum`, check for physical corruption.
When `true`, check for both physical and logical corruption. This is much
more expensive in terms of CPU and memory usage.
When `fix`, check for both physical and logical corruption, and segments
that were reported as corrupted will be automatically removed.
Default value is `false`, which performs no checks.
Set to `true` to make the index and index metadata read only, `false` to
allow writes and metadata changes.
NOTE: Checking shards may take a lot of time on large indices.
`index.blocks.read`::
WARNING: Setting `index.shard.check_on_startup` to `fix` may result in data loss,
use with extreme caution.
Set to `true` to disable read operations against the index.
`index.blocks.write`::
Set to `true` to disable write operations against the index.
`index.blocks.metadata`::
Set to `true` to disable index metadata reads and writes.
`index.ttl.disable_purge`::
experimental[] Disables the purge of <<mapping-ttl-field,expired docs>> on
the current index.
[[index.recovery.initial_shards]]`index.recovery.initial_shards`::
+
--
A primary shard is only recovered only if there are enough nodes available to
allocate sufficient replicas to form a quorum. It can be set to:
* `quorum` (default)
* `quorum-1` (or `half`)
* `full`
* `full-1`.
* Number values are also supported, e.g. `1`.
--
[float]
=== Settings in other index modules
Other index settings are available in index modules:
<<analysis,Analysis>>::
Settings to define analyzers, tokenizers, token filters and character
filters.
<<index-modules-allocation,Index shard allocation>>::
Control over where, when, and how shards are allocated to nodes.
<<index-modules-mapper,Mapping>>::
Enable or disable dynamic mapping for an index.
<<index-modules-merge,Merging>>::
Control over how shards are merged by the background merge process.
<<index-modules-similarity,Similarities>>::
Configure custom similarity settings to customize how search results are
scored.
<<index-modules-slowlog,Slowlog>>::
Control over how slow queries and fetch requests are logged.
<<index-modules-store,Store>>::
Configure the type of filesystem used to access shard data.
<<index-modules-translog,Translog>>::
Control over the transaction log and background flush operations.
--
@ -51,22 +179,16 @@ include::index-modules/analysis.asciidoc[]
include::index-modules/allocation.asciidoc[]
include::index-modules/slowlog.asciidoc[]
include::index-modules/mapper.asciidoc[]
include::index-modules/merge.asciidoc[]
include::index-modules/store.asciidoc[]
include::index-modules/similarity.asciidoc[]
include::index-modules/mapper.asciidoc[]
include::index-modules/slowlog.asciidoc[]
include::index-modules/store.asciidoc[]
include::index-modules/translog.asciidoc[]
include::index-modules/cache.asciidoc[]
include::index-modules/query-cache.asciidoc[]
include::index-modules/fielddata.asciidoc[]
include::index-modules/similarity.asciidoc[]

View File

@ -1,168 +1,131 @@
[[index-modules-allocation]]
== Index Shard Allocation
This module provides per-index settings to control the allocation of shards to
nodes.
[float]
[[shard-allocation-filtering]]
=== Shard Allocation Filtering
Allows to control the allocation of indices on nodes based on include/exclude
filters. The filters can be set both on the index level and on the
cluster level. Lets start with an example of setting it on the cluster
level:
Shard allocation filtering allows you to specify which nodes are allowed
to host the shards of a particular index.
Lets say we have 4 nodes, each has specific attribute called `tag`
associated with it (the name of the attribute can be any name). Each
node has a specific value associated with `tag`. Node 1 has a setting
`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on.
NOTE: The per-index shard allocation filters explained below work in
conjunction with the cluster-wide allocation filters explained in
<<shards-allocation>>.
We can create an index that will only deploy on nodes that have `tag`
set to `value1` and `value2` by setting
`index.routing.allocation.include.tag` to `value1,value2`. For example:
It is possible to assign arbitrary metadata attributes to each node at
startup. For instance, nodes could be assigned a `rack` and a `group`
attribute as follows:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.tag" : "value1,value2"
}'
--------------------------------------------------
[source,sh]
------------------------
bin/elasticsearch --node.rack rack1 --node.size big <1>
------------------------
<1> These attribute settings can also be specfied in the `elasticsearch.yml` config file.
On the other hand, we can create an index that will be deployed on all
nodes except for nodes with a `tag` of value `value3` by setting
`index.routing.allocation.exclude.tag` to `value3`. For example:
These metadata attributes can be used with the
`index.routing.allocation.*` settings to allocate an index to a particular
group of nodes. For instance, we can move the index `test` to either `big` or
`medium` nodes as follows:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.exclude.tag" : "value3"
}'
--------------------------------------------------
[source,json]
------------------------
PUT test/_settings
{
"index.routing.allocation.include.size": "big,medium"
}
------------------------
// AUTOSENSE
`index.routing.allocation.require.*` can be used to
specify a number of rules, all of which MUST match in order for a shard
to be allocated to a node. This is in contrast to `include` which will
include a node if ANY rule matches.
Alternatively, we can move the index `test` away from the `small` nodes with
an `exclude` rule:
The `include`, `exclude` and `require` values can have generic simple
matching wildcards, for example, `value1*`. Additionally, special attribute
names called `_ip`, `_name`, `_id` and `_host` can be used to match by node
ip address, name, id or host name, respectively.
[source,json]
------------------------
PUT test/_settings
{
"index.routing.allocation.exclude.size": "small"
}
------------------------
// AUTOSENSE
Obviously a node can have several attributes associated with it, and
both the attribute name and value are controlled in the setting. For
example, here is a sample of several node configurations:
Multiple rules can be specified, in which case all conditions must be
satisfied. For instance, we could move the index `test` to `big` nodes in
`rack1` with the following:
[source,js]
--------------------------------------------------
node.group1: group1_value1
node.group2: group2_value4
--------------------------------------------------
[source,json]
------------------------
PUT test/_settings
{
"index.routing.allocation.include.size": "big",
"index.routing.allocation.include.rack": "rack1"
}
------------------------
// AUTOSENSE
In the same manner, `include`, `exclude` and `require` can work against
several attributes, for example:
NOTE: If some conditions cannot be satisfied then shards will not be moved.
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.group1" : "xxx"
"index.routing.allocation.include.group2" : "yyy",
"index.routing.allocation.exclude.group3" : "zzz",
"index.routing.allocation.require.group4" : "aaa",
}'
--------------------------------------------------
The following settings are _dynamic_, allowing live indices to be moved from
one set of nodes to another:
The provided settings can also be updated in real time using the update
settings API, allowing to "move" indices (shards) around in realtime.
`index.routing.allocation.include.{attribute}`::
Cluster wide filtering can also be defined, and be updated in real time
using the cluster update settings API. This setting can come in handy
for things like decommissioning nodes (even if the replica count is set
to 0). Here is a sample of how to decommission a node based on `_ip`
address:
Assign the index to a node whose `{attribute}` has at least one of the
comma-separated values.
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}'
--------------------------------------------------
`index.routing.allocation.require.{attribute}`::
Assign the index to a node whose `{attribute}` has _all_ of the
comma-separated values.
`index.routing.allocation.exclude.{attribute}`::
Assign the index to a node whose `{attribute}` has _none_ of the
comma-separated values.
These special attributes are also supported:
[horizontal]
`_name`:: Match nodes by node name
`_ip`:: Match nodes by IP address (the IP address associated with the hostname)
`_host`:: Match nodes by hostname
All attribute values can be specified with wildcards, eg:
[source,json]
------------------------
PUT test/_settings
{
"index.routing.allocation.include._ip": "192.168.2.*"
}
------------------------
// AUTOSENSE
[float]
=== Total Shards Per Node
The `index.routing.allocation.total_shards_per_node` setting allows to
control how many total shards (replicas and primaries) for an index will be allocated per node.
It can be dynamically set on a live index using the update index
settings API.
The cluster-level shard allocator tries to spread the shards of a single index
across as many nodes as possible. However, depending on how many shards and
indices you have, and how big they are, it may not always be possible to spread
shards evenly.
[float]
[[disk]]
=== Disk-based Shard Allocation
The following _dynamic_ setting allows you to specify a hard limit on the total
number of shards from a single index allowed per node:
disk based shard allocation is enabled from version 1.3.0 onward
`index.routing.allocation.total_shards_per_node`::
Elasticsearch can be configured to prevent shard
allocation on nodes depending on disk usage for the node. This
functionality is enabled by default, and can be changed either in the
configuration file, or dynamically using:
The maximum number of shards (replicas and primaries) that will be
allocated to a single node. Defaults to unbounded.
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.disk.threshold_enabled" : false
}
}'
--------------------------------------------------
[WARNING]
=======================================
This setting imposes a hard limit which can result in some shards not
being allocated.
Once enabled, Elasticsearch uses two watermarks to decide whether
shards should be allocated or can remain on the node.
Use with caution.
=======================================
`cluster.routing.allocation.disk.watermark.low` controls the low
watermark for disk usage. It defaults to 85%, meaning ES will not
allocate new shards to nodes once they have more than 85% disk
used. It can also be set to an absolute byte value (like 500mb) to
prevent ES from allocating shards if less than the configured amount
of space is available.
`cluster.routing.allocation.disk.watermark.high` controls the high
watermark. It defaults to 90%, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 90%. It can
also be set to an absolute byte value (similar to the low watermark)
to relocate shards once less than the configured amount of space is
available on the node.
NOTE: Percentage values refer to used disk space, while byte values refer to
free disk space. This can be confusing, since it flips the meaning of
high and low. For example, it makes sense to set the low watermark to 10gb
and the high watermark to 5gb, but not the other way around.
Both watermark settings can be changed dynamically using the cluster
settings API. By default, Elasticsearch will retrieve information
about the disk usage of the nodes every 30 seconds. This can also be
changed by setting the `cluster.info.update.interval` setting.
An example of updating the low watermark to no more than 80% of the disk size, a
high watermark of at least 50 gigabytes free, and updating the information about
the cluster every minute:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.disk.watermark.low" : "80%",
"cluster.routing.allocation.disk.watermark.high" : "50gb",
"cluster.info.update.interval" : "1m"
}
}'
--------------------------------------------------
By default, Elasticsearch will take into account shards that are currently being
relocated to the target node when computing a node's disk usage. This can be
changed by setting the `cluster.routing.allocation.disk.include_relocations`
setting to `false` (defaults to `true`). Taking relocating shards' sizes into
account may, however, mean that the disk usage for a node is incorrectly
estimated on the high side, since the relocation could be 90% complete and a
recently retrieved disk usage would include the total size of the relocating
shard as well as the space already used by the running relocation.

View File

@ -1,18 +1,12 @@
[[index-modules-analysis]]
== Analysis
The index analysis module acts as a configurable registry of Analyzers
that can be used in order to break down indexed (analyzed) fields when a
document is indexed as well as to process query strings. It maps to the Lucene
`Analyzer`.
The index analysis module acts as a configurable registry of _analyzers_
that can be used in order to convert a string field into individual terms
which are:
Analyzers are (generally) composed of a single `Tokenizer` and zero or
more `TokenFilters`. A set of `CharFilters` can be associated with an
analyzer to process the characters prior to other analysis steps. The
analysis module allows one to register `TokenFilters`, `Tokenizers` and
`Analyzers` under logical names that can then be referenced either in
mapping definitions or in certain APIs. The Analysis module
automatically registers (*if not explicitly defined*) built in
analyzers, token filters, and tokenizers.
* added to the inverted index in order to make the document searchable
* used by high level queries such as the <<query-dsl-match-query,`match` query>>
to generate seach terms.
See <<analysis>> for configuration details.
See <<analysis>> for configuration details.

View File

@ -1,33 +0,0 @@
[[index-modules-cache]]
== Cache
There are different caching inner modules associated with an index. They
include `filter` and others.
[float]
[[filter]]
=== Filter Cache
The filter cache is responsible for caching the results of filters (used
in the query). The default implementation of a filter cache (and the one
recommended to use in almost all cases) is the `node` filter cache type.
[float]
[[node-filter]]
==== Node Filter Cache
The `node` filter cache may be configured to use either a percentage of
the total memory allocated to the process or a specific amount of
memory. All shards present on a node share a single node cache (thats
why its called `node`). The cache implements an LRU eviction policy:
when a cache becomes full, the least recently used data is evicted to
make way for new data.
The setting that allows one to control the memory size for the filter
cache is `indices.cache.filter.size`, which defaults to `10%`. *Note*,
this is *not* an index level setting but a node level setting (can be
configured in the node configuration).
`indices.cache.filter.size` can accept either a percentage value, like
`30%`, or an exact value, like `512mb`.

View File

@ -49,5 +49,10 @@ automatically.
The default mapping can be overridden by specifying the `_default_` type when
creating a new index.
Dynamic creation of mappings for unmapped types can be completely
disabled by setting `index.mapper.dynamic` to `false`.
[float]
=== Mapper settings
`index.mapper.dynamic` (_static_)::
Dynamic creation of mappings for unmapped types can be completely
disabled by setting `index.mapper.dynamic` to `false`.

View File

@ -14,6 +14,11 @@ number of segments per tier. The merge policy is able to merge
non-adjacent segments, and separates how many segments are merged at once from how many
segments are allowed per tier. It also does not over-merge (i.e., cascade merges).
[float]
[[merge-settings]]
=== Merge policy settings
All merge policy settings are _dynamic_ and can be updated on a live index.
The merge policy has the following settings:
`index.merge.policy.expunge_deletes_allowed`::
@ -80,30 +85,29 @@ possibly either increase the `max_merged_segment` or issue an optimize
call for the index (try and aim to issue it on a low traffic time).
[float]
[[scheduling]]
=== Scheduling
[[merge-scheduling]]
=== Merge scheduling
The merge scheduler (ConcurrentMergeScheduler) controls the execution of
merge operations once they are needed (according to the merge policy). Merges
run in separate threads, and when the maximum number of threads is reached,
further merges will wait until a merge thread becomes available. The merge
scheduler supports this setting:
further merges will wait until a merge thread becomes available.
The merge scheduler supports the following _dynamic_ settings:
`index.merge.scheduler.max_thread_count`::
The maximum number of threads that may be merging at once. Defaults to
`Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))`
which works well for a good solid-state-disk (SSD). If your index is on
spinning platter drives instead, decrease this to 1.
The maximum number of threads that may be merging at once. Defaults to
`Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))`
which works well for a good solid-state-disk (SSD). If your index is on
spinning platter drives instead, decrease this to 1.
`index.merge.scheduler.auto_throttle`::
If this is true (the default), then the merge scheduler will
rate-limit IO (writes) for merges to an adaptive value depending on
how many merges are requested over time. An application with a low
indexing rate that unluckily suddenly requires a large merge will see
that merge aggressively throttled, while an application doing heavy
indexing will see the throttle move higher to allow merges to keep up
with ongoing indexing. This is a dynamic setting (you can <<indices-update-settings,change it
at any time on a running index>>).
If this is true (the default), then the merge scheduler will rate-limit IO
(writes) for merges to an adaptive value depending on how many merges are
requested over time. An application with a low indexing rate that
unluckily suddenly requires a large merge will see that merge aggressively
throttled, while an application doing heavy indexing will see the throttle
move higher to allow merges to keep up with ongoing indexing.

View File

@ -1,29 +1,31 @@
[[index-modules-slowlog]]
== Index Slow Log
== Slow Log
[float]
[[search-slow-log]]
=== Search Slow Log
Shard level slow search log allows to log slow search (query and fetch
executions) into a dedicated log file.
phases) into a dedicated log file.
Thresholds can be set for both the query phase of the execution, and
fetch phase, here is a sample:
[source,js]
[source,yaml]
--------------------------------------------------
#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms
#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms
--------------------------------------------------
All of the above settings are _dynamic_ and can be set per-index.
By default, none are enabled (set to `-1`). Levels (`warn`, `info`,
`debug`, `trace`) allow to control under which logging level the log
will be logged. Not all are required to be configured (for example, only
@ -37,14 +39,10 @@ execute. Some of the benefits of shard level logging is the association
of the actual execution on the specific machine, compared with request
level.
All settings are index level settings (and each index can have different
values for it), and can be changed in runtime using the index update
settings API.
The logging file is configured by default using the following
configuration (found in `logging.yml`):
[source,js]
[source,yaml]
--------------------------------------------------
index_search_slow_log_file:
type: dailyRollingFile
@ -64,18 +62,20 @@ log. The log file is ends with `_index_indexing_slowlog.log`. Log and
the thresholds are configured in the elasticsearch.yml file in the same
way as the search slowlog. Index slowlog sample:
[source,js]
[source,yaml]
--------------------------------------------------
#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms
index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms
--------------------------------------------------
All of the above settings are _dynamic_ and can be set per-index.
The index slow log file is configured by default in the `logging.yml`
file:
[source,js]
[source,yaml]
--------------------------------------------------
index_indexing_slow_log_file:
type: dailyRollingFile

View File

@ -1,34 +1,16 @@
[[index-modules-store]]
== Store
The store module allows you to control how index data is stored.
The index can either be stored in-memory (no persistence) or on-disk
(the default). In-memory indices provide better performance at the cost
of limiting the index size to the amount of available physical memory.
When using a local gateway (the default), file system storage with *no*
in memory storage is required to maintain index consistency. This is
required since the local gateway constructs its state from the local
index state of each node.
Another important aspect of memory based storage is the fact that
Elasticsearch supports storing the index in memory *outside of the JVM
heap space* using the "Memory" (see below) storage type. It translates
to the fact that there is no need for extra large JVM heaps (with their
own consequences) for storing the index in memory.
experimental[All of the settings exposed in the `store` module are expert only and may be removed in the future]
The store module allows you to control how index data is stored and accessed on disk.
[float]
[[file-system]]
=== File system storage types
File system based storage is the default storage used. There are
different implementations or _storage types_. The best one for the
operating environment will be automatically chosen: `mmapfs` on
Windows 64bit, `simplefs` on Windows 32bit, and `default`
(hybrid `niofs` and `mmapfs`) for the rest.
There are different file system implementations or _storage types_. The best
one for the operating environment will be automatically chosen: `mmapfs` on
Windows 64bit, `simplefs` on Windows 32bit, and `default` (hybrid `niofs` and
`mmapfs`) for the rest.
This can be overridden for all indices by adding this to the
`config/elasticsearch.yml` file:
@ -38,57 +20,53 @@ This can be overridden for all indices by adding this to the
index.store.type: niofs
---------------------------------
It can also be set on a per-index basis at index creation time:
It is a _static_ setting that can be set on a per-index basis at index
creation time:
[source,json]
---------------------------------
curl -XPUT localhost:9200/my_index -d '{
"settings": {
"index.store.type": "niofs"
}
}';
PUT /my_index
{
"settings": {
"index.store.type": "niofs"
}
}
---------------------------------
experimental[This is an expert-only setting and may be removed in the future]
The following sections lists all the different storage types supported.
[float]
[[simplefs]]
==== Simple FS
[[simplefs]]`simplefs`::
The `simplefs` type is a straightforward implementation of file system
The Simple FS type is a straightforward implementation of file system
storage (maps to Lucene `SimpleFsDirectory`) using a random access file.
This implementation has poor concurrent performance (multiple threads
will bottleneck). It is usually better to use the `niofs` when you need
index persistence.
[float]
[[niofs]]
==== NIO FS
[[niofs]]`niofs`::
The `niofs` type stores the shard index on the file system (maps to
The NIO FS type stores the shard index on the file system (maps to
Lucene `NIOFSDirectory`) using NIO. It allows multiple threads to read
from the same file concurrently. It is not recommended on Windows
because of a bug in the SUN Java implementation.
[[mmapfs]]
[float]
==== MMap FS
[[mmapfs]]`mmapfs`::
The `mmapfs` type stores the shard index on the file system (maps to
The MMap FS type stores the shard index on the file system (maps to
Lucene `MMapDirectory`) by mapping a file into memory (mmap). Memory
mapping uses up a portion of the virtual memory address space in your
process equal to the size of the file being mapped. Before using this
class, be sure your have plenty of virtual address space.
See <<vm-max-map-count>>
class, be sure you have allowed plenty of
<<vm-max-map-count,virtual address space>>.
[[default_fs]]
[float]
==== Hybrid MMap / NIO FS
[[default_fs]]`default_fs`::
The `default` type stores the shard index on the file system depending on
the file type by mapping a file into memory (mmap) or using Java NIO. Currently
only the Lucene term dictionary and doc values files are memory mapped to reduce
the impact on the operating system. All other files are opened using Lucene `NIOFSDirectory`.
Address space settings (<<vm-max-map-count>>) might also apply if your term
The `default` type is a hybrid of NIO FS and MMapFS, which chooses the best
file system for each type of file. Currently only the Lucene term dictionary
and doc values files are memory mapped to reduce the impact on the operating
system. All other files are opened using Lucene `NIOFSDirectory`. Address
space settings (<<vm-max-map-count>>) might also apply if your term
dictionaries are large.

View File

@ -43,7 +43,7 @@ specified as well in the URI. Those stats can be any of:
`fielddata`:: Fielddata statistics.
`flush`:: Flush statistics.
`merge`:: Merge statistics.
`query_cache`:: <<index-modules-shard-query-cache,Shard query cache>> statistics.
`query_cache`:: <<shard-query-cache,Shard query cache>> statistics.
`refresh`:: Refresh statistics.
`suggest`:: Suggest statistics.
`warmer`:: Warmer statistics.
@ -80,7 +80,7 @@ curl 'localhost:9200/_stats/search?groups=group1,group2
--------------------------------------------------
The stats returned are aggregated on the index level, with
`primaries` and `total` aggregations, where `primaries` are the values for only the
`primaries` and `total` aggregations, where `primaries` are the values for only the
primary shards, and `total` are the cumulated values for both primary and replica shards.
In order to get back shard level stats, set the `level` parameter to `shards`.

View File

@ -29,130 +29,8 @@ curl -XPUT 'localhost:9200/my_index/_settings' -d '
}'
--------------------------------------------------
[WARNING]
========================
When changing the number of replicas the index needs to be open. Changing
the number of replicas on a closed index might prevent the index to be opened correctly again.
========================
Below is the list of settings that can be changed using the update
settings API:
`index.number_of_replicas`::
The number of replicas each shard has.
`index.auto_expand_replicas` (string)::
Set to a dash delimited lower and upper bound (e.g. `0-5`)
or one may use `all` as the upper bound (e.g. `0-all`), or `false` to disable it.
`index.blocks.read_only`::
Set to `true` to have the index read only, `false` to allow writes
and metadata changes.
`index.blocks.read`::
Set to `true` to disable read operations against the index.
`index.blocks.write`::
Set to `true` to disable write operations against the index.
`index.blocks.metadata`::
Set to `true` to disable metadata operations against the index.
`index.refresh_interval`::
The async refresh interval of a shard.
`index.translog.flush_threshold_ops`::
When to flush based on operations.
`index.translog.flush_threshold_size`::
When to flush based on translog (bytes) size.
`index.translog.flush_threshold_period`::
When to flush based on a period of not flushing.
`index.translog.disable_flush`::
Disables flushing. Note, should be set for a short
interval and then enabled.
`index.cache.filter.max_size`::
The maximum size of filter cache (per segment in shard).
Set to `-1` to disable.
`index.cache.filter.expire`::
experimental[] The expire after access time for filter cache.
Set to `-1` to disable.
`index.gateway.snapshot_interval`::
experimental[] The gateway snapshot interval (only applies to shared
gateways). Defaults to 10s.
<<index-modules-merge,merge policy>>::
All the settings for the merge policy currently configured.
A different merge policy can't be set.
`index.merge.scheduler.*`::
experimental[] All the settings for the merge scheduler.
`index.routing.allocation.include.*`::
A node matching any rule will be allowed to host shards from the index.
`index.routing.allocation.exclude.*`::
A node matching any rule will NOT be allowed to host shards from the index.
`index.routing.allocation.require.*`::
Only nodes matching all rules will be allowed to host shards from the index.
`index.routing.allocation.disable_allocation`::
Disable allocation. Defaults to `false`. Deprecated in favour for `index.routing.allocation.enable`.
`index.routing.allocation.disable_new_allocation`::
Disable new allocation. Defaults to `false`. Deprecated in favour for `index.routing.allocation.enable`.
`index.routing.allocation.disable_replica_allocation`::
Disable replica allocation. Defaults to `false`. Deprecated in favour for `index.routing.allocation.enable`.
`index.routing.allocation.enable`::
Enables shard allocation for a specific index. It can be set to:
* `all` (default) - Allows shard allocation for all shards.
* `primaries` - Allows shard allocation only for primary shards.
* `new_primaries` - Allows shard allocation only for primary shards for new indices.
* `none` - No shard allocation is allowed.
`index.routing.rebalance.enable`::
Enables shard rebalancing for a specific index. It can be set to:
* `all` (default) - Allows shard rebalancing for all shards.
* `primaries` - Allows shard rebalancing only for primary shards.
* `replicas` - Allows shard rebalancing only for replica shards.
* `none` - No shard rebalancing is allowed.
`index.routing.allocation.total_shards_per_node`::
Controls the total number of shards (replicas and primaries) allowed to be allocated on a single node. Defaults to unbounded (`-1`).
`index.recovery.initial_shards`::
When using local gateway a particular shard is recovered only if there can be allocated quorum shards in the cluster. It can be set to:
* `quorum` (default)
* `quorum-1` (or `half`)
* `full`
* `full-1`.
* Number values are also supported, e.g. `1`.
`index.gc_deletes`::
experimental[]
`index.ttl.disable_purge`::
experimental[] Disables temporarily the purge of expired docs.
<<index-modules-store,store level throttling>>::
All the settings for the store level throttling policy currently configured.
`index.translog.fs.type`::
experimental[] Either `simple` or `buffered` (default).
<<index-modules-slowlog>>::
All the settings for slow log.
`index.warmer.enabled`::
See <<indices-warmers>>. Defaults to `true`.
The list of per-index settings which can be updated dynamically on live
indices can be found in <<index-modules>>.
[float]
[[bulk]]

View File

@ -56,10 +56,10 @@ value as a numeric type).
The `index.mapping.coerce` global setting can be set on the
index level to coerce numeric content globally across all
mapping types (The default setting is true and coercions attempted are
mapping types (The default setting is true and coercions attempted are
to convert strings with numbers into numeric types and also numeric values
with fractions to any integer/short/long values minus the fraction part).
When the permitted conversions fail in their attempts, the value is considered
When the permitted conversions fail in their attempts, the value is considered
malformed and the ignore_malformed setting dictates what will happen next.
--
@ -69,6 +69,8 @@ include::mapping/types.asciidoc[]
include::mapping/date-format.asciidoc[]
include::mapping/fielddata_formats.asciidoc[]
include::mapping/dynamic-mapping.asciidoc[]
include::mapping/meta.asciidoc[]

View File

@ -1,87 +1,5 @@
[[index-modules-fielddata]]
== Field data
The field data cache is used mainly when sorting on or computing aggregations
on a field. It loads all the field values to memory in order to provide fast
document based access to those values. The field data cache can be
expensive to build for a field, so its recommended to have enough memory
to allocate it, and to keep it loaded.
The amount of memory used for the field
data cache can be controlled using `indices.fielddata.cache.size`. Note:
reloading the field data which does not fit into your cache will be expensive
and perform poorly.
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`indices.fielddata.cache.size` |The max size of the field data cache,
eg `30%` of node heap space, or an absolute value, eg `12GB`. Defaults
to unbounded.
|`indices.fielddata.cache.expire` |experimental[] A time based setting that expires
field data after a certain time of inactivity. Defaults to `-1`. For
example, can be set to `5m` for a 5 minute expiry.
|=======================================================================
[float]
[[circuit-breaker]]
=== Circuit Breaker
Elasticsearch contains multiple circuit breakers used to prevent operations from
causing an OutOfMemoryError. Each breaker specifies a limit for how much memory
it can use. Additionally, there is a parent-level breaker that specifies the
total amount of memory that can be used across all breakers.
The parent-level breaker can be configured with the following setting:
`indices.breaker.total.limit`::
Starting limit for overall parent breaker, defaults to 70% of JVM heap
All circuit breaker settings can be changed dynamically using the cluster update
settings API.
[float]
[[fielddata-circuit-breaker]]
==== Field data circuit breaker
The field data circuit breaker allows Elasticsearch to estimate the amount of
memory a field will require to be loaded into memory. It can then prevent the
field data loading by raising an exception. By default the limit is configured
to 60% of the maximum JVM heap. It can be configured with the following
parameters:
`indices.breaker.fielddata.limit`::
Limit for fielddata breaker, defaults to 60% of JVM heap
`indices.breaker.fielddata.overhead`::
A constant that all field data estimations are multiplied with to determine a
final estimation. Defaults to 1.03
[float]
[[request-circuit-breaker]]
==== Request circuit breaker
The request circuit breaker allows Elasticsearch to prevent per-request data
structures (for example, memory used for calculating aggregations during a
request) from exceeding a certain amount of memory.
`indices.breaker.request.limit`::
Limit for request breaker, defaults to 40% of JVM heap
`indices.breaker.request.overhead`::
A constant that all request estimations are multiplied with to determine a
final estimation. Defaults to 1
[float]
[[fielddata-monitoring]]
=== Monitoring field data
You can monitor memory usage for field data as well as the field data circuit
breaker using
<<cluster-nodes-stats,Nodes Stats API>>
[[fielddata-formats]]
== Field data formats
== Fielddata formats
The field data format controls how field data should be stored.
@ -111,7 +29,7 @@ It is possible to change the field data format (and the field data settings
in general) on a live index by using the update mapping API.
[float]
==== String field data types
=== String field data types
`paged_bytes` (default on analyzed string fields)::
Stores unique terms sequentially in a large buffer and maps documents to
@ -123,7 +41,7 @@ in general) on a live index by using the update mapping API.
`not_analyzed`).
[float]
==== Numeric field data types
=== Numeric field data types
`array`::
Stores field values in memory using arrays.
@ -132,7 +50,7 @@ in general) on a live index by using the update mapping API.
Computes and stores field data data-structures on disk at indexing time.
[float]
==== Geo point field data types
=== Geo point field data types
`array`::
Stores latitudes and longitudes in arrays.
@ -142,7 +60,7 @@ in general) on a live index by using the update mapping API.
[float]
[[global-ordinals]]
==== Global ordinals
=== Global ordinals
Global ordinals is a data-structure on top of field data, that maintains an
incremental numbering for all the terms in field data in a lexicographic order.

View File

@ -200,7 +200,7 @@ PUT my_index/_mapping/my_type
Please however note that norms won't be removed instantly, but will be removed
as old segments are merged into new segments as you continue indexing new documents.
Any score computation on a field that has had
Any score computation on a field that has had
norms removed might return inconsistent results since some documents won't have
norms anymore while other documents might still have norms.
@ -484,7 +484,7 @@ binary type:
It is possible to control which field values are loaded into memory,
which is particularly useful for aggregations on string fields, using
fielddata filters, which are explained in detail in the
<<index-modules-fielddata,Fielddata>> section.
<<modules-fielddata,Fielddata>> section.
Fielddata filters can exclude terms which do not match a regex, or which
don't fall between a `min` and `max` frequency range:

View File

@ -1,6 +1,75 @@
[[modules]]
= Modules
[partintro]
--
This section contains modules responsible for various aspects of the functionality in Elasticsearch. Each module has settings which may be:
_static_::
These settings must be set at the node level, either in the
`elasticsearch.yml` file, or as an environment variable or on the command line
when starting a node. They must be set on every relevant node in the cluster.
_dynamic_::
These settings can be dynamically updated on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API.
The modules in this section are:
<<modules-cluster,Cluster-level routing and shard allocation>>::
Settings to control where, when, and how shards are allocated to nodes.
<<modules-discovery,Discovery>>::
How nodes discover each other to form a cluster.
<<modules-gateway,Gateway>>::
How many nodes need to join the cluster before recovery can start.
<<modules-http,HTTP>>::
Settings to control the HTTP REST interface.
<<modules-indices,Indices>>::
Global index-related settings.
<<modules-network,Network>>::
Controls default network settings.
<<modules-node,Node client>>::
A Java node client joins the cluster, but doesn't hold data or act as a master node.
<<modules-plugins,Plugins>>::
Using plugins to extend Elasticsearch.
<<modules-scripting,Scripting>>::
Custom scripting available in Lucene Expressions, Groovy, Python, and
Javascript.
<<modules-snapshots,Snapshot/Restore>>::
Backup your data with snapshot/restore.
<<modules-threadpool,Thread pools>>::
Information about the dedicated thread pools used in Elasticsearch.
<<modules-transport,Transport>>::
Configure the transport networking layer, used internally by Elasticsearch
to communicate between nodes.
--
include::modules/cluster.asciidoc[]
include::modules/discovery.asciidoc[]
@ -15,19 +84,20 @@ include::modules/network.asciidoc[]
include::modules/node.asciidoc[]
include::modules/tribe.asciidoc[]
include::modules/plugins.asciidoc[]
include::modules/scripting.asciidoc[]
include::modules/advanced-scripting.asciidoc[]
include::modules/snapshots.asciidoc[]
include::modules/threadpool.asciidoc[]
include::modules/transport.asciidoc[]
include::modules/snapshots.asciidoc[]
include::modules/tribe.asciidoc[]

View File

@ -1,5 +1,5 @@
[[modules-advanced-scripting]]
== Text scoring in scripts
=== Text scoring in scripts
Text features, such as term or document frequency for a specific term can be accessed in scripts (see <<modules-scripting, scripting documentation>> ) with the `_index` variable. This can be useful if, for example, you want to implement your own scoring model using for example a script inside a <<query-dsl-function-score-query,function score query>>.
@ -7,7 +7,7 @@ Statistics over the document collection are computed *per shard*, not per
index.
[float]
=== Nomenclature:
==== Nomenclature:
[horizontal]
@ -33,7 +33,7 @@ depending on the shard the current document resides in.
[float]
=== Shard statistics:
==== Shard statistics:
`_index.numDocs()`::
@ -49,7 +49,7 @@ depending on the shard the current document resides in.
[float]
=== Field statistics:
==== Field statistics:
Field statistics can be accessed with a subscript operator like this:
`_index['FIELD']`.
@ -74,7 +74,7 @@ depending on the shard the current document resides in.
The number of terms in a field cannot be accessed using the `_index` variable. See <<mapping-core-types, word count mapping type>> on how to do that.
[float]
=== Term statistics:
==== Term statistics:
Term statistics for a field can be accessed with a subscript operator like
this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist.
@ -101,7 +101,7 @@ affect is your set the `index_options` to `docs` (see <<mapping-core-types, mapp
[float]
=== Term positions, offsets and payloads:
==== Term positions, offsets and payloads:
If you need information on the positions of terms in a field, call
`_index['FIELD'].get('TERM', flag)` where flag can be
@ -174,7 +174,7 @@ return score;
[float]
=== Term vectors:
==== Term vectors:
The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (set `term_vector` in the mapping as described in the <<mapping-core-types,mapping documentation>>). To access them, call
`_index.termVectors()` to get a

View File

@ -1,253 +1,36 @@
[[modules-cluster]]
== Cluster
[float]
[[shards-allocation]]
=== Shards Allocation
One of the main roles of the master is to decide which shards to allocate to
which nodes, and when to move shards between nodes in order to rebalance the
cluster.
Shards allocation is the process of allocating shards to nodes. This can
happen during initial recovery, replica allocation, rebalancing, or
handling nodes being added or removed.
There are a number of settings available to control the shard allocation process:
The following settings may be used:
* <<shards-allocation>> lists the settings to control the allocation an
rebalancing operations.
`cluster.routing.allocation.allow_rebalance`::
Allow to control when rebalancing will happen based on the total
state of all the indices shards in the cluster. `always`,
`indices_primaries_active`, and `indices_all_active` are allowed,
defaulting to `indices_all_active` to reduce chatter during
initial recovery.
* <<disk-allocator>> explains how Elasticsearch takes available disk space
into account, and the related settings.
* <<allocation-awareness>> and <<forced-awareness>> control how shards can
be distributed across different racks or availability zones.
`cluster.routing.allocation.cluster_concurrent_rebalance`::
Allow to control how many concurrent rebalancing of shards are
allowed cluster wide, and default it to `2`.
* <<allocation-filtering>> allows certain nodes or groups of nodes excluded
from allocation so that they can be decommisioned.
Besides these, there are a few other <<misc-cluster,miscellaneous cluster-level settings>>.
`cluster.routing.allocation.node_initial_primaries_recoveries`::
Allow to control specifically the number of initial recoveries
of primaries that are allowed per node. Since most times local
gateway is used, those should be fast and we can handle more of
those per node without creating load. Defaults to `4`.
All of the settings in this section are _dynamic_ settings which can be
updated on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API.
include::cluster/shards_allocation.asciidoc[]
`cluster.routing.allocation.node_concurrent_recoveries`::
How many concurrent recoveries are allowed to happen on a node.
Defaults to `2`.
include::cluster/disk_allocator.asciidoc[]
`cluster.routing.allocation.enable`::
include::cluster/allocation_awareness.asciidoc[]
Controls shard allocation for all indices, by allowing specific
kinds of shard to be allocated.
+
--
Can be set to:
include::cluster/allocation_filtering.asciidoc[]
* `all` - (default) Allows shard allocation for all kinds of shards.
* `primaries` - Allows shard allocation only for primary shards.
* `new_primaries` - Allows shard allocation only for primary shards for new indices.
* `none` - No shard allocations of any kind are allowed for all indices.
--
`cluster.routing.rebalance.enable`::
Controls shard rebalance for all indices, by allowing specific
kinds of shard to be rebalanced.
+
--
Can be set to:
* `all` - (default) Allows shard balancing for all kinds of shards.
* `primaries` - Allows shard balancing only for primary shards.
* `replicas` - Allows shard balancing only for replica shards.
* `none` - No shard balancing of any kind are allowed for all indices.
--
`cluster.routing.allocation.same_shard.host`::
Allows to perform a check to prevent allocation of multiple instances
of the same shard on a single host, based on host name and host address.
Defaults to `false`, meaning that no check is performed by default. This
setting only applies if multiple nodes are started on the same machine.
`indices.recovery.concurrent_streams`::
The number of streams to open (on a *node* level) to recover a
shard from a peer shard. Defaults to `3`.
`indices.recovery.concurrent_small_file_streams`::
The number of streams to open (on a *node* level) for small files (under
5mb) to recover a shard from a peer shard. Defaults to `2`.
[float]
[[allocation-awareness]]
=== Shard Allocation Awareness
Cluster allocation awareness allows to configure shard and replicas
allocation across generic attributes associated the nodes. Lets explain
it through an example:
Assume we have several racks. When we start a node, we can configure an
attribute called `rack_id` (any attribute name works), for example, here
is a sample config:
----------------------
node.rack_id: rack_one
----------------------
The above sets an attribute called `rack_id` for the relevant node with
a value of `rack_one`. Now, we need to configure the `rack_id` attribute
as one of the awareness allocation attributes (set it on *all* (master
eligible) nodes config):
--------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id
--------------------------------------------------------
The above will mean that the `rack_id` attribute will be used to do
awareness based allocation of shard and its replicas. For example, lets
say we start 2 nodes with `node.rack_id` set to `rack_one`, and deploy a
single index with 5 shards and 1 replica. The index will be fully
deployed on the current nodes (5 shards and 1 replica each, total of 10
shards).
Now, if we start two more nodes, with `node.rack_id` set to `rack_two`,
shards will relocate to even the number of shards across the nodes, but,
a shard and its replica will not be allocated in the same `rack_id`
value.
The awareness attributes can hold several values, for example:
-------------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id,zone
-------------------------------------------------------------
*NOTE*: When using awareness attributes, shards will not be allocated to
nodes that don't have values set for those attributes.
[float]
[[forced-awareness]]
=== Forced Awareness
Sometimes, we know in advance the number of values an awareness
attribute can have, and more over, we would like never to have more
replicas than needed allocated on a specific group of nodes with the
same awareness attribute value. For that, we can force awareness on
specific attributes.
For example, lets say we have an awareness attribute called `zone`, and
we know we are going to have two zones, `zone1` and `zone2`. Here is how
we can force awareness on a node:
[source,js]
-------------------------------------------------------------------
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
cluster.routing.allocation.awareness.attributes: zone
-------------------------------------------------------------------
Now, lets say we start 2 nodes with `node.zone` set to `zone1` and
create an index with 5 shards and 1 replica. The index will be created,
but only 5 shards will be allocated (with no replicas). Only when we
start more shards with `node.zone` set to `zone2` will the replicas be
allocated.
[float]
==== Automatic Preference When Searching / GETing
When executing a search, or doing a get, the node receiving the request
will prefer to execute the request on shards that exists on nodes that
have the same attribute values as the executing node. This only happens
when the `cluster.routing.allocation.awareness.attributes` setting has
been set to a value.
[float]
==== Realtime Settings Update
The settings can be updated using the <<cluster-update-settings,cluster update settings API>> on a live cluster.
[float]
[[allocation-filtering]]
=== Shard Allocation Filtering
Allow to control allocation of indices on nodes based on include/exclude
filters. The filters can be set both on the index level and on the
cluster level. Lets start with an example of setting it on the cluster
level:
Lets say we have 4 nodes, each has specific attribute called `tag`
associated with it (the name of the attribute can be any name). Each
node has a specific value associated with `tag`. Node 1 has a setting
`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on.
We can create an index that will only deploy on nodes that have `tag`
set to `value1` and `value2` by setting
`index.routing.allocation.include.tag` to `value1,value2`. For example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.tag" : "value1,value2"
}'
--------------------------------------------------
On the other hand, we can create an index that will be deployed on all
nodes except for nodes with a `tag` of value `value3` by setting
`index.routing.allocation.exclude.tag` to `value3`. For example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.exclude.tag" : "value3"
}'
--------------------------------------------------
`index.routing.allocation.require.*` can be used to
specify a number of rules, all of which MUST match in order for a shard
to be allocated to a node. This is in contrast to `include` which will
include a node if ANY rule matches.
The `include`, `exclude` and `require` values can have generic simple
matching wildcards, for example, `value1*`. A special attribute name
called `_ip` can be used to match on node ip values. In addition `_host`
attribute can be used to match on either the node's hostname or its ip
address. Similarly `_name` and `_id` attributes can be used to match on
node name and node id accordingly.
Obviously a node can have several attributes associated with it, and
both the attribute name and value are controlled in the setting. For
example, here is a sample of several node configurations:
[source,js]
--------------------------------------------------
node.group1: group1_value1
node.group2: group2_value4
--------------------------------------------------
In the same manner, `include`, `exclude` and `require` can work against
several attributes, for example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.group1" : "xxx",
"index.routing.allocation.include.group2" : "yyy",
"index.routing.allocation.exclude.group3" : "zzz",
"index.routing.allocation.require.group4" : "aaa"
}'
--------------------------------------------------
The provided settings can also be updated in real time using the update
settings API, allowing to "move" indices (shards) around in realtime.
Cluster wide filtering can also be defined, and be updated in real time
using the cluster update settings API. This setting can come in handy
for things like decommissioning nodes (even if the replica count is set
to 0). Here is a sample of how to decommission a node based on `_ip`
address:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}'
--------------------------------------------------
include::cluster/misc.asciidoc[]

View File

@ -0,0 +1,107 @@
[[allocation-awareness]]
=== Shard Allocation Awareness
When running nodes on multiple VMs on the same physical server, on multiple
racks, or across multiple awareness zones, it is more likely that two nodes on
the same physical server, in the same rack, or in the same awareness zone will
crash at the same time, rather than two unrelated nodes crashing
simultaneously.
If Elasticsearch is _aware_ of the physical configuration of your hardware, it
can ensure that the primary shard and its replica shards are spread across
different physical servers, racks, or zones, to minimise the risk of losing
all shard copies at the same time.
The shard allocation awareness settings allow you to tell Elasticsearch about
your hardware configuration.
As an example, let's assume we have several racks. When we start a node, we
can tell it which rack it is in by assigning it an arbitrary metadata
attribute called `rack_id` -- we could use any attribute name. For example:
[source,sh]
----------------------
./bin/elasticsearch --node.rack_id rack_one <1>
----------------------
<1> This setting could also be specified in the `elasticsearch.yml` config file.
Now, we need to setup _shard allocation awareness_ by telling Elasticsearch
which attributes to use. This can be configured in the `elasticsearch.yml`
file on *all* master-eligible nodes, or it can be set (and changed) with the
<<cluster-update-settings,cluster-update-settings>> API.
For our example, we'll set the value in the config file:
[source,yaml]
--------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id
--------------------------------------------------------
With this config in place, let's say we start two nodes with `node.rack_id`
set to `rack_one`, and we create an index with 5 primary shards and 1 replica
of each primary. All primaries and replicas are allocated across the two
nodes.
Now, if we start two more nodes with `node.rack_id` set to `rack_two`,
Elasticsearch will move shards across to the new nodes, ensuring (if possible)
that the primary and replica shards are never in the same rack.
.Prefer local shards
*********************************************
When executing search or GET requests, with shard awareness enabled,
Elasticsearch will prefer using local shards -- shards in the same awareness
group -- to execute the request. This is usually faster than crossing racks or
awareness zones.
*********************************************
Multiple awareness attributes can be specified, in which case the combination
of values from each attribute is considered to be a separate value.
[source,yaml]
-------------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id,zone
-------------------------------------------------------------
NOTE: When using awareness attributes, shards will not be allocated to
nodes that don't have values set for those attributes.
[float]
[[forced-awareness]]
=== Forced Awareness
Imagine that you have two awareness zones and enough hardware across the two
zones to host all of your primary and replica shards. But perhaps the
hardware in a single zone, while sufficient to host half the shards, would be
unable to host *ALL* the shards.
With ordinary awareness, if one zone lost contact with the other zone,
Elasticsearch would assign all of the missing replica shards to a single zone.
But in this example, this sudden extra load would cause the hardware in the
remaining zone to be overloaded.
Forced awareness solves this problem by *NEVER* allowing copies of the same
shard to be allocated to the same zone.
For example, lets say we have an awareness attribute called `zone`, and
we know we are going to have two zones, `zone1` and `zone2`. Here is how
we can force awareness on a node:
[source,yaml]
-------------------------------------------------------------------
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2 <1>
cluster.routing.allocation.awareness.attributes: zone
-------------------------------------------------------------------
<1> We must list all possible values that the `zone` attribute can have.
Now, if we start 2 nodes with `node.zone` set to `zone1` and create an index
with 5 shards and 1 replica. The index will be created, but only the 5 primary
shards will be allocated (with no replicas). Only when we start more shards
with `node.zone` set to `zone2` will the replicas be allocated.
The `cluster.routing.allocation.awareness.*` settings can all be updated
dynamically on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API.

View File

@ -0,0 +1,70 @@
[[allocation-filtering]]
=== Shard Allocation Filtering
While <<index-modules-allocation>> provides *per-index* settings to control the
allocation of shards to nodes, cluster-level shard allocation filtering allows
you to allow or disallow the allocation of shards from *any* index to
particular nodes.
The typical use case for cluster-wide shard allocation filtering is when you
want to decommision a node, and you would like to move the shards from that
node to other nodes in the cluster before shutting it down.
For instance, we could decomission a node using its IP address as follows:
[source,json]
--------------------------------------------------
PUT /_cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}
--------------------------------------------------
// AUTOSENSE
NOTE: Shards will only be relocated if it is possible to do so without
breaking another routing constraint, such as never allocating a primary and
replica shard to the same node.
Cluster-wide shard allocation filtering works in the same way as index-level
shard allocation filtering (see <<index-modules-allocation>> for details).
The available _dynamic_ cluster settings are as follows, where `{attribute}`
refers to an arbitrary node attribute.:
`cluster.routing.allocation.include.{attribute}`::
Assign the index to a node whose `{attribute}` has at least one of the
comma-separated values.
`cluster.routing.allocation.require.{attribute}`::
Assign the index to a node whose `{attribute}` has _all_ of the
comma-separated values.
`cluster.routing.allocation.exclude.{attribute}`::
Assign the index to a node whose `{attribute}` has _none_ of the
comma-separated values.
These special attributes are also supported:
[horizontal]
`_name`:: Match nodes by node name
`_ip`:: Match nodes by IP address (the IP address associated with the hostname)
`_host`:: Match nodes by hostname
All attribute values can be specified with wildcards, eg:
[source,json]
------------------------
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.include._ip": "192.168.2.*"
}
}
------------------------
// AUTOSENSE

View File

@ -0,0 +1,69 @@
[[disk-allocator]]
=== Disk-based Shard Allocation
Elasticsearch factors in the available disk space on a node before deciding
whether to allocate new shards to that node or to actively relocate shards
away from that node.
Below are the settings that can be configred in the `elasticsearch.yml` config
file or updated dynamically on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API:
`cluster.routing.allocation.disk.threshold_enabled`::
Defaults to `true`. Set to `false` to disable the disk allocation decider.
`cluster.routing.allocation.disk.watermark.low`::
Controls the low watermark for disk usage. It defaults to 85%, meaning ES will
not allocate new shards to nodes once they have more than 85% disk used. It
can also be set to an absolute byte value (like 500mb) to prevent ES from
allocating shards if less than the configured amount of space is available.
`cluster.routing.allocation.disk.watermark.high`::
Controls the high watermark. It defaults to 90%, meaning ES will attempt to
relocate shards to another node if the node disk usage rises above 90%. It can
also be set to an absolute byte value (similar to the low watermark) to
relocate shards once less than the configured amount of space is available on
the node.
NOTE: Percentage values refer to used disk space, while byte values refer to
free disk space. This can be confusing, since it flips the meaning of high and
low. For example, it makes sense to set the low watermark to 10gb and the high
watermark to 5gb, but not the other way around.
`cluster.info.update.interval`::
How often Elasticsearch should check on disk usage for each node in the
cluster. Defaults to `30s`.
`cluster.routing.allocation.disk.include_relocations`::
Defaults to +true+, which means that Elasticsearch will take into account
shards that are currently being relocated to the target node when computing a
node's disk usage. Taking relocating shards' sizes into account may, however,
mean that the disk usage for a node is incorrectly estimated on the high side,
since the relocation could be 90% complete and a recently retrieved disk usage
would include the total size of the relocating shard as well as the space
already used by the running relocation.
An example of updating the low watermark to no more than 80% of the disk size, a
high watermark of at least 50 gigabytes free, and updating the information about
the cluster every minute:
[source,js]
--------------------------------------------------
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "80%",
"cluster.routing.allocation.disk.watermark.high": "50gb",
"cluster.info.update.interval": "1m"
}
}
--------------------------------------------------
// AUTOSENSE

View File

@ -0,0 +1,36 @@
[[misc-cluster]]
=== Miscellaneous cluster settings
[[cluster-read-only]]
==== Metadata
An entire cluster may be set to read-only with the following _dynamic_ setting:
`cluster.blocks.read_only`::
Make the whole cluster read only (indices do not accept write
operations), metadata is not allowed to be modified (create or delete
indices).
WARNING: Don't rely on this setting to prevent changes to your cluster. Any
user with access to the <<cluster-update-settings,cluster-update-settings>>
API can make the cluster read-write again.
[[cluster-logger]]
==== Logger
The settings which control logging can be updated dynamically with the
`logger.` prefix. For instance, to increase the logging level of the
`indices.recovery` module to `DEBUG`, issue this request:
[source,json]
-------------------------------
PUT /_cluster/settings
{
"transient": {
"logger.indices.recovery": "DEBUG"
}
}
-------------------------------

View File

@ -0,0 +1,124 @@
[[shards-allocation]]
=== Cluster Level Shard Allocation
Shard allocation is the process of allocating shards to nodes. This can
happen during initial recovery, replica allocation, rebalancing, or
when nodes are added or removed.
[float]
=== Shard Allocation Settings
The following _dynamic_ settings may be used to control shard allocation and recovery:
`cluster.routing.allocation.enable`::
+
--
Enable or disable allocation for specific kinds of shards:
* `all` - (default) Allows shard allocation for all kinds of shards.
* `primaries` - Allows shard allocation only for primary shards.
* `new_primaries` - Allows shard allocation only for primary shards for new indices.
* `none` - No shard allocations of any kind are allowed for any indices.
This setting does not affect the recovery of local primary shards when
restarting a node. A restarted node that has a copy of an unassigned primary
shard will recover that primary immediately, assuming that the
<<index.recovery.initial_shards,`index.recovery.initial_shards`>> setting is
satisfied.
--
`cluster.routing.allocation.node_concurrent_recoveries`::
How many concurrent shard recoveries are allowed to happen on a node.
Defaults to `2`.
`cluster.routing.allocation.node_initial_primaries_recoveries`::
While the recovery of replicas happens over the network, the recovery of
an unassigned primary after node restart uses data from the local disk.
These should be fast so more initial primary recoveries can happen in
parallel on the same node. Defaults to `4`.
`cluster.routing.allocation.same_shard.host`::
Allows to perform a check to prevent allocation of multiple instances of
the same shard on a single host, based on host name and host address.
Defaults to `false`, meaning that no check is performed by default. This
setting only applies if multiple nodes are started on the same machine.
`indices.recovery.concurrent_streams`::
The number of network streams to open per node to recover a shard from
a peer shard. Defaults to `3`.
`indices.recovery.concurrent_small_file_streams`::
The number of streams to open per node for small files (under 5mb) to
recover a shard from a peer shard. Defaults to `2`.
[float]
=== Shard Rebalancing Settings
The following _dynamic_ settings may be used to control the rebalancing of
shards across the cluster:
`cluster.routing.rebalance.enable`::
+
--
Enable or disable rebalancing for specific kinds of shards:
* `all` - (default) Allows shard balancing for all kinds of shards.
* `primaries` - Allows shard balancing only for primary shards.
* `replicas` - Allows shard balancing only for replica shards.
* `none` - No shard balancing of any kind are allowed for any indices.
--
`cluster.routing.allocation.allow_rebalance`::
+
--
Specify when shard rebalancing is allowed:
* `always` - (default) Always allow rebalancing.
* `indices_primaries_active` - Only when all primaries in the cluster are allocated.
* `indices_all_active` - Only when all shards (primaries and replicas) in the cluster are allocated.
--
`cluster.routing.allocation.cluster_concurrent_rebalance`::
Allow to control how many concurrent shard rebalances are
allowed cluster wide. Defaults to `2`.
[float]
=== Shard Balancing Heuristics
The following settings are used together to determine where to place each
shard. The cluster is balanced when no allowed action can bring the weights
of each node closer together by more then the `balance.threshold`.
`cluster.routing.allocation.balance.shard`::
Defines the weight factor for shards allocated on a node
(float). Defaults to `0.45f`. Raising this raises the tendency to
equalize the number of shards across all nodes in the cluster.
`cluster.routing.allocation.balance.index`::
Defines a factor to the number of shards per index allocated
on a specific node (float). Defaults to `0.55f`. Raising this raises the
tendency to equalize the number of shards per index across all nodes in
the cluster.
`cluster.routing.allocation.balance.threshold`::
Minimal optimization value of operations that should be performed (non
negative float). Defaults to `1.0f`. Raising this will cause the cluster
to be less aggressive about optimizing the shard balance.
NOTE: Regardless of the result of the balancing algorithm, rebalancing might
not be allowed due to forced awareness or allocation filtering.

View File

@ -1,69 +1,51 @@
[[modules-gateway]]
== Gateway
== Local Gateway
The gateway module allows one to store the state of the cluster meta
data across full cluster restarts. The cluster meta data mainly holds
all the indices created with their respective (index level) settings and
explicit type mappings.
The local gateway module stores the cluster state and shard data across full
cluster restarts.
Each time the cluster meta data changes (for example, when an index is
added or deleted), those changes will be persisted using the gateway.
When the cluster first starts up, the state will be read from the
gateway and applied.
The following _static_ settings, which must be set on every data node in the
cluster, controls how long nodes should wait before they try to recover any
shards which are stored locally:
The gateway set on the node level will automatically control the index gateway
that will be used. For example, if the `local` gateway is used (the default),
then each index created on the node will automatically use its own respective
index level `local` gateway.
`gateway.expected_nodes`::
The default gateway used is the
<<modules-gateway-local,local>> gateway.
The number of (data or master) nodes that are expected to be in the cluster.
Recovery of local shards will start as soon as the expected number of
nodes have joined the cluster. Defaults to `0`
The `none` gateway option was removed in Elasticsearch 2.0.
`gateway.expected_master_nodes`::
[float]
[[recover-after]]
=== Recovery After Nodes / Time
The number of master nodes that are expected to be in the cluster.
Recovery of local shards will start as soon as the expected number of
master nodes have joined the cluster. Defaults to `0`
In many cases, the actual cluster meta data should only be recovered
after specific nodes have started in the cluster, or a timeout has
passed. This is handy when restarting the cluster, and each node local
index storage still exists to be reused and not recovered from the
gateway (which reduces the time it takes to recover from the gateway).
`gateway.expected_data_nodes`::
The `gateway.recover_after_nodes` setting (which accepts a number)
controls after how many data and master eligible nodes within the
cluster recovery will start. The `gateway.recover_after_data_nodes` and
`gateway.recover_after_master_nodes` setting work in a similar fashion,
except they consider only the number of data nodes and only the number
of master nodes respectively. The `gateway.recover_after_time` setting
(which accepts a time value) sets the time to wait till recovery happens
once all `gateway.recover_after...nodes` conditions are met.
The number of data nodes that are expected to be in the cluster.
Recovery of local shards will start as soon as the expected number of
data nodes have joined the cluster. Defaults to `0`
The `gateway.expected_nodes` allows to set how many data and master
eligible nodes are expected to be in the cluster, and once met, the
`gateway.recover_after_time` is ignored and recovery starts.
Setting `gateway.expected_nodes` also defaults `gateway.recover_after_time` to `5m` The `gateway.expected_data_nodes` and `gateway.expected_master_nodes`
settings are also supported. For example setting:
`gateway.recover_after_time`::
[source,js]
--------------------------------------------------
gateway:
recover_after_time: 5m
expected_nodes: 2
--------------------------------------------------
If the expected number of nodes is not achieved, the recovery process waits
for the configured amount of time before trying to recover regardless.
Defaults to `5m` if one of the `expected_nodes` settings is configured.
In an expected 2 nodes cluster will cause recovery to start 5 minutes
after the first node is up, but once there are 2 nodes in the cluster,
recovery will begin immediately (without waiting).
Once the `recover_after_time` duration has timed out, recovery will start
as long as the following conditions are met:
Note, once the meta data has been recovered from the gateway (which
indices to create, mappings and so on), then this setting is no longer
effective until the next full restart of the cluster.
`gateway.recover_after_nodes`::
Operations are blocked while the cluster meta data has not been
recovered in order not to mix with the actual cluster meta data that
will be recovered once the settings has been reached.
Recover as long as this many data or master nodes have joined the cluster.
include::gateway/local.asciidoc[]
`gateway.recover_after_master_nodes`::
Recover as long as this many master nodes have joined the cluster.
`gateway.recover_after_data_nodes`::
Recover as long as this many data nodes have joined the cluster.
NOTE: These settings only take effect on a full cluster restart.

View File

@ -1,56 +0,0 @@
[[modules-gateway-local]]
=== Local Gateway
The local gateway allows for recovery of the full cluster state and
indices from the local storage of each node, and does not require a
common node level shared storage.
Note, different from shared gateway types, the persistency to the local
gateway is *not* done in an async manner. Once an operation is
performed, the data is there for the local gateway to recover it in case
of full cluster failure.
It is important to configure the `gateway.recover_after_nodes` setting
to include most of the expected nodes to be started after a full cluster
restart. This will insure that the latest cluster state is recovered.
For example:
[source,js]
--------------------------------------------------
gateway:
recover_after_nodes: 3
expected_nodes: 5
--------------------------------------------------
[float]
==== Dangling indices
When a node joins the cluster, any shards/indices stored in its local `data/`
directory which do not already exist in the cluster will be imported into the
cluster by default. This functionality has two purposes:
1. If a new master node is started which is unaware of the other indices in
the cluster, adding the old nodes will cause the old indices to be
imported, instead of being deleted.
2. An old index can be added to an existing cluster by copying it to the
`data/` directory of a new node, starting the node and letting it join
the cluster. Once the index has been replicated to other nodes in the
cluster, the new node can be shut down and removed.
The import of dangling indices can be controlled with the
`gateway.auto_import_dangled` which accepts:
[horizontal]
`yes`::
Import dangling indices into the cluster (default).
`close`::
Import dangling indices into the cluster state, but leave them closed.
`no`::
Delete dangling indices after `gateway.dangling_timeout`, which
defaults to 2 hours.

View File

@ -1,66 +1,50 @@
[[modules-indices]]
== Indices
The indices module allow to control settings that are globally managed
for all indices.
The indices module controls index-related settings that are globally managed
for all indices, rather than being configurable at a per-index level.
[float]
[[buffer]]
=== Indexing Buffer
Available settings include:
The indexing buffer setting allows to control how much memory will be
allocated for the indexing process. It is a global setting that bubbles
down to all the different shards allocated on a specific node.
<<circuit-breaker,Circuit breaker>>::
The `indices.memory.index_buffer_size` accepts either a percentage or a
byte size value. It defaults to `10%`, meaning that `10%` of the total
memory allocated to a node will be used as the indexing buffer size.
This amount is then divided between all the different shards. Also, if
percentage is used, it is possible to set `min_index_buffer_size` (defaults to
`48mb`) and `max_index_buffer_size` (defaults to unbounded).
Circuit breakers set limits on memory usage to avoid out of memory exceptions.
The `indices.memory.min_shard_index_buffer_size` allows to set a hard
lower limit for the memory allocated per shard for its own indexing
buffer. It defaults to `4mb`.
<<modules-fielddata,Fielddata cache>>::
[float]
[[indices-ttl]]
=== TTL interval
Set limits on the amount of heap used by the in-memory fielddata cache.
You can dynamically set the `indices.ttl.interval`, which allows to set how
often expired documents will be automatically deleted. The default value
is 60s.
<<filter-cache,Node filter cache>>::
The deletion orders are processed by bulk. You can set
`indices.ttl.bulk_size` to fit your needs. The default value is 10000.
Configure the amount heap used to cache filter results.
See also <<mapping-ttl-field>>.
<<indexing-buffer,Indexing buffer>>::
[float]
[[recovery]]
=== Recovery
Control the size of the buffer allocated to the indexing process.
The following settings can be set to manage the recovery policy:
<<shard-query-cache,Shard query cache>>::
[horizontal]
`indices.recovery.concurrent_streams`::
defaults to `3`.
Control the behaviour of the shard-level query cache.
`indices.recovery.concurrent_small_file_streams`::
defaults to `2`.
<<recovery,Recovery>>::
`indices.recovery.file_chunk_size`::
defaults to `512kb`.
Control the resource limits on the shard recovery process.
`indices.recovery.translog_ops`::
defaults to `1000`.
<<indices-ttl,TTL interval>>::
`indices.recovery.translog_size`::
defaults to `512kb`.
Control how expired documents are removed.
`indices.recovery.compress`::
defaults to `true`.
include::indices/circuit_breaker.asciidoc[]
`indices.recovery.max_bytes_per_sec`::
defaults to `40mb`.
include::indices/fielddata.asciidoc[]
include::indices/filter_cache.asciidoc[]
include::indices/indexing_buffer.asciidoc[]
include::indices/query-cache.asciidoc[]
include::indices/recovery.asciidoc[]
include::indices/ttl_interval.asciidoc[]

View File

@ -0,0 +1,56 @@
[[circuit-breaker]]
=== Circuit Breaker
Elasticsearch contains multiple circuit breakers used to prevent operations from
causing an OutOfMemoryError. Each breaker specifies a limit for how much memory
it can use. Additionally, there is a parent-level breaker that specifies the
total amount of memory that can be used across all breakers.
These settings can be dynamically updated on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API.
[[parent-circuit-breaker]]
[float]
==== Parent circuit breaker
The parent-level breaker can be configured with the following setting:
`indices.breaker.total.limit`::
Starting limit for overall parent breaker, defaults to 70% of JVM heap.
[[fielddata-circuit-breaker]]
[float]
==== Field data circuit breaker
The field data circuit breaker allows Elasticsearch to estimate the amount of
memory a field will require to be loaded into memory. It can then prevent the
field data loading by raising an exception. By default the limit is configured
to 60% of the maximum JVM heap. It can be configured with the following
parameters:
`indices.breaker.fielddata.limit`::
Limit for fielddata breaker, defaults to 60% of JVM heap
`indices.breaker.fielddata.overhead`::
A constant that all field data estimations are multiplied with to determine a
final estimation. Defaults to 1.03
[[request-circuit-breaker]]
[float]
==== Request circuit breaker
The request circuit breaker allows Elasticsearch to prevent per-request data
structures (for example, memory used for calculating aggregations during a
request) from exceeding a certain amount of memory.
`indices.breaker.request.limit`::
Limit for request breaker, defaults to 40% of JVM heap
`indices.breaker.request.overhead`::
A constant that all request estimations are multiplied with to determine a
final estimation. Defaults to 1

View File

@ -0,0 +1,37 @@
[[modules-fielddata]]
=== Fielddata
The field data cache is used mainly when sorting on or computing aggregations
on a field. It loads all the field values to memory in order to provide fast
document based access to those values. The field data cache can be
expensive to build for a field, so its recommended to have enough memory
to allocate it, and to keep it loaded.
The amount of memory used for the field
data cache can be controlled using `indices.fielddata.cache.size`. Note:
reloading the field data which does not fit into your cache will be expensive
and perform poorly.
`indices.fielddata.cache.size`::
The max size of the field data cache, eg `30%` of node heap space, or an
absolute value, eg `12GB`. Defaults to unbounded. Also see
<<fielddata-circuit-breaker>>.
`indices.fielddata.cache.expire`::
experimental[] A time based setting that expires field data after a
certain time of inactivity. Defaults to `-1`. For example, can be set to
`5m` for a 5 minute expiry.
NOTE: These are static settings which must be configured on every data node in
the cluster.
[float]
[[fielddata-monitoring]]
==== Monitoring field data
You can monitor memory usage for field data as well as the field data circuit
breaker using
<<cluster-nodes-stats,Nodes Stats API>>

View File

@ -0,0 +1,16 @@
[[filter-cache]]
=== Node Filter Cache
The filter cache is responsible for caching the results of filters (used in
the query). There is one filter cache per node that is shared by all shards.
The cache implements an LRU eviction policy: when a cache becomes full, the
least recently used data is evicted to make way for new data.
The following setting is _static_ and must be configured on every data node in
the cluster:
`indices.cache.filter.size`::
Controls the memory size for the filter cache , defaults to `10%`. Accepts
either a percentage value, like `30%`, or an exact value, like `512mb`.

View File

@ -0,0 +1,32 @@
[[indexing-buffer]]
=== Indexing Buffer
The indexing buffer is used to store newly indexed documents. When it fills
up, the documents in the buffer are written to a segment on disk. It is divided
between all shards on the node.
The following settings are _static_ and must be configured on every data node
in the cluster:
`indices.memory.index_buffer_size`::
Accepts either a percentage or a byte size value. It defaults to `10%`,
meaning that `10%` of the total heap allocated to a node will be used as the
indexing buffer size.
`indices.memory.min_index_buffer_size`::
If the `index_buffer_size` is specified as a percentage, then this
setting can be used to specify an absolute minimum. Defaults to `48mb`.
`indices.memory.max_index_buffer_size`::
If the `index_buffer_size` is specified as a percentage, then this
setting can be used to specify an absolute maximum. Defaults to unbounded.
`indices.memory.min_shard_index_buffer_size`::
Sets a hard lower limit for the memory allocated per shard for its own
indexing buffer. Defaults to `4mb`.

View File

@ -1,5 +1,5 @@
[[index-modules-shard-query-cache]]
== Shard query cache
[[shard-query-cache]]
=== Shard query cache
When a search request is run against an index or against many indices, each
involved shard executes the search locally and returns its local results to
@ -13,7 +13,7 @@ use case, where only the most recent index is being actively updated --
results from older indices will be served directly from the cache.
[IMPORTANT]
==================================
===================================
For now, the query cache will only cache the results of search requests
where `size=0`, so it will not cache `hits`,
@ -21,10 +21,10 @@ but it will cache `hits.total`, <<search-aggregations,aggregations>>, and
<<search-suggesters,suggestions>>.
Queries that use `now` (see <<date-math>>) cannot be cached.
==================================
===================================
[float]
=== Cache invalidation
==== Cache invalidation
The cache is smart -- it keeps the same _near real-time_ promise as uncached
search.
@ -46,7 +46,7 @@ curl -XPOST 'localhost:9200/kimchy,elasticsearch/_cache/clear?query_cache=true'
------------------------
[float]
=== Enabling caching by default
==== Enabling caching by default
The cache is not enabled by default, but can be enabled when creating a new
index as follows:
@ -73,7 +73,7 @@ curl -XPUT localhost:9200/my_index/_settings -d'
-----------------------------
[float]
=== Enabling caching per request
==== Enabling caching per request
The `query_cache` query-string parameter can be used to enable or disable
caching on a *per-query* basis. If set, it overrides the index-level setting:
@ -99,7 +99,7 @@ it uses a random function or references the current time) you should set the
`query_cache` flag to `false` to disable caching for that request.
[float]
=== Cache key
==== Cache key
The whole JSON body is used as the cache key. This means that if the JSON
changes -- for instance if keys are output in a different order -- then the
@ -110,7 +110,7 @@ keys are always emitted in the same order. This canonical mode can be used in
the application to ensure that a request is always serialized in the same way.
[float]
=== Cache settings
==== Cache settings
The cache is managed at the node level, and has a default maximum size of `1%`
of the heap. This can be changed in the `config/elasticsearch.yml` file with:
@ -126,7 +126,7 @@ stale results are automatically invalidated when the index is refreshed. This
setting is provided for completeness' sake only.
[float]
=== Monitoring cache usage
==== Monitoring cache usage
The size of the cache (in bytes) and the number of evictions can be viewed
by index, with the <<indices-stats,`indices-stats`>> API:

View File

@ -0,0 +1,28 @@
[[recovery]]
=== Indices Recovery
The following _expert_ settings can be set to manage the recovery policy.
`indices.recovery.concurrent_streams`::
Defaults to `3`.
`indices.recovery.concurrent_small_file_streams`::
Defaults to `2`.
`indices.recovery.file_chunk_size`::
Defaults to `512kb`.
`indices.recovery.translog_ops`::
Defaults to `1000`.
`indices.recovery.translog_size`::
Defaults to `512kb`.
`indices.recovery.compress`::
Defaults to `true`.
`indices.recovery.max_bytes_per_sec`::
Defaults to `40mb`.
These settings can be dynamically updated on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API:

View File

@ -0,0 +1,16 @@
[[indices-ttl]]
=== TTL interval
Documents that have a <<mapping-ttl-field,`ttl`>> value set need to be deleted
once they have expired. How and how often they are deleted is controlled by
the following dynamic cluster settings:
`indices.ttl.interval`::
How often the deletion process runs. Defaults to `60s`.
`indices.ttl.bulk_size`::
The deletions are processed with a <<docs-bulk,bulk request>>.
The number of deletions processed can be configured with
this settings. Defaults to `10000`.

View File

@ -22,7 +22,7 @@ Installing plugins typically take the following form:
[source,shell]
-----------------------------------
plugin --install <org>/<user/component>/<version>
bin/plugin --install <org>/<user/component>/<version>
-----------------------------------
The plugins will be

View File

@ -9,7 +9,6 @@ of discarded.
There are several thread pools, but the important ones include:
[horizontal]
`index`::
For index/delete operations. Defaults to `fixed`
with a size of `# of available processors`,

View File

@ -73,7 +73,7 @@ And here is a sample response:
Set to `true` or `false` to enable or disable the caching
of search results for requests where `size` is 0, ie
aggregations and suggestions (no top hits returned).
See <<index-modules-shard-query-cache>>.
See <<shard-query-cache>>.
`terminate_after`::

View File

@ -416,7 +416,7 @@ The Snapshot/Restore API supports a number of different repository types for sto
[float]
=== Circuit Breaker: Fielddata (STATUS: DONE, v1.0.0)
Currently, the https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-fielddata.html[circuit breaker] protects against loading too much field data by estimating how much memory the field data will take to load, then aborting the request if the memory requirements are too high. This feature was added in Elasticsearch version 1.0.0.
Currently, the https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-fielddata.html[circuit breaker] protects against loading too much field data by estimating how much memory the field data will take to load, then aborting the request if the memory requirements are too high. This feature was added in Elasticsearch version 1.0.0.
[float]
=== Use of Paginated Data Structures to Ease Garbage Collection (STATUS: DONE, v1.0.0 & v1.2.0)