Docs: Refactored modules and index modules sections

This commit is contained in:
Clinton Gormley 2015-06-22 23:49:45 +02:00
parent 1df2d3015e
commit f123a53d72
37 changed files with 1136 additions and 1102 deletions

View File

@ -7,7 +7,7 @@ can be cached for faster responses. These cached results are the same results
that would be returned by an uncached aggregation -- you will never get stale that would be returned by an uncached aggregation -- you will never get stale
results. results.
See <<index-modules-shard-query-cache>> for more details. See <<shard-query-cache>> for more details.
[[returning-only-agg-results]] [[returning-only-agg-results]]
== Returning only aggregation results == Returning only aggregation results

View File

@ -10,8 +10,8 @@ survive a full cluster restart). Here is an example:
curl -XPUT localhost:9200/_cluster/settings -d '{ curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : { "persistent" : {
"discovery.zen.minimum_master_nodes" : 2 "discovery.zen.minimum_master_nodes" : 2
} }
}' }'
-------------------------------------------------- --------------------------------------------------
Or: Or:
@ -21,8 +21,8 @@ Or:
curl -XPUT localhost:9200/_cluster/settings -d '{ curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : { "transient" : {
"discovery.zen.minimum_master_nodes" : 2 "discovery.zen.minimum_master_nodes" : 2
} }
}' }'
-------------------------------------------------- --------------------------------------------------
The cluster responds with the settings updated. So the response for the The cluster responds with the settings updated. So the response for the
@ -34,8 +34,8 @@ last example will be:
"persistent" : {}, "persistent" : {},
"transient" : { "transient" : {
"discovery.zen.minimum_master_nodes" : "2" "discovery.zen.minimum_master_nodes" : "2"
} }
}' }'
-------------------------------------------------- --------------------------------------------------
Cluster wide settings can be returned using: Cluster wide settings can be returned using:
@ -45,157 +45,7 @@ Cluster wide settings can be returned using:
curl -XGET localhost:9200/_cluster/settings curl -XGET localhost:9200/_cluster/settings
-------------------------------------------------- --------------------------------------------------
There is a specific list of settings that can be updated, those include:
[float] A list of dynamically updatable settings can be found in the
[[cluster-settings]] <<modules,Modules>> documentation.
=== Cluster settings
[float]
==== Routing allocation
[float]
===== Awareness
`cluster.routing.allocation.awareness.attributes`::
See <<modules-cluster>>.
`cluster.routing.allocation.awareness.force.*`::
See <<modules-cluster>>.
[float]
===== Balanced Shards
All these values are relative to one another. The first three are used to
compose a three separate weighting functions into one. The cluster is balanced
when no allowed action can bring the weights of each node closer together by
more then the fourth setting. Actions might not be allowed, for instance,
due to forced awareness or allocation filtering.
`cluster.routing.allocation.balance.shard`::
Defines the weight factor for shards allocated on a node
(float). Defaults to `0.45f`. Raising this raises the tendency to
equalize the number of shards across all nodes in the cluster.
`cluster.routing.allocation.balance.index`::
Defines a factor to the number of shards per index allocated
on a specific node (float). Defaults to `0.55f`. Raising this raises the
tendency to equalize the number of shards per index across all nodes in
the cluster.
`cluster.routing.allocation.balance.threshold`::
Minimal optimization value of operations that should be performed (non
negative float). Defaults to `1.0f`. Raising this will cause the cluster
to be less aggressive about optimizing the shard balance.
[float]
===== Concurrent Rebalance
`cluster.routing.allocation.cluster_concurrent_rebalance`::
Allow to control how many concurrent rebalancing of shards are
allowed cluster wide, and default it to `2` (integer). `-1` for
unlimited. See also <<modules-cluster>>.
[float]
===== Enable allocation
`cluster.routing.allocation.enable`::
See <<modules-cluster>>.
[float]
===== Throttling allocation
`cluster.routing.allocation.node_initial_primaries_recoveries`::
See <<modules-cluster>>.
`cluster.routing.allocation.node_concurrent_recoveries`::
See <<modules-cluster>>.
[float]
===== Filter allocation
`cluster.routing.allocation.include.*`::
See <<modules-cluster>>.
`cluster.routing.allocation.exclude.*`::
See <<modules-cluster>>.
`cluster.routing.allocation.require.*`
See <<modules-cluster>>.
[float]
==== Metadata
`cluster.blocks.read_only`::
Have the whole cluster read only (indices do not accept write operations), metadata is not allowed to be modified (create or delete indices).
[float]
==== Discovery
`discovery.zen.minimum_master_nodes`::
See <<modules-discovery-zen>>
`discovery.zen.publish_timeout`::
See <<modules-discovery-zen>>
[float]
==== Threadpools
`threadpool.*`::
See <<modules-threadpool>>
[float]
[[cluster-index-settings]]
=== Index settings
[float]
==== Index filter cache
`indices.cache.filter.size`::
See <<index-modules-cache>>
[float]
==== TTL interval
`indices.ttl.interval` (time)::
See <<mapping-ttl-field>>
[float]
==== Recovery
`indices.recovery.concurrent_streams`::
See <<modules-indices>>
`indices.recovery.concurrent_small_file_streams`::
See <<modules-indices>>
`indices.recovery.file_chunk_size`::
See <<modules-indices>>
`indices.recovery.translog_ops`::
See <<modules-indices>>
`indices.recovery.translog_size`::
See <<modules-indices>>
`indices.recovery.compress`::
See <<modules-indices>>
`indices.recovery.max_bytes_per_sec`::
See <<modules-indices>>
[float]
[[logger]]
=== Logger
Logger values can also be updated by setting `logger.` prefix. More
settings will be allowed to be updated.
[float]
=== Field data circuit breaker
`indices.breaker.fielddata.limit`::
See <<index-modules-fielddata>>
`indices.breaker.fielddata.overhead`::
See <<index-modules-fielddata>>

View File

@ -1,49 +1,177 @@
[[index-modules]] [[index-modules]]
= Index Modules = Index Modules
[partintro] [partintro]
-- --
Index Modules are modules created per index and control all aspects
related to an index. Since those modules lifecycle are tied to an index, Index Modules are modules created per index and control all aspects related to
all the relevant modules settings can be provided when creating an index an index.
(and it is actually the recommended way to configure an index).
[float] [float]
[[index-modules-settings]] [[index-modules-settings]]
== Index Settings == Index Settings
There are specific index level settings that are not associated with any Index level settings can be set per-index. Settings may be:
specific module. These include:
_static_::
They can only be set at index creation time or on a
<<indices-open-close,closed index>>.
_dynamic_::
They can be changed on a live index using the
<<indices-update-settings,update-index-settings>> API.
WARNING: Changing static or dynamic index settings on a closed index could
result in incorrect settings that are impossible to rectify without deleting
and recreating the index.
[float]
=== Static index settings
Below is a list of all _static_ index settings that are not associated with any
specific index module:
`index.number_of_shards`::
The number of primary shards that an index should have. Defaults to 5.
This setting can only be set at index creation time. It cannot be
changed on a closed index.
`index.shard.check_on_startup`::
+
--
experimental[] Whether or not shards should be checked for corruption before opening. When
corruption is detected, it will prevent the shard from being opened. Accepts:
`false`::
(default) Don't check for corruption when opening a shard.
`checksum`::
Check for physical corruption.
`true`::
Check for both physical and logical corruption. This is much more
expensive in terms of CPU and memory usage.
`fix`::
Check for both physical and logical corruption. Segments that were reported
as corrupted will be automatically removed. This option *may result in data loss*.
Use with extreme caution!
Checking shards may take a lot of time on large indices.
--
[float]
[[dynamic-index-settings]]
=== Dynamic index settings
Below is a list of all _dynamic_ index settings that are not associated with any
specific index module:
`index.number_of_replicas`::
The number of replicas each primary shard has. Defaults to 1.
`index.auto_expand_replicas`::
Auto-expand the number of replicas based on the number of available nodes.
Set to a dash delimited lower and upper bound (e.g. `0-5`) or use `all`
for the upper bound (e.g. `0-all`). Defaults to `false` (i.e. disabled).
`index.refresh_interval`:: `index.refresh_interval`::
A time setting controlling how often the
refresh operation will be executed. Defaults to `1s`. Can be set to `-1` How often to perform a refresh operation, which makes recent changes to the
in order to disable it. index visible to search. Defaults to `1s`. Can be set to `-1` to disable
refresh.
`index.codec`:: `index.codec`::
experimental[] experimental[] The `default` value compresses stored data with LZ4
The `default` value compresses stored data with LZ4 compression, but compression, but this can be set to `best_compression` for a higher
this can be set to `best_compression` for a higher compression ratio, compression ratio, at the expense of slower stored fields performance.
at the expense of slower stored fields performance.
`index.shard.check_on_startup`:: `index.blocks.read_only`::
experimental[] Set to `true` to make the index and index metadata read only, `false` to
Should shard consistency be checked upon opening. When corruption is detected, allow writes and metadata changes.
it will prevent the shard from being opened.
+
When `checksum`, check for physical corruption.
When `true`, check for both physical and logical corruption. This is much
more expensive in terms of CPU and memory usage.
When `fix`, check for both physical and logical corruption, and segments
that were reported as corrupted will be automatically removed.
Default value is `false`, which performs no checks.
NOTE: Checking shards may take a lot of time on large indices. `index.blocks.read`::
WARNING: Setting `index.shard.check_on_startup` to `fix` may result in data loss, Set to `true` to disable read operations against the index.
use with extreme caution.
`index.blocks.write`::
Set to `true` to disable write operations against the index.
`index.blocks.metadata`::
Set to `true` to disable index metadata reads and writes.
`index.ttl.disable_purge`::
experimental[] Disables the purge of <<mapping-ttl-field,expired docs>> on
the current index.
[[index.recovery.initial_shards]]`index.recovery.initial_shards`::
+
--
A primary shard is only recovered only if there are enough nodes available to
allocate sufficient replicas to form a quorum. It can be set to:
* `quorum` (default)
* `quorum-1` (or `half`)
* `full`
* `full-1`.
* Number values are also supported, e.g. `1`.
--
[float]
=== Settings in other index modules
Other index settings are available in index modules:
<<analysis,Analysis>>::
Settings to define analyzers, tokenizers, token filters and character
filters.
<<index-modules-allocation,Index shard allocation>>::
Control over where, when, and how shards are allocated to nodes.
<<index-modules-mapper,Mapping>>::
Enable or disable dynamic mapping for an index.
<<index-modules-merge,Merging>>::
Control over how shards are merged by the background merge process.
<<index-modules-similarity,Similarities>>::
Configure custom similarity settings to customize how search results are
scored.
<<index-modules-slowlog,Slowlog>>::
Control over how slow queries and fetch requests are logged.
<<index-modules-store,Store>>::
Configure the type of filesystem used to access shard data.
<<index-modules-translog,Translog>>::
Control over the transaction log and background flush operations.
-- --
@ -51,22 +179,16 @@ include::index-modules/analysis.asciidoc[]
include::index-modules/allocation.asciidoc[] include::index-modules/allocation.asciidoc[]
include::index-modules/slowlog.asciidoc[] include::index-modules/mapper.asciidoc[]
include::index-modules/merge.asciidoc[] include::index-modules/merge.asciidoc[]
include::index-modules/store.asciidoc[] include::index-modules/similarity.asciidoc[]
include::index-modules/mapper.asciidoc[] include::index-modules/slowlog.asciidoc[]
include::index-modules/store.asciidoc[]
include::index-modules/translog.asciidoc[] include::index-modules/translog.asciidoc[]
include::index-modules/cache.asciidoc[]
include::index-modules/query-cache.asciidoc[]
include::index-modules/fielddata.asciidoc[]
include::index-modules/similarity.asciidoc[]

View File

@ -1,168 +1,131 @@
[[index-modules-allocation]] [[index-modules-allocation]]
== Index Shard Allocation == Index Shard Allocation
This module provides per-index settings to control the allocation of shards to
nodes.
[float] [float]
[[shard-allocation-filtering]] [[shard-allocation-filtering]]
=== Shard Allocation Filtering === Shard Allocation Filtering
Allows to control the allocation of indices on nodes based on include/exclude Shard allocation filtering allows you to specify which nodes are allowed
filters. The filters can be set both on the index level and on the to host the shards of a particular index.
cluster level. Lets start with an example of setting it on the cluster
level:
Lets say we have 4 nodes, each has specific attribute called `tag` NOTE: The per-index shard allocation filters explained below work in
associated with it (the name of the attribute can be any name). Each conjunction with the cluster-wide allocation filters explained in
node has a specific value associated with `tag`. Node 1 has a setting <<shards-allocation>>.
`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on.
We can create an index that will only deploy on nodes that have `tag` It is possible to assign arbitrary metadata attributes to each node at
set to `value1` and `value2` by setting startup. For instance, nodes could be assigned a `rack` and a `group`
`index.routing.allocation.include.tag` to `value1,value2`. For example: attribute as follows:
[source,js] [source,sh]
-------------------------------------------------- ------------------------
curl -XPUT localhost:9200/test/_settings -d '{ bin/elasticsearch --node.rack rack1 --node.size big <1>
"index.routing.allocation.include.tag" : "value1,value2" ------------------------
}' <1> These attribute settings can also be specfied in the `elasticsearch.yml` config file.
--------------------------------------------------
On the other hand, we can create an index that will be deployed on all These metadata attributes can be used with the
nodes except for nodes with a `tag` of value `value3` by setting `index.routing.allocation.*` settings to allocate an index to a particular
`index.routing.allocation.exclude.tag` to `value3`. For example: group of nodes. For instance, we can move the index `test` to either `big` or
`medium` nodes as follows:
[source,js] [source,json]
-------------------------------------------------- ------------------------
curl -XPUT localhost:9200/test/_settings -d '{ PUT test/_settings
"index.routing.allocation.exclude.tag" : "value3" {
}' "index.routing.allocation.include.size": "big,medium"
-------------------------------------------------- }
------------------------
// AUTOSENSE
`index.routing.allocation.require.*` can be used to Alternatively, we can move the index `test` away from the `small` nodes with
specify a number of rules, all of which MUST match in order for a shard an `exclude` rule:
to be allocated to a node. This is in contrast to `include` which will
include a node if ANY rule matches.
The `include`, `exclude` and `require` values can have generic simple [source,json]
matching wildcards, for example, `value1*`. Additionally, special attribute ------------------------
names called `_ip`, `_name`, `_id` and `_host` can be used to match by node PUT test/_settings
ip address, name, id or host name, respectively. {
"index.routing.allocation.exclude.size": "small"
}
------------------------
// AUTOSENSE
Obviously a node can have several attributes associated with it, and Multiple rules can be specified, in which case all conditions must be
both the attribute name and value are controlled in the setting. For satisfied. For instance, we could move the index `test` to `big` nodes in
example, here is a sample of several node configurations: `rack1` with the following:
[source,js] [source,json]
-------------------------------------------------- ------------------------
node.group1: group1_value1 PUT test/_settings
node.group2: group2_value4 {
-------------------------------------------------- "index.routing.allocation.include.size": "big",
"index.routing.allocation.include.rack": "rack1"
}
------------------------
// AUTOSENSE
In the same manner, `include`, `exclude` and `require` can work against NOTE: If some conditions cannot be satisfied then shards will not be moved.
several attributes, for example:
[source,js] The following settings are _dynamic_, allowing live indices to be moved from
-------------------------------------------------- one set of nodes to another:
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.group1" : "xxx"
"index.routing.allocation.include.group2" : "yyy",
"index.routing.allocation.exclude.group3" : "zzz",
"index.routing.allocation.require.group4" : "aaa",
}'
--------------------------------------------------
The provided settings can also be updated in real time using the update `index.routing.allocation.include.{attribute}`::
settings API, allowing to "move" indices (shards) around in realtime.
Cluster wide filtering can also be defined, and be updated in real time Assign the index to a node whose `{attribute}` has at least one of the
using the cluster update settings API. This setting can come in handy comma-separated values.
for things like decommissioning nodes (even if the replica count is set
to 0). Here is a sample of how to decommission a node based on `_ip`
address:
[source,js] `index.routing.allocation.require.{attribute}`::
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{ Assign the index to a node whose `{attribute}` has _all_ of the
"transient" : { comma-separated values.
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
} `index.routing.allocation.exclude.{attribute}`::
}'
-------------------------------------------------- Assign the index to a node whose `{attribute}` has _none_ of the
comma-separated values.
These special attributes are also supported:
[horizontal]
`_name`:: Match nodes by node name
`_ip`:: Match nodes by IP address (the IP address associated with the hostname)
`_host`:: Match nodes by hostname
All attribute values can be specified with wildcards, eg:
[source,json]
------------------------
PUT test/_settings
{
"index.routing.allocation.include._ip": "192.168.2.*"
}
------------------------
// AUTOSENSE
[float] [float]
=== Total Shards Per Node === Total Shards Per Node
The `index.routing.allocation.total_shards_per_node` setting allows to The cluster-level shard allocator tries to spread the shards of a single index
control how many total shards (replicas and primaries) for an index will be allocated per node. across as many nodes as possible. However, depending on how many shards and
It can be dynamically set on a live index using the update index indices you have, and how big they are, it may not always be possible to spread
settings API. shards evenly.
[float] The following _dynamic_ setting allows you to specify a hard limit on the total
[[disk]] number of shards from a single index allowed per node:
=== Disk-based Shard Allocation
disk based shard allocation is enabled from version 1.3.0 onward `index.routing.allocation.total_shards_per_node`::
Elasticsearch can be configured to prevent shard The maximum number of shards (replicas and primaries) that will be
allocation on nodes depending on disk usage for the node. This allocated to a single node. Defaults to unbounded.
functionality is enabled by default, and can be changed either in the
configuration file, or dynamically using:
[source,js] [WARNING]
-------------------------------------------------- =======================================
curl -XPUT localhost:9200/_cluster/settings -d '{ This setting imposes a hard limit which can result in some shards not
"transient" : { being allocated.
"cluster.routing.allocation.disk.threshold_enabled" : false
}
}'
--------------------------------------------------
Once enabled, Elasticsearch uses two watermarks to decide whether Use with caution.
shards should be allocated or can remain on the node. =======================================
`cluster.routing.allocation.disk.watermark.low` controls the low
watermark for disk usage. It defaults to 85%, meaning ES will not
allocate new shards to nodes once they have more than 85% disk
used. It can also be set to an absolute byte value (like 500mb) to
prevent ES from allocating shards if less than the configured amount
of space is available.
`cluster.routing.allocation.disk.watermark.high` controls the high
watermark. It defaults to 90%, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 90%. It can
also be set to an absolute byte value (similar to the low watermark)
to relocate shards once less than the configured amount of space is
available on the node.
NOTE: Percentage values refer to used disk space, while byte values refer to
free disk space. This can be confusing, since it flips the meaning of
high and low. For example, it makes sense to set the low watermark to 10gb
and the high watermark to 5gb, but not the other way around.
Both watermark settings can be changed dynamically using the cluster
settings API. By default, Elasticsearch will retrieve information
about the disk usage of the nodes every 30 seconds. This can also be
changed by setting the `cluster.info.update.interval` setting.
An example of updating the low watermark to no more than 80% of the disk size, a
high watermark of at least 50 gigabytes free, and updating the information about
the cluster every minute:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.disk.watermark.low" : "80%",
"cluster.routing.allocation.disk.watermark.high" : "50gb",
"cluster.info.update.interval" : "1m"
}
}'
--------------------------------------------------
By default, Elasticsearch will take into account shards that are currently being
relocated to the target node when computing a node's disk usage. This can be
changed by setting the `cluster.routing.allocation.disk.include_relocations`
setting to `false` (defaults to `true`). Taking relocating shards' sizes into
account may, however, mean that the disk usage for a node is incorrectly
estimated on the high side, since the relocation could be 90% complete and a
recently retrieved disk usage would include the total size of the relocating
shard as well as the space already used by the running relocation.

View File

@ -1,18 +1,12 @@
[[index-modules-analysis]] [[index-modules-analysis]]
== Analysis == Analysis
The index analysis module acts as a configurable registry of Analyzers The index analysis module acts as a configurable registry of _analyzers_
that can be used in order to break down indexed (analyzed) fields when a that can be used in order to convert a string field into individual terms
document is indexed as well as to process query strings. It maps to the Lucene which are:
`Analyzer`.
Analyzers are (generally) composed of a single `Tokenizer` and zero or * added to the inverted index in order to make the document searchable
more `TokenFilters`. A set of `CharFilters` can be associated with an * used by high level queries such as the <<query-dsl-match-query,`match` query>>
analyzer to process the characters prior to other analysis steps. The to generate seach terms.
analysis module allows one to register `TokenFilters`, `Tokenizers` and
`Analyzers` under logical names that can then be referenced either in
mapping definitions or in certain APIs. The Analysis module
automatically registers (*if not explicitly defined*) built in
analyzers, token filters, and tokenizers.
See <<analysis>> for configuration details. See <<analysis>> for configuration details.

View File

@ -1,33 +0,0 @@
[[index-modules-cache]]
== Cache
There are different caching inner modules associated with an index. They
include `filter` and others.
[float]
[[filter]]
=== Filter Cache
The filter cache is responsible for caching the results of filters (used
in the query). The default implementation of a filter cache (and the one
recommended to use in almost all cases) is the `node` filter cache type.
[float]
[[node-filter]]
==== Node Filter Cache
The `node` filter cache may be configured to use either a percentage of
the total memory allocated to the process or a specific amount of
memory. All shards present on a node share a single node cache (thats
why its called `node`). The cache implements an LRU eviction policy:
when a cache becomes full, the least recently used data is evicted to
make way for new data.
The setting that allows one to control the memory size for the filter
cache is `indices.cache.filter.size`, which defaults to `10%`. *Note*,
this is *not* an index level setting but a node level setting (can be
configured in the node configuration).
`indices.cache.filter.size` can accept either a percentage value, like
`30%`, or an exact value, like `512mb`.

View File

@ -49,5 +49,10 @@ automatically.
The default mapping can be overridden by specifying the `_default_` type when The default mapping can be overridden by specifying the `_default_` type when
creating a new index. creating a new index.
Dynamic creation of mappings for unmapped types can be completely [float]
disabled by setting `index.mapper.dynamic` to `false`. === Mapper settings
`index.mapper.dynamic` (_static_)::
Dynamic creation of mappings for unmapped types can be completely
disabled by setting `index.mapper.dynamic` to `false`.

View File

@ -14,6 +14,11 @@ number of segments per tier. The merge policy is able to merge
non-adjacent segments, and separates how many segments are merged at once from how many non-adjacent segments, and separates how many segments are merged at once from how many
segments are allowed per tier. It also does not over-merge (i.e., cascade merges). segments are allowed per tier. It also does not over-merge (i.e., cascade merges).
[float]
[[merge-settings]]
=== Merge policy settings
All merge policy settings are _dynamic_ and can be updated on a live index.
The merge policy has the following settings: The merge policy has the following settings:
`index.merge.policy.expunge_deletes_allowed`:: `index.merge.policy.expunge_deletes_allowed`::
@ -80,30 +85,29 @@ possibly either increase the `max_merged_segment` or issue an optimize
call for the index (try and aim to issue it on a low traffic time). call for the index (try and aim to issue it on a low traffic time).
[float] [float]
[[scheduling]] [[merge-scheduling]]
=== Scheduling === Merge scheduling
The merge scheduler (ConcurrentMergeScheduler) controls the execution of The merge scheduler (ConcurrentMergeScheduler) controls the execution of
merge operations once they are needed (according to the merge policy). Merges merge operations once they are needed (according to the merge policy). Merges
run in separate threads, and when the maximum number of threads is reached, run in separate threads, and when the maximum number of threads is reached,
further merges will wait until a merge thread becomes available. The merge further merges will wait until a merge thread becomes available.
scheduler supports this setting:
The merge scheduler supports the following _dynamic_ settings:
`index.merge.scheduler.max_thread_count`:: `index.merge.scheduler.max_thread_count`::
The maximum number of threads that may be merging at once. Defaults to The maximum number of threads that may be merging at once. Defaults to
`Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))` `Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))`
which works well for a good solid-state-disk (SSD). If your index is on which works well for a good solid-state-disk (SSD). If your index is on
spinning platter drives instead, decrease this to 1. spinning platter drives instead, decrease this to 1.
`index.merge.scheduler.auto_throttle`:: `index.merge.scheduler.auto_throttle`::
If this is true (the default), then the merge scheduler will If this is true (the default), then the merge scheduler will rate-limit IO
rate-limit IO (writes) for merges to an adaptive value depending on (writes) for merges to an adaptive value depending on how many merges are
how many merges are requested over time. An application with a low requested over time. An application with a low indexing rate that
indexing rate that unluckily suddenly requires a large merge will see unluckily suddenly requires a large merge will see that merge aggressively
that merge aggressively throttled, while an application doing heavy throttled, while an application doing heavy indexing will see the throttle
indexing will see the throttle move higher to allow merges to keep up move higher to allow merges to keep up with ongoing indexing.
with ongoing indexing. This is a dynamic setting (you can <<indices-update-settings,change it
at any time on a running index>>).

View File

@ -1,29 +1,31 @@
[[index-modules-slowlog]] [[index-modules-slowlog]]
== Index Slow Log == Slow Log
[float] [float]
[[search-slow-log]] [[search-slow-log]]
=== Search Slow Log === Search Slow Log
Shard level slow search log allows to log slow search (query and fetch Shard level slow search log allows to log slow search (query and fetch
executions) into a dedicated log file. phases) into a dedicated log file.
Thresholds can be set for both the query phase of the execution, and Thresholds can be set for both the query phase of the execution, and
fetch phase, here is a sample: fetch phase, here is a sample:
[source,js] [source,yaml]
-------------------------------------------------- --------------------------------------------------
#index.search.slowlog.threshold.query.warn: 10s index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms index.search.slowlog.threshold.query.trace: 500ms
#index.search.slowlog.threshold.fetch.warn: 1s index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms index.search.slowlog.threshold.fetch.trace: 200ms
-------------------------------------------------- --------------------------------------------------
All of the above settings are _dynamic_ and can be set per-index.
By default, none are enabled (set to `-1`). Levels (`warn`, `info`, By default, none are enabled (set to `-1`). Levels (`warn`, `info`,
`debug`, `trace`) allow to control under which logging level the log `debug`, `trace`) allow to control under which logging level the log
will be logged. Not all are required to be configured (for example, only will be logged. Not all are required to be configured (for example, only
@ -37,14 +39,10 @@ execute. Some of the benefits of shard level logging is the association
of the actual execution on the specific machine, compared with request of the actual execution on the specific machine, compared with request
level. level.
All settings are index level settings (and each index can have different
values for it), and can be changed in runtime using the index update
settings API.
The logging file is configured by default using the following The logging file is configured by default using the following
configuration (found in `logging.yml`): configuration (found in `logging.yml`):
[source,js] [source,yaml]
-------------------------------------------------- --------------------------------------------------
index_search_slow_log_file: index_search_slow_log_file:
type: dailyRollingFile type: dailyRollingFile
@ -64,18 +62,20 @@ log. The log file is ends with `_index_indexing_slowlog.log`. Log and
the thresholds are configured in the elasticsearch.yml file in the same the thresholds are configured in the elasticsearch.yml file in the same
way as the search slowlog. Index slowlog sample: way as the search slowlog. Index slowlog sample:
[source,js] [source,yaml]
-------------------------------------------------- --------------------------------------------------
#index.indexing.slowlog.threshold.index.warn: 10s index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms index.indexing.slowlog.threshold.index.trace: 500ms
-------------------------------------------------- --------------------------------------------------
All of the above settings are _dynamic_ and can be set per-index.
The index slow log file is configured by default in the `logging.yml` The index slow log file is configured by default in the `logging.yml`
file: file:
[source,js] [source,yaml]
-------------------------------------------------- --------------------------------------------------
index_indexing_slow_log_file: index_indexing_slow_log_file:
type: dailyRollingFile type: dailyRollingFile

View File

@ -1,34 +1,16 @@
[[index-modules-store]] [[index-modules-store]]
== Store == Store
The store module allows you to control how index data is stored. The store module allows you to control how index data is stored and accessed on disk.
The index can either be stored in-memory (no persistence) or on-disk
(the default). In-memory indices provide better performance at the cost
of limiting the index size to the amount of available physical memory.
When using a local gateway (the default), file system storage with *no*
in memory storage is required to maintain index consistency. This is
required since the local gateway constructs its state from the local
index state of each node.
Another important aspect of memory based storage is the fact that
Elasticsearch supports storing the index in memory *outside of the JVM
heap space* using the "Memory" (see below) storage type. It translates
to the fact that there is no need for extra large JVM heaps (with their
own consequences) for storing the index in memory.
experimental[All of the settings exposed in the `store` module are expert only and may be removed in the future]
[float] [float]
[[file-system]] [[file-system]]
=== File system storage types === File system storage types
File system based storage is the default storage used. There are There are different file system implementations or _storage types_. The best
different implementations or _storage types_. The best one for the one for the operating environment will be automatically chosen: `mmapfs` on
operating environment will be automatically chosen: `mmapfs` on Windows 64bit, `simplefs` on Windows 32bit, and `default` (hybrid `niofs` and
Windows 64bit, `simplefs` on Windows 32bit, and `default` `mmapfs`) for the rest.
(hybrid `niofs` and `mmapfs`) for the rest.
This can be overridden for all indices by adding this to the This can be overridden for all indices by adding this to the
`config/elasticsearch.yml` file: `config/elasticsearch.yml` file:
@ -38,57 +20,53 @@ This can be overridden for all indices by adding this to the
index.store.type: niofs index.store.type: niofs
--------------------------------- ---------------------------------
It can also be set on a per-index basis at index creation time: It is a _static_ setting that can be set on a per-index basis at index
creation time:
[source,json] [source,json]
--------------------------------- ---------------------------------
curl -XPUT localhost:9200/my_index -d '{ PUT /my_index
"settings": { {
"index.store.type": "niofs" "settings": {
} "index.store.type": "niofs"
}'; }
}
--------------------------------- ---------------------------------
experimental[This is an expert-only setting and may be removed in the future]
The following sections lists all the different storage types supported. The following sections lists all the different storage types supported.
[float] [[simplefs]]`simplefs`::
[[simplefs]]
==== Simple FS
The `simplefs` type is a straightforward implementation of file system The Simple FS type is a straightforward implementation of file system
storage (maps to Lucene `SimpleFsDirectory`) using a random access file. storage (maps to Lucene `SimpleFsDirectory`) using a random access file.
This implementation has poor concurrent performance (multiple threads This implementation has poor concurrent performance (multiple threads
will bottleneck). It is usually better to use the `niofs` when you need will bottleneck). It is usually better to use the `niofs` when you need
index persistence. index persistence.
[float] [[niofs]]`niofs`::
[[niofs]]
==== NIO FS
The `niofs` type stores the shard index on the file system (maps to The NIO FS type stores the shard index on the file system (maps to
Lucene `NIOFSDirectory`) using NIO. It allows multiple threads to read Lucene `NIOFSDirectory`) using NIO. It allows multiple threads to read
from the same file concurrently. It is not recommended on Windows from the same file concurrently. It is not recommended on Windows
because of a bug in the SUN Java implementation. because of a bug in the SUN Java implementation.
[[mmapfs]] [[mmapfs]]`mmapfs`::
[float]
==== MMap FS
The `mmapfs` type stores the shard index on the file system (maps to The MMap FS type stores the shard index on the file system (maps to
Lucene `MMapDirectory`) by mapping a file into memory (mmap). Memory Lucene `MMapDirectory`) by mapping a file into memory (mmap). Memory
mapping uses up a portion of the virtual memory address space in your mapping uses up a portion of the virtual memory address space in your
process equal to the size of the file being mapped. Before using this process equal to the size of the file being mapped. Before using this
class, be sure your have plenty of virtual address space. class, be sure you have allowed plenty of
See <<vm-max-map-count>> <<vm-max-map-count,virtual address space>>.
[[default_fs]] [[default_fs]]`default_fs`::
[float]
==== Hybrid MMap / NIO FS
The `default` type stores the shard index on the file system depending on The `default` type is a hybrid of NIO FS and MMapFS, which chooses the best
the file type by mapping a file into memory (mmap) or using Java NIO. Currently file system for each type of file. Currently only the Lucene term dictionary
only the Lucene term dictionary and doc values files are memory mapped to reduce and doc values files are memory mapped to reduce the impact on the operating
the impact on the operating system. All other files are opened using Lucene `NIOFSDirectory`. system. All other files are opened using Lucene `NIOFSDirectory`. Address
Address space settings (<<vm-max-map-count>>) might also apply if your term space settings (<<vm-max-map-count>>) might also apply if your term
dictionaries are large. dictionaries are large.

View File

@ -43,7 +43,7 @@ specified as well in the URI. Those stats can be any of:
`fielddata`:: Fielddata statistics. `fielddata`:: Fielddata statistics.
`flush`:: Flush statistics. `flush`:: Flush statistics.
`merge`:: Merge statistics. `merge`:: Merge statistics.
`query_cache`:: <<index-modules-shard-query-cache,Shard query cache>> statistics. `query_cache`:: <<shard-query-cache,Shard query cache>> statistics.
`refresh`:: Refresh statistics. `refresh`:: Refresh statistics.
`suggest`:: Suggest statistics. `suggest`:: Suggest statistics.
`warmer`:: Warmer statistics. `warmer`:: Warmer statistics.
@ -80,7 +80,7 @@ curl 'localhost:9200/_stats/search?groups=group1,group2
-------------------------------------------------- --------------------------------------------------
The stats returned are aggregated on the index level, with The stats returned are aggregated on the index level, with
`primaries` and `total` aggregations, where `primaries` are the values for only the `primaries` and `total` aggregations, where `primaries` are the values for only the
primary shards, and `total` are the cumulated values for both primary and replica shards. primary shards, and `total` are the cumulated values for both primary and replica shards.
In order to get back shard level stats, set the `level` parameter to `shards`. In order to get back shard level stats, set the `level` parameter to `shards`.

View File

@ -29,130 +29,8 @@ curl -XPUT 'localhost:9200/my_index/_settings' -d '
}' }'
-------------------------------------------------- --------------------------------------------------
[WARNING] The list of per-index settings which can be updated dynamically on live
======================== indices can be found in <<index-modules>>.
When changing the number of replicas the index needs to be open. Changing
the number of replicas on a closed index might prevent the index to be opened correctly again.
========================
Below is the list of settings that can be changed using the update
settings API:
`index.number_of_replicas`::
The number of replicas each shard has.
`index.auto_expand_replicas` (string)::
Set to a dash delimited lower and upper bound (e.g. `0-5`)
or one may use `all` as the upper bound (e.g. `0-all`), or `false` to disable it.
`index.blocks.read_only`::
Set to `true` to have the index read only, `false` to allow writes
and metadata changes.
`index.blocks.read`::
Set to `true` to disable read operations against the index.
`index.blocks.write`::
Set to `true` to disable write operations against the index.
`index.blocks.metadata`::
Set to `true` to disable metadata operations against the index.
`index.refresh_interval`::
The async refresh interval of a shard.
`index.translog.flush_threshold_ops`::
When to flush based on operations.
`index.translog.flush_threshold_size`::
When to flush based on translog (bytes) size.
`index.translog.flush_threshold_period`::
When to flush based on a period of not flushing.
`index.translog.disable_flush`::
Disables flushing. Note, should be set for a short
interval and then enabled.
`index.cache.filter.max_size`::
The maximum size of filter cache (per segment in shard).
Set to `-1` to disable.
`index.cache.filter.expire`::
experimental[] The expire after access time for filter cache.
Set to `-1` to disable.
`index.gateway.snapshot_interval`::
experimental[] The gateway snapshot interval (only applies to shared
gateways). Defaults to 10s.
<<index-modules-merge,merge policy>>::
All the settings for the merge policy currently configured.
A different merge policy can't be set.
`index.merge.scheduler.*`::
experimental[] All the settings for the merge scheduler.
`index.routing.allocation.include.*`::
A node matching any rule will be allowed to host shards from the index.
`index.routing.allocation.exclude.*`::
A node matching any rule will NOT be allowed to host shards from the index.
`index.routing.allocation.require.*`::
Only nodes matching all rules will be allowed to host shards from the index.
`index.routing.allocation.disable_allocation`::
Disable allocation. Defaults to `false`. Deprecated in favour for `index.routing.allocation.enable`.
`index.routing.allocation.disable_new_allocation`::
Disable new allocation. Defaults to `false`. Deprecated in favour for `index.routing.allocation.enable`.
`index.routing.allocation.disable_replica_allocation`::
Disable replica allocation. Defaults to `false`. Deprecated in favour for `index.routing.allocation.enable`.
`index.routing.allocation.enable`::
Enables shard allocation for a specific index. It can be set to:
* `all` (default) - Allows shard allocation for all shards.
* `primaries` - Allows shard allocation only for primary shards.
* `new_primaries` - Allows shard allocation only for primary shards for new indices.
* `none` - No shard allocation is allowed.
`index.routing.rebalance.enable`::
Enables shard rebalancing for a specific index. It can be set to:
* `all` (default) - Allows shard rebalancing for all shards.
* `primaries` - Allows shard rebalancing only for primary shards.
* `replicas` - Allows shard rebalancing only for replica shards.
* `none` - No shard rebalancing is allowed.
`index.routing.allocation.total_shards_per_node`::
Controls the total number of shards (replicas and primaries) allowed to be allocated on a single node. Defaults to unbounded (`-1`).
`index.recovery.initial_shards`::
When using local gateway a particular shard is recovered only if there can be allocated quorum shards in the cluster. It can be set to:
* `quorum` (default)
* `quorum-1` (or `half`)
* `full`
* `full-1`.
* Number values are also supported, e.g. `1`.
`index.gc_deletes`::
experimental[]
`index.ttl.disable_purge`::
experimental[] Disables temporarily the purge of expired docs.
<<index-modules-store,store level throttling>>::
All the settings for the store level throttling policy currently configured.
`index.translog.fs.type`::
experimental[] Either `simple` or `buffered` (default).
<<index-modules-slowlog>>::
All the settings for slow log.
`index.warmer.enabled`::
See <<indices-warmers>>. Defaults to `true`.
[float] [float]
[[bulk]] [[bulk]]

View File

@ -56,10 +56,10 @@ value as a numeric type).
The `index.mapping.coerce` global setting can be set on the The `index.mapping.coerce` global setting can be set on the
index level to coerce numeric content globally across all index level to coerce numeric content globally across all
mapping types (The default setting is true and coercions attempted are mapping types (The default setting is true and coercions attempted are
to convert strings with numbers into numeric types and also numeric values to convert strings with numbers into numeric types and also numeric values
with fractions to any integer/short/long values minus the fraction part). with fractions to any integer/short/long values minus the fraction part).
When the permitted conversions fail in their attempts, the value is considered When the permitted conversions fail in their attempts, the value is considered
malformed and the ignore_malformed setting dictates what will happen next. malformed and the ignore_malformed setting dictates what will happen next.
-- --
@ -69,6 +69,8 @@ include::mapping/types.asciidoc[]
include::mapping/date-format.asciidoc[] include::mapping/date-format.asciidoc[]
include::mapping/fielddata_formats.asciidoc[]
include::mapping/dynamic-mapping.asciidoc[] include::mapping/dynamic-mapping.asciidoc[]
include::mapping/meta.asciidoc[] include::mapping/meta.asciidoc[]

View File

@ -1,87 +1,5 @@
[[index-modules-fielddata]]
== Field data
The field data cache is used mainly when sorting on or computing aggregations
on a field. It loads all the field values to memory in order to provide fast
document based access to those values. The field data cache can be
expensive to build for a field, so its recommended to have enough memory
to allocate it, and to keep it loaded.
The amount of memory used for the field
data cache can be controlled using `indices.fielddata.cache.size`. Note:
reloading the field data which does not fit into your cache will be expensive
and perform poorly.
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`indices.fielddata.cache.size` |The max size of the field data cache,
eg `30%` of node heap space, or an absolute value, eg `12GB`. Defaults
to unbounded.
|`indices.fielddata.cache.expire` |experimental[] A time based setting that expires
field data after a certain time of inactivity. Defaults to `-1`. For
example, can be set to `5m` for a 5 minute expiry.
|=======================================================================
[float]
[[circuit-breaker]]
=== Circuit Breaker
Elasticsearch contains multiple circuit breakers used to prevent operations from
causing an OutOfMemoryError. Each breaker specifies a limit for how much memory
it can use. Additionally, there is a parent-level breaker that specifies the
total amount of memory that can be used across all breakers.
The parent-level breaker can be configured with the following setting:
`indices.breaker.total.limit`::
Starting limit for overall parent breaker, defaults to 70% of JVM heap
All circuit breaker settings can be changed dynamically using the cluster update
settings API.
[float]
[[fielddata-circuit-breaker]]
==== Field data circuit breaker
The field data circuit breaker allows Elasticsearch to estimate the amount of
memory a field will require to be loaded into memory. It can then prevent the
field data loading by raising an exception. By default the limit is configured
to 60% of the maximum JVM heap. It can be configured with the following
parameters:
`indices.breaker.fielddata.limit`::
Limit for fielddata breaker, defaults to 60% of JVM heap
`indices.breaker.fielddata.overhead`::
A constant that all field data estimations are multiplied with to determine a
final estimation. Defaults to 1.03
[float]
[[request-circuit-breaker]]
==== Request circuit breaker
The request circuit breaker allows Elasticsearch to prevent per-request data
structures (for example, memory used for calculating aggregations during a
request) from exceeding a certain amount of memory.
`indices.breaker.request.limit`::
Limit for request breaker, defaults to 40% of JVM heap
`indices.breaker.request.overhead`::
A constant that all request estimations are multiplied with to determine a
final estimation. Defaults to 1
[float]
[[fielddata-monitoring]]
=== Monitoring field data
You can monitor memory usage for field data as well as the field data circuit
breaker using
<<cluster-nodes-stats,Nodes Stats API>>
[[fielddata-formats]] [[fielddata-formats]]
== Field data formats == Fielddata formats
The field data format controls how field data should be stored. The field data format controls how field data should be stored.
@ -111,7 +29,7 @@ It is possible to change the field data format (and the field data settings
in general) on a live index by using the update mapping API. in general) on a live index by using the update mapping API.
[float] [float]
==== String field data types === String field data types
`paged_bytes` (default on analyzed string fields):: `paged_bytes` (default on analyzed string fields)::
Stores unique terms sequentially in a large buffer and maps documents to Stores unique terms sequentially in a large buffer and maps documents to
@ -123,7 +41,7 @@ in general) on a live index by using the update mapping API.
`not_analyzed`). `not_analyzed`).
[float] [float]
==== Numeric field data types === Numeric field data types
`array`:: `array`::
Stores field values in memory using arrays. Stores field values in memory using arrays.
@ -132,7 +50,7 @@ in general) on a live index by using the update mapping API.
Computes and stores field data data-structures on disk at indexing time. Computes and stores field data data-structures on disk at indexing time.
[float] [float]
==== Geo point field data types === Geo point field data types
`array`:: `array`::
Stores latitudes and longitudes in arrays. Stores latitudes and longitudes in arrays.
@ -142,7 +60,7 @@ in general) on a live index by using the update mapping API.
[float] [float]
[[global-ordinals]] [[global-ordinals]]
==== Global ordinals === Global ordinals
Global ordinals is a data-structure on top of field data, that maintains an Global ordinals is a data-structure on top of field data, that maintains an
incremental numbering for all the terms in field data in a lexicographic order. incremental numbering for all the terms in field data in a lexicographic order.

View File

@ -200,7 +200,7 @@ PUT my_index/_mapping/my_type
Please however note that norms won't be removed instantly, but will be removed Please however note that norms won't be removed instantly, but will be removed
as old segments are merged into new segments as you continue indexing new documents. as old segments are merged into new segments as you continue indexing new documents.
Any score computation on a field that has had Any score computation on a field that has had
norms removed might return inconsistent results since some documents won't have norms removed might return inconsistent results since some documents won't have
norms anymore while other documents might still have norms. norms anymore while other documents might still have norms.
@ -484,7 +484,7 @@ binary type:
It is possible to control which field values are loaded into memory, It is possible to control which field values are loaded into memory,
which is particularly useful for aggregations on string fields, using which is particularly useful for aggregations on string fields, using
fielddata filters, which are explained in detail in the fielddata filters, which are explained in detail in the
<<index-modules-fielddata,Fielddata>> section. <<modules-fielddata,Fielddata>> section.
Fielddata filters can exclude terms which do not match a regex, or which Fielddata filters can exclude terms which do not match a regex, or which
don't fall between a `min` and `max` frequency range: don't fall between a `min` and `max` frequency range:

View File

@ -1,6 +1,75 @@
[[modules]] [[modules]]
= Modules = Modules
[partintro]
--
This section contains modules responsible for various aspects of the functionality in Elasticsearch. Each module has settings which may be:
_static_::
These settings must be set at the node level, either in the
`elasticsearch.yml` file, or as an environment variable or on the command line
when starting a node. They must be set on every relevant node in the cluster.
_dynamic_::
These settings can be dynamically updated on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API.
The modules in this section are:
<<modules-cluster,Cluster-level routing and shard allocation>>::
Settings to control where, when, and how shards are allocated to nodes.
<<modules-discovery,Discovery>>::
How nodes discover each other to form a cluster.
<<modules-gateway,Gateway>>::
How many nodes need to join the cluster before recovery can start.
<<modules-http,HTTP>>::
Settings to control the HTTP REST interface.
<<modules-indices,Indices>>::
Global index-related settings.
<<modules-network,Network>>::
Controls default network settings.
<<modules-node,Node client>>::
A Java node client joins the cluster, but doesn't hold data or act as a master node.
<<modules-plugins,Plugins>>::
Using plugins to extend Elasticsearch.
<<modules-scripting,Scripting>>::
Custom scripting available in Lucene Expressions, Groovy, Python, and
Javascript.
<<modules-snapshots,Snapshot/Restore>>::
Backup your data with snapshot/restore.
<<modules-threadpool,Thread pools>>::
Information about the dedicated thread pools used in Elasticsearch.
<<modules-transport,Transport>>::
Configure the transport networking layer, used internally by Elasticsearch
to communicate between nodes.
--
include::modules/cluster.asciidoc[] include::modules/cluster.asciidoc[]
include::modules/discovery.asciidoc[] include::modules/discovery.asciidoc[]
@ -15,19 +84,20 @@ include::modules/network.asciidoc[]
include::modules/node.asciidoc[] include::modules/node.asciidoc[]
include::modules/tribe.asciidoc[]
include::modules/plugins.asciidoc[] include::modules/plugins.asciidoc[]
include::modules/scripting.asciidoc[] include::modules/scripting.asciidoc[]
include::modules/advanced-scripting.asciidoc[] include::modules/advanced-scripting.asciidoc[]
include::modules/snapshots.asciidoc[]
include::modules/threadpool.asciidoc[] include::modules/threadpool.asciidoc[]
include::modules/transport.asciidoc[] include::modules/transport.asciidoc[]
include::modules/snapshots.asciidoc[] include::modules/tribe.asciidoc[]

View File

@ -1,5 +1,5 @@
[[modules-advanced-scripting]] [[modules-advanced-scripting]]
== Text scoring in scripts === Text scoring in scripts
Text features, such as term or document frequency for a specific term can be accessed in scripts (see <<modules-scripting, scripting documentation>> ) with the `_index` variable. This can be useful if, for example, you want to implement your own scoring model using for example a script inside a <<query-dsl-function-score-query,function score query>>. Text features, such as term or document frequency for a specific term can be accessed in scripts (see <<modules-scripting, scripting documentation>> ) with the `_index` variable. This can be useful if, for example, you want to implement your own scoring model using for example a script inside a <<query-dsl-function-score-query,function score query>>.
@ -7,7 +7,7 @@ Statistics over the document collection are computed *per shard*, not per
index. index.
[float] [float]
=== Nomenclature: ==== Nomenclature:
[horizontal] [horizontal]
@ -33,7 +33,7 @@ depending on the shard the current document resides in.
[float] [float]
=== Shard statistics: ==== Shard statistics:
`_index.numDocs()`:: `_index.numDocs()`::
@ -49,7 +49,7 @@ depending on the shard the current document resides in.
[float] [float]
=== Field statistics: ==== Field statistics:
Field statistics can be accessed with a subscript operator like this: Field statistics can be accessed with a subscript operator like this:
`_index['FIELD']`. `_index['FIELD']`.
@ -74,7 +74,7 @@ depending on the shard the current document resides in.
The number of terms in a field cannot be accessed using the `_index` variable. See <<mapping-core-types, word count mapping type>> on how to do that. The number of terms in a field cannot be accessed using the `_index` variable. See <<mapping-core-types, word count mapping type>> on how to do that.
[float] [float]
=== Term statistics: ==== Term statistics:
Term statistics for a field can be accessed with a subscript operator like Term statistics for a field can be accessed with a subscript operator like
this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist. this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist.
@ -101,7 +101,7 @@ affect is your set the `index_options` to `docs` (see <<mapping-core-types, mapp
[float] [float]
=== Term positions, offsets and payloads: ==== Term positions, offsets and payloads:
If you need information on the positions of terms in a field, call If you need information on the positions of terms in a field, call
`_index['FIELD'].get('TERM', flag)` where flag can be `_index['FIELD'].get('TERM', flag)` where flag can be
@ -174,7 +174,7 @@ return score;
[float] [float]
=== Term vectors: ==== Term vectors:
The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (set `term_vector` in the mapping as described in the <<mapping-core-types,mapping documentation>>). To access them, call The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (set `term_vector` in the mapping as described in the <<mapping-core-types,mapping documentation>>). To access them, call
`_index.termVectors()` to get a `_index.termVectors()` to get a

View File

@ -1,253 +1,36 @@
[[modules-cluster]] [[modules-cluster]]
== Cluster == Cluster
[float] One of the main roles of the master is to decide which shards to allocate to
[[shards-allocation]] which nodes, and when to move shards between nodes in order to rebalance the
=== Shards Allocation cluster.
Shards allocation is the process of allocating shards to nodes. This can There are a number of settings available to control the shard allocation process:
happen during initial recovery, replica allocation, rebalancing, or
handling nodes being added or removed.
The following settings may be used: * <<shards-allocation>> lists the settings to control the allocation an
rebalancing operations.
`cluster.routing.allocation.allow_rebalance`:: * <<disk-allocator>> explains how Elasticsearch takes available disk space
Allow to control when rebalancing will happen based on the total into account, and the related settings.
state of all the indices shards in the cluster. `always`,
`indices_primaries_active`, and `indices_all_active` are allowed,
defaulting to `indices_all_active` to reduce chatter during
initial recovery.
* <<allocation-awareness>> and <<forced-awareness>> control how shards can
be distributed across different racks or availability zones.
`cluster.routing.allocation.cluster_concurrent_rebalance`:: * <<allocation-filtering>> allows certain nodes or groups of nodes excluded
Allow to control how many concurrent rebalancing of shards are from allocation so that they can be decommisioned.
allowed cluster wide, and default it to `2`.
Besides these, there are a few other <<misc-cluster,miscellaneous cluster-level settings>>.
`cluster.routing.allocation.node_initial_primaries_recoveries`:: All of the settings in this section are _dynamic_ settings which can be
Allow to control specifically the number of initial recoveries updated on a live cluster with the
of primaries that are allowed per node. Since most times local <<cluster-update-settings,cluster-update-settings>> API.
gateway is used, those should be fast and we can handle more of
those per node without creating load. Defaults to `4`.
include::cluster/shards_allocation.asciidoc[]
`cluster.routing.allocation.node_concurrent_recoveries`:: include::cluster/disk_allocator.asciidoc[]
How many concurrent recoveries are allowed to happen on a node.
Defaults to `2`.
`cluster.routing.allocation.enable`:: include::cluster/allocation_awareness.asciidoc[]
Controls shard allocation for all indices, by allowing specific include::cluster/allocation_filtering.asciidoc[]
kinds of shard to be allocated.
+
--
Can be set to:
* `all` - (default) Allows shard allocation for all kinds of shards. include::cluster/misc.asciidoc[]
* `primaries` - Allows shard allocation only for primary shards.
* `new_primaries` - Allows shard allocation only for primary shards for new indices.
* `none` - No shard allocations of any kind are allowed for all indices.
--
`cluster.routing.rebalance.enable`::
Controls shard rebalance for all indices, by allowing specific
kinds of shard to be rebalanced.
+
--
Can be set to:
* `all` - (default) Allows shard balancing for all kinds of shards.
* `primaries` - Allows shard balancing only for primary shards.
* `replicas` - Allows shard balancing only for replica shards.
* `none` - No shard balancing of any kind are allowed for all indices.
--
`cluster.routing.allocation.same_shard.host`::
Allows to perform a check to prevent allocation of multiple instances
of the same shard on a single host, based on host name and host address.
Defaults to `false`, meaning that no check is performed by default. This
setting only applies if multiple nodes are started on the same machine.
`indices.recovery.concurrent_streams`::
The number of streams to open (on a *node* level) to recover a
shard from a peer shard. Defaults to `3`.
`indices.recovery.concurrent_small_file_streams`::
The number of streams to open (on a *node* level) for small files (under
5mb) to recover a shard from a peer shard. Defaults to `2`.
[float]
[[allocation-awareness]]
=== Shard Allocation Awareness
Cluster allocation awareness allows to configure shard and replicas
allocation across generic attributes associated the nodes. Lets explain
it through an example:
Assume we have several racks. When we start a node, we can configure an
attribute called `rack_id` (any attribute name works), for example, here
is a sample config:
----------------------
node.rack_id: rack_one
----------------------
The above sets an attribute called `rack_id` for the relevant node with
a value of `rack_one`. Now, we need to configure the `rack_id` attribute
as one of the awareness allocation attributes (set it on *all* (master
eligible) nodes config):
--------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id
--------------------------------------------------------
The above will mean that the `rack_id` attribute will be used to do
awareness based allocation of shard and its replicas. For example, lets
say we start 2 nodes with `node.rack_id` set to `rack_one`, and deploy a
single index with 5 shards and 1 replica. The index will be fully
deployed on the current nodes (5 shards and 1 replica each, total of 10
shards).
Now, if we start two more nodes, with `node.rack_id` set to `rack_two`,
shards will relocate to even the number of shards across the nodes, but,
a shard and its replica will not be allocated in the same `rack_id`
value.
The awareness attributes can hold several values, for example:
-------------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id,zone
-------------------------------------------------------------
*NOTE*: When using awareness attributes, shards will not be allocated to
nodes that don't have values set for those attributes.
[float]
[[forced-awareness]]
=== Forced Awareness
Sometimes, we know in advance the number of values an awareness
attribute can have, and more over, we would like never to have more
replicas than needed allocated on a specific group of nodes with the
same awareness attribute value. For that, we can force awareness on
specific attributes.
For example, lets say we have an awareness attribute called `zone`, and
we know we are going to have two zones, `zone1` and `zone2`. Here is how
we can force awareness on a node:
[source,js]
-------------------------------------------------------------------
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
cluster.routing.allocation.awareness.attributes: zone
-------------------------------------------------------------------
Now, lets say we start 2 nodes with `node.zone` set to `zone1` and
create an index with 5 shards and 1 replica. The index will be created,
but only 5 shards will be allocated (with no replicas). Only when we
start more shards with `node.zone` set to `zone2` will the replicas be
allocated.
[float]
==== Automatic Preference When Searching / GETing
When executing a search, or doing a get, the node receiving the request
will prefer to execute the request on shards that exists on nodes that
have the same attribute values as the executing node. This only happens
when the `cluster.routing.allocation.awareness.attributes` setting has
been set to a value.
[float]
==== Realtime Settings Update
The settings can be updated using the <<cluster-update-settings,cluster update settings API>> on a live cluster.
[float]
[[allocation-filtering]]
=== Shard Allocation Filtering
Allow to control allocation of indices on nodes based on include/exclude
filters. The filters can be set both on the index level and on the
cluster level. Lets start with an example of setting it on the cluster
level:
Lets say we have 4 nodes, each has specific attribute called `tag`
associated with it (the name of the attribute can be any name). Each
node has a specific value associated with `tag`. Node 1 has a setting
`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on.
We can create an index that will only deploy on nodes that have `tag`
set to `value1` and `value2` by setting
`index.routing.allocation.include.tag` to `value1,value2`. For example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.tag" : "value1,value2"
}'
--------------------------------------------------
On the other hand, we can create an index that will be deployed on all
nodes except for nodes with a `tag` of value `value3` by setting
`index.routing.allocation.exclude.tag` to `value3`. For example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.exclude.tag" : "value3"
}'
--------------------------------------------------
`index.routing.allocation.require.*` can be used to
specify a number of rules, all of which MUST match in order for a shard
to be allocated to a node. This is in contrast to `include` which will
include a node if ANY rule matches.
The `include`, `exclude` and `require` values can have generic simple
matching wildcards, for example, `value1*`. A special attribute name
called `_ip` can be used to match on node ip values. In addition `_host`
attribute can be used to match on either the node's hostname or its ip
address. Similarly `_name` and `_id` attributes can be used to match on
node name and node id accordingly.
Obviously a node can have several attributes associated with it, and
both the attribute name and value are controlled in the setting. For
example, here is a sample of several node configurations:
[source,js]
--------------------------------------------------
node.group1: group1_value1
node.group2: group2_value4
--------------------------------------------------
In the same manner, `include`, `exclude` and `require` can work against
several attributes, for example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.group1" : "xxx",
"index.routing.allocation.include.group2" : "yyy",
"index.routing.allocation.exclude.group3" : "zzz",
"index.routing.allocation.require.group4" : "aaa"
}'
--------------------------------------------------
The provided settings can also be updated in real time using the update
settings API, allowing to "move" indices (shards) around in realtime.
Cluster wide filtering can also be defined, and be updated in real time
using the cluster update settings API. This setting can come in handy
for things like decommissioning nodes (even if the replica count is set
to 0). Here is a sample of how to decommission a node based on `_ip`
address:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}'
--------------------------------------------------

View File

@ -0,0 +1,107 @@
[[allocation-awareness]]
=== Shard Allocation Awareness
When running nodes on multiple VMs on the same physical server, on multiple
racks, or across multiple awareness zones, it is more likely that two nodes on
the same physical server, in the same rack, or in the same awareness zone will
crash at the same time, rather than two unrelated nodes crashing
simultaneously.
If Elasticsearch is _aware_ of the physical configuration of your hardware, it
can ensure that the primary shard and its replica shards are spread across
different physical servers, racks, or zones, to minimise the risk of losing
all shard copies at the same time.
The shard allocation awareness settings allow you to tell Elasticsearch about
your hardware configuration.
As an example, let's assume we have several racks. When we start a node, we
can tell it which rack it is in by assigning it an arbitrary metadata
attribute called `rack_id` -- we could use any attribute name. For example:
[source,sh]
----------------------
./bin/elasticsearch --node.rack_id rack_one <1>
----------------------
<1> This setting could also be specified in the `elasticsearch.yml` config file.
Now, we need to setup _shard allocation awareness_ by telling Elasticsearch
which attributes to use. This can be configured in the `elasticsearch.yml`
file on *all* master-eligible nodes, or it can be set (and changed) with the
<<cluster-update-settings,cluster-update-settings>> API.
For our example, we'll set the value in the config file:
[source,yaml]
--------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id
--------------------------------------------------------
With this config in place, let's say we start two nodes with `node.rack_id`
set to `rack_one`, and we create an index with 5 primary shards and 1 replica
of each primary. All primaries and replicas are allocated across the two
nodes.
Now, if we start two more nodes with `node.rack_id` set to `rack_two`,
Elasticsearch will move shards across to the new nodes, ensuring (if possible)
that the primary and replica shards are never in the same rack.
.Prefer local shards
*********************************************
When executing search or GET requests, with shard awareness enabled,
Elasticsearch will prefer using local shards -- shards in the same awareness
group -- to execute the request. This is usually faster than crossing racks or
awareness zones.
*********************************************
Multiple awareness attributes can be specified, in which case the combination
of values from each attribute is considered to be a separate value.
[source,yaml]
-------------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id,zone
-------------------------------------------------------------
NOTE: When using awareness attributes, shards will not be allocated to
nodes that don't have values set for those attributes.
[float]
[[forced-awareness]]
=== Forced Awareness
Imagine that you have two awareness zones and enough hardware across the two
zones to host all of your primary and replica shards. But perhaps the
hardware in a single zone, while sufficient to host half the shards, would be
unable to host *ALL* the shards.
With ordinary awareness, if one zone lost contact with the other zone,
Elasticsearch would assign all of the missing replica shards to a single zone.
But in this example, this sudden extra load would cause the hardware in the
remaining zone to be overloaded.
Forced awareness solves this problem by *NEVER* allowing copies of the same
shard to be allocated to the same zone.
For example, lets say we have an awareness attribute called `zone`, and
we know we are going to have two zones, `zone1` and `zone2`. Here is how
we can force awareness on a node:
[source,yaml]
-------------------------------------------------------------------
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2 <1>
cluster.routing.allocation.awareness.attributes: zone
-------------------------------------------------------------------
<1> We must list all possible values that the `zone` attribute can have.
Now, if we start 2 nodes with `node.zone` set to `zone1` and create an index
with 5 shards and 1 replica. The index will be created, but only the 5 primary
shards will be allocated (with no replicas). Only when we start more shards
with `node.zone` set to `zone2` will the replicas be allocated.
The `cluster.routing.allocation.awareness.*` settings can all be updated
dynamically on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API.

View File

@ -0,0 +1,70 @@
[[allocation-filtering]]
=== Shard Allocation Filtering
While <<index-modules-allocation>> provides *per-index* settings to control the
allocation of shards to nodes, cluster-level shard allocation filtering allows
you to allow or disallow the allocation of shards from *any* index to
particular nodes.
The typical use case for cluster-wide shard allocation filtering is when you
want to decommision a node, and you would like to move the shards from that
node to other nodes in the cluster before shutting it down.
For instance, we could decomission a node using its IP address as follows:
[source,json]
--------------------------------------------------
PUT /_cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}
--------------------------------------------------
// AUTOSENSE
NOTE: Shards will only be relocated if it is possible to do so without
breaking another routing constraint, such as never allocating a primary and
replica shard to the same node.
Cluster-wide shard allocation filtering works in the same way as index-level
shard allocation filtering (see <<index-modules-allocation>> for details).
The available _dynamic_ cluster settings are as follows, where `{attribute}`
refers to an arbitrary node attribute.:
`cluster.routing.allocation.include.{attribute}`::
Assign the index to a node whose `{attribute}` has at least one of the
comma-separated values.
`cluster.routing.allocation.require.{attribute}`::
Assign the index to a node whose `{attribute}` has _all_ of the
comma-separated values.
`cluster.routing.allocation.exclude.{attribute}`::
Assign the index to a node whose `{attribute}` has _none_ of the
comma-separated values.
These special attributes are also supported:
[horizontal]
`_name`:: Match nodes by node name
`_ip`:: Match nodes by IP address (the IP address associated with the hostname)
`_host`:: Match nodes by hostname
All attribute values can be specified with wildcards, eg:
[source,json]
------------------------
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.include._ip": "192.168.2.*"
}
}
------------------------
// AUTOSENSE

View File

@ -0,0 +1,69 @@
[[disk-allocator]]
=== Disk-based Shard Allocation
Elasticsearch factors in the available disk space on a node before deciding
whether to allocate new shards to that node or to actively relocate shards
away from that node.
Below are the settings that can be configred in the `elasticsearch.yml` config
file or updated dynamically on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API:
`cluster.routing.allocation.disk.threshold_enabled`::
Defaults to `true`. Set to `false` to disable the disk allocation decider.
`cluster.routing.allocation.disk.watermark.low`::
Controls the low watermark for disk usage. It defaults to 85%, meaning ES will
not allocate new shards to nodes once they have more than 85% disk used. It
can also be set to an absolute byte value (like 500mb) to prevent ES from
allocating shards if less than the configured amount of space is available.
`cluster.routing.allocation.disk.watermark.high`::
Controls the high watermark. It defaults to 90%, meaning ES will attempt to
relocate shards to another node if the node disk usage rises above 90%. It can
also be set to an absolute byte value (similar to the low watermark) to
relocate shards once less than the configured amount of space is available on
the node.
NOTE: Percentage values refer to used disk space, while byte values refer to
free disk space. This can be confusing, since it flips the meaning of high and
low. For example, it makes sense to set the low watermark to 10gb and the high
watermark to 5gb, but not the other way around.
`cluster.info.update.interval`::
How often Elasticsearch should check on disk usage for each node in the
cluster. Defaults to `30s`.
`cluster.routing.allocation.disk.include_relocations`::
Defaults to +true+, which means that Elasticsearch will take into account
shards that are currently being relocated to the target node when computing a
node's disk usage. Taking relocating shards' sizes into account may, however,
mean that the disk usage for a node is incorrectly estimated on the high side,
since the relocation could be 90% complete and a recently retrieved disk usage
would include the total size of the relocating shard as well as the space
already used by the running relocation.
An example of updating the low watermark to no more than 80% of the disk size, a
high watermark of at least 50 gigabytes free, and updating the information about
the cluster every minute:
[source,js]
--------------------------------------------------
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "80%",
"cluster.routing.allocation.disk.watermark.high": "50gb",
"cluster.info.update.interval": "1m"
}
}
--------------------------------------------------
// AUTOSENSE

View File

@ -0,0 +1,36 @@
[[misc-cluster]]
=== Miscellaneous cluster settings
[[cluster-read-only]]
==== Metadata
An entire cluster may be set to read-only with the following _dynamic_ setting:
`cluster.blocks.read_only`::
Make the whole cluster read only (indices do not accept write
operations), metadata is not allowed to be modified (create or delete
indices).
WARNING: Don't rely on this setting to prevent changes to your cluster. Any
user with access to the <<cluster-update-settings,cluster-update-settings>>
API can make the cluster read-write again.
[[cluster-logger]]
==== Logger
The settings which control logging can be updated dynamically with the
`logger.` prefix. For instance, to increase the logging level of the
`indices.recovery` module to `DEBUG`, issue this request:
[source,json]
-------------------------------
PUT /_cluster/settings
{
"transient": {
"logger.indices.recovery": "DEBUG"
}
}
-------------------------------

View File

@ -0,0 +1,124 @@
[[shards-allocation]]
=== Cluster Level Shard Allocation
Shard allocation is the process of allocating shards to nodes. This can
happen during initial recovery, replica allocation, rebalancing, or
when nodes are added or removed.
[float]
=== Shard Allocation Settings
The following _dynamic_ settings may be used to control shard allocation and recovery:
`cluster.routing.allocation.enable`::
+
--
Enable or disable allocation for specific kinds of shards:
* `all` - (default) Allows shard allocation for all kinds of shards.
* `primaries` - Allows shard allocation only for primary shards.
* `new_primaries` - Allows shard allocation only for primary shards for new indices.
* `none` - No shard allocations of any kind are allowed for any indices.
This setting does not affect the recovery of local primary shards when
restarting a node. A restarted node that has a copy of an unassigned primary
shard will recover that primary immediately, assuming that the
<<index.recovery.initial_shards,`index.recovery.initial_shards`>> setting is
satisfied.
--
`cluster.routing.allocation.node_concurrent_recoveries`::
How many concurrent shard recoveries are allowed to happen on a node.
Defaults to `2`.
`cluster.routing.allocation.node_initial_primaries_recoveries`::
While the recovery of replicas happens over the network, the recovery of
an unassigned primary after node restart uses data from the local disk.
These should be fast so more initial primary recoveries can happen in
parallel on the same node. Defaults to `4`.
`cluster.routing.allocation.same_shard.host`::
Allows to perform a check to prevent allocation of multiple instances of
the same shard on a single host, based on host name and host address.
Defaults to `false`, meaning that no check is performed by default. This
setting only applies if multiple nodes are started on the same machine.
`indices.recovery.concurrent_streams`::
The number of network streams to open per node to recover a shard from
a peer shard. Defaults to `3`.
`indices.recovery.concurrent_small_file_streams`::
The number of streams to open per node for small files (under 5mb) to
recover a shard from a peer shard. Defaults to `2`.
[float]
=== Shard Rebalancing Settings
The following _dynamic_ settings may be used to control the rebalancing of
shards across the cluster:
`cluster.routing.rebalance.enable`::
+
--
Enable or disable rebalancing for specific kinds of shards:
* `all` - (default) Allows shard balancing for all kinds of shards.
* `primaries` - Allows shard balancing only for primary shards.
* `replicas` - Allows shard balancing only for replica shards.
* `none` - No shard balancing of any kind are allowed for any indices.
--
`cluster.routing.allocation.allow_rebalance`::
+
--
Specify when shard rebalancing is allowed:
* `always` - (default) Always allow rebalancing.
* `indices_primaries_active` - Only when all primaries in the cluster are allocated.
* `indices_all_active` - Only when all shards (primaries and replicas) in the cluster are allocated.
--
`cluster.routing.allocation.cluster_concurrent_rebalance`::
Allow to control how many concurrent shard rebalances are
allowed cluster wide. Defaults to `2`.
[float]
=== Shard Balancing Heuristics
The following settings are used together to determine where to place each
shard. The cluster is balanced when no allowed action can bring the weights
of each node closer together by more then the `balance.threshold`.
`cluster.routing.allocation.balance.shard`::
Defines the weight factor for shards allocated on a node
(float). Defaults to `0.45f`. Raising this raises the tendency to
equalize the number of shards across all nodes in the cluster.
`cluster.routing.allocation.balance.index`::
Defines a factor to the number of shards per index allocated
on a specific node (float). Defaults to `0.55f`. Raising this raises the
tendency to equalize the number of shards per index across all nodes in
the cluster.
`cluster.routing.allocation.balance.threshold`::
Minimal optimization value of operations that should be performed (non
negative float). Defaults to `1.0f`. Raising this will cause the cluster
to be less aggressive about optimizing the shard balance.
NOTE: Regardless of the result of the balancing algorithm, rebalancing might
not be allowed due to forced awareness or allocation filtering.

View File

@ -1,69 +1,51 @@
[[modules-gateway]] [[modules-gateway]]
== Gateway == Local Gateway
The gateway module allows one to store the state of the cluster meta The local gateway module stores the cluster state and shard data across full
data across full cluster restarts. The cluster meta data mainly holds cluster restarts.
all the indices created with their respective (index level) settings and
explicit type mappings.
Each time the cluster meta data changes (for example, when an index is The following _static_ settings, which must be set on every data node in the
added or deleted), those changes will be persisted using the gateway. cluster, controls how long nodes should wait before they try to recover any
When the cluster first starts up, the state will be read from the shards which are stored locally:
gateway and applied.
The gateway set on the node level will automatically control the index gateway `gateway.expected_nodes`::
that will be used. For example, if the `local` gateway is used (the default),
then each index created on the node will automatically use its own respective
index level `local` gateway.
The default gateway used is the The number of (data or master) nodes that are expected to be in the cluster.
<<modules-gateway-local,local>> gateway. Recovery of local shards will start as soon as the expected number of
nodes have joined the cluster. Defaults to `0`
The `none` gateway option was removed in Elasticsearch 2.0. `gateway.expected_master_nodes`::
[float] The number of master nodes that are expected to be in the cluster.
[[recover-after]] Recovery of local shards will start as soon as the expected number of
=== Recovery After Nodes / Time master nodes have joined the cluster. Defaults to `0`
In many cases, the actual cluster meta data should only be recovered `gateway.expected_data_nodes`::
after specific nodes have started in the cluster, or a timeout has
passed. This is handy when restarting the cluster, and each node local
index storage still exists to be reused and not recovered from the
gateway (which reduces the time it takes to recover from the gateway).
The `gateway.recover_after_nodes` setting (which accepts a number) The number of data nodes that are expected to be in the cluster.
controls after how many data and master eligible nodes within the Recovery of local shards will start as soon as the expected number of
cluster recovery will start. The `gateway.recover_after_data_nodes` and data nodes have joined the cluster. Defaults to `0`
`gateway.recover_after_master_nodes` setting work in a similar fashion,
except they consider only the number of data nodes and only the number
of master nodes respectively. The `gateway.recover_after_time` setting
(which accepts a time value) sets the time to wait till recovery happens
once all `gateway.recover_after...nodes` conditions are met.
The `gateway.expected_nodes` allows to set how many data and master `gateway.recover_after_time`::
eligible nodes are expected to be in the cluster, and once met, the
`gateway.recover_after_time` is ignored and recovery starts.
Setting `gateway.expected_nodes` also defaults `gateway.recover_after_time` to `5m` The `gateway.expected_data_nodes` and `gateway.expected_master_nodes`
settings are also supported. For example setting:
[source,js] If the expected number of nodes is not achieved, the recovery process waits
-------------------------------------------------- for the configured amount of time before trying to recover regardless.
gateway: Defaults to `5m` if one of the `expected_nodes` settings is configured.
recover_after_time: 5m
expected_nodes: 2
--------------------------------------------------
In an expected 2 nodes cluster will cause recovery to start 5 minutes Once the `recover_after_time` duration has timed out, recovery will start
after the first node is up, but once there are 2 nodes in the cluster, as long as the following conditions are met:
recovery will begin immediately (without waiting).
Note, once the meta data has been recovered from the gateway (which `gateway.recover_after_nodes`::
indices to create, mappings and so on), then this setting is no longer
effective until the next full restart of the cluster.
Operations are blocked while the cluster meta data has not been Recover as long as this many data or master nodes have joined the cluster.
recovered in order not to mix with the actual cluster meta data that
will be recovered once the settings has been reached.
include::gateway/local.asciidoc[] `gateway.recover_after_master_nodes`::
Recover as long as this many master nodes have joined the cluster.
`gateway.recover_after_data_nodes`::
Recover as long as this many data nodes have joined the cluster.
NOTE: These settings only take effect on a full cluster restart.

View File

@ -1,56 +0,0 @@
[[modules-gateway-local]]
=== Local Gateway
The local gateway allows for recovery of the full cluster state and
indices from the local storage of each node, and does not require a
common node level shared storage.
Note, different from shared gateway types, the persistency to the local
gateway is *not* done in an async manner. Once an operation is
performed, the data is there for the local gateway to recover it in case
of full cluster failure.
It is important to configure the `gateway.recover_after_nodes` setting
to include most of the expected nodes to be started after a full cluster
restart. This will insure that the latest cluster state is recovered.
For example:
[source,js]
--------------------------------------------------
gateway:
recover_after_nodes: 3
expected_nodes: 5
--------------------------------------------------
[float]
==== Dangling indices
When a node joins the cluster, any shards/indices stored in its local `data/`
directory which do not already exist in the cluster will be imported into the
cluster by default. This functionality has two purposes:
1. If a new master node is started which is unaware of the other indices in
the cluster, adding the old nodes will cause the old indices to be
imported, instead of being deleted.
2. An old index can be added to an existing cluster by copying it to the
`data/` directory of a new node, starting the node and letting it join
the cluster. Once the index has been replicated to other nodes in the
cluster, the new node can be shut down and removed.
The import of dangling indices can be controlled with the
`gateway.auto_import_dangled` which accepts:
[horizontal]
`yes`::
Import dangling indices into the cluster (default).
`close`::
Import dangling indices into the cluster state, but leave them closed.
`no`::
Delete dangling indices after `gateway.dangling_timeout`, which
defaults to 2 hours.

View File

@ -1,66 +1,50 @@
[[modules-indices]] [[modules-indices]]
== Indices == Indices
The indices module allow to control settings that are globally managed The indices module controls index-related settings that are globally managed
for all indices. for all indices, rather than being configurable at a per-index level.
[float] Available settings include:
[[buffer]]
=== Indexing Buffer
The indexing buffer setting allows to control how much memory will be <<circuit-breaker,Circuit breaker>>::
allocated for the indexing process. It is a global setting that bubbles
down to all the different shards allocated on a specific node.
The `indices.memory.index_buffer_size` accepts either a percentage or a Circuit breakers set limits on memory usage to avoid out of memory exceptions.
byte size value. It defaults to `10%`, meaning that `10%` of the total
memory allocated to a node will be used as the indexing buffer size.
This amount is then divided between all the different shards. Also, if
percentage is used, it is possible to set `min_index_buffer_size` (defaults to
`48mb`) and `max_index_buffer_size` (defaults to unbounded).
The `indices.memory.min_shard_index_buffer_size` allows to set a hard <<modules-fielddata,Fielddata cache>>::
lower limit for the memory allocated per shard for its own indexing
buffer. It defaults to `4mb`.
[float] Set limits on the amount of heap used by the in-memory fielddata cache.
[[indices-ttl]]
=== TTL interval
You can dynamically set the `indices.ttl.interval`, which allows to set how <<filter-cache,Node filter cache>>::
often expired documents will be automatically deleted. The default value
is 60s.
The deletion orders are processed by bulk. You can set Configure the amount heap used to cache filter results.
`indices.ttl.bulk_size` to fit your needs. The default value is 10000.
See also <<mapping-ttl-field>>. <<indexing-buffer,Indexing buffer>>::
[float] Control the size of the buffer allocated to the indexing process.
[[recovery]]
=== Recovery
The following settings can be set to manage the recovery policy: <<shard-query-cache,Shard query cache>>::
[horizontal] Control the behaviour of the shard-level query cache.
`indices.recovery.concurrent_streams`::
defaults to `3`.
`indices.recovery.concurrent_small_file_streams`:: <<recovery,Recovery>>::
defaults to `2`.
`indices.recovery.file_chunk_size`:: Control the resource limits on the shard recovery process.
defaults to `512kb`.
`indices.recovery.translog_ops`:: <<indices-ttl,TTL interval>>::
defaults to `1000`.
`indices.recovery.translog_size`:: Control how expired documents are removed.
defaults to `512kb`.
`indices.recovery.compress`:: include::indices/circuit_breaker.asciidoc[]
defaults to `true`.
`indices.recovery.max_bytes_per_sec`:: include::indices/fielddata.asciidoc[]
defaults to `40mb`.
include::indices/filter_cache.asciidoc[]
include::indices/indexing_buffer.asciidoc[]
include::indices/query-cache.asciidoc[]
include::indices/recovery.asciidoc[]
include::indices/ttl_interval.asciidoc[]

View File

@ -0,0 +1,56 @@
[[circuit-breaker]]
=== Circuit Breaker
Elasticsearch contains multiple circuit breakers used to prevent operations from
causing an OutOfMemoryError. Each breaker specifies a limit for how much memory
it can use. Additionally, there is a parent-level breaker that specifies the
total amount of memory that can be used across all breakers.
These settings can be dynamically updated on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API.
[[parent-circuit-breaker]]
[float]
==== Parent circuit breaker
The parent-level breaker can be configured with the following setting:
`indices.breaker.total.limit`::
Starting limit for overall parent breaker, defaults to 70% of JVM heap.
[[fielddata-circuit-breaker]]
[float]
==== Field data circuit breaker
The field data circuit breaker allows Elasticsearch to estimate the amount of
memory a field will require to be loaded into memory. It can then prevent the
field data loading by raising an exception. By default the limit is configured
to 60% of the maximum JVM heap. It can be configured with the following
parameters:
`indices.breaker.fielddata.limit`::
Limit for fielddata breaker, defaults to 60% of JVM heap
`indices.breaker.fielddata.overhead`::
A constant that all field data estimations are multiplied with to determine a
final estimation. Defaults to 1.03
[[request-circuit-breaker]]
[float]
==== Request circuit breaker
The request circuit breaker allows Elasticsearch to prevent per-request data
structures (for example, memory used for calculating aggregations during a
request) from exceeding a certain amount of memory.
`indices.breaker.request.limit`::
Limit for request breaker, defaults to 40% of JVM heap
`indices.breaker.request.overhead`::
A constant that all request estimations are multiplied with to determine a
final estimation. Defaults to 1

View File

@ -0,0 +1,37 @@
[[modules-fielddata]]
=== Fielddata
The field data cache is used mainly when sorting on or computing aggregations
on a field. It loads all the field values to memory in order to provide fast
document based access to those values. The field data cache can be
expensive to build for a field, so its recommended to have enough memory
to allocate it, and to keep it loaded.
The amount of memory used for the field
data cache can be controlled using `indices.fielddata.cache.size`. Note:
reloading the field data which does not fit into your cache will be expensive
and perform poorly.
`indices.fielddata.cache.size`::
The max size of the field data cache, eg `30%` of node heap space, or an
absolute value, eg `12GB`. Defaults to unbounded. Also see
<<fielddata-circuit-breaker>>.
`indices.fielddata.cache.expire`::
experimental[] A time based setting that expires field data after a
certain time of inactivity. Defaults to `-1`. For example, can be set to
`5m` for a 5 minute expiry.
NOTE: These are static settings which must be configured on every data node in
the cluster.
[float]
[[fielddata-monitoring]]
==== Monitoring field data
You can monitor memory usage for field data as well as the field data circuit
breaker using
<<cluster-nodes-stats,Nodes Stats API>>

View File

@ -0,0 +1,16 @@
[[filter-cache]]
=== Node Filter Cache
The filter cache is responsible for caching the results of filters (used in
the query). There is one filter cache per node that is shared by all shards.
The cache implements an LRU eviction policy: when a cache becomes full, the
least recently used data is evicted to make way for new data.
The following setting is _static_ and must be configured on every data node in
the cluster:
`indices.cache.filter.size`::
Controls the memory size for the filter cache , defaults to `10%`. Accepts
either a percentage value, like `30%`, or an exact value, like `512mb`.

View File

@ -0,0 +1,32 @@
[[indexing-buffer]]
=== Indexing Buffer
The indexing buffer is used to store newly indexed documents. When it fills
up, the documents in the buffer are written to a segment on disk. It is divided
between all shards on the node.
The following settings are _static_ and must be configured on every data node
in the cluster:
`indices.memory.index_buffer_size`::
Accepts either a percentage or a byte size value. It defaults to `10%`,
meaning that `10%` of the total heap allocated to a node will be used as the
indexing buffer size.
`indices.memory.min_index_buffer_size`::
If the `index_buffer_size` is specified as a percentage, then this
setting can be used to specify an absolute minimum. Defaults to `48mb`.
`indices.memory.max_index_buffer_size`::
If the `index_buffer_size` is specified as a percentage, then this
setting can be used to specify an absolute maximum. Defaults to unbounded.
`indices.memory.min_shard_index_buffer_size`::
Sets a hard lower limit for the memory allocated per shard for its own
indexing buffer. Defaults to `4mb`.

View File

@ -1,5 +1,5 @@
[[index-modules-shard-query-cache]] [[shard-query-cache]]
== Shard query cache === Shard query cache
When a search request is run against an index or against many indices, each When a search request is run against an index or against many indices, each
involved shard executes the search locally and returns its local results to involved shard executes the search locally and returns its local results to
@ -13,7 +13,7 @@ use case, where only the most recent index is being actively updated --
results from older indices will be served directly from the cache. results from older indices will be served directly from the cache.
[IMPORTANT] [IMPORTANT]
================================== ===================================
For now, the query cache will only cache the results of search requests For now, the query cache will only cache the results of search requests
where `size=0`, so it will not cache `hits`, where `size=0`, so it will not cache `hits`,
@ -21,10 +21,10 @@ but it will cache `hits.total`, <<search-aggregations,aggregations>>, and
<<search-suggesters,suggestions>>. <<search-suggesters,suggestions>>.
Queries that use `now` (see <<date-math>>) cannot be cached. Queries that use `now` (see <<date-math>>) cannot be cached.
================================== ===================================
[float] [float]
=== Cache invalidation ==== Cache invalidation
The cache is smart -- it keeps the same _near real-time_ promise as uncached The cache is smart -- it keeps the same _near real-time_ promise as uncached
search. search.
@ -46,7 +46,7 @@ curl -XPOST 'localhost:9200/kimchy,elasticsearch/_cache/clear?query_cache=true'
------------------------ ------------------------
[float] [float]
=== Enabling caching by default ==== Enabling caching by default
The cache is not enabled by default, but can be enabled when creating a new The cache is not enabled by default, but can be enabled when creating a new
index as follows: index as follows:
@ -73,7 +73,7 @@ curl -XPUT localhost:9200/my_index/_settings -d'
----------------------------- -----------------------------
[float] [float]
=== Enabling caching per request ==== Enabling caching per request
The `query_cache` query-string parameter can be used to enable or disable The `query_cache` query-string parameter can be used to enable or disable
caching on a *per-query* basis. If set, it overrides the index-level setting: caching on a *per-query* basis. If set, it overrides the index-level setting:
@ -99,7 +99,7 @@ it uses a random function or references the current time) you should set the
`query_cache` flag to `false` to disable caching for that request. `query_cache` flag to `false` to disable caching for that request.
[float] [float]
=== Cache key ==== Cache key
The whole JSON body is used as the cache key. This means that if the JSON The whole JSON body is used as the cache key. This means that if the JSON
changes -- for instance if keys are output in a different order -- then the changes -- for instance if keys are output in a different order -- then the
@ -110,7 +110,7 @@ keys are always emitted in the same order. This canonical mode can be used in
the application to ensure that a request is always serialized in the same way. the application to ensure that a request is always serialized in the same way.
[float] [float]
=== Cache settings ==== Cache settings
The cache is managed at the node level, and has a default maximum size of `1%` The cache is managed at the node level, and has a default maximum size of `1%`
of the heap. This can be changed in the `config/elasticsearch.yml` file with: of the heap. This can be changed in the `config/elasticsearch.yml` file with:
@ -126,7 +126,7 @@ stale results are automatically invalidated when the index is refreshed. This
setting is provided for completeness' sake only. setting is provided for completeness' sake only.
[float] [float]
=== Monitoring cache usage ==== Monitoring cache usage
The size of the cache (in bytes) and the number of evictions can be viewed The size of the cache (in bytes) and the number of evictions can be viewed
by index, with the <<indices-stats,`indices-stats`>> API: by index, with the <<indices-stats,`indices-stats`>> API:

View File

@ -0,0 +1,28 @@
[[recovery]]
=== Indices Recovery
The following _expert_ settings can be set to manage the recovery policy.
`indices.recovery.concurrent_streams`::
Defaults to `3`.
`indices.recovery.concurrent_small_file_streams`::
Defaults to `2`.
`indices.recovery.file_chunk_size`::
Defaults to `512kb`.
`indices.recovery.translog_ops`::
Defaults to `1000`.
`indices.recovery.translog_size`::
Defaults to `512kb`.
`indices.recovery.compress`::
Defaults to `true`.
`indices.recovery.max_bytes_per_sec`::
Defaults to `40mb`.
These settings can be dynamically updated on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API:

View File

@ -0,0 +1,16 @@
[[indices-ttl]]
=== TTL interval
Documents that have a <<mapping-ttl-field,`ttl`>> value set need to be deleted
once they have expired. How and how often they are deleted is controlled by
the following dynamic cluster settings:
`indices.ttl.interval`::
How often the deletion process runs. Defaults to `60s`.
`indices.ttl.bulk_size`::
The deletions are processed with a <<docs-bulk,bulk request>>.
The number of deletions processed can be configured with
this settings. Defaults to `10000`.

View File

@ -22,7 +22,7 @@ Installing plugins typically take the following form:
[source,shell] [source,shell]
----------------------------------- -----------------------------------
plugin --install <org>/<user/component>/<version> bin/plugin --install <org>/<user/component>/<version>
----------------------------------- -----------------------------------
The plugins will be The plugins will be

View File

@ -9,7 +9,6 @@ of discarded.
There are several thread pools, but the important ones include: There are several thread pools, but the important ones include:
[horizontal]
`index`:: `index`::
For index/delete operations. Defaults to `fixed` For index/delete operations. Defaults to `fixed`
with a size of `# of available processors`, with a size of `# of available processors`,

View File

@ -73,7 +73,7 @@ And here is a sample response:
Set to `true` or `false` to enable or disable the caching Set to `true` or `false` to enable or disable the caching
of search results for requests where `size` is 0, ie of search results for requests where `size` is 0, ie
aggregations and suggestions (no top hits returned). aggregations and suggestions (no top hits returned).
See <<index-modules-shard-query-cache>>. See <<shard-query-cache>>.
`terminate_after`:: `terminate_after`::

View File

@ -416,7 +416,7 @@ The Snapshot/Restore API supports a number of different repository types for sto
[float] [float]
=== Circuit Breaker: Fielddata (STATUS: DONE, v1.0.0) === Circuit Breaker: Fielddata (STATUS: DONE, v1.0.0)
Currently, the https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-fielddata.html[circuit breaker] protects against loading too much field data by estimating how much memory the field data will take to load, then aborting the request if the memory requirements are too high. This feature was added in Elasticsearch version 1.0.0. Currently, the https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-fielddata.html[circuit breaker] protects against loading too much field data by estimating how much memory the field data will take to load, then aborting the request if the memory requirements are too high. This feature was added in Elasticsearch version 1.0.0.
[float] [float]
=== Use of Paginated Data Structures to Ease Garbage Collection (STATUS: DONE, v1.0.0 & v1.2.0) === Use of Paginated Data Structures to Ease Garbage Collection (STATUS: DONE, v1.0.0 & v1.2.0)