OpenSearch/docs/reference/indices/shrink-index.asciidoc

65 lines
3.1 KiB
Plaintext
Raw Normal View History

Add primitive to shrink an index into a single shard (#18270) This adds a low level primitive operations to shrink an existing index into a new index with a single shard. This primitive expects all shards of the source index to allocated on a single node. Once the target index is initializing on the shrink node it takes a snapshot of the source index shards and copies all files into the target indices data folder. An [optimization](https://issues.apache.org/jira/browse/LUCENE-7300) coming in Lucene 6.1 will also allow for optional constant time copy if hard-links are supported by the filesystem. All mappings are merged into the new indexes metadata once the snapshots have been taken on the merge node. To shrink an existing index all shards must be moved to a single node (one instance of each shard) and the index must be read-only: ```BASH $ curl -XPUT 'http://localhost:9200/logs/_settings' -d '{ "settings" : { "index.routing.allocation.require._name" : "shrink_node_name", "index.blocks.write" : true } } ``` once all shards are started on the shrink node. the new index can be created via: ```BASH $ curl -XPUT 'http://localhost:9200/logs/_shrink/logs_single_shard' -d '{ "settings" : { "index.codec" : "best_compression", "index.number_of_replicas" : 1 } }' ``` This API will perform all needed check before the new index is created and selects the shrink node based on the allocation of the source index. This call returns immediately, to monitor shrink progress the recovery API should be used since all copy operations are reflected in the recovery API with byte copy progress etc. The shrink operation does not modify the source index, if a shrink operation should be canceled or if the shrink failed, the target index can simply be deleted and all resources are released.
2016-05-31 04:41:44 -04:00
[[indices-shrink-index]]
== Shrink Index
The shrink index API allows to shrink an existing index into a new index with a single shard.
In order to shrink an index, all its shards must be allocated on a single node in the cluster.
This is required since the shrink command will copy all shards index files into the target index
data folder when the primary of the target index is initially allocated.
When an index is shrunk no write operations should happen to the source index. Elasticsearch will
enforce the `read-only` property when the shrink command is executed. All operations necessary to shrink the
source index are executed during initial primary recovery. Once the target index primary shard is started the
shrink operation has successfully finished. To monitor status and progress use <<cat-recovery>>
To shrink and index all shards of that index must be allocated on a single node.
[source,js]
--------------------------------------------------
$ curl -XPUT 'http://localhost:9200/logs/_settings' -d '{
"settings" : {
"index.routing.allocation.require._name" : "shrink_node_name", <1>
"index.blocks.write" : true <2>
}
}'
--------------------------------------------------
<1> Forces the relocation of all of the indices shards to the node `shrink_node_name`
<2> Prevents write operations to this index while still allowing metadata changes like deleting the index.
The above second curl example shows how an index called `logs` can be
forced to allocate at least one copy of each shard on a specific node in the cluster.
The `_shrink` API is similar to <<indices-create-index>> and accepts `settings` and `aliases` for the target index.
[source,js]
--------------------------------------------------
$ curl -XPUT 'http://localhost:9200/logs/_shrink/logs_single_shard' -d '{
"settings" : {
"index.codec" : "best_compression", <1>
}
}'
--------------------------------------------------
<1> Enables `best_compression` codec on the target index
The API call above returns immediately once the target index is created but doesn't wait
for the shrink operation to start. Once the target indices primary shard moves to state `initializing`
the shrink operation has started.
[float]
[[shrink-index-limitations]]
=== Limitations
Indices can only be shrunk into a single shard if they fully the following requirements:
* an instance of all of the indices shards must be allocated on a single node
* the index must not contain more than `2.14 billion` documents (`2147483519`) in total (sum of all shards)
This is the maximum shard size elasticsearch can support.
* the index must have more than one shard
* the index must be `read-only`, ie. have a cluster block set `index.blocks.write=true`
* the target index must not exist
* all `index.analysis.*` and `index.similarity.*` settings passed to the `_shrink` call will be overwritten with the
source indices settings.
* if the target index can't be allocated on the shrink node, due to throttling or other allocation deciders,
its primary shard will stay `unassigned` until it can be allocated on that node