mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-09 06:25:07 +00:00
Today file-chunks are sent sequentially one by one in peer-recovery. This is a correct choice since the implementation is straightforward and recovery is network bound in most of the time. However, if the connection is encrypted, we might not be able to saturate the network pipe because encrypting/decrypting are cpu bound rather than network-bound. With this commit, a source node can send multiple (default to 2) file-chunks without waiting for the acknowledgments from the target. Below are the benchmark results for PMC and NYC_taxis. - PMC (20.2 GB) | Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 | | ----------| ---------| -------- | -------- | -------- | -------- | | Plain | 184s | 137s | 106s | 105s | 106s | | TLS | 346s | 294s | 176s | 153s | 117s | | Compress | 1556s | 1407s | 1193s | 1183s | 1211s | - NYC_Taxis (38.6GB) | Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 | | ----------| ---------| ---------| ---------| ---------| -------- | | Plain | 321s | 249s | 191s | * | * | | TLS | 618s | 539s | 323s | 290s | 213s | | Compress | 2622s | 2421s | 2018s | 2029s | n/a | Relates #33844
34 lines
1.8 KiB
Plaintext
34 lines
1.8 KiB
Plaintext
[[recovery]]
|
|
=== Indices Recovery
|
|
|
|
<<cat-recovery,Peer recovery>> is the process used to build a new copy of a
|
|
shard on a node by copying data from the primary. {es} uses this peer recovery
|
|
process to rebuild shard copies that were lost if a node has failed, and uses
|
|
the same process when migrating a shard copy between nodes to rebalance the
|
|
cluster or to honor any changes to the <<modules-cluster,shard allocation
|
|
settings>>.
|
|
|
|
The following _expert_ setting can be set to manage the resources consumed by
|
|
peer recoveries:
|
|
|
|
`indices.recovery.max_bytes_per_sec`::
|
|
Limits the total inbound and outbound peer recovery traffic on each node.
|
|
Since this limit applies on each node, but there may be many nodes
|
|
performing peer recoveries concurrently, the total amount of peer recovery
|
|
traffic within a cluster may be much higher than this limit. If you set
|
|
this limit too high then there is a risk that ongoing peer recoveries will
|
|
consume an excess of bandwidth (or other resources) which could destabilize
|
|
the cluster. Defaults to `40mb`.
|
|
|
|
`indices.recovery.max_concurrent_file_chunks`::
|
|
Controls the number of file chunk requests that can be sent in parallel per recovery.
|
|
As multiple recoveries are already running in parallel (controlled by
|
|
cluster.routing.allocation.node_concurrent_recoveries), increasing this expert-level
|
|
setting might only help in situations where peer recovery of a single shard is not
|
|
reaching the total inbound and outbound peer recovery traffic as configured by
|
|
indices.recovery.max_bytes_per_sec, but is CPU-bound instead, typically when using
|
|
transport-level security or compression. Defaults to `2`.
|
|
|
|
This setting can be dynamically updated on a live cluster with the
|
|
<<cluster-update-settings,cluster-update-settings>> API.
|