Suggest reducing tcp_retries2 (#59222)
Adds documentation suggesting reducing `tcp_retries2` on Linux to detect network partitions more quickly. Relates #34405
This commit is contained in:
parent
89466eefa5
commit
d8fdb82efb
|
@ -14,6 +14,7 @@ The following settings *must* be considered before going to production:
|
|||
* <<max-number-of-threads,Ensure sufficient threads>>
|
||||
* <<networkaddress-cache-ttl,JVM DNS cache settings>>
|
||||
* <<executable-jna-tmpdir,Temporary directory not mounted with `noexec`>>
|
||||
* <<system-config-tcpretries,TCP retransmission timeout>>
|
||||
|
||||
[[dev-vs-prod]]
|
||||
[discrete]
|
||||
|
@ -43,3 +44,5 @@ include::sysconfig/threads.asciidoc[]
|
|||
include::sysconfig/dns-cache.asciidoc[]
|
||||
|
||||
include::sysconfig/executable-jna-tmpdir.asciidoc[]
|
||||
|
||||
include::sysconfig/tcpretries.asciidoc[]
|
||||
|
|
|
@ -0,0 +1,49 @@
|
|||
[[system-config-tcpretries]]
|
||||
=== TCP retransmission timeout
|
||||
|
||||
Each pair of nodes in a cluster communicates via a number of TCP connections
|
||||
which remain open until one of the nodes shuts down or communication between
|
||||
the nodes is disrupted by a failure in the underlying infrastructure.
|
||||
|
||||
TCP provides reliable communication over occasionally-unreliable networks by
|
||||
hiding temporary network disruptions from the communicating applications. Your
|
||||
operating system will retransmit any lost messages a number of times before
|
||||
informing the sender of any problem. Most Linux distributions default to
|
||||
retransmitting any lost packets 15 times. Retransmissions back off
|
||||
exponentially, so these 15 retransmissions take over 900 seconds to complete.
|
||||
This means it takes Linux many minutes to detect a network partition or a
|
||||
failed node with this method. Windows defaults to just 5 retransmissions which
|
||||
corresponds with a timeout of around 6 seconds.
|
||||
|
||||
The Linux default allows for communication over networks that may experience
|
||||
very long periods of packet loss, but this default is excessive for production
|
||||
networks within a single data centre as is the case for most {es} clusters.
|
||||
Highly-available clusters must be able to detect node failures quickly so that
|
||||
they can react promptly by reallocating lost shards, rerouting searches and
|
||||
perhaps electing a new master node. Linux users should therefore reduce the
|
||||
maximum number of TCP retransmissions.
|
||||
|
||||
You can decrease the maximum number of TCP retransmissions to `5` by running
|
||||
the following command as `root`. Five retransmissions corresponds with a
|
||||
timeout of around 6 seconds.
|
||||
|
||||
[source,sh]
|
||||
-------------------------------------
|
||||
sysctl -w net.ipv4.tcp_retries2=5
|
||||
-------------------------------------
|
||||
|
||||
To set this value permanently, update the `net.ipv4.tcp_retries2` setting in
|
||||
`/etc/sysctl.conf`. To verify after rebooting, run `sysctl
|
||||
net.ipv4.tcp_retries2`.
|
||||
|
||||
{es} also implements its own health checks with timeouts that are much shorter
|
||||
than the default retransmission timeout on Linux. However these health checks
|
||||
must allow for application-level effects such as garbage collection pauses. We
|
||||
do not recommend reducing any timeouts related to these application-level
|
||||
health checks.
|
||||
|
||||
IMPORTANT: This setting applies to all TCP connections and will affect the
|
||||
reliability of communication with systems outside your cluster too. If your
|
||||
cluster communicates with external systems over an unreliable network then you
|
||||
may need to select a higher value for `net.ipv4.tcp_retries2`. For this reason,
|
||||
{es} does not adjust this setting automatically.
|
Loading…
Reference in New Issue