diff --git a/docs/reference/setup/sysconfig.asciidoc b/docs/reference/setup/sysconfig.asciidoc index 1862c2c74a6..dc9072d6906 100644 --- a/docs/reference/setup/sysconfig.asciidoc +++ b/docs/reference/setup/sysconfig.asciidoc @@ -14,6 +14,7 @@ The following settings *must* be considered before going to production: * <> * <> * <> +* <> [[dev-vs-prod]] [discrete] @@ -43,3 +44,5 @@ include::sysconfig/threads.asciidoc[] include::sysconfig/dns-cache.asciidoc[] include::sysconfig/executable-jna-tmpdir.asciidoc[] + +include::sysconfig/tcpretries.asciidoc[] diff --git a/docs/reference/setup/sysconfig/tcpretries.asciidoc b/docs/reference/setup/sysconfig/tcpretries.asciidoc new file mode 100644 index 00000000000..82962f1ed6e --- /dev/null +++ b/docs/reference/setup/sysconfig/tcpretries.asciidoc @@ -0,0 +1,49 @@ +[[system-config-tcpretries]] +=== TCP retransmission timeout + +Each pair of nodes in a cluster communicates via a number of TCP connections +which remain open until one of the nodes shuts down or communication between +the nodes is disrupted by a failure in the underlying infrastructure. + +TCP provides reliable communication over occasionally-unreliable networks by +hiding temporary network disruptions from the communicating applications. Your +operating system will retransmit any lost messages a number of times before +informing the sender of any problem. Most Linux distributions default to +retransmitting any lost packets 15 times. Retransmissions back off +exponentially, so these 15 retransmissions take over 900 seconds to complete. +This means it takes Linux many minutes to detect a network partition or a +failed node with this method. Windows defaults to just 5 retransmissions which +corresponds with a timeout of around 6 seconds. + +The Linux default allows for communication over networks that may experience +very long periods of packet loss, but this default is excessive for production +networks within a single data centre as is the case for most {es} clusters. +Highly-available clusters must be able to detect node failures quickly so that +they can react promptly by reallocating lost shards, rerouting searches and +perhaps electing a new master node. Linux users should therefore reduce the +maximum number of TCP retransmissions. + +You can decrease the maximum number of TCP retransmissions to `5` by running +the following command as `root`. Five retransmissions corresponds with a +timeout of around 6 seconds. + +[source,sh] +------------------------------------- +sysctl -w net.ipv4.tcp_retries2=5 +------------------------------------- + +To set this value permanently, update the `net.ipv4.tcp_retries2` setting in +`/etc/sysctl.conf`. To verify after rebooting, run `sysctl +net.ipv4.tcp_retries2`. + +{es} also implements its own health checks with timeouts that are much shorter +than the default retransmission timeout on Linux. However these health checks +must allow for application-level effects such as garbage collection pauses. We +do not recommend reducing any timeouts related to these application-level +health checks. + +IMPORTANT: This setting applies to all TCP connections and will affect the +reliability of communication with systems outside your cluster too. If your +cluster communicates with external systems over an unreliable network then you +may need to select a higher value for `net.ipv4.tcp_retries2`. For this reason, +{es} does not adjust this setting automatically.