OpenSearch/test
David Turner 049970af3e Only connect to new nodes on new cluster state (#39629)
Today, when applying new cluster state we attempt to connect to all of its
nodes as a blocking part of the application process. This is the right thing to
do with new nodes, and is a no-op on any already-connected nodes, but is
questionable on known nodes from which we are currently disconnected: there is
a risk that we are partitioned from these nodes so that any attempt to connect
to them will hang until it times out. This can dramatically slow down the
application of new cluster states which hinders the recovery of the cluster
during certain kinds of partition.

If nodes are disconnected from the master then it is likely that they are to be
removed as part of a subsequent cluster state update, so there's no need to try
and reconnect to them like this. Moreover there is no need to attempt to
reconnect to disconnected nodes as part of the cluster state application
process, because we periodically try and reconnect to any disconnected nodes,
and handle their disconnectedness reasonably gracefully in the meantime.

This commit alters this behaviour to avoid reconnecting to known nodes during
cluster state application.

Resolves #29025.
2019-03-12 19:26:13 +00:00
..
fixtures Converting randomized testing to create a separate unitTest task instead of replacing the builtin test task (#36311) 2018-12-19 08:25:20 +02:00
framework Only connect to new nodes on new cluster state (#39629) 2019-03-12 19:26:13 +00:00
logger-usage Split third party audit exclusions by type (#36763) 2019-01-07 17:24:19 +02:00
build.gradle MINOR: Remove some Deadcode in Gradle (#37160) 2019-01-07 09:21:25 +01:00