296ab3ec16
This commit fixes an issue in the handling of TransportExceptions in ShardStateAction. There were two cases not being handled correctly. - when the local node is shutting down, handlers will be notified with a TransportException with a message starting "transport stopped" - when the remote node disconnects, handlers will be notified with a NodeDisconnectedException In both of these cases, the cause of the exception will be null and this was incorrectly being handled. The first case can passed to the listener like any other critical non-channel failure, and the second case can be handled by modifying the logic for detecting master channel exceptions. There was a third case of NodeNotConnectedException that was not being treated as a master channel exception but should be. This commit adds an integration test that simulates the handling of a shard failure request during a network partition. By isolating the master from the cluster while a shard failed request is in flight, this test simulates that we wait until a new master is elected and then retry sending that shard failed request to the newly elected master. This commit adds methods to CapturingTransport to separate local and remote transport exceptions. The motivation for this change is that local transport exceptions are delivered to listeners (usually, but not always) wrapped in SendRequestTransportException while remote transport exceptions are delivered to listeners wrapped in RemoteTransportException. By making this distinction clear in the CapturingTransport, this makes it less likely that tests will make incorrect assumptions about the exceptions coming out of the transport layer to listeners. Closes #16057 |
||
---|---|---|
.. | ||
src | ||
build.gradle |