Catching Throwable instead of Exception in TransportClient
and TransportClientNodesService and restore interrupted flag
if interrupt exception is caught and ignored
TransportClient doesn't add the initial nodes to the nodes list
if it doesn't retrieve any nodes from the listeners which can cause
the transport client to throw a 'NoNodeAvailableException' if the
'sniff' response didn't return any nodes. This situation can occure
if the client tries to get the listener nodes cluster state while that
node is not yet connected to any other nodes.
TestCluster can currently only be used in a globally shared scope.
This commit adds the ability to use the TestCluster in 3 different
scopes per test-suite. The scopes are 'Global', 'Suite' and 'Test'
where the cluster is shared across all tests, across all test methods or
not at all respectivly.
Subclasses of AbstractIntegrationTest (formerly AbstractSharedClusterTest)
can add an annotation if they need a different scope than Global (default):
```
@ClusterScope(scope=Scope.Suite, numNodes=1)
```
This also allows to specify the number of shared nodes in that TestCluster
that are available when a test starts.
The cleanups in this commit include:
- s/Elasticsearch/ElasticSearch/g on test classes
- Move test classes in org.elasticsearch.test
This assertion module also injects an AssertingIndexSearcher that
checks if our queries are all compliant with the lucene specification
which is improtant for future updates and changes in the upstream project.
There was a small window of time where the transport response handler's handException method was invoked twice. As far as I can tell this happened when node disconnect event was processed just after the request was registered and between a "Node not connected" error was thrown. The TransportService#sendRequest method would invoke the transport response handler's handException method regardless if it was already invoked. This resulted that for one request failure, two retries were executed.
The mpercolate api has an assert that tripped when more than the expected shard level responses were returned. This was caused by the issue described above. For the a single shard level request we had multiple responses and this broke the the the total excepted responses. Also the reduce could be started prematurely, which resulted in an incorrect final response (e.g. total count being incorrect). For example: two shards in total, shard 0 gets reduces twice. The second shard 0 response gets in just before shard 1 response gets in. The reduce starts without shard 1 response.
The master node processing changes to cluster state, and part of the processing is publishing the cluster state to other nodes. It does not wait for the cluster state to be processed on the other nodes before it moves on to the next cluster state processing job.
This is fine, we support out of order cluster state events using versioning, and nodes can handle those cases. It does lead though to non optimal API semantics. For example, when issuing cluster health, and waiting for green state, the master node will report back once the cluster is green based on its cluster state, but that mentioned "green" state might not have been received by all other nodes yet.
Add a discovery.zen.publish_timeout setting, and default it to 5s. This will give a best effort into making sure all nodes will process a cluster state within a window of time.
closes#3736