mirror of
https://github.com/apache/lucene.git
synced 2025-02-10 03:55:46 +00:00
There are 3 tightly related bug fixes in these changes: 1) ConcurrentModificationExceptions were being thrown by some SimClusterStateProvider methods when creating collections/replicas due to the use of ArrayLists nodeReplicaMap. These ArrayLists were changed to use synchronizedList wrappers. 2) The Exceptions from #1 were being swallowed/hidden by code using SimCloudManager.submit() w/o checking the result of the resulting Future object. (As a result, tests waiting for a particular ClusterShape would timeout regardless of how long they waited.) To protect against "silent" failures like this, this SimCloudManager.submit() has been updated to wrap all input Callables such that any uncaught errors will be logged and "counted." SimSolrCloudTestCase will ensure a suite level failure if any such failures are counted. 3) The changes in #2 exposed additional concurrency problems with the Callables involved in leader election: These would frequently throw IllegalStateExceptions due to assumptions about the state/existence of replicas when the Callables were created vs when they were later run -- notably a Callable may have been created that held a reference to a Slice, but by the time that Callable was run the collection (or a node, etc...) refered to by that Slice may have been deleted. While fixing this, the leader election logic was also cleaned up such that adding a replica only triggers leader election for that shard, not every shard in the collection. While auditing this code, cleanup was also done to ensure all usage of SimClusterStateProvider.lock was also cleaned up to remove all risky points where an exception may have been possible after aquiring the lock but before the try/finally that ensured it would be unlocked. (cherry picked from commit 76babf876a49f82959cc36a1d7ef922a9c2dddff)