OpenSearch/server
David Turner 389f7779e7 Report more details of unobtainable ShardLock (#61255)
Today a common reason for a `ShardLockObtainFailedException` is when a
shard is removed from a node and then assigned straight back to it again
before the node has had a chance to shut the previous shard instance
down. For instance, this can happen if a node briefly leaves the cluster
holding a primary with no in-sync replicas.

The message in this case is typically as follows:

    obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]

This is pretty hard to interpret, and doesn't raise the important
question: "why didn't the shard shut down sooner?"

With this change we reword the message a bit, report the age of the
shard lock, and adjust the details to report that the lock is held by a
closing shard:

    obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [12345ms]

Relates #38807
2020-08-19 06:36:28 +01:00
..
licenses upgrade to lucene-8.6.0 release (#59596) (#59599) 2020-07-15 12:40:57 +02:00
src Report more details of unobtainable ShardLock (#61255) 2020-08-19 06:36:28 +01:00
build.gradle Replace immediate task creations by using task avoidance api (#60071) (#60504) 2020-07-31 13:09:04 +02:00