16fae5d66d
In #61906 we added the possibility for the master node to fetch the size of a shard snapshot before allocating the shard to a data node with enough disk space to host it. When merging this change we agreed that any failure during size fetching should not prevent the shard to be allocated. Sadly it does not work as expected: the service only triggers reroutes when fetching the size succeed but never when it fails. It means that a shard might stay unassigned until another cluster state update triggers a new allocation (as in #64372). More sadly, the test I wrote was wrong as it explicitly triggered a reroute. This commit changes the InternalSnapshotsInfoService so that it also triggers a reroute when fetching the snapshot shard size failed, ensuring that the allocation can move forward by using an UNAVAILABLE_EXPECTED_SHARD_SIZE shard size. This unknown shard size is kept around in the snapshot info service until no corresponding unassigned shards need the information. Backport of #65436 |
||
---|---|---|
.. | ||
licenses | ||
src | ||
build.gradle |