OpenSearch/server
Armin Braun 0a604e3b24
Fix Two Races that Lead to Stuck Snapshots (#37686)
* Fixes two broken spots:
    1. Master failover while deleting a snapshot that has no shards will get stuck if the new master finds the 0-shard snapshot in `INIT` when deleting
    2. Aborted shards that were never seen in `INIT` state by the `SnapshotsShardService` will not be notified as failed, leading to the snapshot staying in `ABORTED` state and never getting deleted with one or more shards stuck in `ABORTED` state
* Tried to make fixes as short as possible so we can backport to `6.x` with the least amount of risk
* Significantly extended test infrastructure to reproduce the above two issues
  * Two new test runs:
      1. Reproducing the effects of node disconnects/restarts in isolation
      2. Reproducing the effects of disconnects/restarts in parallel with shard relocations and deletes
* Relates #32265 
* Closes #32348
2019-02-01 05:45:40 +01:00
..
licenses Upgrade to lucene-8.0.0-snapshot-83f9835. (#37668) 2019-01-22 11:44:29 +01:00
src Fix Two Races that Lead to Stuck Snapshots (#37686) 2019-02-01 05:45:40 +01:00
build.gradle Add an alias for :server:integTest so it runs as part of internalClusterTest (#37910) 2019-01-28 14:26:22 +02:00