OpenSearch/server
amitai stern ca9fb6d302
Fix snapshot deletion task getting stuck in the event of exceptions (#629)
Changes the behavior of the recursive deletion function `executeOneStaleIndexDelete()` stop
condition to be when the queue of `staleIndicesToDelete` is empty -- also in the error flow.
Otherwise the GroupedActionListener never responds and in the event of a few exceptions the
deletion task gets stuck.
Alters the test case to fail to delete in bulk many snapshots at the first attempt, and then
the next successful deletion also takes care of the previously failed attempt as the test
originally intended.
SNAPSHOT threadpool is at most 5. So in the event we get more than 5 exceptions there are no
more threads to handle the deletion task and there is still one more snapshot to delete in the
queue. Thus, in the test I made the number of extra snapshots be one more than the max in the
SNAPSHOT threadpool.

Signed-off-by: AmiStrn <amitai.stern@logz.io>
2021-05-04 08:44:15 -05:00
..
licenses Update lucene version to 8.8.2 (#557) 2021-04-23 09:48:41 -05:00
src Fix snapshot deletion task getting stuck in the event of exceptions (#629) 2021-05-04 08:44:15 -05:00
build.gradle Speed ups to test suite and precommit tasks. (#580) 2021-04-20 09:02:45 -05:00