OpenSearch/core
Jason Tedor 070963658b Block global checkpoint advances when recovering
After a replica shard finishes recovery, it will be marked as active and
its local checkpoint will be considered in the calculation of the global
checkpoint on the primary. If there were operations in flight during
recovery, when the replica is activated its local checkpoint could be
lagging behind the global checkpoint on the primary. This means that
when the replica shard is activated, we can end up in a situtaion where
a global checkpoint update would want to move the global checkpoint
backwards, violating an invariant of the system. This only arises if a
background global checkpoint sync executes, which today is only a
scheduled operation and might be delayed until the in-flight operations
complete and the replica catches up to the primary. Yet, we are going to
move to inlining global checkpoints which will cause this issue to be
more likely to manifest. Additionally, the global checkpoint on the
replica, which is the local knowledge on the replica updated under the
mandate of the primary, could be higher than the local checkpoint on the
replica, again violating an invariant of the system. This commit
addresses these issues by blocking global checkpoint on the primary when
a replica shard is finalizing recovery. While we have blocked global
checkpoint advancement, recovery on the replica shard will not be
considered complete until its local checkpoint advances to the blocked
global checkpoint.

Relates #24404
2017-05-03 06:48:09 -04:00
..
licenses Upgrade to a Lucene 7 snapshot (#24089) 2017-04-18 15:17:21 +02:00
src Block global checkpoint advances when recovering 2017-05-03 06:48:09 -04:00
build.gradle Build: Switch jna dependency to an elastic version (#24081) 2017-04-13 00:17:50 -07:00