mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-09 14:34:43 +00:00
Replicated operation consist of a routing action (the original), which is in charge of sending the operation to the primary shard, a primary action which executes the operation on the resolved primary and replica actions which performs the operation on a specific replica. This commit adds the targeted shard's allocation id to the primary and replica actions and makes sure that those match the shard the actions end up executing on. This helps preventing extremely rare failure mode where a shard moves off a node and back to it, all between an action is sent and the time it's processed. For example: 1) Primary action is sent to a relocating primary on node A. 2) The primary finishes relocation to node B and start relocating back. 3) The relocation back gets to the phase and opens up the target engine, on the original node, node A. 4) The primary action is executed on the target engine before the relocation finishes, at which the shard copy on node B is still the official primary - i.e., it is executed on the wrong primary.