Today we carry over hard deletes based on the SegmentReaders liveDocs.
This is not correct if soft-deletes are used especially with rentention
policies. If a soft delete is added while a segment is merged the document
might end up hard deleted in the target segment. This isn't necessarily a
correctness issue but causes unnecessary writes of hard-deletes. The biggest
issue here is that we assert that previously deleted documents are still deleted
in the live-docs we apply and that might be violated by the retention policy.
Today we pass on the doc values update to the PendingDeletes
when it's applied. This might cause issues with a rentention policy
merge policy that will see a deleted document but not it's value on
disk.
This change moves back the PendingDeletes callback to flush time
in order to be consistent with what is actually updated on disk.
This change also makes sure we write values to disk on flush that
are in the reader pool as well as extra best effort checks to drop
fully deleted segments on flush, commit and getReader.
NumericDocValuesFieldUpdates and BinaryDocValuesFieldUpdates duplicate
a significant amount of logic that can all be pushed into the base class.
This change moves all the logic that is independent of the type to the base
class.
DV updates used the boxed type Long to keep API generic. Yet, the missing
type caused a lot of code duplication, boxing and unnecessary object creation.
This change cuts over to type safe APIs using BytesRef and long (the primitive)
In this change most of the code that is almost identical between binary and numeric
is not shared reducing the maintenance overhead and likelihood of introducing bugs.
The particular test here is #testStressLocks that has several protectesion against
WindowsFS and special logic in the catch clause that steps out on fatal exceptions with
pending deletes. Since we now check this consistently in the IW ctor we need to also
skip this entire test if we are on windows and have pending deletes.
IndexWriter checks in it's ctor if the incoming directory is an
FSDirectory. If that is the case it ensures that the directory retries
deleting it's pending deletes and if there are pending deletes it will
fail creating the writer. Yet, this check didn't unwrap filter directories
or subclasses like FileSwitchDirectory such that in the case of MDW we
never checked for pending deletes.
There are also two places in FSDirectory that first removed the file
that was supposed to be created / renamed to from the pending deletes set
and then tried to clean up pending deletes which excluded the file. These
places now remove the file from the set after the pending deletes are checked.