Today we pass on the doc values update to the PendingDeletes
when it's applied. This might cause issues with a rentention policy
merge policy that will see a deleted document but not it's value on
disk.
This change moves back the PendingDeletes callback to flush time
in order to be consistent with what is actually updated on disk.
This change also makes sure we write values to disk on flush that
are in the reader pool as well as extra best effort checks to drop
fully deleted segments on flush, commit and getReader.
NumericDocValuesFieldUpdates and BinaryDocValuesFieldUpdates duplicate
a significant amount of logic that can all be pushed into the base class.
This change moves all the logic that is independent of the type to the base
class.
DV updates used the boxed type Long to keep API generic. Yet, the missing
type caused a lot of code duplication, boxing and unnecessary object creation.
This change cuts over to type safe APIs using BytesRef and long (the primitive)
In this change most of the code that is almost identical between binary and numeric
is not shared reducing the maintenance overhead and likelihood of introducing bugs.
The particular test here is #testStressLocks that has several protectesion against
WindowsFS and special logic in the catch clause that steps out on fatal exceptions with
pending deletes. Since we now check this consistently in the IW ctor we need to also
skip this entire test if we are on windows and have pending deletes.
IndexWriter checks in it's ctor if the incoming directory is an
FSDirectory. If that is the case it ensures that the directory retries
deleting it's pending deletes and if there are pending deletes it will
fail creating the writer. Yet, this check didn't unwrap filter directories
or subclasses like FileSwitchDirectory such that in the case of MDW we
never checked for pending deletes.
There are also two places in FSDirectory that first removed the file
that was supposed to be created / renamed to from the pending deletes set
and then tried to clean up pending deletes which excluded the file. These
places now remove the file from the set after the pending deletes are checked.
Today we duplicate a fair portion of the internal logic to
apply updates of binary and numeric doc values. This change refactors
this non-trivial code to share the same code path and only differ in
if we provide a binary or numeric instance. This also allows us to
iterator over the updates only once rather than twice once for numeric
and once for binary fields.
This change also subclass DocValuesIterator from DocValuesFieldUpdates.Iterator
which allows easier consumption down the road since it now shares most of it's
interface with DocIdSetIterator which is the main interface for this in Lucene.