IndexWriter checks in it's ctor if the incoming directory is an
FSDirectory. If that is the case it ensures that the directory retries
deleting it's pending deletes and if there are pending deletes it will
fail creating the writer. Yet, this check didn't unwrap filter directories
or subclasses like FileSwitchDirectory such that in the case of MDW we
never checked for pending deletes.
There are also two places in FSDirectory that first removed the file
that was supposed to be created / renamed to from the pending deletes set
and then tried to clean up pending deletes which excluded the file. These
places now remove the file from the set after the pending deletes are checked.
Today we duplicate a fair portion of the internal logic to
apply updates of binary and numeric doc values. This change refactors
this non-trivial code to share the same code path and only differ in
if we provide a binary or numeric instance. This also allows us to
iterator over the updates only once rather than twice once for numeric
and once for binary fields.
This change also subclass DocValuesIterator from DocValuesFieldUpdates.Iterator
which allows easier consumption down the road since it now shares most of it's
interface with DocIdSetIterator which is the main interface for this in Lucene.
This simplifies DocumentsWriterFlushQueue by moving all IW related
code out of it. The DWFQ now only contains logic for taking tickets
off the queue and applying it to a given consumer. The logic now
entirely resides in IW and has private visibility. Locking
also is more contained since IW knows exactly what is called and when.
IndexWriter today is shared with many classes like BufferedUpdateStream,
DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks
on the writer instance or assert that the current thread doesn't hold a lock.
This makes it very difficult to have a manageable threading model.
This change separates out the IndexWriter from those classes and makes them all
independent of IW. IW now implements a new interface for DocumentsWriter to communicate
on failed or successful flushes and tragic events. This allows IW to make it's critical
methods private and execute all lock critical actions on it's private queue that ensures
that the IW lock is not held. Follow-up changes will try to detach more code like
publishing flushed segments to ensure we never call back into IW in an uncontrolled way.
ReaderPool plays a central role in the IndexWriter pooling NRT readers
and making sure we write buffered deletes and updates to disk. This class
used to be a non-static inner class accessing many aspects including locks
from the IndexWriter itself. This change moves the class outside of IW and
defines it's responsibility in a clear way with respect to locks etc. Now
IndexWriter doesn't need to share ReaderPool anymore and reacts on writes done
inside the pool by checkpointing internally. This also removes acquiring the IW
lock inside the reader pool which makes reasoning about concurrency difficult.
This change also add javadocs and dedicated tests for the ReaderPool class.
* SetAliasPropCmd now calls AliasesManager.update() first.
* SetAliasPropCmd now more efficiently updates multiple values.
* Tests: Commented out BadApple annotations on alias related stuff.
IndexWriter#numDeletesToMerge was creating a ReadersAndUpdates
for all incoming SegmentCommitInfo even if that info wasn't private
to the IndexWriter. This is an illegal use of this API but since it's
transitively public via MergePolicy#findMerges we have to be conservative
with regestiering ReadersAndUpdates. In IndexWriter#numDeletesToMerge we
can only use existing ones. This means for soft-deletes we need to react
earlier in order to produce accurate numbers.
This change partially rolls back the changes in LUCENE-8253. Instead of
registering the readers once they are pulled via IndexWriter#numDeletesToMerge
we now check if segments are fully deleted on flush which is very unlikely and
can be done in a lazy fashion ie. it's only paying the extra cost of opening a
reader and checking all soft-deletes if soft deletes are used and present
in the flushed segment.
This has the side-effect that flushed segments that are 100% hard deleted are also
cleaned up right after they are flushed, previously these segments were sticking
around for a while until they got picked for a merge or received another delete.
This also closes LUCENE-8256