DV updates used the boxed type Long to keep API generic. Yet, the missing
type caused a lot of code duplication, boxing and unnecessary object creation.
This change cuts over to type safe APIs using BytesRef and long (the primitive)
In this change most of the code that is almost identical between binary and numeric
is not shared reducing the maintenance overhead and likelihood of introducing bugs.
The particular test here is #testStressLocks that has several protectesion against
WindowsFS and special logic in the catch clause that steps out on fatal exceptions with
pending deletes. Since we now check this consistently in the IW ctor we need to also
skip this entire test if we are on windows and have pending deletes.
IndexWriter checks in it's ctor if the incoming directory is an
FSDirectory. If that is the case it ensures that the directory retries
deleting it's pending deletes and if there are pending deletes it will
fail creating the writer. Yet, this check didn't unwrap filter directories
or subclasses like FileSwitchDirectory such that in the case of MDW we
never checked for pending deletes.
There are also two places in FSDirectory that first removed the file
that was supposed to be created / renamed to from the pending deletes set
and then tried to clean up pending deletes which excluded the file. These
places now remove the file from the set after the pending deletes are checked.
Today we duplicate a fair portion of the internal logic to
apply updates of binary and numeric doc values. This change refactors
this non-trivial code to share the same code path and only differ in
if we provide a binary or numeric instance. This also allows us to
iterator over the updates only once rather than twice once for numeric
and once for binary fields.
This change also subclass DocValuesIterator from DocValuesFieldUpdates.Iterator
which allows easier consumption down the road since it now shares most of it's
interface with DocIdSetIterator which is the main interface for this in Lucene.
This simplifies DocumentsWriterFlushQueue by moving all IW related
code out of it. The DWFQ now only contains logic for taking tickets
off the queue and applying it to a given consumer. The logic now
entirely resides in IW and has private visibility. Locking
also is more contained since IW knows exactly what is called and when.
IndexWriter today is shared with many classes like BufferedUpdateStream,
DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks
on the writer instance or assert that the current thread doesn't hold a lock.
This makes it very difficult to have a manageable threading model.
This change separates out the IndexWriter from those classes and makes them all
independent of IW. IW now implements a new interface for DocumentsWriter to communicate
on failed or successful flushes and tragic events. This allows IW to make it's critical
methods private and execute all lock critical actions on it's private queue that ensures
that the IW lock is not held. Follow-up changes will try to detach more code like
publishing flushed segments to ensure we never call back into IW in an uncontrolled way.
ReaderPool plays a central role in the IndexWriter pooling NRT readers
and making sure we write buffered deletes and updates to disk. This class
used to be a non-static inner class accessing many aspects including locks
from the IndexWriter itself. This change moves the class outside of IW and
defines it's responsibility in a clear way with respect to locks etc. Now
IndexWriter doesn't need to share ReaderPool anymore and reacts on writes done
inside the pool by checkpointing internally. This also removes acquiring the IW
lock inside the reader pool which makes reasoning about concurrency difficult.
This change also add javadocs and dedicated tests for the ReaderPool class.