Inside the IndexWriter buffers are only written to disk if it's needed
or "worth it" which doesn't guarantee soft deletes to be accounted
in time. This is not necessarily a problem since they are eventually
collected and segments that have soft-deletes will me merged eventually
but for tests and on par behavior compared to hard deletes this behavior
is tricky.
This change cuts over to accounting in-place just like hard-deletes. This
results in accurate delete numbers for soft deletes at any give point in time
once the reader is loaded or a pending soft delete occurs.
This change also fixes an issue where all updates to a DV field are allowed
event if the field is unknown. Now this only works if the field is equal
to the soft deletes field. This behavior was never released.
This change adds a korean analyzer in a new analysis module named nori. It is similar
to Kuromoji but uses the mecab-ko-dic dictionary to perform morphological analysis of Korean
text.
We drop changes after we finish a merge, this has also reset
the DV generation the PendingSoftDeletes were initialized on causing
assertions to trip if releaseing the reader was writing DV to disk.
This change removes resetting the dv generation to make assertions
hold which requried to keep the pending change count on PendingSoftDeletes.
this test was flagged as BadApple and referred to SOLR-12028
The test stated clearly that the usage of newSearch(reader) is
dangerous since it might add concurrency to the test. This commit
respects this comment and removes all subsequent useage of
newSearcher(...)
This change adds a missing call to PendingDeletes#onNewReader and
hardens the assertion when a PendingDelete can actually be modified ie.
receive deletes and updates. Now PendingDeltes are also initialized
when no reader is provided but the SegmentCommitInfo has evidence that there
is are no deletes.
With the introduction of soft deletes no every merge claims all documents
that are marked as deleted in the segment readers. MergePolicies still
need to do accurate accounting in order to select segments for merging
and need to decide if segments are merged. This change allows the
merge policy to customize the number of deletes a merge of a segment
claims.