Today once a document has a value in a certain DV field this values
can only be changed but not removed. While resetting / removing a value
from a field is certainly a corner case it can be used to undelete a
soft-deleted document unless it's merged away.
This allows to rollback changes without rolling back to another commitpoint
or trashing all uncommitted changes. In certain cenarios it can be used to
"repair" history of documents in distributed systems.
* MiniSolrCloudCluster.deleteAllCollections will now first delete aliases
* Minor refactorings to AliasesManager, AliasIntegrationTest, CreateRoutedAliasTest
IndexWriter can update doc values for a specific term but this might
affect all documents containing the term. With tryUpdateDocValues
users can update doc-values fields for individual documents. This allows
for instance to soft-delete individual documents.
The new method shares most of it's code with tryDeleteDocuments.
Today we carry over hard deletes based on the SegmentReaders liveDocs.
This is not correct if soft-deletes are used especially with rentention
policies. If a soft delete is added while a segment is merged the document
might end up hard deleted in the target segment. This isn't necessarily a
correctness issue but causes unnecessary writes of hard-deletes. The biggest
issue here is that we assert that previously deleted documents are still deleted
in the live-docs we apply and that might be violated by the retention policy.
Today we pass on the doc values update to the PendingDeletes
when it's applied. This might cause issues with a rentention policy
merge policy that will see a deleted document but not it's value on
disk.
This change moves back the PendingDeletes callback to flush time
in order to be consistent with what is actually updated on disk.
This change also makes sure we write values to disk on flush that
are in the reader pool as well as extra best effort checks to drop
fully deleted segments on flush, commit and getReader.
NumericDocValuesFieldUpdates and BinaryDocValuesFieldUpdates duplicate
a significant amount of logic that can all be pushed into the base class.
This change moves all the logic that is independent of the type to the base
class.
DV updates used the boxed type Long to keep API generic. Yet, the missing
type caused a lot of code duplication, boxing and unnecessary object creation.
This change cuts over to type safe APIs using BytesRef and long (the primitive)
In this change most of the code that is almost identical between binary and numeric
is not shared reducing the maintenance overhead and likelihood of introducing bugs.