The merge sort of sorted segments can produce an invalid
sort if the sort field is an Integer/Long that uses reverse order and contains values equal to
Integer/Long#MIN_VALUE. These values are always sorted first during a merge
(instead of last because of the reverse order) due to this bug.
Indices affected by the bug can be detected by running the CheckIndex command on a
distribution that contains the fix (7.6+).
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested.
It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected.
Relates #33028
This change fixes a bug where interleaved update and reset value
to the same doc in the same updates package looses an update
if the reset comes before the update as well as loosing the reset
if the update comes frist.
Co-authored-by: Adrien Grand <jpountz@gmail.com>
A segmemnt written with Lucene70Codec failes if it ties to update
a DV field that didn't exist in the index before it was upgraded to
Lucene80Codec. We bake the DV format into the FieldInfo when it's used
the first time and therefor never go to the codec if we need to update.
yet on a field that didn't exist before and was added during an indexing
operation we have to consult the coded and get an exception.
This change fixes this issue and adds the relevant bwc tests.
The case when all values are the the same on a numeric field update
is common for soft_deletes. With the new infrastucture for buffering
DV updates we can gain an easy win by specializing the applied updates
if all values are the same.
Today we are using a LinkedHashMap to buffer doc-values updates in
BufferedUpdates. This on the one hand uses an Object based datastructure
and on the other requires re-encoding the data into a more compact representation
once the BufferedUpdates are frozen. This change uses a more compact represenation
for the updates already in the BufferedUpdates in a parallel-array like datastructure
that can be reused in FrozenBufferedDeletes. It also adds an much simpler to use
API to consume the updates and allows for internal memory optimization for common
case updates.