`RandomAccessVectorValues` is internally used in our HNSW implementation to
provide random access to vectors, both at index and search time. In order to
better reflect this, this change does the following:
- `RandomAccessVectorValues` moves to `org.apache.lucene.util.hnsw`.
- `BufferingKnnVectorsWriter` no longer has a dependency on
`RandomAccessVectorValues` and moves to `org.apache.lucene.codecs` since
it's more of a utility class for KNN vector file formats than an index API.
Maybe we should think of moving it near each file format that uses it
instead.
- `SortingCodecReader` no longer has a dependency on
`RandomAccessVectorValues`.
Closes#10623
This generalizes #687 to indexes that are sorted in descending order. The main
challenge with descending sorts is that they require being able to compute the
last doc ID that matches a value, which would ideally require walking the BKD
tree in reverse order, but the API only support moving forward. This is worked
around by maintaining a stack of `PointTree` clones to perform the search.
These are easy/obvious ones to disable since we don't use the
functionality at all: the checks are literally useless.
This gives some performance boost to the error-prone, although it is
still pretty slow.
triage most of the previously disabled checks into TODO, noisy, etc
PassageScorer uses a priority queue of size maxPassages to keep track of
which highlighted passages are worth returning to the user. Once all
passages have been collected, we go through and merge overlapping
passages together, but this reduction in the number of passages is not
compensated for by re-adding the highest-scoring passages that were pushed
out of the queue by passages which have been merged away.
This commit increases the size of the priority queue to try and account for
overlapping passages that will subsequently be merged together.
ExitableTerms should not iterate through the terms to retrieve min and max when the wrapped implementation has the values cached (e.g. FieldsReader, OrdsFieldReader)
OffsetsFromMatchIterator and OffsetsFromPositions both have package-
private constructors, which makes them difficult to use as components in a
separate highlighter implementation.
This does not change the semantics or performance of our setup.
Instead, it explicitly enables checks that we want vs disabling checks
that we don't want.
Also reordered checks to match the error-prone website list of checks
for easier maintenance.
It is now clear that many useless checks are enabled, we can disable
some of them and try to get the performance reasonable.
`VectorValues` have a `cost()` method that reports an approximate number of
documents that have a vector, but also a `size()` method that reports the
accurate number of vectors in the field. Since KNN vectors only support
single-valued fields we should enforce that `cost()` returns the `size()`.