Fix failing BaseVectorSimilarityQueryTestCase#testApproximate (#12922)

Discovered in #12921, and introduced in #12679 

The first issue is that we weren't advancing the `VectorScorer` [here](cf13a92950/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java (L257-L262)) -- so it was still un-positioned while trying to compute the similarity score

Earlier in the PR, the underlying delegate of the `FilteredDocIdSetIterator` was `scorer.iterator()` (see [here](cad565439b/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java (L107))) -- so we didn't need to explicitly advance it

Later, we decided to maintain parity to `AbstractKnnVectorQuery` and introduce filtering in `AbstractVectorSimilarityQuery` (see [this commit](5096790f28)) to determine the `visitLimit` of approximate search -- after which the underlying iterator changed to the accepted docs (see [here](5096790f28/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java (L255))) and I missed advancing the `VectorScorer` explicitly..

After doing so, we no longer get the original `java.lang.ArrayIndexOutOfBoundsException` -- but the `BaseVectorSimilarityQueryTestCase#testApproximate` starts failing because it falls back to exact search, as the limit of the prefilter is met during graph search

Relaxed the parameters of the test to fix this (making the filter less restrictive, and trying to visit a fewer number of nodes so that approximate search completes without hitting its limit)

Sorry for missing this earlier!
This commit is contained in:
Kaival Parikh 2023-12-13 20:41:45 +05:30 committed by GitHub
parent 98d2df17d5
commit 6c5dcc1795
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 9 additions and 2 deletions

View File

@ -255,6 +255,11 @@ abstract class AbstractVectorSimilarityQuery extends Query {
new FilteredDocIdSetIterator(acceptDocs) { new FilteredDocIdSetIterator(acceptDocs) {
@Override @Override
protected boolean match(int doc) throws IOException { protected boolean match(int doc) throws IOException {
// Advance the scorer
if (!scorer.advanceExact(doc)) {
return false;
}
// Compute the dot product // Compute the dot product
float score = scorer.score(); float score = scorer.score();
cachedScore[0] = score * boost; cachedScore[0] = score * boost;

View File

@ -87,6 +87,7 @@ abstract class VectorScorer {
@Override @Override
public float score() throws IOException { public float score() throws IOException {
assert values.docID() != -1 : getClass().getSimpleName() + " is not positioned";
return similarity.compare(query, values.vectorValue()); return similarity.compare(query, values.vectorValue());
} }
} }
@ -117,6 +118,7 @@ abstract class VectorScorer {
@Override @Override
public float score() throws IOException { public float score() throws IOException {
assert values.docID() != -1 : getClass().getSimpleName() + " is not positioned";
return similarity.compare(query, values.vectorValue()); return similarity.compare(query, values.vectorValue());
} }
} }

View File

@ -433,8 +433,8 @@ abstract class BaseVectorSimilarityQueryTestCase<
public void testApproximate() throws IOException { public void testApproximate() throws IOException {
// Non-restrictive filter, along with similarity to visit a small number of nodes // Non-restrictive filter, along with similarity to visit a small number of nodes
int numFiltered = random().nextInt((numDocs * 4) / 5, numDocs); int numFiltered = numDocs - 1;
int targetVisited = random().nextInt(numFiltered / 10, numFiltered / 8); int targetVisited = random().nextInt(1, numFiltered / 10);
V[] vectors = getRandomVectors(numDocs, dim); V[] vectors = getRandomVectors(numDocs, dim);
V queryVector = getRandomVector(dim); V queryVector = getRandomVector(dim);