Simplify HnswGraph#search. (#627)

Currently the contract on `bound` is that it holds the score of the top of the
`results` priority queue. It means that a candidate is only considered if its
score is better than the bound *or* if less than `topK` results have been
accumulated so far. I think it would be simpler if `bound` would always hold
the minimum score that is required for a candidate to be considered? This would
also be more consistent with how our WAND support works, by trusting
`setMinCompetitiveScore` alone, instead of having to check whether the priority
queue is full as well.
This commit is contained in:
Adrien Grand 2022-01-27 18:08:06 +01:00 committed by GitHub
parent 4323848469
commit 09ddac1fe5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 11 additions and 10 deletions

View File

@ -167,17 +167,17 @@ public final class HnswGraph extends KnnGraphValues {
}
}
// Set the bound to the worst current result and below reject any newly-generated candidates
// failing to exceed this bound
// A bound that holds the minimum similarity to the query vector that a candidate vector must
// have to be considered.
BoundsChecker bound = BoundsChecker.create(similarityFunction.reversed);
bound.set(results.topScore());
if (results.size() >= topK) {
bound.set(results.topScore());
}
while (candidates.size() > 0) {
// get the best candidate (closest or best scoring)
float topCandidateScore = candidates.topScore();
if (results.size() >= topK) {
if (bound.check(topCandidateScore)) {
break;
}
if (bound.check(topCandidateScore)) {
break;
}
int topCandidateNode = candidates.pop();
graphValues.seek(level, topCandidateNode);
@ -189,11 +189,12 @@ public final class HnswGraph extends KnnGraphValues {
}
float score = similarityFunction.compare(query, vectors.vectorValue(friendOrd));
if (results.size() < topK || bound.check(score) == false) {
if (bound.check(score) == false) {
candidates.add(friendOrd, score);
if (acceptOrds == null || acceptOrds.get(friendOrd)) {
results.insertWithOverflow(friendOrd, score);
bound.set(results.topScore());
if (results.insertWithOverflow(friendOrd, score) && results.size() >= topK) {
bound.set(results.topScore());
}
}
}
}