LUCENE-10069: Document that kNN queries might not return all results (#434)

Performing a kNN search with very large k may return fewer than k documents.
This is due to the fact that the HNSW graph is not guaranteed to be connected.
This commit documents the behavior as part of a general warning that the results
of a kNN search may be approximate.
This commit is contained in:
Julie Tibshirani 2021-11-12 14:19:20 -08:00 committed by GitHub
parent 2a9adb81df
commit 3b914a4d73
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 8 additions and 4 deletions

View File

@ -45,9 +45,13 @@ public abstract class KnnVectorsReader implements Closeable, Accountable {
/**
* Return the k nearest neighbor documents as determined by comparison of their vector values for
* this field, to the given vector, by the field's search strategy. If the search strategy is
* reversed, lower values indicate nearer vectors, otherwise higher scores indicate nearer
* vectors. Unlike relevance scores, vector scores may be negative.
* this field, to the given vector, by the field's similarity function. The score of each document
* is derived from the vector similarity in a way that ensures scores are positive and that a
* larger score corresponds to a higher ranking.
*
* <p>The search is allowed to be approximate, meaning the results are not guaranteed to be the
* true k closest neighbors. For large values of k (for example when k is close to the total
* number of documents), the search may also retrieve fewer than k documents.
*
* @param field the vector field to search
* @param target the vector-valued query

View File

@ -28,7 +28,7 @@ import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.util.Bits;
/** Uses {@link KnnVectorsReader#search} to perform nearest Neighbour search. */
/** Uses {@link KnnVectorsReader#search} to perform nearest neighbour search. */
public class KnnVectorQuery extends Query {
private static final TopDocs NO_RESULTS =