mirror of
https://github.com/iSharkFly-Docs/opensearch-docs-cn
synced 2025-02-18 13:44:49 +00:00
Minor KNN tweaks
This commit is contained in:
parent
2f8ef915a7
commit
2c32397202
@ -7,6 +7,7 @@ has_children: false
|
||||
---
|
||||
|
||||
# JNI library
|
||||
|
||||
To integrate [nmslib's](https://github.com/nmslib/nmslib/) approximate k-NN functionality (implemented in C++) into the k-NN plugin (implemented in Java), we created a Java Native Interface library, which lets the k-NN plugin leverage nmslib's functionality. To see how we build the JNI library binary and learn how to get the most of it in your production environment, see [JNI Library Artifacts](https://github.com/opensearch-project/k-NN#jni-library-artifacts).
|
||||
|
||||
For more information about JNI, see [Java Native Interface](https://en.wikipedia.org/wiki/Java_Native_Interface) on Wikipedia.
|
||||
|
@ -8,7 +8,8 @@ has_children: false
|
||||
|
||||
# k-NN Index
|
||||
|
||||
## `knn_vector` datatype
|
||||
## knn_vector data type
|
||||
|
||||
The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors
|
||||
into an OpenSearch index.
|
||||
|
||||
|
@ -8,6 +8,7 @@ has_math: true
|
||||
---
|
||||
|
||||
# Exact k-NN with scoring script
|
||||
|
||||
The k-NN plugin implements the OpenSearch score script plugin that you can use to find the exact k-nearest neighbors to a given query point. Using the k-NN score script, you can apply a filter on an index before executing the nearest neighbor search. This is useful for dynamic search cases where the index body may vary based on other conditions.
|
||||
|
||||
Because the score script approach executes a brute force search, it doesn't scale as well as the [approximate approach](../approximate-knn). In some cases, it might be better to think about refactoring your workflow or index structure to use the approximate approach instead of the score script approach.
|
||||
|
@ -54,10 +54,15 @@ l1Norm | `float l1Norm (float[] queryVector, doc['vector field'])` | This functi
|
||||
cosineSimilarity | `float cosineSimilarity (float[] queryVector, doc['vector field'])` | Cosine similarity is an inner product of the query vector and document vector normalized to both have a length of 1. If the magnitude of the query vector doesn't change throughout the query, you can pass the magnitude of the query vector to improve performance, instead of calculating the magnitude every time for every filtered document:<br /> `float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)` <br />In general, the range of cosine similarity is [-1, 1]. However, in the case of information retrieval, the cosine similarity of two documents ranges from 0 to 1 because the tf-idf statistic can't be negative. Therefore, the k-NN plugin adds 1.0 in order to always yield a positive cosine similarity score.
|
||||
|
||||
## Constraints
|
||||
|
||||
1. If a document’s `knn_vector` field has different dimensions than the query, the function throws an `IllegalArgumentException`.
|
||||
|
||||
2. If a vector field doesn't have a value, the function throws an <code>IllegalStateException</code>.
|
||||
|
||||
You can avoid this situation by first checking if a document has a value in its field:
|
||||
|
||||
```
|
||||
"source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))",
|
||||
```
|
||||
|
||||
Because scores can only be positive, this script ranks documents with vector fields higher than those without.
|
||||
|
@ -102,7 +102,8 @@ As an example, assume you have a million vectors with a dimension of 256 and M o
|
||||
1.1 * (4 *256 + 8 * 16) * 1,000,000 ~= 1.26 GB
|
||||
```
|
||||
|
||||
**Note**: Remember that having a replica doubles the total number of vectors.
|
||||
Having a replica doubles the total number of vectors.
|
||||
{: .note }
|
||||
|
||||
## Approximate nearest neighbor versus score script
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user