Merge pull request #10 from opensearch-project/liz-knn-edits

k-nn chapter fixes
This commit is contained in:
Andrew Etter 2021-05-11 15:06:41 -07:00 committed by GitHub
commit ab49b685e2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 131 additions and 141 deletions

View File

@ -6,14 +6,14 @@ parent: k-NN
has_children: false
---
# API
# k-NN plugin API
The k-NN plugin adds two API operations in order to allow users to better manage the plugin's functionality.
The k-NN plugin adds two API operations to help you better manage the plugin's functionality.
## Stats
The k-NN `stats` API provides information about the current status of the k-NN Plugin. The plugin keeps track of both cluster level and node level stats. Cluster level stats have a single value for the entire cluster. Node level stats have a single value for each node in the cluster. You can filter their query by nodeID and statName in the following way:
The k-NN `stats` API provides information about the current status of the k-NN plugin. The plugin keeps track of both cluster-level and node-level statistics. Cluster-level statistics have a single value for the entire cluster. Node-level statistics have a single value for each node in the cluster. You can filter the query by `nodeId` and `statName`:
```
GET /_opensearch/_knn/nodeId1,nodeId2/stats/statName1,statName2
```
@ -21,22 +21,22 @@ GET /_opensearch/_knn/nodeId1,nodeId2/stats/statName1,statName2
Statistic | Description
:--- | :---
`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This statistic is only relevant to approximate k-NN search.
`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This statistic is only relevant to approximate k-NN search.
`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. Note: Explicit evictions that occur because of index deletion are not counted. This statistic is only relevant to approximate k-NN search.
`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This statistic is only relevant to approximate k-NN search.
`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This statistic is only relevant to approximate k-NN search.
`total_load_time` | The time in nanoseconds that k-NN has taken to load graphs into the cache. This statistic is only relevant to approximate k-NN search.
`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. This statistic is only relevant to approximate k-NN search. <br /> **Note**: Explicit evictions that occur because of index deletion aren't counted.
`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph that's already loaded into memory. This statistic is only relevant to approximate k-NN search.
`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph that isn't loaded into memory yet. This statistic is only relevant to approximate k-NN search.
`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This statistic is only relevant to approximate k-NN search.
`graph_memory_usage_percentage` | The current weight of the cache as a percentage of the maximum cache capacity.
`graph_index_requests` | The number of requests to add the knn_vector field of a document into a graph.
`graph_index_errors` | The number of requests to add the knn_vector field of a document into a graph that have produced an error.
`graph_index_requests` | The number of requests to add the `knn_vector` field of a document into a graph.
`graph_index_errors` | The number of requests to add the `knn_vector` field of a document into a graph that have produced an error.
`graph_query_requests` | The number of graph queries that have been made.
`graph_query_errors` | The number of graph queries that have produced an error.
`knn_query_requests` | The number of KNN query requests received.
`knn_query_requests` | The number of k-NN query requests received.
`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This statistic is only relevant to approximate k-NN search.
`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This statistic is only relevant to approximate k-NN search.
`load_success_count` | The number of times k-NN successfully loaded a graph into the cache. This statistic is only relevant to approximate k-NN search.
`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This statistic is only relevant to approximate k-NN search.
`indices_in_cache` | For each index that has graphs in the cache, this stat provides the number of graphs that index has and the total graph_memory_usage that index is using in Kilobytes.
`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This statistic is only relevant to k-NN score script search.
`indices_in_cache` | For each index that has graphs in the cache, this statistic provides the number of graphs that index has and the total `graph_memory_usage` that index is using, in kilobytes.
`script_compilations` | The number of times the k-NN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the k-NN script might be recompiled. This statistic is only relevant to k-NN score script search.
`script_compilation_errors` | The number of errors during script compilation. This statistic is only relevant to k-NN score script search.
`script_query_requests` | The total number of script queries. This statistic is only relevant to k-NN score script search.
`script_query_errors` | The number of errors during script queries. This statistic is only relevant to k-NN score script search.
@ -107,9 +107,9 @@ GET /_opensearch/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,gra
## Warmup operation
The Hierarchical Navigable Small World (HNSW) graphs that are used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, these files need to be loaded into native memory.
The Hierarchical Navigable Small World (HNSW) graphs used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, the plugin needs to load these files into native memory.
If the plugin has not loaded the graphs into native memory, it loads them when it receives a search request. This loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort.
If the plugin hasn't loaded the graphs into native memory, it loads them when it receives a search request. The loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort.
As an alternative, you can avoid this latency issue by running the k-NN plugin warmup API operation on whatever indices you're interested in searching. This operation loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory.
@ -133,7 +133,7 @@ GET /_opensearch/_knn/warmup/index1,index2,index3?pretty
`total` indicates how many shards the k-NN plugin attempted to warm up. The response also includes the number of shards the plugin succeeded and failed to warm up.
The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues on the cluster. To monitor the warmup operation, use the OpenSearch `_tasks` API:
The call doesn't return results until the warmup operation finishes or the request times out. If the request times out, the operation still continues on the cluster. To monitor the warmup operation, use the OpenSearch `_tasks` API:
```json
GET /_tasks
@ -144,10 +144,10 @@ After the operation has finished, use the [k-NN `_stats` API operation](#Stats)
### Best practices
For the warmup operation to function properly, follow these best practices.
For the warmup operation to function properly, follow these best practices:
First, don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present.
* Don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are sometimes deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present.
Second, confirm that all graphs you want to warm up can fit into native memory. For more information about the native memory limit, see the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings). High graph memory usage causes cache thrashing, which can lead to operations constantly failing and attempting to run again.
* Confirm that all graphs you want to warm up can fit into native memory. For more information about the native memory limit, see the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings). High graph memory usage causes cache thrashing, which can lead to operations constantly failing and attempting to run again.
Finally, don't index any documents that you want to load into the cache. Writing new information to segments prevents the warmup API operation from loading the graphs until they're searchable. This means that you would have to run the warmup operation again after indexing finishes.
* Don't index any documents that you want to load into the cache. Writing new information to segments prevents the warmup API operation from loading the graphs until they're searchable. This means that you would have to run the warmup operation again after indexing finishes.

View File

@ -1,27 +1,27 @@
---
layout: default
title: Approximate Search
title: Approximate search
nav_order: 1
parent: k-NN
has_children: false
has_math: true
---
# Approximate k-NN Search
# Approximate k-NN search
The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the HNSW algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach should be preferred.
The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the Hierarchical Navigable Small World (HNSW) algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is preferred.
This plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API](../api#warmup). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section](../api#stats).
The k-NN plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API](../api#warmup). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section](../api#stats).
Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters will be applied on the results produced by the approximate nearest neighbor search.
Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor search.
## Get started with approximate k-NN
To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index.
Additionally, if you are using the approximate k-nearest neighbor method, you should specify `knn.space_type` to the space that you are interested in. This setting cannot be changed after it is set. To see what spaces we support, please refer to the [spaces section](#spaces). By default, `index.knn.space_type` is `l2`. For more information on index settings, such as algorithm parameters that can be tweaked to tune performance, please refer to the [documentation](../settings#index-settings).
Additionally, if you're using the approximate k-nearest neighbor method, specify `knn.space_type` to the space you're interested in. You can't change this setting after it's set. To see what spaces we support, see [spaces](#spaces). By default, `index.knn.space_type` is `l2`. For more information about index settings, such as algorithm parameters you can tweak to tune performance, see [Index settings](../settings#index-settings).
Next, you must add one or more fields of the `knn_vector` data type. Here is an example that creates an index with two `knn_vector` fields and uses cosine similarity:
Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two `knn_vector` fields and uses cosine similarity:
```json
PUT my-knn-index-1
@ -94,7 +94,7 @@ GET my-knn-index-1/_search
}
```
`k` is the number of neighbors the search of each graph will return. You must also include the `size` option. This option indicates how many results the query actually returns. The plugin returns `k` amount of results for each shard (and each segment) and `size` amount of results for the entire query. The plugin supports a maximum `k` value of 10,000.
`k` is the number of neighbors the search of each graph will return. You must also include the `size` option, which indicates how many results the query actually returns. The plugin returns `k` amount of results for each shard (and each segment) and `size` amount of results for the entire query. The plugin supports a maximum `k` value of 10,000.
### Using approximate k-NN with filters
If you use the `knn` query alongside filters or other clauses (e.g. `bool`, `must`, `match`), you might receive fewer than `k` results. In this example, `post_filter` reduces the number of results from 2 to 1:
@ -124,7 +124,7 @@ GET my-knn-index-1/_search
## Spaces
A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. To convert distances to OpenSearch scores, we take 1 / (1 + distance). Currently, the k-NN plugin supports the following spaces:
A space corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. To convert distances to OpenSearch scores, we take 1 / (1 + distance). Currently, the k-NN plugin supports the following spaces:
<table>
<thead style="text-align: left">
@ -158,5 +158,5 @@ A space corresponds to the function used to measure the distance between 2 point
</tr>
</table>
The cosine similarity formula does not include the `1 - ` prefix. However, because nmslib equates smaller scores with closer results, they return `1 - cosineSimilarity` for their cosine similarity space---that's why `1 - ` is included in the distance function.
The cosine similarity formula does not include the `1 -` prefix. However, because nmslib equates smaller scores with closer results, they return `1 - cosineSimilarity` for their cosine similarity space---that's why `1 -` is included in the distance function.
{: .note }

View File

@ -16,19 +16,19 @@ This plugin supports three different methods for obtaining the k-nearest neighbo
1. **Approximate k-NN**
The first method takes an approximate nearest neighbor approach; it uses the HNSW algorithm to return the approximate k-nearest neighbors to a query vector. This algorithm sacrifices indexing speed and search accuracy in return for lower latency and more scalable search. To learn more about the algorithm, please refer to [nmslib's documentation](https://github.com/nmslib/nmslib/) or [the paper introducing the algorithm](https://arxiv.org/abs/1603.09320).
The first method takes an approximate nearest neighbor approach---it uses the HNSW algorithm to return the approximate k-nearest neighbors to a query vector. This algorithm sacrifices indexing speed and search accuracy in return for lower latency and more scalable search. To learn more about the algorithm, please refer to [nmslib's documentation](https://github.com/nmslib/nmslib/) or [the paper introducing the algorithm](https://arxiv.org/abs/1603.09320).
Approximate k-NN is the best choice for searches over large indices (i.e. hundreds of thousands of vectors or more) that require low latency. You should not use approximate k-NN if you want to apply a filter on the index before the k-NN search, which greatly reduces the number of vectors to be searched. In this case, you should use either the script scoring method or painless extensions.
For more details about this method, refer to the [Approximate k-NN section](approximate-knn).
For more details about this method, see [Approximate k-NN search](approximate-knn).
2. **Script Score k-NN**
The second method extends OpenSearch's script scoring functionality to execute a brute force, exact k-NN search over "knn_vector" fields or fields that can represent binary objects. With this approach, you can run k-NN search on a subset of vectors in your index (sometimes referred to as a pre-filter search).
This approach should be used for searches over smaller bodies of documents or when a pre-filter is needed. Using this approach on large indices may lead to high latencies.
Use this approach for searches over smaller bodies of documents or when a pre-filter is needed. Using this approach on large indices may lead to high latencies.
For more details about this method, refer to the [k-NN Script Score section](knn-score-script).
For more details about this method, see [Exact k-NN with scoring script](knn-score-script).
3. **Painless extensions**
@ -36,7 +36,7 @@ This plugin supports three different methods for obtaining the k-nearest neighbo
This approach has slightly slower query performance compared to the k-NN Script Score. If your use case requires more customization over the final score, you should use this approach over Script Score k-NN.
For more details about this method, refer to the [painless functions section](painless-functions).
For more details about this method, see [Painless scripting functions](painless-functions).
Overall, for larger data sets, you should generally choose the approximate nearest neighbor method because it scales significantly better. For smaller data sets, where you may want to apply a filter, you should choose the custom scoring approach. If you have a more complex use case where you need to use a distance function as part of their scoring method, you should use the painless scripting approach.

View File

@ -1,10 +1,12 @@
---
layout: default
title: JNI Library
title: JNI library
nav_order: 5
parent: k-NN
has_children: false
---
# JNI Library
In order to integrate [nmslib's](https://github.com/nmslib/nmslib/) approximate k-NN functionality, which is implemented in C++, into the k-NN plugin, which is implemented in Java, we created a Java Native Interface library. Check out [this wiki](https://en.wikipedia.org/wiki/Java_Native_Interface) to learn more about JNI. This library allows the k-NN plugin to leverage nmslib's functionality. For more information about how we build the JNI library binary and how to get the most of it in your production environment, see [here](https://github.com/opensearch-project/k-NN#jni-library-artifacts).
# JNI library
To integrate [nmslib's](https://github.com/nmslib/nmslib/) approximate k-NN functionality (implemented in C++) into the k-NN plugin (implemented in Java), we created a Java Native Interface library, which lets the k-NN plugin leverage nmslib's functionality. To see how we build the JNI library binary and learn how to get the most of it in your production environment, see [JNI Library Artifacts](https://github.com/opensearch-project/k-NN#jni-library-artifacts).
For more information about JNI, see [Java Native Interface](https://en.wikipedia.org/wiki/Java_Native_Interface) on Wikipedia.

View File

@ -1,18 +1,24 @@
---
layout: default
title: Exact k-NN with Scoring Script
title: Exact k-NN with scoring script
nav_order: 2
parent: k-NN
has_children: false
has_math: true
---
# Exact k-NN with Scoring Script
The k-NN plugin implements the OpenSearch score script plugin that you can use to find the exact k-nearest neighbors to a given query point. Using the k-NN score script, you can apply a filter on an index before executing the nearest neighbor search. This is useful for dynamic search cases where the index body may vary based on other conditions. Because this approach executes a brute force search, it does not scale as well as the [Approximate approach](../approximate-knn). In some cases, it may be better to think about refactoring your workflow or index structure to use the Approximate approach instead of this approach.
# Exact k-NN with scoring script
The k-NN plugin implements the OpenSearch score script plugin that you can use to find the exact k-nearest neighbors to a given query point. Using the k-NN score script, you can apply a filter on an index before executing the nearest neighbor search. This is useful for dynamic search cases where the index body may vary based on other conditions.
Because the score script approach executes a brute force search, it doesn't scale as well as the [approximate approach](../approximate-knn). In some cases, it might be better to think about refactoring your workflow or index structure to use the approximate approach instead of the score script approach.
## Getting started with the score script for vectors
Similar to approximate nearest neighbor search, in order to use the score script on a body of vectors, you must first create an index with one or more `knn_vector` fields. If you intend to just use the script score approach (and not the approximate approach) `index.knn` can be set to `false` and `index.knn.space_type` does not need to be set. The space type can be chosen during search. See the [spaces section](#spaces) to see what spaces the k-NN score script suppports. Here is an example that creates an index with two `knn_vector` fields:
Similar to approximate nearest neighbor search, in order to use the score script on a body of vectors, you must first create an index with one or more `knn_vector` fields.
If you intend to just use the score script approach (and not the approximate approach) you can set `index.knn` to `false` and not set `index.knn.space_type`. You can choose the space type during search. See [spaces](#spaces) for the spaces the k-NN score script suppports.
This example creates an index with two `knn_vector` fields:
```json
PUT my-knn-index-1
@ -95,12 +101,7 @@ All parameters are required.
- `query_value` is the point you want to find the nearest neighbors for. For the Euclidean and cosine similarity spaces, the value must be an array of floats that matches the dimension set in the field's mapping. For Hamming bit distance, this value can be either of type signed long or a base64-encoded string (for the long and binary field types, respectively).
- `space_type` corresponds to the distance function. See the [spaces section](#spaces).
In later versions of the k-NN plugin, `vector` was replaced by `query_value` due to the addition of the `bithamming` space.
{ .note }
The [post filter example in the approximate approach](../approximate-knn/#using-approximate-k-nn-with-filters) shows a search that returns fewer than `k` results. If you want to avoid this situation, the score script method lets you essentially invert the order of events. In other words, you can filter down the set of documents you want to execute the k-nearest neighbor search over.
The [post filter example in the approximate approach](../approximate-knn/#using-approximate-k-nn-with-filters) shows a search that returns fewer than `k` results. If you want to avoid this situation, the score script method lets you essentially invert the order of events. In other words, you can filter down the set of documents over which to execute the k-nearest neighbor search.
This example shows a pre-filter approach to k-NN search with the score script approach. First, create the index:
@ -277,7 +278,7 @@ GET my-long-index/_search
## Spaces
A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. We include the conversions to OpenSearch scores in the table below:
A space corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. The following table illustrates how OpenSearch converts spaces to scores:
<table>
<thead style="text-align: left">

View File

@ -1,19 +1,19 @@
---
layout: default
title: k-NN Painless Extensions
title: k-NN Painless extensions
nav_order: 3
parent: k-NN
has_children: false
has_math: true
---
# Painless Scripting Functions
# k-NN Painless Scripting extensions
With the k-NN plugin's Painless Scripting extensions, you can use k-NN distance functions directly in your Painless scripts to perform operations on `knn_vector` fields. Painless has a strict list of allowed functions and classes per context to ensure its scripts are secure. The k-NN plugin adds Painless Scripting extensions to a few of the distance functions used in [k-NN score script](../knn-score-script), so you can utilize them when you need more customization with respect to your k-NN workload.
With the k-NN plugin's Painless Scripting extensions, you can use k-NN distance functions directly in your Painless scripts to perform operations on `knn_vector` fields. Painless has a strict list of allowed functions and classes per context to ensure its scripts are secure. The k-NN plugin adds Painless Scripting extensions to a few of the distance functions used in [k-NN score script](../knn-score-script), so you can use them to customize your k-NN workload.
## Get started with k-NN's Painless Scripting functions
To use k-NN's Painless Scripting functions, first, you must create an index with `knn_vector` fields like in [k-NN score script](../knn-score-script#Getting-started-with-the-score-script). Once the index is created and you have ingested some data, you can use the painless extensions:
To use k-NN's Painless Scripting functions, first create an index with `knn_vector` fields like in [k-NN score script](../knn-score-script#Getting-started-with-the-score-script). Once the index is created and you ingest some data, you can use the painless extensions:
```json
GET my-knn-index-2/_search
@ -45,39 +45,19 @@ GET my-knn-index-2/_search
`field` needs to map to a `knn_vector` field, and `query_value` needs to be a floating point array with the same dimension as `field`.
## Function types
The following table contains the available painless functions the k-NN plugin provides:
<table>
<thead style="text-align: left">
<tr>
<th>Function Name</th>
<th>Function Signature</th>
<th>Description</th>
</tr>
</thead>
<tr>
<td>l2Squared</td>
<td><code>float l2Squared (float[] queryVector, doc['vector field'])</code></td>
<td>This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors.</td>
</tr>
<tr>
<td>l1Norm</td>
<td><code>float l1Norm (float[] queryVector, doc['vector field'])</code></td>
<td>This function calculates the L1 Norm distance (Manhattan distance) between a given query vector and document vectors.</td>
</tr>
<tr>
<td>cosineSimilarity</td>
<td><code>float cosineSimilarity (float[] queryVector, doc['vector field'])</code></td>
<td>Cosine similarity is an inner product of the query vector and document vector normalized to both have length 1. If magnitude of the query vector does not change throughout the query, users can pass the magnitude of the query vector to improve the performance, instead of calculating the magnitude every time for every filtered document: <code>float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)</code>. In general, range of cosine similarity is [-1, 1], but in the case of information retrieval, the cosine similarity of two documents will range from 0 to 1 because tf-idf cannot be negative. Hence, the k-NN plugin adds 1.0 to always yield a positive cosine similarity score. </td>
</tr>
</table>
The following table describes the available painless functions the k-NN plugin provides:
Function name | Function signature | Description
:--- | :---
l2Squared | `float l2Squared (float[] queryVector, doc['vector field'])` | This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors.
l1Norm | `float l1Norm (float[] queryVector, doc['vector field'])` | This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors.
cosineSimilarity | `float cosineSimilarity (float[] queryVector, doc['vector field'])` | Cosine similarity is an inner product of the query vector and document vector normalized to both have a length of 1. If the magnitude of the query vector doesn't change throughout the query, you can pass the magnitude of the query vector to improve performance, instead of calculating the magnitude every time for every filtered document:<br /> `float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)` <br />In general, the range of cosine similarity is [-1, 1]. However, in the case of information retrieval, the cosine similarity of two documents ranges from 0 to 1 because the tf-idf statistic can't be negative. Therefore, the k-NN plugin adds 1.0 in order to always yield a positive cosine similarity score.
## Constraints
1. If a documents `knn_vector` field has different dimensions than the query, the function throws an `IllegalArgumentException`.
2. If a vector field doesn't have a value, the function throws an IllegalStateException.
2. If a vector field doesn't have a value, the function throws an <code>IllegalStateException</code>.
You can avoid this situation by first checking if a document has a value in its field:
```
"source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))",
"source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))",
```
Because scores can only be positive, this script ranks documents with vector fields higher than those without.

View File

@ -1,104 +1,111 @@
---
layout: default
title: Performance Tuning
title: Performance tuning
parent: k-NN
nav_order: 7
---
# Performance tuning
This section provides recommendations for performance tuning to improve indexing/search performance for approximate k-NN. From a high level, k-NN works according to these principles:
This topic provides performance tuning recommendations to improve indexing and search performance for approximate k-NN. From a high level, k-NN works according to these principles:
* Graphs are created per knn_vector field / (Lucene) segment pair.
* Queries execute on segments sequentially inside the shard (same as any other OpenSearch query).
* Each graph in the segment returns <=k neighbors.
* Coordinator node picks up final size number of neighbors from the neighbors returned by each shard.
* The coordinator node picks up final size number of neighbors from the neighbors returned by each shard.
Additionally, this section provides recommendations for comparing approximate k-NN to exact k-NN with score script.
This topic also provides recommendations for comparing approximate k-NN to exact k-NN with score script.
## Indexing performance tuning
The following steps can be taken to help improve indexing performance, especially when you plan to index a large number of vectors at once:
1. Disable refresh interval (Default = 1 sec) or set a long duration for refresh interval to avoid creating multiple small segments
Take the following steps to improve indexing performance, especially when you plan to index a large number of vectors at once:
```json
PUT /<index_name>/_settings
{
"index" : {
"refresh_interval" : "-1"
}
}
```
*Note* -- Be sure to reenable refresh_interval after indexing finishes.
* **Disable the refresh interval**
2. Disable Replicas (No OpenSearch replica shard).
Either disable the refresh interval (default = 1 sec), or set a long duration for the refresh interval to avoid creating multiple small segments:
Settings replicas to 0 avoids duplicate construction of graphs in both primary and replicas. When we enable replicas after the indexing, the serialized graphs are directly copied. Having no replicas means that losing a node(s) may incur data loss, so it is important that the data lives elsewhere so that this initial load can be retried in case of an issue.
```json
PUT /<index_name>/_settings
{
"index" : {
"refresh_interval" : "-1"
}
}
```
**Note**: Make sure to reenable `refresh_interval` after indexing finishes.
3. Increase number of indexing threads
* **Disable replicas (no OpenSearch replica shard)**
If the hardware we choose has multiple cores, we can allow multiple threads in graph construction by speeding up the indexing process. You can determine the number of threads to be allotted by using the [knn.algo_param.index_thread_qty](../settings/#Cluster-settings) setting.
Set replicas to `0` to prevent duplicate construction of graphs in both primary and replica shards. When you enable replicas after indexing finishes, the serialized graphs are directly copied. If you have no replicas, losing nodes might cause data loss, so it's important that the data lives elsewhere so this initial load can be retried in case of an issue.
Please keep an eye on CPU utilization and choose the right number of threads. Because graph construction is costly, having multiple threads can put additional load on CPU.
* **Increase the number of indexing threads**
If the hardware you choose has multiple cores, you can allow multiple threads in graph construction by speeding up the indexing process. Determine the number of threads to allot with the [knn.algo_param.index_thread_qty](../settings/#Cluster-settings) setting.
Keep an eye on CPU utilization and choose the correct number of threads. Because graph construction is costly, having multiple threads can cause additional CPU load.
## Search performance tuning
1. Have fewer segments
Take the following steps to improve search performance:
To improve search performance, it is necessary to keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results. But, because the complexity of search for the HNSW algorithm is logarithmic with respect to the number of vectors, searching over 5 graphs with 100 vectors each and then taking the top size results from 5*k results will take longer than searching over 1 graph with 500 vectors and then taking the top size results from k results. Ideally, having 1 segment per shard will give the optimal performance with respect to search latency. We can configure index to have multiple shards to avoid giant shards and achieve more parallelism.
* **Reduce segment count**
We can control the number of segments either during indexing by asking OpenSearch to slow down segment creation by disabling the refresh interval or choosing larger refresh interval.
To improve search performance, you must keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results. However, because the complexity of search for the HNSW algorithm is logarithmic with respect to the number of vectors, searching over five graphs with 100 vectors each and then taking the top 'size' results from 5*k results will take longer than searching over one graph with 500 vectors and then taking the top size results from k results.
2. Warm up the index
Ideally, having one segment per shard provides the optimal performance with respect to search latency. You can configure an index to have multiple shards to avoid giant shards and achieve more parallelism.
The graphs are constructed during indexing, but they are loaded into memory during the first search. The way search works in Lucene is that each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point), and the top size number of results based on the score would be returned from all of the results returned by segements at a shard level (higher score --> better result).
You can control the number of segments by choosing a larger refresh interval, or during indexing by asking OpenSearch to slow down segment creation by disabling the refresh interval.
Once a graph is loaded (graphs are loaded outside OpenSearch JVM), we cache the graphs in memory. The initial queries would be expensive in the order of a few seconds, and subsequent queries should be faster in the order of milliseconds (assuming knn circuit breaker is not hit).
* **Warm up the index**
To avoid this latency penalty during your first queries, you can use the warmup API operation on the indices they want to search.
Graphs are constructed during indexing, but they're loaded into memory during the first search. In Lucene, each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point), and the top 'size' number of results based on the score are returned from all the results returned by segements at a shard level (higher score = better result).
### Usage
Once a graph is loaded (graphs are loaded outside OpenSearch JVM), OpenSearch caches them in memory. Initial queries are expensive and take a few seconds, while subsequent queries are faster and take milliseconds (assuming the k-NN circuit breaker isn't hit).
```json
GET /_opensearch/_knn/warmup/index1,index2,index3?pretty
{
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
}
}
```
To avoid this latency penalty during your first queries, you can use the warmup API operation on the indices you want to search:
The warmup API operation loads all of the graphs for all of the shards (primaries and replicas) for the specified indices into the cache. Thus, there will be no penalty to load graphs during initial searches.
```json
GET /_opensearch/_knn/warmup/index1,index2,index3?pretty
{
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
}
}
```
*Note* - This API only loads the segments of the indices it sees into the cache. If a merge or refresh operation finishes after this API is ran or if new documents are added, this API will need to be re-ran to load those graphs into memory.
The warmup API operation loads all graphs for all shards (primary and replica) for the specified indices into the cache, so there's no penalty to load graphs during initial searches.
3. Avoid reading stored fields
**Note**: This API operation only loads the segments of the indices it ***sees*** into the cache. If a merge or refresh operation finishes after the API runs, or if you add new documents, you need to rerun the API to load those graphs into memory.
If the use case is to just read the nearest neighbors' Ids and scores, then we can disable reading stored fields, which can save some time retrieving the vectors from stored fields.
* **Avoid reading stored fields**
## Improving Recall
If your use case is simply to read the IDs and scores of the nearest neighbors, you can disable reading stored fields, which saves time retrieving the vectors from stored fields.
Recall depends on multiple factors like number of vectors, number of dimensions, segments, etc. Searching over a large number of small segments and aggregating the results leads to better recall than searching over a small number of large segments and aggregating results. The larger the graph, the more chances of losing recall if you are sticking with smaller algorithm parameters. Choosing larger values for algorithm parameters should help solve this issue but sacrifices search latency and indexing time. That being said, it is important to understand your system's requirements for latency and accuracy, and then choose the number of segments you want your index to have based on experimentation.
## Improving recall
Recall can be configured by adjusting the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm params that control recall are m, ef_construction, ef_search. For more details on influence of algorithm parameters on the indexing and search recall, please refer to the [HNSW algorithm parameters document](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values could help recall (leading to better search results) but at the cost of higher memory utilization and increased indexing time. Our default values work on a broader set of use cases from our experiments, but we encourage users to run their own experiments on their data sets and choose the appropriate values. For index-level settings, please refer to the [settings page](../settings#index-settings). We will add details on our experiments here shortly.
Recall depends on multiple factors like number of vectors, number of dimensions, segments, and so on. Searching over a large number of small segments and aggregating the results leads to better recall than searching over a small number of large segments and aggregating results. The larger the graph, the more chances of losing recall if you're using smaller algorithm parameters. Choosing larger values for algorithm parameters should help solve this issue but sacrifices search latency and indexing time. That being said, it's important to understand your system's requirements for latency and accuracy, and then choose the number of segments you want your index to have based on experimentation.
## Estimating Memory Usage
To configure recall, adjust the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm parameters that control recall are `m`, `ef_construction`, and `ef_search`. For more information about how algorithm parameters influence indexing and search recall, see [HNSW algorithm parameters](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values can help recall and lead to better search results, but at the cost of higher memory utilization and increased indexing time.
Typically, in an OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates graphs to a portion of the remaining RAM. This portion's size is determined by the circuit_breaker_limit cluster setting. By default, the circuit breaker limit is set at 50%.
The default recall values work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings](../settings#index-settings).
## Estimating memory usage
In a typical OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates graphs to a portion of the remaining RAM. This portion's size is determined by the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50%.
The memory required for graphs is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector.
As an example, assume that we have 1 Million vectors with a dimension of 256 and M of 16, and the memory required can be estimated as:
As an example, assume you have a million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:
```
1.1 * (4 *256 + 8 * 16) * 1,000,000 ~= 1.26 GB
```
*Note* -- Remember that having a replica will double the total number of vectors.
**Note**: Remember that having a replica doubles the total number of vectors.
## Approximate nearest neighbor vs. score script
## Approximate nearest neighbor versus score script
The standard k-NN query and custom scoring option perform differently. Test with a representative set of documents to see if the search results and latencies match your expectations.
Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../../opensearch/#primary-and-replica-shards).
Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latency, but be sure to keep shard size within the [recommended guidelines](../../opensearch/#primary-and-replica-shards).

View File

@ -5,32 +5,32 @@ parent: k-NN
nav_order: 6
---
# k-NN Settings
# k-NN settings
The k-NN plugin adds several new index and cluster settings.
## Index settings
The default values should work well for most use cases, but you can change these settings when you create the index.
The default values work well for most use cases, but you can change these settings when you create the index.
Setting | Default | Description
:--- | :--- | :---
`index.knn.algo_param.ef_search` | 512 | The size of the dynamic list used during KNN searches. Higher values lead to more accurate, but slower searches.
`index.knn.algo_param.ef_construction` | 512 | The size of the dynamic list used during KNN graph creation. Higher values lead to a more accurate graph, but slower indexing speed.
`index.knn.algo_param.ef_search` | 512 | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
`index.knn.algo_param.ef_construction` | 512 | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed.
`index.knn.algo_param.m` | 16 | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100.
`index.knn.space_type` | "l2" | The vector space used to calculate the distance between vectors. Currently, the KNN plugin supports the `l2` space (Euclidean distance) and `cosinesimil` space (cosine similarity). For more information on these spaces, refer to the [nmslib documentation](https://github.com/nmslib/nmslib/blob/master/manual/spaces.md).
`index.knn.space_type` | "l2" | The vector space used to calculate the distance between vectors. Currently, the k-NN plugin supports the `l2` space (Euclidean distance) and `cosinesimil` space (cosine similarity). For more information on these spaces, see the [nmslib documentation](https://github.com/nmslib/nmslib/blob/master/manual/spaces.md).
## Cluster settings
Setting | Default | Description
:--- | :--- | :---
`knn.algo_param.index_thread_qty` | 1 | The number of threads used for graph creation. Keeping this value low reduces the CPU impact of the KNN plugin, but also reduces indexing performance.
`knn.algo_param.index_thread_qty` | 1 | The number of threads used for graph creation. Keeping this value low reduces the CPU impact of the k-NN plugin, but also reduces indexing performance.
`knn.cache.item.expiry.enabled` | false | Whether to remove graphs that have not been accessed for a certain duration from memory.
`knn.cache.item.expiry.minutes` | 3h | If enabled, the idle time before removing a graph from memory.
`knn.circuit_breaker.unset.percentage` | 75.0 | The native memory usage threshold for the circuit breaker. Memory usage must be below this percentage of `knn.memory.circuit_breaker.limit` for `knn.circuit_breaker.triggered` to remain false.
`knn.circuit_breaker.triggered` | false | True when memory usage exceeds the `knn.circuit_breaker.unset.percentage` value.
`knn.memory.circuit_breaker.limit` | 50% | The native memory limit for graphs. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, KNN removes the least recently used graphs.
`knn.memory.circuit_breaker.enabled` | true | Whether to enable the KNN memory circuit breaker.
`knn.plugin.enabled`| true | Enables or disables the KNN plugin.
`knn.memory.circuit_breaker.limit` | 50% | The native memory limit for graphs. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, k-NN removes the least recently used graphs.
`knn.memory.circuit_breaker.enabled` | true | Whether to enable the k-NN memory circuit breaker.
`knn.plugin.enabled`| true | Enables or disables the k-NN plugin.