opensearch-docs-cn/_search-plugins/knn/performance-tuning.md

---
layout: default
title: Performance tuning
parent: k-NN
nav_order: 8
---

# Performance tuning

This topic provides performance tuning recommendations to improve indexing and search performance for approximate k-NN. From a high level, k-NN works according to these principles:
* Native library indices are created per knn_vector field / (Lucene) segment pair.
* Queries execute on segments sequentially inside the shard (same as any other OpenSearch query).
* Each native library index in the segment returns <=k neighbors.
* The coordinator node picks up final size number of neighbors from the neighbors returned by each shard.

This topic also provides recommendations for comparing approximate k-NN to exact k-NN with score script.

## Indexing performance tuning

Take the following steps to improve indexing performance, especially when you plan to index a large number of vectors at once:

* **Disable the refresh interval**

   Either disable the refresh interval (default = 1 sec), or set a long duration for the refresh interval to avoid creating multiple small segments:

   ```json
   PUT /<index_name>/_settings
   {
       "index" : {
           "refresh_interval" : "-1"
       }
   }
   ```
   **Note**: Make sure to reenable `refresh_interval` after indexing finishes.

* **Disable replicas (no OpenSearch replica shard)**

   Set replicas to `0` to prevent duplicate construction of native library indices in both primary and replica shards. When you enable replicas after indexing finishes, the serialized native library indices are directly copied. If you have no replicas, losing nodes might cause data loss, so it's important that the data lives elsewhere so this initial load can be retried in case of an issue.

* **Increase the number of indexing threads**

   If the hardware you choose has multiple cores, you can allow multiple threads in native library index construction by speeding up the indexing process. Determine the number of threads to allot with the [knn.algo_param.index_thread_qty]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings#cluster-settings) setting.

  Keep an eye on CPU utilization and choose the correct number of threads. Because native library index construction is costly, having multiple threads can cause additional CPU load.

## Search performance tuning

Take the following steps to improve search performance:

* **Reduce segment count**

   To improve search performance, you must keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results.

   Ideally, having one segment per shard provides the optimal performance with respect to search latency. You can configure an index to have multiple shards to avoid giant shards and achieve more parallelism.

   You can control the number of segments by choosing a larger refresh interval, or during indexing by asking OpenSearch to slow down segment creation by disabling the refresh interval.

* **Warm up the index**

   Native library indices are constructed during indexing, but they're loaded into memory during the first search. In Lucene, each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point), and the top 'size' number of results based on the score are returned from all the results returned by segements at a shard level (higher score = better result).

   Once a native library index is loaded (native library indices are loaded outside OpenSearch JVM), OpenSearch caches them in memory. Initial queries are expensive and take a few seconds, while subsequent queries are faster and take milliseconds (assuming the k-NN circuit breaker isn't hit).

   To avoid this latency penalty during your first queries, you can use the warmup API operation on the indices you want to search:

   ```json
   GET /_plugins/_knn/warmup/index1,index2,index3?pretty
   {
     "_shards" : {
       "total" : 6,
       "successful" : 6,
       "failed" : 0
     }
   }
   ```

   The warmup API operation loads all native library indices for all shards (primary and replica) for the specified indices into the cache, so there's no penalty to load native library indices during initial searches.

   **Note**: This API operation only loads the segments of the indices it ***sees*** into the cache. If a merge or refresh operation finishes after the API runs, or if you add new documents, you need to rerun the API to load those native library indices into memory.

* **Avoid reading stored fields**

   If your use case is simply to read the IDs and scores of the nearest neighbors, you can disable reading stored fields, which saves time retrieving the vectors from stored fields.

## Improving recall

Recall depends on multiple factors like number of vectors, number of dimensions, segments, and so on. Searching over a large number of small segments and aggregating the results leads to better recall than searching over a small number of large segments and aggregating results. The larger the native library index, the more chances of losing recall if you're using smaller algorithm parameters. Choosing larger values for algorithm parameters should help solve this issue but sacrifices search latency and indexing time. That being said, it's important to understand your system's requirements for latency and accuracy, and then choose the number of segments you want your index to have based on experimentation.

The default parameters work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#index-settings).

## Approximate nearest neighbor versus score script

The standard k-NN query and custom scoring option perform differently. Test with a representative set of documents to see if the search results and latencies match your expectations.

Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latency, but be sure to keep shard size within the [recommended guidelines]({{site.url}}{{site.baseurl}}/opensearch#primary-and-replica-shards).
Initial documentation cut 2021-05-05 10:09:47 -07:00			`---`
			`layout: default`
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`title: Performance tuning`
Initial documentation cut 2021-05-05 10:09:47 -07:00			`parent: k-NN`
Update knn documentation for rc1 Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-05-28 12:34:56 -07:00			`nav_order: 8`
Initial documentation cut 2021-05-05 10:09:47 -07:00			`---`

			`# Performance tuning`

k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`This topic provides performance tuning recommendations to improve indexing and search performance for approximate k-NN. From a high level, k-NN works according to these principles:`
Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			`* Native library indices are created per knn_vector field / (Lucene) segment pair.`
Initial documentation cut 2021-05-05 10:09:47 -07:00			`* Queries execute on segments sequentially inside the shard (same as any other OpenSearch query).`
Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			`* Each native library index in the segment returns <=k neighbors.`
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`* The coordinator node picks up final size number of neighbors from the neighbors returned by each shard.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`This topic also provides recommendations for comparing approximate k-NN to exact k-NN with score script.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
			`## Indexing performance tuning`

k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`Take the following steps to improve indexing performance, especially when you plan to index a large number of vectors at once:`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`* Disable the refresh interval`

			`Either disable the refresh interval (default = 1 sec), or set a long duration for the refresh interval to avoid creating multiple small segments:`

			```json
			`PUT /<index_name>/_settings`
			`{`
			`"index" : {`
			`"refresh_interval" : "-1"`
			`}`
			`}`
			```
			Note: Make sure to reenable `refresh_interval` after indexing finishes.
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`* Disable replicas (no OpenSearch replica shard)`
Initial documentation cut 2021-05-05 10:09:47 -07:00
Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			Set replicas to `0` to prevent duplicate construction of native library indices in both primary and replica shards. When you enable replicas after indexing finishes, the serialized native library indices are directly copied. If you have no replicas, losing nodes might cause data loss, so it's important that the data lives elsewhere so this initial load can be retried in case of an issue.
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`* Increase the number of indexing threads`
Initial documentation cut 2021-05-05 10:09:47 -07:00
Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			`If the hardware you choose has multiple cores, you can allow multiple threads in native library index construction by speeding up the indexing process. Determine the number of threads to allot with the [knn.algo_param.index_thread_qty]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings#cluster-settings) setting.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			`Keep an eye on CPU utilization and choose the correct number of threads. Because native library index construction is costly, having multiple threads can cause additional CPU load.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
			`## Search performance tuning`

k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`Take the following steps to improve search performance:`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`* Reduce segment count`
Initial documentation cut 2021-05-05 10:09:47 -07:00
Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			`To improve search performance, you must keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`Ideally, having one segment per shard provides the optimal performance with respect to search latency. You can configure an index to have multiple shards to avoid giant shards and achieve more parallelism.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`You can control the number of segments by choosing a larger refresh interval, or during indexing by asking OpenSearch to slow down segment creation by disabling the refresh interval.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`* Warm up the index`
Initial documentation cut 2021-05-05 10:09:47 -07:00
Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			`Native library indices are constructed during indexing, but they're loaded into memory during the first search. In Lucene, each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point), and the top 'size' number of results based on the score are returned from all the results returned by segements at a shard level (higher score = better result).`
Initial documentation cut 2021-05-05 10:09:47 -07:00
Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			`Once a native library index is loaded (native library indices are loaded outside OpenSearch JVM), OpenSearch caches them in memory. Initial queries are expensive and take a few seconds, while subsequent queries are faster and take milliseconds (assuming the k-NN circuit breaker isn't hit).`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`To avoid this latency penalty during your first queries, you can use the warmup API operation on the indices you want to search:`

			```json
[k-NN] Update url path Update URL path from _opensearch to _plugins Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> 2021-05-27 11:23:31 -07:00			`GET /_plugins/_knn/warmup/index1,index2,index3?pretty`
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`{`
			`"_shards" : {`
			`"total" : 6,`
			`"successful" : 6,`
			`"failed" : 0`
			`}`
			`}`
			```

Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			`The warmup API operation loads all native library indices for all shards (primary and replica) for the specified indices into the cache, so there's no penalty to load native library indices during initial searches.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			`Note: This API operation only loads the segments of the indices it *sees* into the cache. If a merge or refresh operation finishes after the API runs, or if you add new documents, you need to rerun the API to load those native library indices into memory.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`* Avoid reading stored fields`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`If your use case is simply to read the IDs and scores of the nearest neighbors, you can disable reading stored fields, which saves time retrieving the vectors from stored fields.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`## Improving recall`
Initial documentation cut 2021-05-05 10:09:47 -07:00
Update performance tuning for faiss Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 12:52:05 -08:00			Recall depends on multiple factors like number of vectors, number of dimensions, segments, and so on. Searching over a large number of small segments and aggregating the results leads to better recall than searching over a small number of large segments and aggregating results. The larger the native library index, the more chances of losing recall if you're using smaller algorithm parameters. Choosing larger values for algorithm parameters should help solve this issue but sacrifices search latency and indexing time. That being said, it's important to understand your system's requirements for latency and accuracy, and then choose the number of segments you want your index to have based on experimentation.
Initial documentation cut 2021-05-05 10:09:47 -07:00
Complete v1 of faiss feature docs Signed-off-by: John Mazanec <jmazane@amazon.com> 2021-11-16 15:08:02 -08:00			`The default parameters work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#index-settings).`
Initial documentation cut 2021-05-05 10:09:47 -07:00
k-nn chapter fixes 2021-05-11 09:29:35 -07:00			`## Approximate nearest neighbor versus score script`
Initial documentation cut 2021-05-05 10:09:47 -07:00
			`The standard k-NN query and custom scoring option perform differently. Test with a representative set of documents to see if the search results and latencies match your expectations.`

No more relative links 2021-06-09 19:15:41 -07:00			`Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latency, but be sure to keep shard size within the [recommended guidelines]({{site.url}}{{site.baseurl}}/opensearch#primary-and-replica-shards).`