Merge pull request #280 from jmazanec15/knn-faiss-update

Update k-NN documentation for faiss support feature
2021-11-23 09:45:47 -08:00 · 2021-11-23 09:45:47 -08:00 · 3a147fa551
parent a5039bd90a 4646820ee6
commit 3a147fa551
8 changed files with 644 additions and 130 deletions
--- a/_search-plugins/knn/api.md
+++ b/_search-plugins/knn/api.md
@ -1,6 +1,6 @@
 ---
 layout: default
-title: k-NN API
+title: API
 nav_order: 5
 parent: k-NN
 has_children: false
@ -8,7 +8,7 @@ has_children: false

 # k-NN plugin API

-The k-NN plugin adds two API operations to help you better manage the plugin's functionality.
+The k-NN plugin adds several APIs for managing, monitoring and optimizing your k-NN workload.


 ## Stats
@ -23,25 +23,35 @@ GET /_plugins/_knn/nodeId1,nodeId2/stats/statName1,statName2
 Statistic |  Description
 :--- | :---
 `circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This statistic is only relevant to approximate k-NN search.
-`total_load_time` | The time in nanoseconds that k-NN has taken to load graphs into the cache. This statistic is only relevant to approximate k-NN search.
-`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. This statistic is only relevant to approximate k-NN search. <br /> **Note**: Explicit evictions that occur because of index deletion aren't counted.
-`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph that's already loaded into memory. This statistic is only relevant to approximate k-NN search.
-`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph that isn't loaded into memory yet. This statistic is only relevant to approximate k-NN search.
-`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This statistic is only relevant to approximate k-NN search.
-`graph_memory_usage_percentage` | The current weight of the cache as a percentage of the maximum cache capacity.
-`graph_index_requests` | The number of requests to add the `knn_vector` field of a document into a graph.
-`graph_index_errors` | The number of requests to add the `knn_vector` field of a document into a graph that have produced an error.
-`graph_query_requests` | The number of graph queries that have been made.
-`graph_query_errors` | The number of graph queries that have produced an error.
+`total_load_time` | The time in nanoseconds that k-NN has taken to load native library indices into the cache. This statistic is only relevant to approximate k-NN search.
+`eviction_count` | The number of native library indices that have been evicted from the cache due to memory constraints or idle time. This statistic is only relevant to approximate k-NN search. <br /> **Note**: Explicit evictions that occur because of index deletion aren't counted.
+`hit_count` | The number of cache hits. A cache hit occurs when a user queries a native library index that's already loaded into memory. This statistic is only relevant to approximate k-NN search.
+`miss_count` | The number of cache misses. A cache miss occurs when a user queries a native library index that isn't loaded into memory yet. This statistic is only relevant to approximate k-NN search.
+`graph_memory_usage` | The amount of native memory native library indices are using on the node in kilobytes.
+`graph_memory_usage_percentage` | The amount of native memory native library indices are using on the node as a percentage of the maximum cache capacity.
+`graph_index_requests` | The number of requests to add the `knn_vector` field of a document into a native library index.
+`graph_index_errors` | The number of requests to add the `knn_vector` field of a document into a native library index that have produced an error.
+`graph_query_requests` | The number of native library index queries that have been made.
+`graph_query_errors` | The number of native library index queries that have produced an error.
 `knn_query_requests` | The number of k-NN query requests received.
 `cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This statistic is only relevant to approximate k-NN search.
-`load_success_count` | The number of times k-NN successfully loaded a graph into the cache. This statistic is only relevant to approximate k-NN search.
-`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This statistic is only relevant to approximate k-NN search.
-`indices_in_cache` | For each index that has graphs in the cache, this statistic provides the number of graphs that index has and the total `graph_memory_usage` that index is using, in kilobytes.
+`load_success_count` | The number of times k-NN successfully loaded a native library index into the cache. This statistic is only relevant to approximate k-NN search.
+`load_exception_count` | The number of times an exception occurred when trying to load a native library index into the cache. This statistic is only relevant to approximate k-NN search.
+`indices_in_cache` | For each OpenSearch index  with a `knn_vector` field and approximate k-NN turned on, this statistic provides the number of native library indices that OpenSearch index has and the total `graph_memory_usage` that the OpenSearch index is using, in kilobytes.
 `script_compilations` | The number of times the k-NN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the k-NN script might be recompiled. This statistic is only relevant to k-NN score script search.
 `script_compilation_errors` | The number of errors during script compilation. This statistic is only relevant to k-NN score script search.
 `script_query_requests` | The total number of script queries. This statistic is only relevant to k-NN score script search.
 `script_query_errors` | The number of errors during script queries. This statistic is only relevant to k-NN score script search.
+`nmslib_initialized` | Boolean value indicating whether the *nmslib* JNI library has been loaded and initialized on the node.
+`faiss_initialized` | Boolean value indicating whether the *faiss* JNI library has been loaded and initialized on the node.
+`model_index_status` | Status of model system index. Valid values are "red", "yellow", "green". If the index does not exist, this will be null.
+`indexing_from_model_degraded` | Boolean value indicating if indexing from a model is degraded. This will happen if there is not enough JVM memory to cache the models.
+`training_requests` | The number of training requests made to the node.
+`training_errors` | The number of training errors that have occurred on the node.
+`training_memory_usage` | The amount of native memory training is using on the node in kilobytes.
+`training_memory_usage_percentage` | The amount of native memory training is using on the node as a percentage of the maximum cache capacity.
+
+**Note**: Some stats contain *graph* in the name. In these cases, *graph* is synonymous with *native library index*. The term *graph* is a legacy detail, coming from when the plugin only supported the HNSW algorithm, which consists of hierarchical graphs. 


 ### Usage
@ -54,37 +64,45 @@ GET /_plugins/_knn/stats?pretty
        "successful" : 1,
        "failed" : 0
    },
-    "cluster_name" : "_run",
+    "cluster_name" : "my-cluster",
    "circuit_breaker_triggered" : false,
+    "model_index_status" : "YELLOW",
    "nodes" : {
-        "HYMrXXsBSamUkcAjhjeN0w" : {
-            "eviction_count" : 0,
-            "miss_count" : 1,
-            "graph_memory_usage" : 1,
-            "graph_memory_usage_percentage" : 3.68,
-            "graph_index_requests" : 7,
-            "graph_index_errors" : 1,
-            "knn_query_requests" : 4,
-            "graph_query_requests" : 30,
-            "graph_query_errors" : 15,
-            "indices_in_cache" : {
-                "myindex" : {
-                    "graph_memory_usage" : 2,
-                    "graph_memory_usage_percentage" : 3.68,
-                    "graph_count" : 2
-                }
-            },
-            "cache_capacity_reached" : false,
-            "load_exception_count" : 0,
-            "hit_count" : 0,
-            "load_success_count" : 1,
-            "total_load_time" : 2878745,
-            "script_compilations" : 1,
-            "script_compilation_errors" : 0,
-            "script_query_requests" : 534,
-            "script_query_errors" : 0
-        }
+      "JdfxIkOS1-43UxqNz98nw" : {
+        "graph_memory_usage_percentage" : 3.68,
+        "graph_query_requests" : 1420920,
+        "graph_memory_usage" : 2,
+        "cache_capacity_reached" : false,
+        "load_success_count" : 179,
+        "training_memory_usage" : 0,
+        "indices_in_cache" : {
+            "myindex" : {
+                "graph_memory_usage" : 2,
+                "graph_memory_usage_percentage" : 3.68,
+                "graph_count" : 2
+            }
+        },
+        "script_query_errors" : 0,
+        "hit_count" : 1420775,
+        "knn_query_requests" : 147092,
+        "total_load_time" : 2436679306,
+        "miss_count" : 179,
+        "training_memory_usage_percentage" : 0.0,
+        "graph_index_requests" : 656,
+        "faiss_initialized" : true,
+        "load_exception_count" : 0,
+        "training_errors" : 0,
+        "eviction_count" : 0,
+        "nmslib_initialized" : false,
+        "script_compilations" : 0,
+        "script_query_requests" : 0,
+        "graph_query_errors" : 0,
+        "indexing_from_model_degraded" : false,
+        "graph_index_errors" : 0,
+        "training_requests" : 17,
+        "script_compilation_errors" : 0
    }
+  }
 }
 ```

@ -96,7 +114,7 @@ GET /_plugins/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,graph_
        "successful" : 1,
        "failed" : 0
    },
-    "cluster_name" : "_run",
+    "cluster_name" : "my-cluster",
    "circuit_breaker_triggered" : false,
    "nodes" : {
        "HYMrXXsBSamUkcAjhjeN0w" : {
@ -111,13 +129,13 @@ GET /_plugins/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,graph_
 Introduced 1.0
 {: .label .label-purple }

-The Hierarchical Navigable Small World (HNSW) graphs used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, the plugin needs to load these files into native memory.
+The native library indices used to perform approximate k-Nearest Neighbor (k-NN) search are stored as special files with other Apache Lucene segment files. In order for you to perform a search on these indices using the k-NN plugin, the plugin needs to load these files into native memory.

-If the plugin hasn't loaded the graphs into native memory, it loads them when it receives a search request. The loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort.
+If the plugin hasn't loaded the files into native memory, it loads them when it receives a search request. The loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the files are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort.

-As an alternative, you can avoid this latency issue by running the k-NN plugin warmup API operation on whatever indices you're interested in searching. This operation loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory.
+As an alternative, you can avoid this latency issue by running the k-NN plugin warmup API operation on whatever indices you're interested in searching. This operation loads all the native library files for all of the shards (primaries and replicas) of all the indices specified in the request into native memory.

-After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs that aren't currently in memory.
+After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's native library files are already loaded into memory, this operation has no impact. It only loads files that aren't currently in memory.


 ### Usage
@ -150,8 +168,212 @@ After the operation has finished, use the [k-NN `_stats` API operation](#stats)

 For the warmup operation to function properly, follow these best practices:

-* Don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are sometimes deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present.
+* Don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are sometimes deleted. For example, you could encounter a situation in which the warmup API operation loads native library indices A and B into native memory, but segment C is created from segments A and B being merged. The native library indices for A and B would no longer be in memory, and native library index C would also not be in memory. In this case, the initial penalty for loading native library index C is still present.

-* Confirm that all graphs you want to warm up can fit into native memory. For more information about the native memory limit, see the [knn.memory.circuit_breaker.limit statistic]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings#cluster-settings). High graph memory usage causes cache thrashing, which can lead to operations constantly failing and attempting to run again.
+* Confirm that all native library indices you want to warm up can fit into native memory. For more information about the native memory limit, see the [knn.memory.circuit_breaker.limit statistic]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings#cluster-settings). High graph memory usage causes cache thrashing, which can lead to operations constantly failing and attempting to run again.

-* Don't index any documents that you want to load into the cache. Writing new information to segments prevents the warmup API operation from loading the graphs until they're searchable. This means that you would have to run the warmup operation again after indexing finishes.
+* Don't index any documents that you want to load into the cache. Writing new information to segments prevents the warmup API operation from loading the native library indices until they're searchable. This means that you would have to run the warmup operation again after indexing finishes.
+
+## Get Model
+Introduced 1.2
+{: .label .label-purple }
+
+Used to retrieve information about models present in the cluster. Some native library index configurations require a 
+training step before indexing and querying can begin. The output of training is a model that can then be used to 
+initialize native library index files during indexing. The model is serialized in the k-NN model system index.  
+
+```
+GET /_plugins/_knn/models/{model_id}
+```
+
+Response Field |  Description
+:--- | :---
+`model_id` | The id of the fetched model.
+`model_blob` | The base64 encoded string of the serialized model.
+`state` | Current state of the model. Either "created", "failed", "training".
+`timestamp` | Time when the model was created.
+`description` | User provided description of the model.
+`error` | Error message explaining why the model is in the failed state.
+`space_type` | Space type this model is trained for.
+`dimension` | Dimension this model is for.
+`engine` | Native library used to create model. Either "faiss" or "nmslib". 
+
+### Usage
+
+```json
+GET /_plugins/_knn/models/test-model?pretty
+{
+  "model_id" : "test-model",
+  "model_blob" : "SXdGbIAAAAAAAAAAAA...",
+  "state" : "created",
+  "timestamp" : "2021-11-15T18:45:07.505369036Z",
+  "description" : "Default",
+  "error" : "",
+  "space_type" : "l2",
+  "dimension" : 128,
+  "engine" : "faiss" 
+}
+```
+
+```json
+GET /_plugins/_knn/models/test-model?pretty&filter_path=model_id,state
+{
+  "model_id" : "test-model",
+  "state" : "created"
+}
+```
+
+## Search Model
+Introduced 1.2
+{: .label .label-purple }
+
+Use an OpenSearch query to search for models in the index.
+
+### Usage
+```json
+GET/POST /_plugins/_knn/models/_search?pretty&_source_excludes=model_blob
+{
+    "query": {
+         ...
+     }
+}
+
+{
+    "took" : 0,
+    "timed_out" : false,
+    "_shards" : {
+        "total" : 1,
+        "successful" : 1,
+        "skipped" : 0,
+        "failed" : 0
+    },
+    "hits" : {
+      "total" : {
+          "value" : 1,
+          "relation" : "eq"
+      },
+    "max_score" : 1.0,
+    "hits" : [
+      {
+        "_index" : ".opensearch-knn-models",
+        "_type" : "_doc",
+        "_id" : "test-model",
+        "_score" : 1.0,
+        "_source" : {
+          "engine" : "faiss",
+          "space_type" : "l2",
+          "description" : "Default",
+          "model_id" : "test-model",
+          "state" : "created",
+          "error" : "",
+          "dimension" : 128,
+          "timestamp" : "2021-11-15T18:45:07.505369036Z"
+        }
+      }
+    ]
+  }
+}
+```
+
+## Delete Model
+Introduced 1.2
+{: .label .label-purple }
+
+Used to delete a particular model in the cluster.
+
+### Usage
+
+```json
+DELETE /_plugins/_knn/models/{model_id}
+{
+  "model_id": {model_id},
+  "acknowledged": true
+}
+```
+
+## Train Model
+Introduced 1.2
+{: .label .label-purple }
+
+Create and train a model that can be used for initializing k-NN native library indices during indexing. This API will 
+pull training data from a `knn_vector` field in a training index and then create and train a model and then serialize it
+to the model system index. Training data must match the dimension passed into the body of the request. This request 
+will return when training begins. To monitor the state of the model, use the [Get model API](#get-model).  
+
+Query Parameter |  Description
+:--- | :---
+`model_id` | (Optional) The id of the fetched model. If not specified, a random id will be generated.
+`node_id` | (Optional) Preferred node to execute training. If set, this node will be used to perform training if it is deemed to be capable.
+
+Request Parameter |  Description
+:--- | :---
+`training_index` | Index from where training data from.
+`training_field` | `knn_vector` field from `training_index` to grab training data from. Dimension of this field must match `dimension` passed in to this request.  
+`dimension` | Dimension this model is for.
+`max_training_vector_count` | (Optional) Maximum number of vectors from the training index to use for training. Defaults to all of the vectors in the index.
+`search_size` | (Optional) Training data is pulled from the training index with scroll queries. Defines the number of results to return per scroll query. Defaults to 10,000.
+`description` | (Optional) User provided description of the model.
+`method` | Configuration of ANN method used for search. For more information on possible methods, refer to the [method documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions). Method must require training to be valid.
+   
+
+### Usage
+
+```json
+POST /_plugins/_knn/models/{model_id}/_train?preference={node_id}
+{
+    "training_index": "train-index-name",
+    "training_field": "train-field-name",
+    "dimension": 16,
+    "max_training_vector_count": 1200,
+    "search_size": 100,
+    "description": "My model",
+    "method": {
+        "name":"ivf",
+        "engine":"faiss",
+        "space_type": "l2",
+        "parameters":{
+            "nlists":128,
+            "encoder":{
+                "name":"pq",
+                "parameters":{
+                    "code_size":8
+                }
+            }
+        }
+    }
+}
+
+{
+    "model_id": "model_x"
+}
+```
+
+```json
+POST /_plugins/_knn/models/_train?preference={node_id}
+{
+    "training_index": "train-index-name",
+    "training_field": "train-field-name",
+    "dimension": 16,
+    "max_training_vector_count": 1200,
+    "search_size": 100,
+    "description": "My model",
+    "method": {
+        "name":"ivf",
+        "engine":"faiss",
+        "space_type": "l2",
+        "parameters":{
+            "nlists":128,
+            "encoder":{
+                "name":"pq",
+                "parameters":{
+                    "code_size":8
+                }
+            }
+        }
+    }
+}
+
+{
+    "model_id": "dcdwscddscsad"
+}
+```
--- a/_search-plugins/knn/approximate-knn.md
+++ b/_search-plugins/knn/approximate-knn.md
@ -9,19 +9,32 @@ has_math: true

 # Approximate k-NN search

-The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the Hierarchical Navigable Small World (HNSW) algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is preferred.
+The approximate k-NN search method uses nearest neighbor algorithms from *nmslib* and *faiss* to power 
+k-NN search. To see the algorithms that the plugin currently supports, check out the [k-NN Index documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions). 
+In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest 
+neighbors. Of the three search methods the plugin provides, this method offers the best search scalability for large 
+data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is 
+preferred.

-The k-NN plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
+The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during 
+indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about 
+Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). 
+These native library indices are loaded into native memory during search and managed by a cache. To learn more about
+pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). 
+Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the 
+[stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).

-Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor search.
+Because the native library indices are constructed during indexing, it is not possible to apply a filter on an index
+ and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor 
+ search.

 ## Get started with approximate k-NN

-To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index.
+To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` 
+to `true`. This setting tells the plugin to create native library indices for the index.

-Additionally, if you're using the approximate k-nearest neighbor method, specify `knn.space_type` to the space you're interested in. You can't change this setting after it's set. To see what spaces we support, see [spaces](#spaces). By default, `index.knn.space_type` is `l2`. For more information about index settings, such as algorithm parameters you can tweak to tune performance, see [Index settings]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#index-settings).
-
-Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two `knn_vector` fields and uses cosine similarity:
+Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two 
+`knn_vector`'s, one using *faiss*, the other using *nmslib*, fields:

 ```json
 PUT my-knn-index-1
@ -52,8 +65,8 @@ PUT my-knn-index-1
          "dimension": 4,
          "method": {
            "name": "hnsw",
-            "space_type": "cosinesimil",
-            "engine": "nmslib",
+            "space_type": "innerproduct",
+            "engine": "faiss",
            "parameters": {
              "ef_construction": 256,
              "m": 48
@ -65,9 +78,14 @@ PUT my-knn-index-1
 }
 ```

-The `knn_vector` data type supports a vector of floats that can have a dimension of up to 10,000, as set by the dimension mapping parameter.
+In the example above, both `knn_vector`'s are configured from method definitions. Additionally, `knn_vector`'s can also
+be configured from models. Learn more about it [here]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#knn_vector-data-type)!

-In OpenSearch, codecs handle the storage and retrieval of indices. The k-NN plugin uses a custom codec to write vector data to graphs so that the underlying k-NN search library can read it.
+The `knn_vector` data type supports a vector of floats that can have a dimension of up to 10,000, as set by the 
+dimension mapping parameter.
+
+In OpenSearch, codecs handle the storage and retrieval of indices. The k-NN plugin uses a custom codec to write vector 
+data to native library indices so that the underlying k-NN search library can read it.
 {: .tip }

 After you create the index, you can add some data to it:
@ -112,10 +130,131 @@ GET my-knn-index-1/_search
 }
 ```

-`k` is the number of neighbors the search of each graph will return. You must also include the `size` option, which indicates how many results the query actually returns. The plugin returns `k` amount of results for each shard (and each segment) and `size` amount of results for the entire query. The plugin supports a maximum `k` value of 10,000.
+`k` is the number of neighbors the search of each graph will return. You must also include the `size` option, which 
+indicates how many results the query actually returns. The plugin returns `k` amount of results for each shard 
+(and each segment) and `size` amount of results for the entire query. The plugin supports a maximum `k` value of 10,000.
+
+### Building a k-NN index from a model
+
+For some of the algorithms that we support, the native library index needs to be trained before it can be used. Training 
+everytime a segment is created would be very expensive, so, instead, we introduce the concept of a *model* that is used 
+to initialize the native library index during segment creation. A *model* is created by calling the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model), 
+passing in the source of training data as well as the method definition of the model. Once training is complete, the 
+model will be serialized to a k-NN model system index. Then, during indexing, the model is pulled from this index to 
+initialize the segments.
+
+In order to train a model, we first need an OpenSearch index with training data in it. Training data can come from 
+any `knn_vector` field that has a dimension matching the dimension of the model you want to create. Training data can be 
+the same data that you are going to index or a separate set. Let's create a training index:
+
+```json
+PUT /train-index
+{
+  "settings" : {
+    "number_of_shards" : 3,
+    "number_of_replicas" : 0
+  },
+  "mappings": {
+       "properties": {
+       "train-field": {
+           "type": "knn_vector",
+           "dimension": 4
+      }
+   }
+  }
+}
+``` 
+
+Notice that `index.knn` is not set in the index settings. This ensures that we do not create native library indices for 
+this index.
+
+Next, let's add some data to it:
+```json
+POST _bulk
+{ "index": { "_index": "train-index", "_id": "1" } }
+{ "train-field": [1.5, 5.5, 4.5, 6.4]}
+{ "index": { "_index": "train-index", "_id": "2" } }
+{ "train-field": [2.5, 3.5, 5.6, 6.7]}
+{ "index": { "_index": "train-index", "_id": "3" } }
+{ "train-field": [4.5, 5.5, 6.7, 3.7]}
+{ "index": { "_index": "train-index", "_id": "4" } }
+{ "train-field": [1.5, 5.5, 4.5, 6.4]}
+...
+```
+
+After indexing into the training index completes, we can call our the Train API:
+```json
+POST /_plugins/_knn/models/_train/my-model
+{
+  "training_index": "train-index",
+  "training_field": "train-field",
+  "dimension": 4,
+  "description": "My models description",
+  "search_size": 500,
+  "method": {
+      "name":"hnsw",
+      "engine":"faiss",
+      "parameters":{
+        "encoder":{
+            "name":"pq",
+            "parameters":{
+                "code_size": 8,
+                "m": 8
+            }
+        }
+      }
+  }
+}
+```
+
+The Train API will return as soon as the training job is started. To check its status, we can use the Get Model API:
+```json
+GET /_plugins/_knn/models/my-model?filter_path=state&pretty
+{
+  "state": "training"
+}
+```  
+
+Once the model enters the "created" state, we can create an index that will use this model to initialize it's native 
+library indices:
+```json
+PUT /target-index
+{
+  "settings" : {
+    "number_of_shards" : 3,
+    "number_of_replicas" : 1,
+    "index.knn": true
+  },
+  "mappings": {
+       "properties": {
+       "target-field": {
+           "type": "knn_vector",
+           "model_id": "my-model"
+      }
+   }
+  }
+}
+```
+
+Lastly, we can add the documents we want to be searched to the index:
+```json
+POST _bulk
+{ "index": { "_index": "target-index", "_id": "1" } }
+{ "target-field": [1.5, 5.5, 4.5, 6.4]}
+{ "index": { "_index": "target-index", "_id": "2" } }
+{ "target-field": [2.5, 3.5, 5.6, 6.7]}
+{ "index": { "_index": "target-index", "_id": "3" } }
+{ "target-field": [4.5, 5.5, 6.7, 3.7]}
+{ "index": { "_index": "target-index", "_id": "4" } }
+{ "target-field": [1.5, 5.5, 4.5, 6.4]}
+...
+```
+
+After data is ingested, it can be search just like any other `knn_vector` field!

 ### Using approximate k-NN with filters
-If you use the `knn` query alongside filters or other clauses (e.g. `bool`, `must`, `match`), you might receive fewer than `k` results. In this example, `post_filter` reduces the number of results from 2 to 1:
+If you use the `knn` query alongside filters or other clauses (e.g. `bool`, `must`, `match`), you might receive fewer 
+than `k` results. In this example, `post_filter` reduces the number of results from 2 to 1:

 ```json
 GET my-knn-index-1/_search
@ -142,7 +281,12 @@ GET my-knn-index-1/_search

 ## Spaces

-A space corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. To convert distances to OpenSearch scores, we take 1 / (1 + distance). Currently, the k-NN plugin supports the following spaces:
+A space corresponds to the function used to measure the distance between two points in order to determine the k-nearest 
+neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how 
+OpenSearch scores results, where a greater score equates to a better result. To convert distances to OpenSearch scores, 
+we take 1 / (1 + distance). The k-NN plugin the spaces the plugin supports are below. Not every method supports each of
+these spaces. Be sure to check out [the method documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions) to make sure the space you are 
+interested in is supported.

 <table>
  <thead style="text-align: left">
@ -181,5 +325,7 @@ A space corresponds to the function used to measure the distance between two poi
  </tr>
 </table>

-The cosine similarity formula does not include the `1 -` prefix. However, because nmslib equates smaller scores with closer results, they return `1 - cosineSimilarity` for their cosine similarity space---that's why `1 -` is included in the distance function.
+The cosine similarity formula does not include the `1 -` prefix. However, because similarity search libraries equates 
+smaller scores with closer results, they return `1 - cosineSimilarity` for cosine similarity space---that's why `1 -` is 
+included in the distance function.
 {: .note }
--- a/_search-plugins/knn/index.md
+++ b/_search-plugins/knn/index.md
@ -18,7 +18,7 @@ This plugin supports three different methods for obtaining the k-nearest neighbo

 1. **Approximate k-NN**

-    The first method takes an approximate nearest neighbor approach---it uses the HNSW algorithm to return the approximate k-nearest neighbors to a query vector. This algorithm sacrifices indexing speed and search accuracy in return for lower latency and more scalable search. To learn more about the algorithm, please refer to [nmslib's documentation](https://github.com/nmslib/nmslib/) or [the paper introducing the algorithm](https://arxiv.org/abs/1603.09320).
+    The first method takes an approximate nearest neighbor approach---it uses one of several different algorithms to return the approximate k-nearest neighbors to a query vector. Usually, these algorithms sacrifice indexing speed and search accuracy in return for performance benefits such as lower latency, smaller memory footprints and more scalable search. To learn more about the algorithms, please refer to [*nmslib*](https://github.com/nmslib/nmslib/blob/master/manual/README.md)'s and [*faiss*](https://github.com/facebookresearch/faiss/wiki)'s documentation.

    Approximate k-NN is the best choice for searches over large indices (i.e. hundreds of thousands of vectors or more) that require low latency. You should not use approximate k-NN if you want to apply a filter on the index before the k-NN search, which greatly reduces the number of vectors to be searched. In this case, you should use either the script scoring method or painless extensions.

--- a/_search-plugins/knn/jni-libraries.md
+++ b/_search-plugins/knn/jni-libraries.md
@ -0,0 +1,17 @@
+---
+layout: default
+title: JNI libraries
+nav_order: 6
+parent: k-NN
+has_children: false
+---
+
+# JNI libraries
+
+To integrate [*nmslib*'s](https://github.com/nmslib/nmslib/) and  [*faiss*'s](https://github.com/facebookresearch/faiss/) Approximate k-NN functionality (implemented in C++) into the k-NN plugin (implemented in Java), we created a Java Native Interface, which lets the k-NN plugin make calls to the native libraries. To implement this, we create 3 libraries: `libopensearchknn_nmslib`, the JNI library that interfaces with nmslib, `libopensearchknn_faiss`, the JNI library that interfaces with faiss, and `libopensearchknn_common`, a library containing common shared functionality between native libraries.
+
+The libraries `libopensearchknn_faiss` and `libopensearchknn_nmslib` are lazily loaded when they are first called in the plugin. This means that if you are only planning on using one of the libraries, the other one will never be loaded.
+
+For building the libraries from source, please refer to the [DEVELOPER_GUIDE](https://github.com/opensearch-project/k-NN/blob/main/DEVELOPER_GUIDE.md).
+
+For more information about JNI, see [Java Native Interface](https://en.wikipedia.org/wiki/Java_Native_Interface) on Wikipedia.
--- a/_search-plugins/knn/jni-library.md
+++ b/_search-plugins/knn/jni-library.md
@ -1,13 +0,0 @@
---
-layout: default
-title: JNI library
-nav_order: 6
-parent: k-NN
-has_children: false
---
-
-# JNI library
-
-To integrate [nmslib's](https://github.com/nmslib/nmslib/) approximate k-NN functionality (implemented in C++) into the k-NN plugin (implemented in Java), we created a Java Native Interface library, which lets the k-NN plugin leverage nmslib's functionality. To see how we build the JNI library binary and learn how to get the most of it in your production environment, see [JNI Library Artifacts](https://github.com/opensearch-project/k-NN#jni-library-artifacts).
-
-For more information about JNI, see [Java Native Interface](https://en.wikipedia.org/wiki/Java_Native_Interface) on Wikipedia.
--- a/_search-plugins/knn/knn-index.md
+++ b/_search-plugins/knn/knn-index.md
@ -11,7 +11,14 @@ has_children: false
 ## knn_vector data type

 The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors
-into an OpenSearch index.
+into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and 
+can serve many different k-NN workloads. In general, a `knn_vector` field can be built either by providing a method 
+definition or specifying a model id.
+
+Method definitions are used when the underlying Approximate k-NN algorithm does not 
+require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* 
+should be used for Approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment 
+files.

 ```json
 "my_vector": {
@ -27,19 +34,169 @@ into an OpenSearch index.
    }
  }
 }
+```   
+ 
+Model id's are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the 
+model has to be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model). The 
+model contains the information needed to initialize the native library segment files.
+
+```json
+  "type": "knn_vector",
+  "model_id": "my-model"
+}
 ```

-Mapping Pararameter | Required | Default | Updateable | Description
+However, if you intend to just use painless scripting or a k-NN score script, you only need to pass the dimension.
+ ```json
+   "type": "knn_vector",
+   "dimension": 128
+ }
+ ```
+
+## Method Definitions
+
+A method definition refers to the underlying configuration of the Approximate k-NN algorithm you want to use. Method 
+definitions are used to either create a `knn_vector` field (when the method does not require training) or 
+[create a model during training]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model) that can then be 
+used to [create a `knn_vector` field]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model).
+
+A method definition will always contain the name of the method, the space_type the method is built for, the engine 
+(the native library) to use, and a map of parameters.  
+
+Mapping Parameter | Required | Default | Updatable | Description
 :--- | :--- | :--- | :--- | :---
-`type` | true | n/a | false | The type of the field
-`dimension` | true | n/a | false | The vector dimension for the field
-`method` | false | null | false | The configuration for the Approximate nearest neighbor method
-`method.name` | true, if `method` is specified | n/a | false | The identifier for the nearest neighbor method. Currently, "hnsw" is the only valid method.
-`method.space_type` | false | "l2" | false | The vector space used to calculate the distance between vectors. Refer to [here]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn#spaces)) to see available spaces.
-`method.engine` | false | "nmslib" | false | The approximate k-NN library to use for indexing and search. Currently, "nmslib" is the only valid engine.
-`method.parameters` | false | null | false | The parameters used for the nearest neighbor method.
-`method.parameters.ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed. Only valid for "hnsw" method.
-`method.parameters.m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100. Only valid for "hnsw" method
+`name` | true | n/a | false | The identifier for the nearest neighbor method.
+`space_type` | false | "l2" | false | The vector space used to calculate the distance between vectors.
+`engine` | false | "nmslib" | false | The approximate k-NN library to use for indexing and search. Either "faiss" or "nmslib".
+`parameters` | false | null | false | The parameters used for the nearest neighbor method.
+
+### Supported nmslib methods
+
+Method Name | Requires Training? | Supported Spaces | Description
+:--- | :--- | :--- | :---
+`hnsw` | false | "l2", "innerproduct", "cosinesimil", "l1", "linf" | Hierarchical proximity graph approach to Approximate k-NN search. For more details on the algorithm, [checkout this paper](https://arxiv.org/abs/1603.09320)!
+
+#### HNSW Parameters
+
+Paramater Name | Required | Default | Updatable | Description
+:--- | :--- | :--- | :--- | :---
+`ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed.
+`m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100.
+
+**Note** --- For *nmslib*, *ef_search* is set in the [index settings](#index-settings).
+
+### Supported faiss methods
+
+Method Name | Requires Training? | Supported Spaces | Description
+:--- | :--- | :--- | :---
+`hnsw` | false | "l2", "innerproduct"* | Hierarchical proximity graph approach to Approximate k-NN search.
+`ivf` | true | "l2", "innerproduct" | Bucketing approach where vectors are assigned different buckets based on clustering and, during search, only a subset of the buckets are searched.
+
+**Note** --- For *hnsw*, "innerproduct" is not available when PQ is used.
+
+#### HNSW Parameters
+
+Paramater Name | Required | Default | Updatable | Description
+:--- | :--- | :--- | :--- | :---
+`ef_search` | false | 512 | false | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
+`ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed.
+`m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100.
+`encoder` | false | flat | false | Encoder definition for encoding vectors. Encoders can reduce the memory footprint of your index, at the expense of search accuracy.
+
+#### IVF Parameters
+
+Paramater Name | Required | Default | Updatable | Description
+:--- | :--- | :--- | :--- | :---
+`nlists` | false | 4 | false | Number of buckets to partition vectors into. Higher values may lead to more accurate searches, at the expense of memory and training latency. For more information about choosing the right value, refer to [*faiss*'s documentation](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index). 
+`nprobes` | false | 1 | false | Number of buckets to search over during query. Higher values lead to more accurate but slower searches. 
+`encoder` | false | flat | false | Encoder definition for encoding vectors. Encoders can reduce the memory footprint of your index, at the expense of search accuracy.
+
+For more information about setting these parameters, please refer to [*faiss*'s documentation](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes).
+
+#### IVF training requirements
+
+The IVF algorithm requires a training step. To create an index that uses IVF, you need to train a model with the 
+[Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model), passing the IVF method definition. IVF requires that, at a minimum, there should be `nlist` training 
+data points, but it is [recommended to use more](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset). 
+Training data can either the same data that is going to be ingested or a separate set of data.
+
+### Supported faiss encoders
+
+Encoders can be used to reduce the memory footprint of a k-NN index at the expense of search accuracy. *faiss* has 
+several different encoder types, but currently, the plugin only supports *flat* and *pq* encoding.
+
+An example method definition that specifies an encoder may look something like this:
+```json
+"method": {
+    "name":"hnsw",
+    "engine":"faiss",
+    "parameters":{
+        "encoder":{
+            "name":"pq",
+            "parameters":{
+              "code_size": 8,
+              "m": 8
+            }
+        }
+    }
+}
+```
+
+Encoder Name | Requires Training? | Description
+:--- | :--- | :---
+`flat` | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint.
+`pq` | true | Short for product quantization, it is a lossy compression technique that encodes a vector into a fixed size of bytes using clustering, with the goal of minimizing the drop in k-NN search accuracy. From a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more details on product quantization, here is a [great blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388)!  
+
+#### PQ Parameters
+
+Paramater Name | Required | Default | Updatable | Description
+:--- | :--- | :--- | :--- | :---
+`m` | false | 1 | false |  Determine how many many sub-vectors to break the vector into. sub-vectors are encoded independently of each other. This dimension of the vector must be divisible by `m`. Max value is 1024.
+`code_size` | false | 8 | false | Determines the number of bits to encode a sub-vector into. Max value is 8. **Note** --- for IVF, this value must be less than or equal to 8. For HNSW, this value can only be 8.
+
+### Choosing the right method
+
+There are a lot of options to choose from when building your `knn_vector` field. To determine the correct methods and 
+parameters to choose, you should first understand what requirements you have for your workload and what trade-offs you 
+are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, 
+(4) indexing latency. 
+
+If memory is not a concern, HNSW offers a very strong query latency/query quality tradeoff.
+
+If you want to use less memory and index faster than HNSW, while maintaining similar query quality, you should evaluate IVF.
+
+If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query 
+quality will drop.
+
+### Memory Estimation
+
+In a typical OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates 
+native library indices to a portion of the remaining RAM. This portion's size is determined by 
+the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50%.
+
+Having a replica doubles the total number of vectors.
+{: .note }
+
+#### HNSW memory estimation
+
+The memory required for HNSW is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector.
+
+As an example, assume you have a million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:
+
+```
+1.1 * (4 *256 + 8 * 16) * 1,000,000 ~= 1.26 GB
+```
+
+#### IVF memory estimation
+
+The memory required for IVF is estimated to be `1.1 * (((4 * dimension) * num_vectors) + (4 * nlist * d))` bytes.
+
+As an example, assume you have a million vectors with a dimension of 256 and nlist of 128. The memory requirement can be estimated as follows:
+
+```
+1.1 * (((4 * 128) * 1000000) + (4 * 128 * 256))  ~= 563 MB
+
+```

 ## Index settings

@ -52,8 +209,8 @@ different parameters.

 Setting | Default | Updateable | Description
 :--- | :--- | :--- | :---
-`index.knn` | false | false | Whether the index should build hnsw graphs for the `knn_vector` fields. If set to false, the `knn_vector` fields will be stored in doc values, but Approximate k-NN search functionality will be disabled.
-`index.knn.algo_param.ef_search` | 512 | true | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
-`index.knn.algo_param.ef_construction` | 512 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
-`index.knn.algo_param.m` | 16 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
-`index.knn.space_type` | "l2" | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
+`index.knn` | false | false | Whether the index should build native library indices for the `knn_vector` fields. If set to false, the `knn_vector` fields will be stored in doc values, but Approximate k-NN search functionality will be disabled.
+`index.knn.algo_param.ef_search` | 512 | true | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches. Only available for *nmslib*.
+`index.knn.algo_param.ef_construction` | 512 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Only available for *nmslib*. Refer to mapping definition.
+`index.knn.algo_param.m` | 16 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Only available for *nmslib*. Refer to mapping definition.
+`index.knn.space_type` | "l2" | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Only available for *nmslib*. Refer to mapping definition.
--- a/_search-plugins/knn/performance-tuning.md
+++ b/_search-plugins/knn/performance-tuning.md
@ -8,9 +8,9 @@ nav_order: 8
 # Performance tuning

 This topic provides performance tuning recommendations to improve indexing and search performance for approximate k-NN. From a high level, k-NN works according to these principles:
-* Graphs are created per knn_vector field / (Lucene) segment pair.
+* Native library indices are created per knn_vector field / (Lucene) segment pair.
 * Queries execute on segments sequentially inside the shard (same as any other OpenSearch query).
-* Each graph in the segment returns <=k neighbors.
+* Each native library index in the segment returns <=k neighbors.
 * The coordinator node picks up final size number of neighbors from the neighbors returned by each shard.

 This topic also provides recommendations for comparing approximate k-NN to exact k-NN with score script.
@ -35,13 +35,13 @@ Take the following steps to improve indexing performance, especially when you pl

 * **Disable replicas (no OpenSearch replica shard)**

-   Set replicas to `0` to prevent duplicate construction of graphs in both primary and replica shards. When you enable replicas after indexing finishes, the serialized graphs are directly copied. If you have no replicas, losing nodes might cause data loss, so it's important that the data lives elsewhere so this initial load can be retried in case of an issue.
+   Set replicas to `0` to prevent duplicate construction of native library indices in both primary and replica shards. When you enable replicas after indexing finishes, the serialized native library indices are directly copied. If you have no replicas, losing nodes might cause data loss, so it's important that the data lives elsewhere so this initial load can be retried in case of an issue.

 * **Increase the number of indexing threads**

-   If the hardware you choose has multiple cores, you can allow multiple threads in graph construction by speeding up the indexing process. Determine the number of threads to allot with the [knn.algo_param.index_thread_qty]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings#cluster-settings) setting.
+   If the hardware you choose has multiple cores, you can allow multiple threads in native library index construction by speeding up the indexing process. Determine the number of threads to allot with the [knn.algo_param.index_thread_qty]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings#cluster-settings) setting.

-  Keep an eye on CPU utilization and choose the correct number of threads. Because graph construction is costly, having multiple threads can cause additional CPU load.
+  Keep an eye on CPU utilization and choose the correct number of threads. Because native library index construction is costly, having multiple threads can cause additional CPU load.

 ## Search performance tuning

@ -49,7 +49,7 @@ Take the following steps to improve search performance:

 * **Reduce segment count**

-   To improve search performance, you must keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results. However, because the complexity of search for the HNSW algorithm is logarithmic with respect to the number of vectors, searching over five graphs with 100 vectors each and then taking the top 'size' results from 5*k results will take longer than searching over one graph with 500 vectors and then taking the top size results from k results.
+   To improve search performance, you must keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results.

   Ideally, having one segment per shard provides the optimal performance with respect to search latency. You can configure an index to have multiple shards to avoid giant shards and achieve more parallelism.

@ -57,9 +57,9 @@ Take the following steps to improve search performance:

 * **Warm up the index**

-   Graphs are constructed during indexing, but they're loaded into memory during the first search. In Lucene, each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point), and the top 'size' number of results based on the score are returned from all the results returned by segements at a shard level (higher score = better result).
+   Native library indices are constructed during indexing, but they're loaded into memory during the first search. In Lucene, each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point), and the top 'size' number of results based on the score are returned from all the results returned by segements at a shard level (higher score = better result).

-   Once a graph is loaded (graphs are loaded outside OpenSearch JVM), OpenSearch caches them in memory. Initial queries are expensive and take a few seconds, while subsequent queries are faster and take milliseconds (assuming the k-NN circuit breaker isn't hit).
+   Once a native library index is loaded (native library indices are loaded outside OpenSearch JVM), OpenSearch caches them in memory. Initial queries are expensive and take a few seconds, while subsequent queries are faster and take milliseconds (assuming the k-NN circuit breaker isn't hit).

   To avoid this latency penalty during your first queries, you can use the warmup API operation on the indices you want to search:

@ -74,9 +74,9 @@ Take the following steps to improve search performance:
   }
   ```

-   The warmup API operation loads all graphs for all shards (primary and replica) for the specified indices into the cache, so there's no penalty to load graphs during initial searches.
+   The warmup API operation loads all native library indices for all shards (primary and replica) for the specified indices into the cache, so there's no penalty to load native library indices during initial searches.

-   **Note**: This API operation only loads the segments of the indices it ***sees*** into the cache. If a merge or refresh operation finishes after the API runs, or if you add new documents, you need to rerun the API to load those graphs into memory.
+   **Note**: This API operation only loads the segments of the indices it ***sees*** into the cache. If a merge or refresh operation finishes after the API runs, or if you add new documents, you need to rerun the API to load those native library indices into memory.

 * **Avoid reading stored fields**

@ -84,26 +84,9 @@ Take the following steps to improve search performance:

 ## Improving recall

-Recall depends on multiple factors like number of vectors, number of dimensions, segments, and so on. Searching over a large number of small segments and aggregating the results leads to better recall than searching over a small number of large segments and aggregating results. The larger the graph, the more chances of losing recall if you're using smaller algorithm parameters. Choosing larger values for algorithm parameters should help solve this issue but sacrifices search latency and indexing time. That being said, it's important to understand your system's requirements for latency and accuracy, and then choose the number of segments you want your index to have based on experimentation.
+Recall depends on multiple factors like number of vectors, number of dimensions, segments, and so on. Searching over a large number of small segments and aggregating the results leads to better recall than searching over a small number of large segments and aggregating results. The larger the native library index, the more chances of losing recall if you're using smaller algorithm parameters. Choosing larger values for algorithm parameters should help solve this issue but sacrifices search latency and indexing time. That being said, it's important to understand your system's requirements for latency and accuracy, and then choose the number of segments you want your index to have based on experimentation.

-To configure recall, adjust the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm parameters that control recall are `m`, `ef_construction`, and `ef_search`. For more information about how algorithm parameters influence indexing and search recall, see [HNSW algorithm parameters](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values can help recall and lead to better search results, but at the cost of higher memory utilization and increased indexing time.
-
-The default recall values work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#index-settings).
-
-## Estimating memory usage
-
-In a typical OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates graphs to a portion of the remaining RAM. This portion's size is determined by the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50%.
-
-The memory required for graphs is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector.
-
-As an example, assume you have a million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:
-
-```
-1.1 * (4 *256 + 8 * 16) * 1,000,000 ~= 1.26 GB
-```
-
-Having a replica doubles the total number of vectors.
-{: .note }
+The default parameters work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#index-settings).

 ## Approximate nearest neighbor versus score script

--- a/_search-plugins/knn/settings.md
+++ b/_search-plugins/knn/settings.md
@ -13,11 +13,13 @@ The k-NN plugin adds several new cluster settings.

 Setting | Default | Description
 :--- | :--- | :---
-`knn.algo_param.index_thread_qty` | 1 | The number of threads used for graph creation. Keeping this value low reduces the CPU impact of the k-NN plugin, but also reduces indexing performance.
-`knn.cache.item.expiry.enabled` | false | Whether to remove graphs that have not been accessed for a certain duration from memory.
-`knn.cache.item.expiry.minutes` | 3h | If enabled, the idle time before removing a graph from memory.
-`knn.circuit_breaker.unset.percentage` | 75.0 | The native memory usage threshold for the circuit breaker. Memory usage must be below this percentage of `knn.memory.circuit_breaker.limit` for `knn.circuit_breaker.triggered` to remain false.
+`knn.algo_param.index_thread_qty` | 1 | The number of threads used for native library index creation. Keeping this value low reduces the CPU impact of the k-NN plugin, but also reduces indexing performance.
+`knn.cache.item.expiry.enabled` | false | Whether to remove native library indices that have not been accessed for a certain duration from memory.
+`knn.cache.item.expiry.minutes` | 3h | If enabled, the idle time before removing a native library index from memory.
+`knn.circuit_breaker.unset.percentage` | 75% | The native memory usage threshold for the circuit breaker. Memory usage must be below this percentage of `knn.memory.circuit_breaker.limit` for `knn.circuit_breaker.triggered` to remain false.
 `knn.circuit_breaker.triggered` | false | True when memory usage exceeds the `knn.circuit_breaker.unset.percentage` value.
-`knn.memory.circuit_breaker.limit` | 50% | The native memory limit for graphs. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, k-NN removes the least recently used graphs.
+`knn.memory.circuit_breaker.limit` | 50% | The native memory limit for native library indices. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, k-NN removes the least recently used native library indices.
 `knn.memory.circuit_breaker.enabled` | true | Whether to enable the k-NN memory circuit breaker.
 `knn.plugin.enabled`| true | Enables or disables the k-NN plugin.
+`knn.model.index.number_of_shards`| 1 | Number of shards to use for the model system index, the OpenSearch index that stores the models used for Approximate k-NN Search.
+`knn.model.index.number_of_replicas`| 1 | Number of replica shards to use for the model system index. Generally, in a multi-node cluster, this should be at least 1 to increase stability.