Update knn documentation for rc1

Signed-off-by: John Mazanec <jmazane@amazon.com>
This commit is contained in:
John Mazanec 2021-05-28 12:34:56 -07:00
parent 64a78de6db
commit 005168edf2
8 changed files with 113 additions and 35 deletions

View File

@ -1,7 +1,7 @@
--- ---
layout: default layout: default
title: API title: API
nav_order: 4 nav_order: 5
parent: k-NN parent: k-NN
has_children: false has_children: false
--- ---

View File

@ -1,7 +1,7 @@
--- ---
layout: default layout: default
title: Approximate search title: Approximate search
nav_order: 1 nav_order: 2
parent: k-NN parent: k-NN
has_children: false has_children: false
has_math: true has_math: true
@ -19,7 +19,7 @@ Because the graphs are constructed during indexing, it is not possible to apply
To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index. To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index.
Additionally, if you're using the approximate k-nearest neighbor method, specify `knn.space_type` to the space you're interested in. You can't change this setting after it's set. To see what spaces we support, see [spaces](#spaces). By default, `index.knn.space_type` is `l2`. For more information about index settings, such as algorithm parameters you can tweak to tune performance, see [Index settings](../settings#index-settings). Additionally, if you're using the approximate k-nearest neighbor method, specify `knn.space_type` to the space you're interested in. You can't change this setting after it's set. To see what spaces we support, see [spaces](#spaces). By default, `index.knn.space_type` is `l2`. For more information about index settings, such as algorithm parameters you can tweak to tune performance, see [Index settings](../knn-index#index-settings).
Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two `knn_vector` fields and uses cosine similarity: Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two `knn_vector` fields and uses cosine similarity:
@ -29,18 +29,36 @@ PUT my-knn-index-1
"settings": { "settings": {
"index": { "index": {
"knn": true, "knn": true,
"knn.space_type": "cosinesimil" "knn.algo_param.ef_search": 100
} }
}, },
"mappings": { "mappings": {
"properties": { "properties": {
"my_vector1": { "my_vector1": {
"type": "knn_vector", "type": "knn_vector",
"dimension": 2 "dimension": 4,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "nmslib",
"parameters": {
"ef_construction": 128,
"m": 24
}
}
}, },
"my_vector2": { "my_vector2": {
"type": "knn_vector", "type": "knn_vector",
"dimension": 4 "dimension": 4,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "nmslib",
"parameters": {
"ef_construction": 256,
"m": 48
}
}
} }
} }
} }
@ -144,6 +162,11 @@ A space corresponds to the function used to measure the distance between two poi
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td> <td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
<td>1 / (1 + Distance Function)</td> <td>1 / (1 + Distance Function)</td>
</tr> </tr>
<tr>
<td>linf</td>
<td>\[ Distance(X, Y) = Max(X_i - Y_i) \]</td>
<td>1 / (1 + Distance Function)</td>
</tr>
<tr> <tr>
<td>cosinesimil</td> <td>cosinesimil</td>
<td>\[ 1 - {A &middot; B \over \|A\| &middot; \|B\|} = 1 - <td>\[ 1 - {A &middot; B \over \|A\| &middot; \|B\|} = 1 -
@ -152,9 +175,9 @@ A space corresponds to the function used to measure the distance between two poi
<td>1 / (1 + Distance Function)</td> <td>1 / (1 + Distance Function)</td>
</tr> </tr>
<tr> <tr>
<td>hammingbit</td> <td>innerproduct</td>
<td style="text-align:center">Distance = countSetBits(X \(\oplus\) Y)</td> <td>\[ Distance(X, Y) = {A &middot; B} \]</td>
<td>1 / (1 + Distance Function)</td> <td>if (Distance Function >= 0) 1 / (1 + Distance Function) else -Distance Function + 1</td>
</tr> </tr>
</table> </table>

View File

@ -1,7 +1,7 @@
--- ---
layout: default layout: default
title: JNI library title: JNI library
nav_order: 5 nav_order: 6
parent: k-NN parent: k-NN
has_children: false has_children: false
--- ---

58
docs/knn/knn-index.md Normal file
View File

@ -0,0 +1,58 @@
---
layout: default
title: k-NN Index
nav_order: 1
parent: k-NN
has_children: false
---
# k-NN Index
## `knn_vector` datatype
The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors
into an OpenSearch index.
```json
"my_vector": {
"type": "knn_vector",
"dimension": 4,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "nmslib",
"parameters": {
"ef_construction": 128,
"m": 24
}
}
}
```
Mapping Pararameter | Required | Default | Updateable | Description
:--- | :--- | :--- | :--- | :---
`type` | true | n/a | false | The type of the field
`dimension` | true | n/a | false | The vector dimension for the field
`method` | false | null | false | The configuration for the Approximate nearest neighbor method
`method.name` | true, if `method` is specified | n/a | false | The identifier for the nearest neighbor method. Currently, "hnsw" is the only valid method.
`method.space_type` | false | "l2" | false | The vector space used to calculate the distance between vectors. Refer to [here](../approximate-knn#spaces)) to see available spaces.
`method.engine` | false | "nmslib" | false | The approximate k-NN library to use for indexing and search. Currently, "nmslib" is the only valid engine.
`method.parameters` | false | null | false | The parameters used for the nearest neighbor method.
`method.parameters.ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed. Only valid for "hnsw" method.
`method.parameters.m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100. Only valid for "hnsw" method
## Index settings
Additionally, the k-NN plugin introduces several index settings that can be used to configure the k-NN structure as well.
At the moment, several parameters defined in the settings are in the deprecation process. Those parameters should be set
in the mapping instead of the index settings. Parameters set in the mapping will override the parameters set in the
index settings. Setting the parameters in the mapping allows an index to have multiple `knn_vector` fields with
different parameters.
Setting | Default | Updateable | Description
:--- | :--- | :--- | :---
`index.knn` | false | false | Whether the index should build hnsw graphs for the `knn_vector` fields. If set to false, the `knn_vector` fields will be stored in doc values, but Approximate k-NN search functionality will be disabled.
`index.knn.algo_param.ef_search` | 512 | true | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
`index.knn.algo_param.ef_construction` | 512 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
`index.knn.algo_param.m` | 16 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
`index.knn.space_type` | "l2" | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.

View File

@ -1,7 +1,7 @@
--- ---
layout: default layout: default
title: Exact k-NN with scoring script title: Exact k-NN with scoring script
nav_order: 2 nav_order: 3
parent: k-NN parent: k-NN
has_children: false has_children: false
has_math: true has_math: true
@ -298,6 +298,11 @@ A space corresponds to the function used to measure the distance between two poi
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td> <td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
<td>1 / (1 + Distance Function)</td> <td>1 / (1 + Distance Function)</td>
</tr> </tr>
<tr>
<td>linf</td>
<td>\[ Distance(X, Y) = Max(X_i - Y_i) \]</td>
<td>1 / (1 + Distance Function)</td>
</tr>
<tr> <tr>
<td>cosinesimil</td> <td>cosinesimil</td>
<td>\[ {A &middot; B \over \|A\| &middot; \|B\|} = <td>\[ {A &middot; B \over \|A\| &middot; \|B\|} =
@ -305,6 +310,11 @@ A space corresponds to the function used to measure the distance between two poi
where \(\|A\|\) and \(\|B\|\) represent normalized vectors.</td> where \(\|A\|\) and \(\|B\|\) represent normalized vectors.</td>
<td>1 + Distance Function</td> <td>1 + Distance Function</td>
</tr> </tr>
<tr>
<td>innerproduct</td>
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
<td>1 / (1 + Distance Function)</td>
</tr>
<tr> <tr>
<td>hammingbit</td> <td>hammingbit</td>
<td style="text-align:center">Distance = countSetBits(X \(\oplus\) Y)</td> <td style="text-align:center">Distance = countSetBits(X \(\oplus\) Y)</td>

View File

@ -1,7 +1,7 @@
--- ---
layout: default layout: default
title: k-NN Painless extensions title: k-NN Painless extensions
nav_order: 3 nav_order: 4
parent: k-NN parent: k-NN
has_children: false has_children: false
has_math: true has_math: true

View File

@ -2,7 +2,7 @@
layout: default layout: default
title: Performance tuning title: Performance tuning
parent: k-NN parent: k-NN
nav_order: 7 nav_order: 8
--- ---
# Performance tuning # Performance tuning
@ -88,7 +88,7 @@ Recall depends on multiple factors like number of vectors, number of dimensions,
To configure recall, adjust the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm parameters that control recall are `m`, `ef_construction`, and `ef_search`. For more information about how algorithm parameters influence indexing and search recall, see [HNSW algorithm parameters](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values can help recall and lead to better search results, but at the cost of higher memory utilization and increased indexing time. To configure recall, adjust the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm parameters that control recall are `m`, `ef_construction`, and `ef_search`. For more information about how algorithm parameters influence indexing and search recall, see [HNSW algorithm parameters](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values can help recall and lead to better search results, but at the cost of higher memory utilization and increased indexing time.
The default recall values work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings](../settings#index-settings). The default recall values work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings](../knn-index#index-settings).
## Estimating memory usage ## Estimating memory usage

View File

@ -2,25 +2,12 @@
layout: default layout: default
title: Settings title: Settings
parent: k-NN parent: k-NN
nav_order: 6 nav_order: 7
--- ---
# k-NN settings # k-NN settings
The k-NN plugin adds several new index and cluster settings. The k-NN plugin adds several new cluster settings.
## Index settings
The default values work well for most use cases, but you can change these settings when you create the index.
Setting | Default | Description
:--- | :--- | :---
`index.knn.algo_param.ef_search` | 512 | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
`index.knn.algo_param.ef_construction` | 512 | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed.
`index.knn.algo_param.m` | 16 | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100.
`index.knn.space_type` | "l2" | The vector space used to calculate the distance between vectors. Currently, the k-NN plugin supports the `l2` space (Euclidean distance) and `cosinesimil` space (cosine similarity). For more information on these spaces, see the [nmslib documentation](https://github.com/nmslib/nmslib/blob/master/manual/spaces.md).
## Cluster settings ## Cluster settings