Update knn documentation for rc1
Signed-off-by: John Mazanec <jmazane@amazon.com>
This commit is contained in:
parent
64a78de6db
commit
005168edf2
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: API
|
title: API
|
||||||
nav_order: 4
|
nav_order: 5
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
has_children: false
|
has_children: false
|
||||||
---
|
---
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: Approximate search
|
title: Approximate search
|
||||||
nav_order: 1
|
nav_order: 2
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
has_children: false
|
has_children: false
|
||||||
has_math: true
|
has_math: true
|
||||||
|
@ -19,7 +19,7 @@ Because the graphs are constructed during indexing, it is not possible to apply
|
||||||
|
|
||||||
To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index.
|
To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index.
|
||||||
|
|
||||||
Additionally, if you're using the approximate k-nearest neighbor method, specify `knn.space_type` to the space you're interested in. You can't change this setting after it's set. To see what spaces we support, see [spaces](#spaces). By default, `index.knn.space_type` is `l2`. For more information about index settings, such as algorithm parameters you can tweak to tune performance, see [Index settings](../settings#index-settings).
|
Additionally, if you're using the approximate k-nearest neighbor method, specify `knn.space_type` to the space you're interested in. You can't change this setting after it's set. To see what spaces we support, see [spaces](#spaces). By default, `index.knn.space_type` is `l2`. For more information about index settings, such as algorithm parameters you can tweak to tune performance, see [Index settings](../knn-index#index-settings).
|
||||||
|
|
||||||
Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two `knn_vector` fields and uses cosine similarity:
|
Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two `knn_vector` fields and uses cosine similarity:
|
||||||
|
|
||||||
|
@ -29,19 +29,37 @@ PUT my-knn-index-1
|
||||||
"settings": {
|
"settings": {
|
||||||
"index": {
|
"index": {
|
||||||
"knn": true,
|
"knn": true,
|
||||||
"knn.space_type": "cosinesimil"
|
"knn.algo_param.ef_search": 100
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"mappings": {
|
"mappings": {
|
||||||
"properties": {
|
"properties": {
|
||||||
"my_vector1": {
|
"my_vector1": {
|
||||||
"type": "knn_vector",
|
"type": "knn_vector",
|
||||||
"dimension": 2
|
"dimension": 4,
|
||||||
},
|
"method": {
|
||||||
"my_vector2": {
|
"name": "hnsw",
|
||||||
"type": "knn_vector",
|
"space_type": "l2",
|
||||||
"dimension": 4
|
"engine": "nmslib",
|
||||||
}
|
"parameters": {
|
||||||
|
"ef_construction": 128,
|
||||||
|
"m": 24
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"my_vector2": {
|
||||||
|
"type": "knn_vector",
|
||||||
|
"dimension": 4,
|
||||||
|
"method": {
|
||||||
|
"name": "hnsw",
|
||||||
|
"space_type": "cosinesimil",
|
||||||
|
"engine": "nmslib",
|
||||||
|
"parameters": {
|
||||||
|
"ef_construction": 256,
|
||||||
|
"m": 48
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -144,6 +162,11 @@ A space corresponds to the function used to measure the distance between two poi
|
||||||
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
|
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
|
||||||
<td>1 / (1 + Distance Function)</td>
|
<td>1 / (1 + Distance Function)</td>
|
||||||
</tr>
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>linf</td>
|
||||||
|
<td>\[ Distance(X, Y) = Max(X_i - Y_i) \]</td>
|
||||||
|
<td>1 / (1 + Distance Function)</td>
|
||||||
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>cosinesimil</td>
|
<td>cosinesimil</td>
|
||||||
<td>\[ 1 - {A · B \over \|A\| · \|B\|} = 1 -
|
<td>\[ 1 - {A · B \over \|A\| · \|B\|} = 1 -
|
||||||
|
@ -152,9 +175,9 @@ A space corresponds to the function used to measure the distance between two poi
|
||||||
<td>1 / (1 + Distance Function)</td>
|
<td>1 / (1 + Distance Function)</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>hammingbit</td>
|
<td>innerproduct</td>
|
||||||
<td style="text-align:center">Distance = countSetBits(X \(\oplus\) Y)</td>
|
<td>\[ Distance(X, Y) = {A · B} \]</td>
|
||||||
<td>1 / (1 + Distance Function)</td>
|
<td>if (Distance Function >= 0) 1 / (1 + Distance Function) else -Distance Function + 1</td>
|
||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: JNI library
|
title: JNI library
|
||||||
nav_order: 5
|
nav_order: 6
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
has_children: false
|
has_children: false
|
||||||
---
|
---
|
||||||
|
|
|
@ -0,0 +1,58 @@
|
||||||
|
---
|
||||||
|
layout: default
|
||||||
|
title: k-NN Index
|
||||||
|
nav_order: 1
|
||||||
|
parent: k-NN
|
||||||
|
has_children: false
|
||||||
|
---
|
||||||
|
|
||||||
|
# k-NN Index
|
||||||
|
|
||||||
|
## `knn_vector` datatype
|
||||||
|
The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors
|
||||||
|
into an OpenSearch index.
|
||||||
|
|
||||||
|
```json
|
||||||
|
"my_vector": {
|
||||||
|
"type": "knn_vector",
|
||||||
|
"dimension": 4,
|
||||||
|
"method": {
|
||||||
|
"name": "hnsw",
|
||||||
|
"space_type": "l2",
|
||||||
|
"engine": "nmslib",
|
||||||
|
"parameters": {
|
||||||
|
"ef_construction": 128,
|
||||||
|
"m": 24
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Mapping Pararameter | Required | Default | Updateable | Description
|
||||||
|
:--- | :--- | :--- | :--- | :---
|
||||||
|
`type` | true | n/a | false | The type of the field
|
||||||
|
`dimension` | true | n/a | false | The vector dimension for the field
|
||||||
|
`method` | false | null | false | The configuration for the Approximate nearest neighbor method
|
||||||
|
`method.name` | true, if `method` is specified | n/a | false | The identifier for the nearest neighbor method. Currently, "hnsw" is the only valid method.
|
||||||
|
`method.space_type` | false | "l2" | false | The vector space used to calculate the distance between vectors. Refer to [here](../approximate-knn#spaces)) to see available spaces.
|
||||||
|
`method.engine` | false | "nmslib" | false | The approximate k-NN library to use for indexing and search. Currently, "nmslib" is the only valid engine.
|
||||||
|
`method.parameters` | false | null | false | The parameters used for the nearest neighbor method.
|
||||||
|
`method.parameters.ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed. Only valid for "hnsw" method.
|
||||||
|
`method.parameters.m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100. Only valid for "hnsw" method
|
||||||
|
|
||||||
|
## Index settings
|
||||||
|
|
||||||
|
Additionally, the k-NN plugin introduces several index settings that can be used to configure the k-NN structure as well.
|
||||||
|
|
||||||
|
At the moment, several parameters defined in the settings are in the deprecation process. Those parameters should be set
|
||||||
|
in the mapping instead of the index settings. Parameters set in the mapping will override the parameters set in the
|
||||||
|
index settings. Setting the parameters in the mapping allows an index to have multiple `knn_vector` fields with
|
||||||
|
different parameters.
|
||||||
|
|
||||||
|
Setting | Default | Updateable | Description
|
||||||
|
:--- | :--- | :--- | :---
|
||||||
|
`index.knn` | false | false | Whether the index should build hnsw graphs for the `knn_vector` fields. If set to false, the `knn_vector` fields will be stored in doc values, but Approximate k-NN search functionality will be disabled.
|
||||||
|
`index.knn.algo_param.ef_search` | 512 | true | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
|
||||||
|
`index.knn.algo_param.ef_construction` | 512 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
|
||||||
|
`index.knn.algo_param.m` | 16 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
|
||||||
|
`index.knn.space_type` | "l2" | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
|
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: Exact k-NN with scoring script
|
title: Exact k-NN with scoring script
|
||||||
nav_order: 2
|
nav_order: 3
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
has_children: false
|
has_children: false
|
||||||
has_math: true
|
has_math: true
|
||||||
|
@ -298,6 +298,11 @@ A space corresponds to the function used to measure the distance between two poi
|
||||||
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
|
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
|
||||||
<td>1 / (1 + Distance Function)</td>
|
<td>1 / (1 + Distance Function)</td>
|
||||||
</tr>
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>linf</td>
|
||||||
|
<td>\[ Distance(X, Y) = Max(X_i - Y_i) \]</td>
|
||||||
|
<td>1 / (1 + Distance Function)</td>
|
||||||
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>cosinesimil</td>
|
<td>cosinesimil</td>
|
||||||
<td>\[ {A · B \over \|A\| · \|B\|} =
|
<td>\[ {A · B \over \|A\| · \|B\|} =
|
||||||
|
@ -305,6 +310,11 @@ A space corresponds to the function used to measure the distance between two poi
|
||||||
where \(\|A\|\) and \(\|B\|\) represent normalized vectors.</td>
|
where \(\|A\|\) and \(\|B\|\) represent normalized vectors.</td>
|
||||||
<td>1 + Distance Function</td>
|
<td>1 + Distance Function</td>
|
||||||
</tr>
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>innerproduct</td>
|
||||||
|
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
|
||||||
|
<td>1 / (1 + Distance Function)</td>
|
||||||
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>hammingbit</td>
|
<td>hammingbit</td>
|
||||||
<td style="text-align:center">Distance = countSetBits(X \(\oplus\) Y)</td>
|
<td style="text-align:center">Distance = countSetBits(X \(\oplus\) Y)</td>
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: k-NN Painless extensions
|
title: k-NN Painless extensions
|
||||||
nav_order: 3
|
nav_order: 4
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
has_children: false
|
has_children: false
|
||||||
has_math: true
|
has_math: true
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
layout: default
|
layout: default
|
||||||
title: Performance tuning
|
title: Performance tuning
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
nav_order: 7
|
nav_order: 8
|
||||||
---
|
---
|
||||||
|
|
||||||
# Performance tuning
|
# Performance tuning
|
||||||
|
@ -88,7 +88,7 @@ Recall depends on multiple factors like number of vectors, number of dimensions,
|
||||||
|
|
||||||
To configure recall, adjust the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm parameters that control recall are `m`, `ef_construction`, and `ef_search`. For more information about how algorithm parameters influence indexing and search recall, see [HNSW algorithm parameters](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values can help recall and lead to better search results, but at the cost of higher memory utilization and increased indexing time.
|
To configure recall, adjust the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm parameters that control recall are `m`, `ef_construction`, and `ef_search`. For more information about how algorithm parameters influence indexing and search recall, see [HNSW algorithm parameters](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values can help recall and lead to better search results, but at the cost of higher memory utilization and increased indexing time.
|
||||||
|
|
||||||
The default recall values work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings](../settings#index-settings).
|
The default recall values work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings](../knn-index#index-settings).
|
||||||
|
|
||||||
## Estimating memory usage
|
## Estimating memory usage
|
||||||
|
|
||||||
|
|
|
@ -2,25 +2,12 @@
|
||||||
layout: default
|
layout: default
|
||||||
title: Settings
|
title: Settings
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
nav_order: 6
|
nav_order: 7
|
||||||
---
|
---
|
||||||
|
|
||||||
# k-NN settings
|
# k-NN settings
|
||||||
|
|
||||||
The k-NN plugin adds several new index and cluster settings.
|
The k-NN plugin adds several new cluster settings.
|
||||||
|
|
||||||
|
|
||||||
## Index settings
|
|
||||||
|
|
||||||
The default values work well for most use cases, but you can change these settings when you create the index.
|
|
||||||
|
|
||||||
Setting | Default | Description
|
|
||||||
:--- | :--- | :---
|
|
||||||
`index.knn.algo_param.ef_search` | 512 | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
|
|
||||||
`index.knn.algo_param.ef_construction` | 512 | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed.
|
|
||||||
`index.knn.algo_param.m` | 16 | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100.
|
|
||||||
`index.knn.space_type` | "l2" | The vector space used to calculate the distance between vectors. Currently, the k-NN plugin supports the `l2` space (Euclidean distance) and `cosinesimil` space (cosine similarity). For more information on these spaces, see the [nmslib documentation](https://github.com/nmslib/nmslib/blob/master/manual/spaces.md).
|
|
||||||
|
|
||||||
|
|
||||||
## Cluster settings
|
## Cluster settings
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue