Merge pull request #35 from jmazanec15/knn-mapping-refactor
Update knn documentation for rc1
This commit is contained in:
commit
5eb7f4c299
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: default
|
||||
title: API
|
||||
nav_order: 4
|
||||
nav_order: 5
|
||||
parent: k-NN
|
||||
has_children: false
|
||||
---
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: default
|
||||
title: Approximate search
|
||||
nav_order: 1
|
||||
nav_order: 2
|
||||
parent: k-NN
|
||||
has_children: false
|
||||
has_math: true
|
||||
|
@ -19,7 +19,7 @@ Because the graphs are constructed during indexing, it is not possible to apply
|
|||
|
||||
To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index.
|
||||
|
||||
Additionally, if you're using the approximate k-nearest neighbor method, specify `knn.space_type` to the space you're interested in. You can't change this setting after it's set. To see what spaces we support, see [spaces](#spaces). By default, `index.knn.space_type` is `l2`. For more information about index settings, such as algorithm parameters you can tweak to tune performance, see [Index settings](../settings#index-settings).
|
||||
Additionally, if you're using the approximate k-nearest neighbor method, specify `knn.space_type` to the space you're interested in. You can't change this setting after it's set. To see what spaces we support, see [spaces](#spaces). By default, `index.knn.space_type` is `l2`. For more information about index settings, such as algorithm parameters you can tweak to tune performance, see [Index settings](../knn-index#index-settings).
|
||||
|
||||
Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two `knn_vector` fields and uses cosine similarity:
|
||||
|
||||
|
@ -29,19 +29,37 @@ PUT my-knn-index-1
|
|||
"settings": {
|
||||
"index": {
|
||||
"knn": true,
|
||||
"knn.space_type": "cosinesimil"
|
||||
"knn.algo_param.ef_search": 100
|
||||
}
|
||||
},
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"my_vector1": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 2
|
||||
},
|
||||
"my_vector2": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 4
|
||||
}
|
||||
"my_vector1": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 4,
|
||||
"method": {
|
||||
"name": "hnsw",
|
||||
"space_type": "l2",
|
||||
"engine": "nmslib",
|
||||
"parameters": {
|
||||
"ef_construction": 128,
|
||||
"m": 24
|
||||
}
|
||||
}
|
||||
},
|
||||
"my_vector2": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 4,
|
||||
"method": {
|
||||
"name": "hnsw",
|
||||
"space_type": "cosinesimil",
|
||||
"engine": "nmslib",
|
||||
"parameters": {
|
||||
"ef_construction": 256,
|
||||
"m": 48
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -144,6 +162,11 @@ A space corresponds to the function used to measure the distance between two poi
|
|||
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
|
||||
<td>1 / (1 + Distance Function)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>linf</td>
|
||||
<td>\[ Distance(X, Y) = Max(X_i - Y_i) \]</td>
|
||||
<td>1 / (1 + Distance Function)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>cosinesimil</td>
|
||||
<td>\[ 1 - {A · B \over \|A\| · \|B\|} = 1 -
|
||||
|
@ -152,9 +175,9 @@ A space corresponds to the function used to measure the distance between two poi
|
|||
<td>1 / (1 + Distance Function)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>hammingbit</td>
|
||||
<td style="text-align:center">Distance = countSetBits(X \(\oplus\) Y)</td>
|
||||
<td>1 / (1 + Distance Function)</td>
|
||||
<td>innerproduct</td>
|
||||
<td>\[ Distance(X, Y) = {A · B} \]</td>
|
||||
<td>if (Distance Function >= 0) 1 / (1 + Distance Function) else -Distance Function + 1</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: default
|
||||
title: JNI library
|
||||
nav_order: 5
|
||||
nav_order: 6
|
||||
parent: k-NN
|
||||
has_children: false
|
||||
---
|
||||
|
|
|
@ -0,0 +1,58 @@
|
|||
---
|
||||
layout: default
|
||||
title: k-NN Index
|
||||
nav_order: 1
|
||||
parent: k-NN
|
||||
has_children: false
|
||||
---
|
||||
|
||||
# k-NN Index
|
||||
|
||||
## `knn_vector` datatype
|
||||
The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors
|
||||
into an OpenSearch index.
|
||||
|
||||
```json
|
||||
"my_vector": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 4,
|
||||
"method": {
|
||||
"name": "hnsw",
|
||||
"space_type": "l2",
|
||||
"engine": "nmslib",
|
||||
"parameters": {
|
||||
"ef_construction": 128,
|
||||
"m": 24
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Mapping Pararameter | Required | Default | Updateable | Description
|
||||
:--- | :--- | :--- | :--- | :---
|
||||
`type` | true | n/a | false | The type of the field
|
||||
`dimension` | true | n/a | false | The vector dimension for the field
|
||||
`method` | false | null | false | The configuration for the Approximate nearest neighbor method
|
||||
`method.name` | true, if `method` is specified | n/a | false | The identifier for the nearest neighbor method. Currently, "hnsw" is the only valid method.
|
||||
`method.space_type` | false | "l2" | false | The vector space used to calculate the distance between vectors. Refer to [here](../approximate-knn#spaces)) to see available spaces.
|
||||
`method.engine` | false | "nmslib" | false | The approximate k-NN library to use for indexing and search. Currently, "nmslib" is the only valid engine.
|
||||
`method.parameters` | false | null | false | The parameters used for the nearest neighbor method.
|
||||
`method.parameters.ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed. Only valid for "hnsw" method.
|
||||
`method.parameters.m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100. Only valid for "hnsw" method
|
||||
|
||||
## Index settings
|
||||
|
||||
Additionally, the k-NN plugin introduces several index settings that can be used to configure the k-NN structure as well.
|
||||
|
||||
At the moment, several parameters defined in the settings are in the deprecation process. Those parameters should be set
|
||||
in the mapping instead of the index settings. Parameters set in the mapping will override the parameters set in the
|
||||
index settings. Setting the parameters in the mapping allows an index to have multiple `knn_vector` fields with
|
||||
different parameters.
|
||||
|
||||
Setting | Default | Updateable | Description
|
||||
:--- | :--- | :--- | :---
|
||||
`index.knn` | false | false | Whether the index should build hnsw graphs for the `knn_vector` fields. If set to false, the `knn_vector` fields will be stored in doc values, but Approximate k-NN search functionality will be disabled.
|
||||
`index.knn.algo_param.ef_search` | 512 | true | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
|
||||
`index.knn.algo_param.ef_construction` | 512 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
|
||||
`index.knn.algo_param.m` | 16 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
|
||||
`index.knn.space_type` | "l2" | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Refer to mapping definition.
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: default
|
||||
title: Exact k-NN with scoring script
|
||||
nav_order: 2
|
||||
nav_order: 3
|
||||
parent: k-NN
|
||||
has_children: false
|
||||
has_math: true
|
||||
|
@ -298,6 +298,11 @@ A space corresponds to the function used to measure the distance between two poi
|
|||
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
|
||||
<td>1 / (1 + Distance Function)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>linf</td>
|
||||
<td>\[ Distance(X, Y) = Max(X_i - Y_i) \]</td>
|
||||
<td>1 / (1 + Distance Function)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>cosinesimil</td>
|
||||
<td>\[ {A · B \over \|A\| · \|B\|} =
|
||||
|
@ -305,6 +310,11 @@ A space corresponds to the function used to measure the distance between two poi
|
|||
where \(\|A\|\) and \(\|B\|\) represent normalized vectors.</td>
|
||||
<td>1 + Distance Function</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>innerproduct</td>
|
||||
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
|
||||
<td>1 / (1 + Distance Function)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>hammingbit</td>
|
||||
<td style="text-align:center">Distance = countSetBits(X \(\oplus\) Y)</td>
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: default
|
||||
title: k-NN Painless extensions
|
||||
nav_order: 3
|
||||
nav_order: 4
|
||||
parent: k-NN
|
||||
has_children: false
|
||||
has_math: true
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
layout: default
|
||||
title: Performance tuning
|
||||
parent: k-NN
|
||||
nav_order: 7
|
||||
nav_order: 8
|
||||
---
|
||||
|
||||
# Performance tuning
|
||||
|
@ -88,7 +88,7 @@ Recall depends on multiple factors like number of vectors, number of dimensions,
|
|||
|
||||
To configure recall, adjust the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm parameters that control recall are `m`, `ef_construction`, and `ef_search`. For more information about how algorithm parameters influence indexing and search recall, see [HNSW algorithm parameters](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values can help recall and lead to better search results, but at the cost of higher memory utilization and increased indexing time.
|
||||
|
||||
The default recall values work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings](../settings#index-settings).
|
||||
The default recall values work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings](../knn-index#index-settings).
|
||||
|
||||
## Estimating memory usage
|
||||
|
||||
|
|
|
@ -2,25 +2,12 @@
|
|||
layout: default
|
||||
title: Settings
|
||||
parent: k-NN
|
||||
nav_order: 6
|
||||
nav_order: 7
|
||||
---
|
||||
|
||||
# k-NN settings
|
||||
|
||||
The k-NN plugin adds several new index and cluster settings.
|
||||
|
||||
|
||||
## Index settings
|
||||
|
||||
The default values work well for most use cases, but you can change these settings when you create the index.
|
||||
|
||||
Setting | Default | Description
|
||||
:--- | :--- | :---
|
||||
`index.knn.algo_param.ef_search` | 512 | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
|
||||
`index.knn.algo_param.ef_construction` | 512 | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed.
|
||||
`index.knn.algo_param.m` | 16 | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100.
|
||||
`index.knn.space_type` | "l2" | The vector space used to calculate the distance between vectors. Currently, the k-NN plugin supports the `l2` space (Euclidean distance) and `cosinesimil` space (cosine similarity). For more information on these spaces, see the [nmslib documentation](https://github.com/nmslib/nmslib/blob/master/manual/spaces.md).
|
||||
|
||||
The k-NN plugin adds several new cluster settings.
|
||||
|
||||
## Cluster settings
|
||||
|
||||
|
|
Loading…
Reference in New Issue