Add k-NN vector field type (#4850)
* Add k-NN vector field type Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Rename topic Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
This commit is contained in:
parent
fc14355c1f
commit
6bece563ea
|
@ -27,6 +27,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/):
|
|||
[Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.<br> [`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion.
|
||||
[Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.<br>[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape.
|
||||
[Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`).
|
||||
[k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) | Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
|
||||
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
|
||||
|
||||
## Arrays
|
||||
|
|
|
@ -0,0 +1,166 @@
|
|||
---
|
||||
layout: default
|
||||
title: k-NN vector
|
||||
nav_order: 58
|
||||
has_children: false
|
||||
parent: Supported field types
|
||||
---
|
||||
|
||||
# k-NN vector
|
||||
|
||||
The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors
|
||||
into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. In general, a `knn_vector` field can be built either by providing a method definition or specifying a model id.
|
||||
|
||||
## Example
|
||||
|
||||
For example, to map `my_vector1` as a `knn_vector`, use the following request:
|
||||
|
||||
```json
|
||||
PUT test-index
|
||||
{
|
||||
"settings": {
|
||||
"index": {
|
||||
"knn": true,
|
||||
"knn.algo_param.ef_search": 100
|
||||
}
|
||||
},
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"my_vector1": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 3,
|
||||
"method": {
|
||||
"name": "hnsw",
|
||||
"space_type": "l2",
|
||||
"engine": "lucene",
|
||||
"parameters": {
|
||||
"ef_construction": 128,
|
||||
"m": 24
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
## Method definitions
|
||||
|
||||
Method definitions are used when the underlying Approximate k-NN algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for Approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files.
|
||||
|
||||
```json
|
||||
"my_vector": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 4,
|
||||
"method": {
|
||||
"name": "hnsw",
|
||||
"space_type": "l2",
|
||||
"engine": "nmslib",
|
||||
"parameters": {
|
||||
"ef_construction": 128,
|
||||
"m": 24
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Model IDs
|
||||
|
||||
Model IDs are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the
|
||||
model has to be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model). The
|
||||
model contains the information needed to initialize the native library segment files.
|
||||
|
||||
```json
|
||||
"type": "knn_vector",
|
||||
"model_id": "my-model"
|
||||
}
|
||||
```
|
||||
|
||||
However, if you intend to just use painless scripting or a k-NN score script, you only need to pass the dimension.
|
||||
```json
|
||||
"type": "knn_vector",
|
||||
"dimension": 128
|
||||
}
|
||||
```
|
||||
|
||||
## Lucene byte vector
|
||||
|
||||
By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
|
||||
|
||||
Byte vectors are supported only for the `lucene` engine. They are not supported for the `nmslib` and `faiss` engines.
|
||||
{: .note}
|
||||
|
||||
When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
|
||||
{: .important}
|
||||
|
||||
Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`.
|
||||
|
||||
To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index:
|
||||
|
||||
```json
|
||||
PUT test-index
|
||||
{
|
||||
"settings": {
|
||||
"index": {
|
||||
"knn": true,
|
||||
"knn.algo_param.ef_search": 100
|
||||
}
|
||||
},
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"my_vector1": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 3,
|
||||
"data_type": "byte",
|
||||
"method": {
|
||||
"name": "hnsw",
|
||||
"space_type": "l2",
|
||||
"engine": "lucene",
|
||||
"parameters": {
|
||||
"ef_construction": 128,
|
||||
"m": 24
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
Then ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range:
|
||||
|
||||
```json
|
||||
PUT test-index/_doc/1
|
||||
{
|
||||
"my_vector1": [-126, 28, 127]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
```json
|
||||
PUT test-index/_doc/2
|
||||
{
|
||||
"my_vector1": [100, -128, 0]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
When querying, be sure to use a `byte` vector:
|
||||
|
||||
```json
|
||||
GET test-index/_search
|
||||
{
|
||||
"size": 2,
|
||||
"query": {
|
||||
"knn": {
|
||||
"my_vector1": {
|
||||
"vector": [26, -120, 99],
|
||||
"k": 2
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
layout: default
|
||||
title: Approximate search
|
||||
title: Approximate k-NN search
|
||||
nav_order: 15
|
||||
parent: k-NN
|
||||
has_children: false
|
||||
|
@ -79,7 +79,7 @@ PUT my-knn-index-1
|
|||
}
|
||||
```
|
||||
|
||||
In the example above, both `knn_vector` fields are configured from method definitions. Additionally, `knn_vector` fields can also be configured from models. You can learn more about this in the [knn_vector data type]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#knn_vector-data-type) section.
|
||||
In the example above, both `knn_vector` fields are configured from method definitions. Additionally, `knn_vector` fields can also be configured from models. You can learn more about this in the [knn_vector data type]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) section.
|
||||
|
||||
The `knn_vector` data type supports a vector of floats that can have a dimension count of up to 16,000 for the nmslib and faiss engines, as set by the dimension mapping parameter. The maximum dimension count for the Lucene library is 1,024.
|
||||
|
||||
|
|
|
@ -1,133 +1,15 @@
|
|||
---
|
||||
layout: default
|
||||
title: k-NN Index
|
||||
title: k-NN index
|
||||
nav_order: 5
|
||||
parent: k-NN
|
||||
has_children: false
|
||||
---
|
||||
|
||||
# k-NN Index
|
||||
|
||||
## knn_vector data type
|
||||
# k-NN index
|
||||
|
||||
The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors
|
||||
into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. In general, a `knn_vector` field can be built either by providing a method definition or specifying a model id.
|
||||
|
||||
Method definitions are used when the underlying Approximate k-NN algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for Approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files.
|
||||
|
||||
```json
|
||||
"my_vector": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 4,
|
||||
"method": {
|
||||
"name": "hnsw",
|
||||
"space_type": "l2",
|
||||
"engine": "nmslib",
|
||||
"parameters": {
|
||||
"ef_construction": 128,
|
||||
"m": 24
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Model IDs are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the
|
||||
model has to be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model). The
|
||||
model contains the information needed to initialize the native library segment files.
|
||||
|
||||
```json
|
||||
"type": "knn_vector",
|
||||
"model_id": "my-model"
|
||||
}
|
||||
```
|
||||
|
||||
However, if you intend to just use painless scripting or a k-NN score script, you only need to pass the dimension.
|
||||
```json
|
||||
"type": "knn_vector",
|
||||
"dimension": 128
|
||||
}
|
||||
```
|
||||
|
||||
### Lucene byte vector
|
||||
|
||||
By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
|
||||
|
||||
Byte vectors are supported only for the `lucene` engine. They are not supported for the `nmslib` and `faiss` engines.
|
||||
{: .note}
|
||||
|
||||
When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
|
||||
{: .important}
|
||||
|
||||
Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`.
|
||||
|
||||
To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index:
|
||||
|
||||
```json
|
||||
PUT test-index
|
||||
{
|
||||
"settings": {
|
||||
"index": {
|
||||
"knn": true,
|
||||
"knn.algo_param.ef_search": 100
|
||||
}
|
||||
},
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"my_vector1": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 3,
|
||||
"data_type": "byte",
|
||||
"method": {
|
||||
"name": "hnsw",
|
||||
"space_type": "l2",
|
||||
"engine": "lucene",
|
||||
"parameters": {
|
||||
"ef_construction": 128,
|
||||
"m": 24
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
Then ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range:
|
||||
|
||||
```json
|
||||
PUT test-index/_doc/1
|
||||
{
|
||||
"my_vector1": [-126, 28, 127]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
```json
|
||||
PUT test-index/_doc/2
|
||||
{
|
||||
"my_vector1": [100, -128, 0]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
When querying, be sure to use a `byte` vector:
|
||||
|
||||
```json
|
||||
GET test-index/_search
|
||||
{
|
||||
"size": 2,
|
||||
"query": {
|
||||
"knn": {
|
||||
"my_vector1": {
|
||||
"vector": [26, -120, 99],
|
||||
"k": 2
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. For more information, see [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/).
|
||||
|
||||
## Method definitions
|
||||
|
||||
|
|
Loading…
Reference in New Issue