Add k-NN vector field type (#4850)
* Add k-NN vector field type Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Rename topic Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
This commit is contained in:
parent
fc14355c1f
commit
6bece563ea
|
@ -27,6 +27,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/):
|
||||||
[Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.<br> [`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion.
|
[Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.<br> [`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion.
|
||||||
[Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.<br>[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape.
|
[Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.<br>[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape.
|
||||||
[Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`).
|
[Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`).
|
||||||
|
[k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) | Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
|
||||||
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
|
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
|
||||||
|
|
||||||
## Arrays
|
## Arrays
|
||||||
|
|
|
@ -0,0 +1,166 @@
|
||||||
|
---
|
||||||
|
layout: default
|
||||||
|
title: k-NN vector
|
||||||
|
nav_order: 58
|
||||||
|
has_children: false
|
||||||
|
parent: Supported field types
|
||||||
|
---
|
||||||
|
|
||||||
|
# k-NN vector
|
||||||
|
|
||||||
|
The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors
|
||||||
|
into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. In general, a `knn_vector` field can be built either by providing a method definition or specifying a model id.
|
||||||
|
|
||||||
|
## Example
|
||||||
|
|
||||||
|
For example, to map `my_vector1` as a `knn_vector`, use the following request:
|
||||||
|
|
||||||
|
```json
|
||||||
|
PUT test-index
|
||||||
|
{
|
||||||
|
"settings": {
|
||||||
|
"index": {
|
||||||
|
"knn": true,
|
||||||
|
"knn.algo_param.ef_search": 100
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"mappings": {
|
||||||
|
"properties": {
|
||||||
|
"my_vector1": {
|
||||||
|
"type": "knn_vector",
|
||||||
|
"dimension": 3,
|
||||||
|
"method": {
|
||||||
|
"name": "hnsw",
|
||||||
|
"space_type": "l2",
|
||||||
|
"engine": "lucene",
|
||||||
|
"parameters": {
|
||||||
|
"ef_construction": 128,
|
||||||
|
"m": 24
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
{% include copy-curl.html %}
|
||||||
|
|
||||||
|
## Method definitions
|
||||||
|
|
||||||
|
Method definitions are used when the underlying Approximate k-NN algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for Approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files.
|
||||||
|
|
||||||
|
```json
|
||||||
|
"my_vector": {
|
||||||
|
"type": "knn_vector",
|
||||||
|
"dimension": 4,
|
||||||
|
"method": {
|
||||||
|
"name": "hnsw",
|
||||||
|
"space_type": "l2",
|
||||||
|
"engine": "nmslib",
|
||||||
|
"parameters": {
|
||||||
|
"ef_construction": 128,
|
||||||
|
"m": 24
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Model IDs
|
||||||
|
|
||||||
|
Model IDs are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the
|
||||||
|
model has to be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model). The
|
||||||
|
model contains the information needed to initialize the native library segment files.
|
||||||
|
|
||||||
|
```json
|
||||||
|
"type": "knn_vector",
|
||||||
|
"model_id": "my-model"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
However, if you intend to just use painless scripting or a k-NN score script, you only need to pass the dimension.
|
||||||
|
```json
|
||||||
|
"type": "knn_vector",
|
||||||
|
"dimension": 128
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Lucene byte vector
|
||||||
|
|
||||||
|
By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
|
||||||
|
|
||||||
|
Byte vectors are supported only for the `lucene` engine. They are not supported for the `nmslib` and `faiss` engines.
|
||||||
|
{: .note}
|
||||||
|
|
||||||
|
When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
|
||||||
|
{: .important}
|
||||||
|
|
||||||
|
Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`.
|
||||||
|
|
||||||
|
To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index:
|
||||||
|
|
||||||
|
```json
|
||||||
|
PUT test-index
|
||||||
|
{
|
||||||
|
"settings": {
|
||||||
|
"index": {
|
||||||
|
"knn": true,
|
||||||
|
"knn.algo_param.ef_search": 100
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"mappings": {
|
||||||
|
"properties": {
|
||||||
|
"my_vector1": {
|
||||||
|
"type": "knn_vector",
|
||||||
|
"dimension": 3,
|
||||||
|
"data_type": "byte",
|
||||||
|
"method": {
|
||||||
|
"name": "hnsw",
|
||||||
|
"space_type": "l2",
|
||||||
|
"engine": "lucene",
|
||||||
|
"parameters": {
|
||||||
|
"ef_construction": 128,
|
||||||
|
"m": 24
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
{% include copy-curl.html %}
|
||||||
|
|
||||||
|
Then ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range:
|
||||||
|
|
||||||
|
```json
|
||||||
|
PUT test-index/_doc/1
|
||||||
|
{
|
||||||
|
"my_vector1": [-126, 28, 127]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
{% include copy-curl.html %}
|
||||||
|
|
||||||
|
```json
|
||||||
|
PUT test-index/_doc/2
|
||||||
|
{
|
||||||
|
"my_vector1": [100, -128, 0]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
{% include copy-curl.html %}
|
||||||
|
|
||||||
|
When querying, be sure to use a `byte` vector:
|
||||||
|
|
||||||
|
```json
|
||||||
|
GET test-index/_search
|
||||||
|
{
|
||||||
|
"size": 2,
|
||||||
|
"query": {
|
||||||
|
"knn": {
|
||||||
|
"my_vector1": {
|
||||||
|
"vector": [26, -120, 99],
|
||||||
|
"k": 2
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
{% include copy-curl.html %}
|
|
@ -1,6 +1,6 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: Approximate search
|
title: Approximate k-NN search
|
||||||
nav_order: 15
|
nav_order: 15
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
has_children: false
|
has_children: false
|
||||||
|
@ -79,7 +79,7 @@ PUT my-knn-index-1
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
In the example above, both `knn_vector` fields are configured from method definitions. Additionally, `knn_vector` fields can also be configured from models. You can learn more about this in the [knn_vector data type]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#knn_vector-data-type) section.
|
In the example above, both `knn_vector` fields are configured from method definitions. Additionally, `knn_vector` fields can also be configured from models. You can learn more about this in the [knn_vector data type]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) section.
|
||||||
|
|
||||||
The `knn_vector` data type supports a vector of floats that can have a dimension count of up to 16,000 for the nmslib and faiss engines, as set by the dimension mapping parameter. The maximum dimension count for the Lucene library is 1,024.
|
The `knn_vector` data type supports a vector of floats that can have a dimension count of up to 16,000 for the nmslib and faiss engines, as set by the dimension mapping parameter. The maximum dimension count for the Lucene library is 1,024.
|
||||||
|
|
||||||
|
|
|
@ -1,133 +1,15 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: k-NN Index
|
title: k-NN index
|
||||||
nav_order: 5
|
nav_order: 5
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
has_children: false
|
has_children: false
|
||||||
---
|
---
|
||||||
|
|
||||||
# k-NN Index
|
# k-NN index
|
||||||
|
|
||||||
## knn_vector data type
|
|
||||||
|
|
||||||
The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors
|
The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors
|
||||||
into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. In general, a `knn_vector` field can be built either by providing a method definition or specifying a model id.
|
into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. For more information, see [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/).
|
||||||
|
|
||||||
Method definitions are used when the underlying Approximate k-NN algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for Approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files.
|
|
||||||
|
|
||||||
```json
|
|
||||||
"my_vector": {
|
|
||||||
"type": "knn_vector",
|
|
||||||
"dimension": 4,
|
|
||||||
"method": {
|
|
||||||
"name": "hnsw",
|
|
||||||
"space_type": "l2",
|
|
||||||
"engine": "nmslib",
|
|
||||||
"parameters": {
|
|
||||||
"ef_construction": 128,
|
|
||||||
"m": 24
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Model IDs are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the
|
|
||||||
model has to be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model). The
|
|
||||||
model contains the information needed to initialize the native library segment files.
|
|
||||||
|
|
||||||
```json
|
|
||||||
"type": "knn_vector",
|
|
||||||
"model_id": "my-model"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
However, if you intend to just use painless scripting or a k-NN score script, you only need to pass the dimension.
|
|
||||||
```json
|
|
||||||
"type": "knn_vector",
|
|
||||||
"dimension": 128
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Lucene byte vector
|
|
||||||
|
|
||||||
By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
|
|
||||||
|
|
||||||
Byte vectors are supported only for the `lucene` engine. They are not supported for the `nmslib` and `faiss` engines.
|
|
||||||
{: .note}
|
|
||||||
|
|
||||||
When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
|
|
||||||
{: .important}
|
|
||||||
|
|
||||||
Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`.
|
|
||||||
|
|
||||||
To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index:
|
|
||||||
|
|
||||||
```json
|
|
||||||
PUT test-index
|
|
||||||
{
|
|
||||||
"settings": {
|
|
||||||
"index": {
|
|
||||||
"knn": true,
|
|
||||||
"knn.algo_param.ef_search": 100
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"mappings": {
|
|
||||||
"properties": {
|
|
||||||
"my_vector1": {
|
|
||||||
"type": "knn_vector",
|
|
||||||
"dimension": 3,
|
|
||||||
"data_type": "byte",
|
|
||||||
"method": {
|
|
||||||
"name": "hnsw",
|
|
||||||
"space_type": "l2",
|
|
||||||
"engine": "lucene",
|
|
||||||
"parameters": {
|
|
||||||
"ef_construction": 128,
|
|
||||||
"m": 24
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
{% include copy-curl.html %}
|
|
||||||
|
|
||||||
Then ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range:
|
|
||||||
|
|
||||||
```json
|
|
||||||
PUT test-index/_doc/1
|
|
||||||
{
|
|
||||||
"my_vector1": [-126, 28, 127]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
{% include copy-curl.html %}
|
|
||||||
|
|
||||||
```json
|
|
||||||
PUT test-index/_doc/2
|
|
||||||
{
|
|
||||||
"my_vector1": [100, -128, 0]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
{% include copy-curl.html %}
|
|
||||||
|
|
||||||
When querying, be sure to use a `byte` vector:
|
|
||||||
|
|
||||||
```json
|
|
||||||
GET test-index/_search
|
|
||||||
{
|
|
||||||
"size": 2,
|
|
||||||
"query": {
|
|
||||||
"knn": {
|
|
||||||
"my_vector1": {
|
|
||||||
"vector": [26, -120, 99],
|
|
||||||
"k": 2
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
{% include copy-curl.html %}
|
|
||||||
|
|
||||||
## Method definitions
|
## Method definitions
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue