diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index 700e05f7..8d0b29af 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -27,6 +27,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): [Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. [Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape. [Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`). +[k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) | Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search. Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query. ## Arrays diff --git a/_field-types/supported-field-types/knn-vector.md b/_field-types/supported-field-types/knn-vector.md new file mode 100644 index 00000000..4e447b15 --- /dev/null +++ b/_field-types/supported-field-types/knn-vector.md @@ -0,0 +1,166 @@ +--- +layout: default +title: k-NN vector +nav_order: 58 +has_children: false +parent: Supported field types +--- + +# k-NN vector + +The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors +into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. In general, a `knn_vector` field can be built either by providing a method definition or specifying a model id. + +## Example + +For example, to map `my_vector1` as a `knn_vector`, use the following request: + +```json +PUT test-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 3, + "method": { + "name": "hnsw", + "space_type": "l2", + "engine": "lucene", + "parameters": { + "ef_construction": 128, + "m": 24 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Method definitions + +Method definitions are used when the underlying Approximate k-NN algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for Approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files. + +```json +"my_vector": { + "type": "knn_vector", + "dimension": 4, + "method": { + "name": "hnsw", + "space_type": "l2", + "engine": "nmslib", + "parameters": { + "ef_construction": 128, + "m": 24 + } + } +} +``` + +## Model IDs + +Model IDs are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the +model has to be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model). The +model contains the information needed to initialize the native library segment files. + +```json + "type": "knn_vector", + "model_id": "my-model" +} +``` + +However, if you intend to just use painless scripting or a k-NN score script, you only need to pass the dimension. + ```json + "type": "knn_vector", + "dimension": 128 + } + ``` + +## Lucene byte vector + +By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range. + +Byte vectors are supported only for the `lucene` engine. They are not supported for the `nmslib` and `faiss` engines. +{: .note} + +When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall. +{: .important} + +Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`. + +To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index: + + ```json +PUT test-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 3, + "data_type": "byte", + "method": { + "name": "hnsw", + "space_type": "l2", + "engine": "lucene", + "parameters": { + "ef_construction": 128, + "m": 24 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +Then ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range: + +```json +PUT test-index/_doc/1 +{ + "my_vector1": [-126, 28, 127] +} +``` +{% include copy-curl.html %} + +```json +PUT test-index/_doc/2 +{ + "my_vector1": [100, -128, 0] +} +``` +{% include copy-curl.html %} + +When querying, be sure to use a `byte` vector: + +```json +GET test-index/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector1": { + "vector": [26, -120, 99], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} \ No newline at end of file diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md index 13505b04..0bc00870 100644 --- a/_search-plugins/knn/approximate-knn.md +++ b/_search-plugins/knn/approximate-knn.md @@ -1,6 +1,6 @@ --- layout: default -title: Approximate search +title: Approximate k-NN search nav_order: 15 parent: k-NN has_children: false @@ -79,7 +79,7 @@ PUT my-knn-index-1 } ``` -In the example above, both `knn_vector` fields are configured from method definitions. Additionally, `knn_vector` fields can also be configured from models. You can learn more about this in the [knn_vector data type]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#knn_vector-data-type) section. +In the example above, both `knn_vector` fields are configured from method definitions. Additionally, `knn_vector` fields can also be configured from models. You can learn more about this in the [knn_vector data type]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) section. The `knn_vector` data type supports a vector of floats that can have a dimension count of up to 16,000 for the nmslib and faiss engines, as set by the dimension mapping parameter. The maximum dimension count for the Lucene library is 1,024. diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index baaa4628..c0be704e 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -1,133 +1,15 @@ --- layout: default -title: k-NN Index +title: k-NN index nav_order: 5 parent: k-NN has_children: false --- -# k-NN Index - -## knn_vector data type +# k-NN index The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors -into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. In general, a `knn_vector` field can be built either by providing a method definition or specifying a model id. - -Method definitions are used when the underlying Approximate k-NN algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for Approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files. - -```json -"my_vector": { - "type": "knn_vector", - "dimension": 4, - "method": { - "name": "hnsw", - "space_type": "l2", - "engine": "nmslib", - "parameters": { - "ef_construction": 128, - "m": 24 - } - } -} -``` - -Model IDs are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the -model has to be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model). The -model contains the information needed to initialize the native library segment files. - -```json - "type": "knn_vector", - "model_id": "my-model" -} -``` - -However, if you intend to just use painless scripting or a k-NN score script, you only need to pass the dimension. - ```json - "type": "knn_vector", - "dimension": 128 - } - ``` - -### Lucene byte vector - -By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range. - -Byte vectors are supported only for the `lucene` engine. They are not supported for the `nmslib` and `faiss` engines. -{: .note} - -When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall. -{: .important} - -Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`. - -To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index: - - ```json -PUT test-index -{ - "settings": { - "index": { - "knn": true, - "knn.algo_param.ef_search": 100 - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 3, - "data_type": "byte", - "method": { - "name": "hnsw", - "space_type": "l2", - "engine": "lucene", - "parameters": { - "ef_construction": 128, - "m": 24 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -Then ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range: - -```json -PUT test-index/_doc/1 -{ - "my_vector1": [-126, 28, 127] -} -``` -{% include copy-curl.html %} - -```json -PUT test-index/_doc/2 -{ - "my_vector1": [100, -128, 0] -} -``` -{% include copy-curl.html %} - -When querying, be sure to use a `byte` vector: - -```json -GET test-index/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector1": { - "vector": [26, -120, 99], - "k": 2 - } - } - } -} -``` -{% include copy-curl.html %} +into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. For more information, see [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/). ## Method definitions