Change terminology from sparse to neural sparse (#5714)

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
This commit is contained in:
kolchfa-aws 2023-11-30 10:43:12 -05:00 committed by GitHub
parent a8bd47e07b
commit f524e6891f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 15 additions and 16 deletions

View File

@ -35,7 +35,7 @@ The following table lists the required and optional parameters for the `sparse_e
| Name | Data type | Required | Description |
|:---|:---|:---|:---|
`model_id` | String | Required | The ID of the model that will be used to generate the embeddings. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/sparse-search/).
`model_id` | String | Required | The ID of the model that will be used to generate the embeddings. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
`field_map` | Object | Required | Contains key-value pairs that specify the mapping of a text field to a `rank_features` field.
`field_map.<input_field>` | String | Required | The name of the field from which to obtain text for generating vector embeddings.
`field_map.<vector_field>` | String | Required | The name of the vector field in which to store the generated vector embeddings.
@ -145,6 +145,6 @@ The response confirms that in addition to the `passage_text` field, the processo
## Next steps
- To learn how to use the `neural_sparse` query for a sparse search, see [Neural sparse query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural-sparse/).
- To learn more about sparse search, see [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/sparse-search/).
- To learn more about sparse search, see [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
- To learn more about using models in OpenSearch, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
- For a comprehensive example, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/).

View File

@ -268,7 +268,7 @@ The response contains the tokens and weights:
To learn how to set up a vector index and use text embedding models for search, see [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/).
To learn how to set up a vector index and use sparse encoding models for search, see [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/sparse-search/).
To learn how to set up a vector index and use sparse encoding models for search, see [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
## Supported pretrained models

View File

@ -31,7 +31,7 @@ The top-level `vector_field` specifies the vector field against which to run a s
Field | Data type | Required/Optional | Description
:--- | :--- | :---
`query_text` | String | Required | The query text from which to generate vector embeddings.
`model_id` | String | Required | The ID of the sparse encoding model or tokenizer model that will be used to generate vector embeddings from the query text. The model must be deployed in OpenSearch before it can be used in sparse neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/sparse-search/).
`model_id` | String | Required | The ID of the sparse encoding model or tokenizer model that will be used to generate vector embeddings from the query text. The model must be deployed in OpenSearch before it can be used in sparse neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
`max_token_score` | Float | Optional | The theoretical upper bound of the score for all tokens in the vocabulary (required for performance optimization). For OpenSearch-provided [pretrained sparse embedding models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models), we recommend setting `max_token_score` to 2 for `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` and to 3.5 for `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1`.
#### Example request

View File

@ -2,7 +2,7 @@
layout: default
title: Hybrid search
has_children: false
nav_order: 40
nav_order: 60
---
# Hybrid search

View File

@ -31,7 +31,7 @@ OpenSearch supports the following search methods:
- [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/): Uses multimodal embedding models to search text and image data.
- [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/sparse-search/): Uses sparse retrieval based on sparse embedding models to search text data.
- [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/): Uses sparse retrieval based on sparse embedding models to search text data.
- [Hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/): Combines traditional search and vector search to improve search relevance.

View File

@ -1,7 +1,7 @@
---
layout: default
title: Multimodal search
nav_order: 60
nav_order: 40
has_children: false
redirect_from:
- /search-plugins/neural-multimodal-search/

View File

@ -1,18 +1,17 @@
---
layout: default
title: Sparse search
parent: Search
title: Neural sparse search
nav_order: 50
has_children: false
redirect_from:
- /search-plugins/neural-sparse-search/
- /search-plugins/sparse-search/
---
# Sparse search
# Neural sparse search
Introduced 2.11
{: .label .label-purple }
[Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/) relies on dense retrieval that is based on text embedding models. However, dense methods use k-NN search, which consumes a large amount of memory and CPU resources. An alternative to semantic search, sparse search is implemented using an inverted index and is thus as efficient as BM25. Sparse search is facilitated by sparse embedding models. When you perform a sparse search, it creates a sparse vector (a list of `token: weight` key-value pairs representing an entry and its weight) and ingests data into a rank features index.
[Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/) relies on dense retrieval that is based on text embedding models. However, dense methods use k-NN search, which consumes a large amount of memory and CPU resources. An alternative to semantic search, neural sparse search is implemented using an inverted index and is thus as efficient as BM25. Neural sparse search is facilitated by sparse embedding models. When you perform a neural sparse search, it creates a sparse vector (a list of `token: weight` key-value pairs representing an entry and its weight) and ingests data into a rank features index.
When selecting a model, choose one of the following options:
@ -20,12 +19,12 @@ When selecting a model, choose one of the following options:
- Use a sparse encoding model at ingestion time and a tokenizer model at search time (low performance, relatively low latency).
**PREREQUISITE**<br>
Before using sparse search, make sure to set up a [pretrained sparse embedding model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models) or your own sparse embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
Before using neural sparse search, make sure to set up a [pretrained sparse embedding model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models) or your own sparse embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
## Using sparse search
## Using neural sparse search
To use sparse search, follow these steps:
To use neural sparse search, follow these steps:
1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline).
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
@ -144,7 +143,7 @@ Before the document is ingested into the index, the ingest pipeline runs the `sp
## Step 4: Search the index using neural search
To perform a sparse vector search on your index, use the `neural_sparse` query clause in [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries.
To perform a neural sparse search on your index, use the `neural_sparse` query clause in [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries.
The following example request uses a `neural_sparse` query to search for relevant documents: