Add an overview of search methods and pages for each search method (#5636)

* Restructuring TOC

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Resolve merge conflicts

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* More foundational rewrites of ML

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* TOC restructure

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Rename and rewrite search pages and add keyword search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Small wording change

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Small wording change

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Updated response

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Small rewording

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Move neural search to top of vector search list

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Change terminology

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Reorganize search methods list

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Rename links

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* More link renames

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
kolchfa-aws 2023-11-29 15:28:20 -05:00 committed by GitHub
parent fa5bc07a69
commit f999e0a8a8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
56 changed files with 947 additions and 323 deletions

View File

@ -1,9 +1,10 @@
---
layout: default
title: Search templates
nav_order: 50
nav_order: 80
redirect_from:
- /opensearch/search-template/
- /search-plugins/search-template/
---
# Search templates

View File

@ -12,7 +12,7 @@ redirect_from:
The `sparse_encoding` processor is used to generate a sparse vector/token and weights from text fields for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) using sparse retrieval.
**PREREQUISITE**<br>
Before using the `sparse_encoding` processor, you must set up a machine learning (ML) model. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
Before using the `sparse_encoding` processor, you must set up a machine learning (ML) model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
The following is the syntax for the `sparse_encoding` processor:
@ -35,7 +35,7 @@ The following table lists the required and optional parameters for the `sparse_e
| Name | Data type | Required | Description |
|:---|:---|:---|:---|
`model_id` | String | Required | The ID of the model that will be used to generate the embeddings. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
`model_id` | String | Required | The ID of the model that will be used to generate the embeddings. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/sparse-search/).
`field_map` | Object | Required | Contains key-value pairs that specify the mapping of a text field to a `rank_features` field.
`field_map.<input_field>` | String | Required | The name of the field from which to obtain text for generating vector embeddings.
`field_map.<vector_field>` | String | Required | The name of the vector field in which to store the generated vector embeddings.
@ -44,7 +44,7 @@ The following table lists the required and optional parameters for the `sparse_e
## Using the processor
Follow these steps to use the processor in a pipeline. You must provide a model ID when creating the processor. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
Follow these steps to use the processor in a pipeline. You must provide a model ID when creating the processor. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
**Step 1: Create a pipeline.**
@ -140,9 +140,11 @@ The response confirms that in addition to the `passage_text` field, the processo
}
```
---
## Next steps
- To learn how to use the `neural_sparse` query for a sparse search, see [Neural sparse query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural-sparse/).
- To learn more about sparse neural search, see [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/).
- To learn more about using models in OpenSearch, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
- For a semantic search tutorial, see [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
- To learn more about sparse search, see [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/sparse-search/).
- To learn more about using models in OpenSearch, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
- For a comprehensive example, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/).

View File

@ -12,7 +12,7 @@ redirect_from:
The `text_embedding` processor is used to generate vector embeddings from text fields for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/).
**PREREQUISITE**<br>
Before using the `text_embedding` processor, you must set up a machine learning (ML) model. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
Before using the `text_embedding` processor, you must set up a machine learning (ML) model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
The following is the syntax for the `text_embedding` processor:
@ -35,7 +35,7 @@ The following table lists the required and optional parameters for the `text_emb
| Name | Data type | Required | Description |
|:---|:---|:---|:---|
`model_id` | String | Required | The ID of the model that will be used to generate the embeddings. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
`model_id` | String | Required | The ID of the model that will be used to generate the embeddings. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/).
`field_map` | Object | Required | Contains key-value pairs that specify the mapping of a text field to a vector field.
`field_map.<input_field>` | String | Required | The name of the field from which to obtain text for generating text embeddings.
`field_map.<vector_field>` | String | Required | The name of the vector field in which to store the generated text embeddings.
@ -44,7 +44,7 @@ The following table lists the required and optional parameters for the `text_emb
## Using the processor
Follow these steps to use the processor in a pipeline. You must provide a model ID when creating the processor. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
Follow these steps to use the processor in a pipeline. You must provide a model ID when creating the processor. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
**Step 1: Create a pipeline.**
@ -124,6 +124,6 @@ The response confirms that in addition to the `passage_text` field, the processo
## Next steps
- To learn how to use the `neural` query for text search, see [Neural query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural/).
- To learn more about neural text search, see [Text search]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/).
- To learn more about using models in OpenSearch, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
- For a semantic search tutorial, see [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
- To learn more about neural text search, see [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/).
To learn more about using models in OpenSearch, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
- For a comprehensive example, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/).

View File

@ -12,7 +12,7 @@ redirect_from:
The `text_image_embedding` processor is used to generate combined vector embeddings from text and image fields for [multimodal neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-multimodal-search/).
**PREREQUISITE**<br>
Before using the `text_image_embedding` processor, you must set up a machine learning (ML) model. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
Before using the `text_image_embedding` processor, you must set up a machine learning (ML) model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
The following is the syntax for the `text_image_embedding` processor:
@ -37,7 +37,7 @@ The following table lists the required and optional parameters for the `text_ima
| Name | Data type | Required | Description |
|:---|:---|:---|:---|
`model_id` | String | Required | The ID of the model that will be used to generate the embeddings. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
`model_id` | String | Required | The ID of the model that will be used to generate the embeddings. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/).
`embedding` | String | Required | The name of the vector field in which to store the generated embeddings. A single embedding is generated for both `text` and `image` fields.
`field_map` | Object | Required | Contains key-value pairs that specify the fields from which to generate embeddings.
`field_map.text` | String | Optional | The name of the field from which to obtain text for generating vector embeddings. You must specify at least one `text` or `image`.
@ -47,7 +47,7 @@ The following table lists the required and optional parameters for the `text_ima
## Using the processor
Follow these steps to use the processor in a pipeline. You must provide a model ID when creating the processor. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
Follow these steps to use the processor in a pipeline. You must provide a model ID when creating the processor. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
**Step 1: Create a pipeline.**
@ -134,6 +134,6 @@ The response confirms that in addition to the `image_description` and `image_bin
## Next steps
- To learn how to use the `neural` query for a multimodal search, see [Neural query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural/).
- To learn more about multimodal neural search, see [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/neural-multimodal-search/).
- To learn more about using models in OpenSearch, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
- For a semantic search tutorial, see [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
- To learn more about multimodal neural search, see [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/search-methods/multimodal-search/).
To learn more about using models in OpenSearch, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
- For a comprehensive example, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/).

View File

@ -1,8 +1,8 @@
---
layout: default
title: Supported Algorithms
title: Supported algorithms
has_children: false
nav_order: 30
nav_order: 125
---
# Supported algorithms

View File

@ -8,7 +8,7 @@ nav_order: 10
# Create a connector
Creates a standalone connector. For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/).
Creates a standalone connector. For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/).
## Path and HTTP methods
@ -18,7 +18,7 @@ POST /_plugins/_ml/connectors/_create
#### Example request
To create a standalone connector, send a request to the `connectors/_create` endpoint and provide all of the parameters described in [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/blueprints/):
To create a standalone connector, send a request to the `connectors/_create` endpoint and provide all of the parameters described in [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/):
```json
POST /_plugins/_ml/connectors/_create

View File

@ -8,7 +8,7 @@ nav_order: 30
# Delete a connector
Deletes a standalone connector. For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/).
Deletes a standalone connector. For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/).
## Path and HTTP methods

View File

@ -14,4 +14,4 @@ ML Commons supports the following connector APIs:
- [Search for a connector]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/connector-apis/get-connector/)
- [Delete connector]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/connector-apis/delete-connector/)
For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/).
For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/).

View File

@ -162,7 +162,7 @@ POST /_plugins/_ml/models/_register
## Register a model hosted on a third-party platform
To register a model hosted on a third-party platform, you can either first create a standalone connector and provide the ID of that connector or specify an internal connector for the model. For more information, see [Creating connectors for third-party ML platforms]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/).
To register a model hosted on a third-party platform, you can either first create a standalone connector and provide the ID of that connector or specify an internal connector for the model. For more information, see [Creating connectors for third-party ML platforms]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/).
### Request fields
@ -172,8 +172,8 @@ Field | Data type | Required/Optional | Description
:--- | :--- | :---
`name`| String | Required | The model name. |
`function_name` | String | Required | Set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`.
`connector_id` | Optional | Required | The connector ID of a standalone connector to a model hosted on a third-party platform. For more information, see [Standalone connector]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/#standalone-connector). You must provide either `connector_id` or `connector`.
`connector` | Object | Required | Contains specifications for an internal connector to a model that is hosted on a third-party platform. For more information, see [Internal connector]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/#internal-connector). You must provide either `connector_id` or `connector`.
`connector_id` | Optional | Required | The connector ID of a standalone connector for a model hosted on a third-party platform. For more information, see [Standalone connector]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/#creating-a-standalone-connector). You must provide either `connector_id` or `connector`.
`connector` | Object | Required | Contains specifications for a connector for a model hosted on a third-party platform. For more information, see [Creating a connector for a specific model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/#creating-a-connector-for-a-specific-model). You must provide either `connector_id` or `connector`.
`description` | String | Optional| The model description. |
`model_group_id` | String | Optional | The model group ID of the model group to register this model to.
@ -191,7 +191,7 @@ POST /_plugins/_ml/models/_register
```
{% include copy-curl.html %}
#### Example request: Remote model with an internal connector
#### Example request: Remote model with a connector specified as part of the model
```json
POST /_plugins/_ml/models/_register

View File

@ -2,6 +2,7 @@
layout: default
title: Custom models
parent: Using ML models within OpenSearch
grand_parent: Integrating ML models
nav_order: 120
---
@ -286,8 +287,4 @@ The response contains the tokens and weights:
## Step 5: Use the model for search
To learn how to set up a vector index and use text embedding models for search, see [Neural text search]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/).
To learn how to set up a vector index and use sparse encoding models for search, see [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
To learn how to set up a vector index and use multimodal embedding models for search, see [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
To learn how to use the model for vector search, see [Set up neural search]({{site.url}}{{site.baseurl}}http://localhost:4000/docs/latest/search-plugins/neural-search/#set-up-neural-search).

View File

@ -2,6 +2,7 @@
layout: default
title: GPU acceleration
parent: Using ML models within OpenSearch
grand_parent: Integrating ML models
nav_order: 150
---

View File

@ -1,26 +1,30 @@
---
layout: default
title: About ML Commons
title: Machine learning
nav_order: 1
has_children: false
has_toc: false
nav_exclude: true
---
# ML Commons plugin
# Machine learning
ML Commons for OpenSearch simplifies the development of machine learning (ML) features by providing a set of ML algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitor ML tasks to ensure uptime. This allows you to use existing open-source ML algorithms and reduce the effort required to develop new ML features.
The [ML Commons plugin](https://github.com/opensearch-project/ml-commons/) provides machine learning (ML) features in OpenSearch.
Interaction with the ML Commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/index/) or [`ad`]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/functions#ad) and [`kmeans`]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/functions#kmeans) Piped Processing Language (PPL) commands.
## Integrating ML models
[Models trained]({{site.url}}{{site.baseurl}}//ml-commons-plugin/api/train-predict/train/) through the ML Commons plugin support model-based algorithms, such as k-means. After you've trained a model to your precision requirements, use the model to [make predictions]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/predict/).
For ML-model-powered search, you can use a pretrained model provided by OpenSearch, upload your own model to the OpenSearch cluster, or connect to a foundation model hosted on an external platform.
If you don't want to use a model, you can use the [Train and Predict API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/train-and-predict/) to test your model without having to evaluate the model's performance.
For more information, see [Integrating ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/).
## Using ML Commons
## Managing ML models in OpenSearch Dashboards
1. Ensure that you've appropriately set the cluster settings described in [ML Commons cluster settings]({{site.url}}{{site.baseurl}}/ml-commons-plugin/cluster-settings/).
2. Set up model access as described in [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/).
3. Start using models:
- [Run your custom models within an OpenSearch cluster]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
- [Integrate models hosted on an external platform]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/index/).
Administrators of ML clusters can use OpenSearch Dashboards to review and manage the status of ML models running inside a cluster. For more information, see [Managing ML models in OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-dashboard/).
## Support for algorithms
ML Commons supports various algorithms to help train ML models and make predictions or test data-driven predictions without a model. For more information, see [Supported algorithms]({{site.url}}{{site.baseurl}}/ml-commons-plugin/algorithms/).
## ML Commons API
ML Commons provides its own set of REST APIs. For more information, see [ML Commons API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/index/).

View File

@ -0,0 +1,56 @@
---
layout: default
title: Integrating ML models
nav_order: 15
has_children: true
---
# Integrating ML models
OpenSearch offers support for machine learning (ML) models that you can use in conjunction with k-NN search to retrieve semantically similar documents. This semantic search capability improves search relevance for your applications.
Before you get started, you'll need to [set up]({{site.url}}{{site.baseurl}}/quickstart/) and [secure]({{site.url}}{{site.baseurl}}/security/index/) your cluster.
{: .tip}
## Choosing a model
To integrate an ML model into your search workflow, choose one of the following options:
1. **Local model**: Upload a model to the OpenSearch cluster and use it locally. This option allows you to serve the model in your OpenSearch cluster but may require significant system resources.
1. **Pretrained model provided by OpenSearch**: This option requires minimal setup and avoids the time and effort required to train a custom model.
For a list of supported models and information about using a pretrained model provided by OpenSearch, see [Pretrained models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/).
1. **Custom model**: This option offers customization for your specific use case.
For information about uploading your model, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
1. **Remote model**: This option allows you to connect to a model hosted on a third-party platform. It requires more setup but allows the use of models that are already hosted on a service other than OpenSearch.
To connect to an externally hosted model, you need to set up a connector:
- For a walkthrough with detailed steps, see [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/).
- For more information about supported connectors, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/).
- For information about creating your own connector, see [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/).
## Tutorial
For a step-by-step tutorial, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/).
## Using a model
You can use an ML model in one of the following ways:
- [Make predictions](#making-predictions).
- [Use a model for search](#using-a-model-for-search).
### Making predictions
[Models trained]({{site.url}}{{site.baseurl}}//ml-commons-plugin/api/train-predict/train/) through the ML Commons plugin support model-based algorithms, such as k-means. After you've trained a model to your precision requirements, use the model to [make predictions]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/predict/).
If you don't want to use a model, you can use the [Train and Predict API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/train-and-predict/) to test your model without having to evaluate the model's performance.
### Using a model for search
OpenSearch supports multiple search methods that integrate with ML models. For more information, see [Search methods]({{site.url}}{{site.baseurl}}/search-plugins/index/#search-methods).

View File

@ -1,6 +1,7 @@
---
layout: default
title: Model access control
parent: Integrating ML models
has_children: false
nav_order: 20
---

View File

@ -2,6 +2,7 @@
layout: default
title: Pretrained models
parent: Using ML models within OpenSearch
grand_parent: Integrating ML models
nav_order: 120
---
@ -265,9 +266,9 @@ The response contains the tokens and weights:
## Step 5: Use the model for search
To learn how to set up a vector index and use text embedding models for search, see [Neural text search]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/).
To learn how to set up a vector index and use text embedding models for search, see [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/search-methods/semantic-search/).
To learn how to set up a vector index and use sparse encoding models for search, see [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
To learn how to set up a vector index and use sparse encoding models for search, see [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/search-methods/sparse-search/).
## Supported pretrained models

View File

@ -4,6 +4,9 @@ title: Connector blueprints
has_children: false
nav_order: 65
parent: Connecting to remote models
grand_parent: Integrating ML models
redirect_from:
- ml-commons-plugin/remote-models/blueprints/
---
# Connector blueprints
@ -61,7 +64,7 @@ The following configuration options are **required** in order to build a connect
| `protocol` | String | The protocol for the connection. For AWS services such as Amazon SageMaker and Amazon Bedrock, use `aws_sigv4`. For all other services, use `http`. |
| `parameters` | JSON object | The default connector parameters, including `endpoint` and `model`. Any parameters indicated in this field can be overridden by parameters specified in a predict request. |
| `credential` | JSON object | Defines any credential variables required in order to connect to your chosen endpoint. ML Commons uses **AES/GCM/NoPadding** symmetric encryption to encrypt your credentials. When the connection to the cluster first starts, OpenSearch creates a random 32-byte encryption key that persists in OpenSearch's system index. Therefore, you do not need to manually set the encryption key. |
| `actions` | JSON array | Define what actions can run within the connector. If you're an administrator making a connection, add the [blueprint]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/blueprints/) for your desired connection. |
| `actions` | JSON array | Defines what actions can run within the connector. If you're an administrator creating a connection, add the [blueprint]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/) for your desired connection. |
| `backend_roles` | JSON array | A list of OpenSearch backend roles. For more information about setting up backend roles, see [Assigning backend roles to users]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control#assigning-backend-roles-to-users). |
| `access_mode` | String | Sets the access mode for the model, either `public`, `restricted`, or `private`. Default is `private`. For more information about `access_mode`, see [Model groups]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control#model-groups). |
| `add_all_backend_roles` | Boolean | When set to `true`, adds all `backend_roles` to the access list, which only a user with admin permissions can adjust. When set to `false`, non-admins can add `backend_roles`. |
@ -72,7 +75,7 @@ The `action` parameter supports the following options.
| :--- | :--- | :--- |
| `action_type` | String | Required. Sets the ML Commons API operation to use upon connection. As of OpenSearch 2.9, only `predict` is supported. |
| `method` | String | Required. Defines the HTTP method for the API call. Supports `POST` and `GET`. |
| `url` | String | Required. Sets the connection endpoint at which the action takes place. This must match the regex expression for the connection used when [adding trusted endpoints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/index#adding-trusted-endpoints). |
| `url` | String | Required. Sets the connection endpoint at which the action occurs. This must match the regex expression for the connection used when [adding trusted endpoints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index#adding-trusted-endpoints). |
| `headers` | JSON object | Sets the headers used inside the request or response body. Default is `ContentType: application/json`. If your third-party ML tool requires access control, define the required `credential` parameters in the `headers` parameter. |
| `request_body` | String | Required. Sets the parameters contained inside the request body of the action. The parameters must include `\"inputText\`, which specifies how users of the connector should construct the request payload for the `action_type`. |
| `pre_process_function` | String | Optional. A built-in or custom Painless script used to preprocess the input data. OpenSearch provides the following built-in preprocess functions that you can call directly:<br> - `connector.pre_process.cohere.embedding` for [Cohere](https://cohere.com/) embedding models<br> - `connector.pre_process.openai.embedding` for [OpenAI](https://openai.com/) embedding models <br> - `connector.pre_process.default.embedding`, which you can use to preprocess documents in neural search requests so that they are in the format that ML Commons can process with the default preprocessor (OpenSearch 2.11 or later). For more information, see [built-in functions](#built-in-pre--and-post-processing-functions). |
@ -215,4 +218,4 @@ POST /_plugins/_ml/connectors/_create
## Next step
For examples of creating various connectors, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/).
For examples of creating various connectors, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/).

View File

@ -5,6 +5,9 @@ has_children: false
has_toc: false
nav_order: 61
parent: Connecting to remote models
grand_parent: Integrating ML models
redirect_from:
- ml-commons-plugin/remote-models/connectors/
---
# Creating connectors for third-party ML platforms
@ -15,9 +18,9 @@ Connectors facilitate access to remote models hosted on third-party platforms.
You can provision connectors in two ways:
1. Create a [standalone connector](#standalone-connector): A standalone connector can be reused and shared by multiple remote models but requires access to both the model and connector in OpenSearch and the third-party platform, such as OpenAI or Amazon SageMaker, that the connector is accessing. Standalone connectors are saved in a connector index.
1. [Create a standalone connector](#creating-a-standalone-connector): A standalone connector can be reused and shared by multiple remote models but requires access to both the model and connector in OpenSearch and the third-party platform, such as OpenAI or Amazon SageMaker, that the connector is accessing. Standalone connectors are saved in a connector index.
2. Create a remote model with an [internal connector](#internal-connector): An internal connector can only be used with the remote model in which it was created. To access an internal connector, you only need access to the model itself because the connection is established inside the model. Internal connectors are saved in the model index.
2. [Create a connector for a specific remote model](#creating-a-connector-for-a-specific-model): Alternatively, you can create a connector that can only be used with the remote model for which it was created. To access such a connector, you only need access to the model itself because the connection is established inside the model. These connectors are saved in the model index.
## Supported connectors
@ -32,14 +35,14 @@ All connectors consist of a JSON blueprint created by machine learning (ML) deve
You can find blueprints for each connector in the [ML Commons repository](https://github.com/opensearch-project/ml-commons/tree/2.x/docs/remote_inference_blueprints).
For more information about blueprint parameters, see [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/blueprints/).
For more information about blueprint parameters, see [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/).
Admins are only required to enter their `credential` settings, such as `"openAI_key"`, for the service they are connecting to. All other parameters are defined within the [blueprint]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/blueprints/).
Admins are only required to enter their `credential` settings, such as `"openAI_key"`, for the service they are connecting to. All other parameters are defined within the [blueprint]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/).
{: .note}
## Standalone connector
## Creating a standalone connector
To create a standalone connector, send a request to the `connectors/_create` endpoint and provide all of the parameters described in [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/blueprints/):
Standalone connectors can be used by multiple models. To create a standalone connector, send a request to the `connectors/_create` endpoint and provide all of the parameters described in [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/):
```json
POST /_plugins/_ml/connectors/_create
@ -70,14 +73,14 @@ POST /_plugins/_ml/connectors/_create
```
{% include copy-curl.html %}
## Internal connector
## Creating a connector for a specific model
To create an internal connector, provide all of the parameters described in [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/blueprints/) within the `connector` object of a request to the `models/_register` endpoint:
To create a connector for a specific model, provide all of the parameters described in [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/) within the `connector` object of a request to the `models/_register` endpoint:
```json
POST /_plugins/_ml/models/_register
{
"name": "openAI-GPT-3.5: internal connector",
"name": "openAI-GPT-3.5 model with a connector",
"function_name": "remote",
"model_group_id": "lEFGL4kB4ubqQRzegPo2",
"description": "test model",
@ -113,8 +116,7 @@ POST /_plugins/_ml/models/_register
## OpenAI chat connector
The following example creates a standalone OpenAI chat connector. The same options can be used for an internal connector under the `connector` parameter:
The following example shows how to create a standalone OpenAI chat connector:
```json
POST /_plugins/_ml/connectors/_create
@ -149,7 +151,7 @@ After creating the connector, you can retrieve the `task_id` and `connector_id`
## Amazon SageMaker connector
The following example creates a standalone Amazon SageMaker connector. The same options can be used for an internal connector under the `connector` parameter:
The following example shows how to create a standalone Amazon SageMaker connector:
```json
POST /_plugins/_ml/connectors/_create
@ -267,5 +269,5 @@ POST /_plugins/_ml/connectors/_create
## Next steps
- To learn more about using models in OpenSearch, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
- To learn more about using models in OpenSearch, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
- To learn more about model access control and model groups, see [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/).

View File

@ -1,21 +1,24 @@
---
layout: default
title: Connecting to remote models
parent: Integrating ML models
has_children: true
has_toc: false
nav_order: 60
redirect_from:
- ml-commons-plugin/remote-models/index/
---
# Connecting to remote models
**Introduced 2.9**
{: .label .label-purple }
Machine learning (ML) extensibility enables ML developers to create integrations with other ML services, such as Amazon SageMaker or OpenAI. These integrations provide system administrators and data scientists the ability to run ML workloads outside of their OpenSearch cluster.
Machine learning (ML) remote models enable ML developers to create integrations with other ML services, such as Amazon SageMaker or OpenAI. These integrations allow system administrators and data scientists to run ML workloads outside of their OpenSearch cluster.
To get started with ML extensibility, choose from the following options:
To get started with ML remote models, choose from the following options:
- If you're an ML developer wanting to integrate with your specific ML services, see [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/blueprints/).
- If you're a system administrator or data scientist wanting to create a connection to an ML service, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/).
- If you're an ML developer wanting to create integrations with your specific ML services, see [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/).
- If you're a system administrator or data scientist wanting to create a connection to an ML service, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/).
## Prerequisites
@ -123,7 +126,7 @@ To learn more about model groups, see [Model access control]({{site.url}}{{site.
## Step 2: Create a connector
You can create a standalone connector or an internal connector as part of a specific model. For more information about connectors and connector examples, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/).
You can create a standalone connector that can be reused for multiple models. Alternatively, you can specify a connector when creating a model so that it can be used only for that model. For more information and example connectors, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/).
The Connectors Create API, `/_plugins/_ml/connectors/_create`, creates connectors that facilitate registering and deploying external models in OpenSearch. Using the `endpoint` parameter, you can connect ML Commons to any supported ML tool by using its specific API endpoint. For example, you can connect to a ChatGPT model by using the `api.openai.com` endpoint:
@ -321,14 +324,11 @@ The response contains the inference results provided by the OpenAI model:
## Step 6: Use the model for search
To learn how to set up a vector index and use text embedding models for search, see [Neural text search]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/).
To learn how to set up a vector index and use sparse encoding models for search, see [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
To learn how to set up a vector index and use multimodal embedding models for search, see [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
To learn how to use the model for vector search, see [Set up neural search]({{site.url}}{{site.baseurl}}http://localhost:4000/docs/latest/search-plugins/neural-search/#set-up-neural-search).
## Next steps
- For more information about connectors, including connector examples, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/).
- For more information about connector parameters, see [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/blueprints/).
- For more information about connectors, including example connectors, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/).
- For more information about connector parameters, see [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/).
- For more information about managing ML models in OpenSearch, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-serving-framework/).
- For more information about interacting with ML models in OpenSearch, see [Managing ML models in OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-dashboard/)

View File

@ -1,10 +1,12 @@
---
layout: default
title: Using ML models within OpenSearch
parent: Integrating ML models
has_children: true
nav_order: 50
redirect_from:
- /ml-commons-plugin/model-serving-framework/
- /ml-commons-plugin/using-ml-models/
---
# Using ML models within OpenSearch

View File

@ -12,9 +12,13 @@ You can use a hybrid query to combine relevance scores from multiple queries int
## Example
Before using a `hybrid` query, you must configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) (see [this example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor#example)).
Before using a `hybrid` query, you must set up a machine learning (ML) model, ingest documents, and configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/).
To try out the example, follow the [Semantic search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial).
To learn how to set up an ML model, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
Once you set up an ML model, learn how to use the `hybrid` query by following the steps in [Using hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/#using-hybrid-search).
For a comprehensive example, follow the [Neural search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial).
## Parameters

View File

@ -31,7 +31,7 @@ The top-level `vector_field` specifies the vector field against which to run a s
Field | Data type | Required/Optional | Description
:--- | :--- | :---
`query_text` | String | Required | The query text from which to generate vector embeddings.
`model_id` | String | Required | The ID of the sparse encoding model or tokenizer model that will be used to generate vector embeddings from the query text. The model must be deployed in OpenSearch before it can be used in sparse neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
`model_id` | String | Required | The ID of the sparse encoding model or tokenizer model that will be used to generate vector embeddings from the query text. The model must be deployed in OpenSearch before it can be used in sparse neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/sparse-search/).
`max_token_score` | Float | Optional | The theoretical upper bound of the score for all tokens in the vocabulary (required for performance optimization). For OpenSearch-provided [pretrained sparse embedding models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models), we recommend setting `max_token_score` to 2 for `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` and to 3.5 for `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1`.
#### Example request

View File

@ -31,7 +31,7 @@ Field | Data type | Required/Optional | Description
:--- | :--- | :---
`query_text` | String | Optional | The query text from which to generate vector embeddings. You must specify at least one `query_text` or `query_image`.
`query_image` | String | Optional | A base-64 encoded string that corresponds to the query image from which to generate vector embeddings. You must specify at least one `query_text` or `query_image`.
`model_id` | String | Required if the default model ID is not set. For more information, see [Setting a default model on an index or field]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/#setting-a-default-model-on-an-index-or-field). | The ID of the model that will be used to generate vector embeddings from the query text. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
`model_id` | String | Required if the default model ID is not set. For more information, see [Setting a default model on an index or field]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/#setting-a-default-model-on-an-index-or-field). | The ID of the model that will be used to generate vector embeddings from the query text. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/).
`k` | Integer | Optional | The number of results returned by the k-NN search. Default is 10.
#### Example request

View File

@ -2,6 +2,7 @@
layout: default
title: Asynchronous search
nav_order: 51
parent: Improving search performance
has_children: true
redirect_from:
- /search-plugins/async/

View File

@ -3,6 +3,7 @@ layout: default
title: Asynchronous search security
nav_order: 2
parent: Asynchronous search
grand_parent: Improving search performance
has_children: false
redirect_from:
- /search-plugins/async/security/

View File

@ -2,6 +2,7 @@
layout: default
title: Settings
parent: Asynchronous search
grand_parent: Improving search performance
nav_order: 4
---

View File

@ -1,6 +1,7 @@
---
layout: default
title: Concurrent segment search
parent: Improving search performance
nav_order: 53
---

View File

@ -2,7 +2,10 @@
layout: default
title: Conversational search
has_children: false
nav_order: 200
nav_order: 70
redirect_from:
- /ml-commons-plugin/conversational-search/
- /search-plugins/search-methods/conversational-search/
---
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://forum.opensearch.org/t/feedback-conversational-search-and-retrieval-augmented-generation-using-search-pipeline-experimental-release/16073).
@ -241,7 +244,7 @@ PUT /_cluster/settings
### Connecting the model
RAG requires an LLM in order to function. We recommend using a [connector]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/).
RAG requires an LLM in order to function. We recommend using a [connector]({{site.url}}{{site.baseurl}}ml-commons-plugin/remote-models/connectors/).
Use the following steps to set up an HTTP connector using the OpenAI GPT 3.5 model:
@ -406,6 +409,6 @@ If your LLM includes a set token limit, set the `size` field in your OpenSearch
## Next steps
- To learn more about connecting to models on external platforms, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/connectors/).
- To learn more about using custom models within your OpenSearch cluster, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
- To learn more about connecting to models on external platforms, see [Connectors]({{site.url}}{{site.baseurl}}ml-commons-plugin/remote-models/connectors/).
- To learn more about using custom models within your OpenSearch cluster, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).

View File

@ -0,0 +1,220 @@
---
layout: default
title: Hybrid search
has_children: false
nav_order: 40
redirect_from:
- /search-plugins/search-methods/hybrid-search/
---
# Hybrid search
Introduced 2.11
{: .label .label-purple }
Hybrid search combines keyword and neural search to improve search relevance. To implement hybrid search, you need to set up a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline you'll configure intercepts search results at an intermediate stage and applies the [`normalization_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) to them. The `normalization_processor` normalizes and combines the document scores from multiple query clauses, rescoring the documents according to the chosen normalization and combination techniques.
**PREREQUISITE**<br>
Before using hybrid search, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
## Using hybrid search
To use hybrid search, follow these steps:
1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline).
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
1. [Configure a search pipeline](#step-4-configure-a-search-pipeline).
1. [Search the index using hybrid search](#step-5-search-the-index-using-hybrid-search).
## Step 1: Create an ingest pipeline
To generate vector embeddings, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains a [`text_embedding` processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/text-embedding/), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings.
The following example request creates an ingest pipeline that converts the text from `passage_text` to text embeddings and stores the embeddings in `passage_embedding`:
```json
PUT /_ingest/pipeline/nlp-ingest-pipeline
{
"description": "A text embedding pipeline",
"processors": [
{
"text_embedding": {
"model_id": "bQ1J8ooBpBj3wT4HVUsb",
"field_map": {
"passage_text": "passage_embedding"
}
}
}
]
}
```
{% include copy-curl.html %}
## Step 2: Create an index for ingestion
In order to use the text embedding processor defined in your pipeline, create a k-NN index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as a k-NN vector with a dimension that matches the model dimension. Similarly, the `passage_text` field should be mapped as `text`.
The following example request creates a k-NN index that is set up with a default ingest pipeline:
```json
PUT /my-nlp-index
{
"settings": {
"index.knn": true,
"default_pipeline": "nlp-ingest-pipeline"
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"passage_embedding": {
"type": "knn_vector",
"dimension": 768,
"method": {
"engine": "lucene",
"space_type": "l2",
"name": "hnsw",
"parameters": {}
}
},
"passage_text": {
"type": "text"
}
}
}
}
```
{% include copy-curl.html %}
For more information about creating a k-NN index and using supported methods, see [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/).
## Step 3: Ingest documents into the index
To ingest documents into the index created in the previous step, send the following requests:
```json
PUT /my-nlp-index/_doc/1
{
"passage_text": "Hello world",
"id": "s1"
}
```
{% include copy-curl.html %}
```json
PUT /my-nlp-index/_doc/2
{
"passage_text": "Hi planet",
"id": "s2"
}
```
{% include copy-curl.html %}
Before the document is ingested into the index, the ingest pipeline runs the `text_embedding` processor on the document, generating text embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings.
## Step 4: Configure a search pipeline
To configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/), use the following request. The normalization technique in the processor is set to `min_max`, and the combination technique is set to `arithmetic_mean`. The `weights` array specifies the weights assigned to each query clause as decimal percentages:
```json
PUT /_search/pipeline/nlp-search-pipeline
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean",
"parameters": {
"weights": [
0.3,
0.7
]
}
}
}
}
]
}
```
{% include copy-curl.html %}
## Step 5: Search the index using hybrid search
To perform hybrid search on your index, use the [`hybrid` query]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/), which combines the results of keyword and semantic search.
The following example request combines two query clauses---a neural query and a `match` query. It specifies the search pipeline created in the previous step as a query parameter:
```json
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
"_source": {
"exclude": [
"passage_embedding"
]
},
"query": {
"hybrid": {
"queries": [
{
"match": {
"text": {
"query": "Hi world"
}
}
},
{
"neural": {
"passage_embedding": {
"query_text": "Hi world",
"model_id": "aVeif4oB5Vm0Tdw8zYO2",
"k": 5
}
}
}
]
}
}
}
```
{% include copy-curl.html %}
Alternatively, you can set a default search pipeline for the `my-nlp-index` index. For more information, see [Default search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/#default-search-pipeline).
The response contains the matching document:
```json
{
"took" : 36,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2251667,
"hits" : [
{
"_index" : "my-nlp-index",
"_id" : "1",
"_score" : 1.2251667,
"_source" : {
"passage_text" : "Hello world",
"id" : "s1"
}
}
]
}
}
```

View File

@ -0,0 +1,14 @@
---
layout: default
title: Improving search performance
nav_order: 220
has_children: true
---
# Improving search performance
OpenSearch offers several ways to improve search performance:
- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).
- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).

View File

@ -9,24 +9,75 @@ nav_exclude: true
# Search
OpenSearch provides several features for customizing your search use cases and improving search relevance. In OpenSearch, you can:
OpenSearch provides many features for customizing your search use cases and improving search relevance.
- Use [SQL and Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) as alternatives to [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/index/) for searching data.
## Search methods
- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).
OpenSearch supports the following search methods:
- Search for k-nearest neighbors with [k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/).
- **Traditional lexical search**
- Abstract OpenSearch queries into [search templates]({{site.url}}{{site.baseurl}}/search-plugins/search-template/).
- [Keyword (BM25) search]({{site.url}}{{site.baseurl}}/search-plugins/keyword-search/): Searches the document corpus for words that appear in the query.
- Integrate machine learning (ML) language models into your search workloads with [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/).
- **Machine learning (ML)-powered search**
- [Compare search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/) to tune search relevance.
- **Vector search**
- Use a dataset that is fixed in time to paginate results with [Point in Time]({{site.url}}{{site.baseurl}}/search-plugins/point-in-time/).
- [k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/): Searches for k-nearest neighbors to a search term across an index of vectors.
- [Paginate]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/) and [sort]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/sort/) search results, [highlight]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/highlight/) search terms, and use the [autocomplete]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/autocomplete/) and [did-you-mean]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/did-you-mean/) functionality.
- **Neural search**: [Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) facilitates generating vector embeddings at ingestion time and searching them at search time. Neural search lets you integrate ML models into your search and serves as a framework for implementing other search methods. The following search methods are built on top of neural search:
- Rewrite queries with [Querqy]({{site.url}}{{site.baseurl}}/search-plugins/querqy/).
- [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/): Considers the meaning of the words in the search context. Uses dense retrieval based on text embedding models to search text data.
- Process search queries and search results with [search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/).
- [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/): Uses multimodal embedding models to search text and image data.
- [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/sparse-search/): Uses sparse retrieval based on sparse embedding models to search text data.
- [Hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/): Combines traditional search and vector search to improve search relevance.
- [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/): Implements a retrieval-augmented generative search.
## Query languages
In OpenSearch, you can use the following query languages to search your data:
- [Query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/index/): The primary OpenSearch query language that supports creating complex, fully customizable queries.
- [Query string query language]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/): A scaled-down query language that you can use in a query parameter of a search request or in OpenSearch Dashboards.
- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql/index/): A traditional query language that bridges the gap between traditional relational database concepts and the flexibility of OpenSearchs document-oriented data storage.
- [Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/): The primary language used with observability in OpenSearch. PPL uses a pipe syntax that chains commands into a query.
- [Dashboards Query Language (DQL)]({{site.url}}{{site.baseurl}}/dashboards/dql/): A simple text-based query language for filtering data in OpenSearch Dashboards.
## Search performance
OpenSearch offers several ways to improve search performance:
- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/): Runs resource-intensive queries asynchronously.
- [Concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/): Searches segments concurrently.
## Search relevance
OpenSearch provides the following search relevance features:
- [Compare Search Results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/compare-search-results/): A search comparison tool in OpenSearch Dashboards that you can use to compare results from two queries side by side.
- [Querqy]({{site.url}}{{site.baseurl}}/search-plugins/querqy/): Offers query rewriting capability.
## Search results
OpenSearch supports the following commonly used operations on search results:
- [Paginate]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/)
- [Paginate with Point in Time]({{site.url}}{{site.baseurl}}/search-plugins/point-in-time/)
- [Sort]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/sort/)
- [Highlight search terms]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/highlight/)
- [Autocomplete]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/autocomplete/)
- [Did-you-mean]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/did-you-mean/)
## Search pipelines
You can process search queries and search results with [search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/).

View File

@ -0,0 +1,183 @@
---
layout: default
title: Keyword search
has_children: false
nav_order: 10
redirect_from:
- /search-plugins/search-methods/keyword-search/
---
# Keyword search
By default, OpenSearch calculates document scores using the [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) algorithm. BM25 is a keyword-based algorithm that performs lexical search for words that appear in the query.
When determining a document's relevance, BM25 considers [term frequency/inverse document frequency (TF/IDF)](https://en.wikipedia.org/wiki/Tf%E2%80%93idf):
- _Term frequency_ stipulates that documents in which the search term appears more frequently are more relevant.
- _Inverse document frequency_ gives less weight to the words that commonly appear in all documents in the corpus (for example, articles like "the").
## Example
The following example query searches for the words `long live king` in the `shakespeare` index:
```json
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": "long live king"
}
}
}
```
{% include copy-curl.html %}
The response contains the matching documents, each with a relevance score in the `_score` field:
```json
{
"took": 113,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2352,
"relation": "eq"
},
"max_score": 18.781435,
"hits": [
{
"_index": "shakespeare",
"_id": "32437",
"_score": 18.781435,
"_source": {
"type": "line",
"line_id": 32438,
"play_name": "Hamlet",
"speech_number": 3,
"line_number": "1.1.3",
"speaker": "BERNARDO",
"text_entry": "Long live the king!"
}
},
{
"_index": "shakespeare",
"_id": "83798",
"_score": 16.523308,
"_source": {
"type": "line",
"line_id": 83799,
"play_name": "Richard III",
"speech_number": 42,
"line_number": "3.7.242",
"speaker": "BUCKINGHAM",
"text_entry": "Long live Richard, Englands royal king!"
}
},
{
"_index": "shakespeare",
"_id": "82994",
"_score": 15.588365,
"_source": {
"type": "line",
"line_id": 82995,
"play_name": "Richard III",
"speech_number": 24,
"line_number": "3.1.80",
"speaker": "GLOUCESTER",
"text_entry": "live long."
}
},
{
"_index": "shakespeare",
"_id": "7199",
"_score": 15.586321,
"_source": {
"type": "line",
"line_id": 7200,
"play_name": "Henry VI Part 2",
"speech_number": 12,
"line_number": "2.2.64",
"speaker": "BOTH",
"text_entry": "Long live our sovereign Richard, Englands king!"
}
}
...
]
}
}
```
## Similarity algorithms
The following table lists the supported similarity algorithms.
Algorithm | Description
`BM25` | The default OpenSearch [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) similarity algorithm.
`boolean` | Assigns terms a score equal to their boost value. Use `boolean` similarity when you want the document scores to be based on the binary value of whether the terms match.
## Specifying similarity
You can specify the similarity algorithm in the `similarity` parameter when configuring mappings at the field level.
For example, the following query specifies the `boolean` similarity for the `boolean_field`. The `bm25_field` is assigned the default `BM25` similarity:
```json
PUT /testindex
{
"mappings": {
"properties": {
"bm25_field": {
"type": "text"
},
"boolean_field": {
"type": "text",
"similarity": "boolean"
}
}
}
}
```
{% include copy-curl.html %}
## Configuring BM25 similarity
You can configure BM25 similarity parameters at the index level as follows:
```json
PUT /testindex
{
"settings": {
"index": {
"similarity": {
"custom_similarity": {
"type": "BM25",
"k1": 1.2,
"b": 0.75,
"discount_overlaps": "true"
}
}
}
}
}
```
`BM25` similarity supports the following parameters.
Parameter | Data type | Description
`k1` | Float | Determines non-linear term frequency normalization (saturation) properties. The default value is `1.2`.
`b` | Float | Determines the degree to which document length normalizes TF values. The default value is `0.75`.
`discount_overlaps` | Boolean | Determines whether overlap tokens (tokens with zero position increment) are ignored when computing the norm. Default is `true` (overlap tokens do not count when computing the norm).
---
## Next steps
- Learn about [query and filter context]({{site.url}}{{site.baseurl}}/query-dsl/query-filter-context/).
- Learn about the [query types]({{site.url}}{{site.baseurl}}/query-dsl/index/) OpenSearch supports.

View File

@ -2,7 +2,8 @@
layout: default
title: API
nav_order: 30
parent: k-NN
parent: k-NN search
grand_parent: Search methods
has_children: false
---

View File

@ -2,7 +2,8 @@
layout: default
title: Approximate k-NN search
nav_order: 15
parent: k-NN
parent: k-NN search
grand_parent: Search methods
has_children: false
has_math: true
---

View File

@ -2,7 +2,8 @@
layout: default
title: k-NN search with filters
nav_order: 20
parent: k-NN
parent: k-NN search
grand_parent: Search methods
has_children: false
has_math: true
---

View File

@ -1,14 +1,14 @@
---
layout: default
title: k-NN
nav_order: 50
title: k-NN search
nav_order: 20
has_children: true
has_toc: false
redirect_from:
- /search-plugins/knn/
---
# k-NN
# k-NN search
Short for *k-nearest neighbors*, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. To determine the neighbors, you can specify the space (the distance function) you want to use to measure the distance between points.

View File

@ -2,7 +2,8 @@
layout: default
title: JNI libraries
nav_order: 35
parent: k-NN
parent: k-NN search
grand_parent: Search methods
has_children: false
redirect_from:
- /search-plugins/knn/jni-library/

View File

@ -2,7 +2,8 @@
layout: default
title: k-NN index
nav_order: 5
parent: k-NN
parent: k-NN search
grand_parent: Search methods
has_children: false
---

View File

@ -2,7 +2,8 @@
layout: default
title: Exact k-NN with scoring script
nav_order: 10
parent: k-NN
parent: k-NN search
grand_parent: Search methods
has_children: false
has_math: true
---

View File

@ -2,7 +2,8 @@
layout: default
title: k-NN Painless extensions
nav_order: 25
parent: k-NN
parent: k-NN search
grand_parent: Search methods
has_children: false
has_math: true
---

View File

@ -1,7 +1,8 @@
---
layout: default
title: Performance tuning
parent: k-NN
parent: k-NN search
grand_parent: Search methods
nav_order: 45
---

View File

@ -1,7 +1,8 @@
---
layout: default
title: Settings
parent: k-NN
parent: k-NN search
grand_parent: Search methods
nav_order: 40
---

View File

@ -1,9 +1,11 @@
---
layout: default
title: Multimodal search
nav_order: 20
nav_order: 60
has_children: false
parent: Neural search
redirect_from:
- /search-plugins/neural-multimodal-search/
- /search-plugins/search-methods/multimodal-search/
---
# Multimodal search
@ -13,7 +15,7 @@ Introduced 2.11
Use multimodal search to search text and image data. In neural search, text search is facilitated by multimodal embedding models.
**PREREQUISITE**<br>
Before using text search, you must set up a multimodal embedding model. For more information, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/index/).
Before using text search, you must set up a multimodal embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
## Using multimodal search

View File

@ -1,15 +1,17 @@
---
layout: default
title: Semantic search
title: Neural search tutorial
has_children: false
nav_order: 140
nav_order: 30
redirect_from:
- /ml-commons-plugin/semantic-search/
---
# Semantic search
# Neural search tutorial
By default, OpenSearch calculates document scores using the [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) algorithm. BM25 is a keyword-based algorithm that performs well on queries containing keywords but fails to capture the semantic meaning of the query terms. Semantic search, unlike keyword-based search, takes into account the meaning of the query in the search context. Thus, semantic search performs well when a query requires natural language understanding.
In this tutorial, you'll learn how to:
In this tutorial, you'll learn how to use neural search to:
- Implement semantic search in OpenSearch.
- Implement hybrid search by combining semantic and keyword search to improve search relevance.
@ -18,7 +20,6 @@ In this tutorial, you'll learn how to:
It's helpful to understand the following terms before starting this tutorial:
- _Semantic search_: Employs neural search in order to determine the intention of the user's query in the search context and improve search relevance.
- _Neural search_: Facilitates vector search at ingestion time and at search time:
- At ingestion time, neural search uses language models to generate vector embeddings from the text fields in the document. The documents containing both the original text field and the vector embedding of the field are then indexed in a k-NN index, as shown in the following diagram.
@ -26,6 +27,9 @@ It's helpful to understand the following terms before starting this tutorial:
- At search time, when you then use a _neural query_, the query text is passed through a language model, and the resulting vector embeddings are compared with the document text vector embeddings to find the most relevant results, as shown in the following diagram.
![Neural search at search time diagram]({{site.url}}{{site.baseurl}}/images/neural-search-query.png)
- _Semantic search_: Employs neural search in order to determine the intention of the user's query in the search context, thereby improving search relevance.
- _Hybrid search_: Combines semantic and keyword search to improve search relevance.
## OpenSearch components for semantic search
@ -65,9 +69,9 @@ PUT _cluster/settings
#### Advanced
For a more advanced setup, note the following requirements:
For a [custom local model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/) setup, note the following requirements:
- To register a custom model, you need to specify an additional `"allow_registering_model_via_url": "true"` cluster setting.
- To register a custom local model, you need to specify an additional `"allow_registering_model_via_url": "true"` cluster setting.
- In production, it's best practice to separate the workloads by having dedicated ML nodes. On clusters with dedicated ML nodes, specify `"only_run_on_ml_node": "true"` for improved performance.
For more information about ML-related cluster settings, see [ML Commons cluster settings]({{site.url}}{{site.baseurl}}/ml-commons-plugin/cluster-settings/).
@ -110,13 +114,21 @@ For this tutorial, you'll use the [DistilBERT](https://huggingface.co/docs/trans
- The model version is `1.0.1`.
- The number of dimensions for this model is `768`.
#### Advanced: Using a different model
Alternatively, you can choose to use one of the [pretrained language models provided by OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/) or your own custom model. For information about choosing a model, see [Further reading](#further-reading). For instructions on how to set up a custom model, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
Take note of the dimensionality of the model because you'll need it when you set up a k-NN index.
{: .important}
#### Advanced: Using a different model
Alternatively, you can choose one of the following options for your model:
- Use any other pretrained model provided by OpenSearch. For more information, see [OpenSearch-provided pretrained models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/).
- Upload your own model to OpenSearch. For more information, see [Custom local models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/).
- Connect to a foundation model hosted on an external platform. For more information, see [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/).
For information about choosing a model, see [Further reading](#further-reading).
### Step 1(b): Register a model group
For access control, models are organized into model groups (collections of versions of a particular model). Each model group name in the cluster must be globally unique. Registering a model group ensures the uniqueness of the model group name.
@ -314,25 +326,28 @@ To register a custom model, you must provide a model configuration in the regist
```json
POST /_plugins/_ml/models/_register
{
"name": "huggingface/sentence-transformers/msmarco-distilbert-base-tas-b",
"version": "1.0.1",
"name": "sentence-transformers/msmarco-distilbert-base-tas-b",
"version": "1.0.1",
"description": "This is a port of the DistilBert TAS-B Model to sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and is optimized for the task of semantic search.",
"model_task_type": "TEXT_EMBEDDING",
"model_format": "ONNX",
"model_content_size_in_bytes": 266291330,
"model_content_hash_value": "a3c916f24239fbe32c43be6b24043123d49cd2c41b312fc2b29f2fc65e3c424c",
"model_config": {
"model_type": "distilbert",
"embedding_dimension": 768,
"framework_type": "huggingface_transformers",
"pooling_mode": "CLS",
"normalize_result": false,
"all_config": "{\"_name_or_path\":\"old_models/msmarco-distilbert-base-tas-b/0_Transformer\",\"activation\":\"gelu\",\"architectures\":[\"DistilBertModel\"],\"attention_dropout\":0.1,\"dim\":768,\"dropout\":0.1,\"hidden_dim\":3072,\"initializer_range\":0.02,\"max_position_embeddings\":512,\"model_type\":\"distilbert\",\"n_heads\":12,\"n_layers\":6,\"pad_token_id\":0,\"qa_dropout\":0.1,\"seq_classif_dropout\":0.2,\"sinusoidal_pos_embds\":false,\"tie_weights_\":true,\"transformers_version\":\"4.7.0\",\"vocab_size\":30522}"
},
"created_time": 1676074079195,
"model_group_id": "Z1eQf4oB5Vm0Tdw8EIP2",
"description": "This is a port of the DistilBert TAS-B Model to sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and is optimized for the task of semantic search.",
"model_task_type": "TEXT_EMBEDDING",
"model_format": "TORCH_SCRIPT",
"model_content_size_in_bytes": 266352827,
"model_content_hash_value": "acdc81b652b83121f914c5912ae27c0fca8fabf270e6f191ace6979a19830413",
"model_config": {
"model_type": "distilbert",
"embedding_dimension": 768,
"framework_type": "sentence_transformers",
"all_config": """{"_name_or_path":"old_models/msmarco-distilbert-base-tas-b/0_Transformer","activation":"gelu","architectures":["DistilBertModel"],"attention_dropout":0.1,"dim":768,"dropout":0.1,"hidden_dim":3072,"initializer_range":0.02,"max_position_embeddings":512,"model_type":"distilbert","n_heads":12,"n_layers":6,"pad_token_id":0,"qa_dropout":0.1,"seq_classif_dropout":0.2,"sinusoidal_pos_embds":false,"tie_weights_":true,"transformers_version":"4.7.0","vocab_size":30522}"""
},
"created_time": 1676073973126
"url": "https://artifacts.opensearch.org/models/ml-models/huggingface/sentence-transformers/msmarco-distilbert-base-tas-b/1.0.1/onnx/sentence-transformers_msmarco-distilbert-base-tas-b-1.0.1-onnx.zip"
}
```
For more information, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
For more information, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
### Step 1(d): Deploy the model

View File

@ -1,8 +1,8 @@
---
layout: default
title: Neural search
nav_order: 200
has_children: true
nav_order: 25
has_children: false
has_toc: false
redirect_from:
- /neural-search-plugin/index/
@ -16,20 +16,39 @@ Before you ingest documents into an index, documents are passed through a machin
## Prerequisite
Before using neural search, you must set up an ML model. You can either use a pretrained model provided by OpenSearch, upload your own model to OpenSearch, or connect to a foundation model hosted on an external platform. For more information about ML models, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/index/). For a step-by-step tutorial, see [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
Before using neural search, you must set up an ML model. When selecting a model, you have the following options:
## Set up neural search
- Use a pretrained model provided by OpenSearch. For more information, see [OpenSearch-provided pretrained models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/).
Once you set up an ML model, choose one of the following neural search types to learn how to use your model for neural search.
- Upload your own model to OpenSearch. For more information, see [Custom local models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/).
### Neural text search
- Connect to a foundation model hosted on an external platform. For more information, see [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/).
Neural text search uses dense retrieval based on text embedding models to search text data. For detailed setup steps, see [Text search]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/).
## Tutorial
For a step-by-step tutorial, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/).
## Using an ML model for neural search
Once you set up an ML model, choose one of the following search methods to use your model for neural search.
### Semantic search
Semantic search uses dense retrieval based on text embedding models to search text data. For detailed setup instructions, see [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/).
### Hybrid search
Hybrid search combines keyword and neural search to improve search relevance. For detailed setup instructions, see [Hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/search-methods/hybrid-search/).
### Multimodal search
Multimodal search uses multimodal embedding models to search text and image data. For detailed setup steps, see [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/neural-multimodal-search/).
Multimodal search uses neural search with multimodal embedding models to search text and image data. For detailed setup instructions, see [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/search-methods/multimodal-search/).
### Neural sparse search
### Sparse search
Neural sparse search uses sparse retrieval based on sparse embedding models to search text data. For detailed setup steps, see [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
Sparse search uses neural search with sparse retrieval based on sparse embedding models to search text data. For detailed setup instructions, see [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
### Conversational search
With conversational search, you can ask questions in natural language, receive a text response, and ask additional clarifying questions. For detailed setup instructions, see [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/search-methods/conversational-search/).

View File

@ -1,6 +1,7 @@
---
layout: default
title: Querqy
parent: Search relevance
has_children: false
redirect_from:
- /search-plugins/querqy/

View File

@ -9,7 +9,7 @@ grand_parent: Search pipelines
# Neural query enricher processor
The `neural_query_enricher` search request processor is designed to set a default machine learning (ML) model ID at the index or field level for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) queries. To learn more about ML models, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/index/).
The `neural_query_enricher` search request processor is designed to set a default machine learning (ML) model ID at the index or field level for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) queries. To learn more about ML models, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Connecting to remote models]({{site.url}}{{site.baseurl}}ml-commons-plugin/remote-models/index/).
## Request fields

View File

@ -38,7 +38,9 @@ Field | Data type | Description
## Example
The following example demonstrates using a search pipeline with a `normalization-processor`. To try out this example, follow the [Semantic search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial).
The following example demonstrates using a search pipeline with a `normalization-processor`.
For a comprehensive example, follow the [Neural search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial).
### Creating a search pipeline
@ -108,7 +110,7 @@ GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
```
{% include copy-curl.html %}
For more information, see [Hybrid query]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/).
For more information about setting up hybrid search, see [Using hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/#using-hybrid-search).
## Search tuning recommendations

View File

@ -0,0 +1,168 @@
---
layout: default
title: Compare Search Results
nav_order: 55
parent: Search relevance
has_children: true
has_toc: false
redirect_from:
- /search-plugins/search-relevance/
---
# Compare Search Results
With Compare Search Results in OpenSearch Dashboards, you can compare results from two queries side by side to determine whether one query produces better results than the other. Using this tool, you can evaluate search quality by experimenting with queries.
For example, you can see how results change when you apply one of the following query changes:
- Weighting fields differently
- Different stemming or lemmatization strategies
- Shingling
## Prerequisites
Before you get started, you must index data in OpenSearch. To learn how to create a new index, see [Index data]({{site.url}}{{site.baseurl}}/opensearch/index-data/).
Alternatively, you can add sample data in OpenSearch Dashboards using the following steps:
1. On the top menu bar, go to **OpenSearch Dashboards > Overview**.
1. Select **View app directory**.
1. Select **Add sample data**.
1. Choose one of the built-in datasets and select **Add data**.
## Using Compare Search Results in OpenSearch Dashboards
To compare search results in OpenSearch Dashboards, perform the following steps.
**Step 1:** On the top menu bar, go to **OpenSearch Plugins > Search Relevance**.
**Step 2:** Enter the search text in the search bar.
**Step 3:** Select an index for **Query 1** and enter a query (request body only) in [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). The `GET` HTTP method and the `_search` endpoint are implicit. Use the `%SearchText%` variable to refer to the text in the search bar.
The following is an example query:
```json
{
"query": {
"multi_match": {
"query": "%SearchText%",
"fields": [ "description", "item_name" ]
}
}
}
```
**Step 4:** Select an index for **Query 2** and enter a query (request body only).
The following example query boosts the `title` field in the search results:
```json
{
"query": {
"multi_match": {
"query": "%SearchText%",
"fields": [ "description", "item_name^3" ]
}
}
}
```
**Step 5:** Select **Search** and compare **Result 1** and **Result 2**.
The following example screen shows a search for the word "cup" in the `description` and `item_name` fields with and without boosting the `item_name`.
<img src="{{site.url}}{{site.baseurl}}/images/search_relevance.png" alt="Compare search results"/>{: .img-fluid }
If a result in Result 1 appears in Result 2, the `Up` and `Down` indicators below the result number signify how many places the result moved up or down compared to the same result in Result 2. In this example, the document with the ID 2 is `Up 1` place in Result 2 compared to Result 1 and `Down 1` place in Result 1 compared to Result 2.
## Changing the number of results
By default, OpenSearch returns the top 10 results. To change the number of returned results to a different value, specify the `size` parameter in the query:
```json
{
"size": 15,
"query": {
"multi_match": {
"query": "%SearchText%",
"fields": [ "title^3", "text" ]
}
}
}
```
Setting `size` to a high value (for example, larger than 250 documents) may degrade performance.
{: .note}
You cannot save a given comparison for future use, so Compare Search Results is not suitable for systematic testing.
{: .note}
## Comparing OpenSearch search results with reranked results
One use case for Compare Search Results is the comparison of raw OpenSearch results with the same results processed by a reranking application. OpenSearch currently integrates with the following two rerankers:
- [Amazon Kendra Intelligent Ranking for OpenSearch](#reranking-results-with-amazon-kendra-intelligent-ranking-for-opensearch)
- [Amazon Personalize Search Ranking](#personalizing-search-results-with-amazon-personalize-search-ranking)
### Reranking results with Amazon Kendra Intelligent Ranking for OpenSearch
An example of a reranker is **Amazon Kendra Intelligent Ranking for OpenSearch**, contributed by the Amazon Kendra team. This plugin takes search results from OpenSearch and applies Amazon Kendras semantic relevance rankings calculated using vector embeddings and other semantic search techniques. For many applications, this provides better result rankings.
To try Amazon Kendra Intelligent Ranking, you must first set up the Amazon Kendra service. To get started, see [Amazon Kendra](https://aws.amazon.com/kendra/). For detailed information, including plugin setup instructions, see [Amazon Kendra Intelligent Ranking for self-managed OpenSearch](https://docs.aws.amazon.com/kendra/latest/dg/opensearch-rerank.html).
### Comparing search results with reranked results in OpenSearch Dashboards
To compare search results with reranked results in OpenSearch Dashboards, enter a query in **Query 1** and enter the same query using a reranker in **Query 2**. Then compare the OpenSearch results with the reranked results.
The following example demonstrates searching for the text "snacking nuts" in the `abo` index. The documents in the index contain snack descriptions in the `bullet_point` array.
<img src="{{site.url}}{{site.baseurl}}/images/kendra_query.png" alt="OpenSearch Intelligent Ranking query"/>{: .img-fluid }
1. Enter `snacking nuts` in the search bar.
1. Enter the following query, which searches the `bullet_point` field for the search text "snacking nuts", in **Query 1**:
```json
{
"query": {
"match": {
"bullet_point": "%SearchText%"
}
},
"size": 25
}
```
1. Enter the same query with a reranker in **Query 2**. This example uses Amazon Kendra Intelligent Ranking:
```json
{
"query" : {
"match" : {
"bullet_point": "%SearchText%"
}
},
"size": 25,
"ext": {
"search_configuration":{
"result_transformer" : {
"kendra_intelligent_ranking": {
"order": 1,
"properties": {
"title_field": "item_name",
"body_field": "bullet_point"
}
}
}
}
}
}
```
In the preceding query, `body_field` refers to the body field of the documents in the index, which Amazon Kendra Intelligent Ranking uses to rank the results. The `body_field` is required, while the `title_field` is optional.
1. Select **Search** and compare the results in **Result 1** and **Result 2**.
### Personalizing search results with Amazon Personalize Search Ranking
Another example of a reranker is **Amazon Personalize Search Ranking**, contributed by the Amazon Personalize team. Amazon Personalize uses machine learning (ML) techniques to generate custom recommendations for your users. The plugin takes OpenSearch search results and applies a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) to rerank them according to their Amazon Personalize ranking. The Amazon Personalize rankings are based on the user's past behavior and metadata about the search items and the user. This workflow improves the search experience for your users by personalizing their search results.
To try Amazon Personalize Search Ranking, you must first set up Amazon Personalize. To get started, see [Amazon Personalize](https://docs.aws.amazon.com/personalize/latest/dg/setup.html). For detailed information, including plugin setup instructions, see [Personalizing search results from OpenSearch](https://docs.aws.amazon.com/personalize/latest/dg/personalize-opensearch.html).

View File

@ -1,7 +1,7 @@
---
layout: default
title: Search relevance
nav_order: 55
nav_order: 80
has_children: true
has_toc: false
redirect_from:
@ -10,162 +10,10 @@ redirect_from:
# Search relevance
Search relevance evaluates the accuracy of the search results returned by a query. The higher the relevance, the better the search engine. Compare Search Results is the first search relevance feature in OpenSearch.
Search relevance evaluates the accuracy of the search results returned by a query. The higher the relevance, the better the search engine.
## Compare Search Results
OpenSearch provides the following search relevance features:
Compare Search Results in OpenSearch Dashboards lets you compare results from two queries side by side to determine whether one query produces better results than the other. Using this tool, you can evaluate search quality by experimenting with queries.
- [Compare Search Results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/compare-search-results/) in OpenSearch Dashboards lets you compare results from two queries side by side.
For example, you can see how results change when you apply one of the following query changes:
- Weighting different fields differently
- Different stemming or lemmatization strategies
- Shingling
## Prerequisites
Before you get started, you must index data in OpenSearch. To learn how to create a new index, see [Index data]({{site.url}}{{site.baseurl}}/opensearch/index-data).
Alternatively, you can add sample data in OpenSearch Dashboards using the following steps:
1. On the top menu bar, go to **OpenSearch Dashboards > Overview**.
1. Select **View app directory**.
1. Select **Add sample data**.
1. Choose one of the built-in datasets and select **Add data**.
## Using Compare Search Results in OpenSearch Dashboards
To compare search results in OpenSearch Dashboards, perform the following steps.
**Step 1:** On the top menu bar, go to **OpenSearch Plugins > Search Relevance**.
**Step 2:** Enter the search text in the search bar.
**Step 3:** Select an index for **Query 1** and enter a query (request body only) in [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl). The `GET` HTTP method and the `_search` endpoint are implicit. Use the `%SearchText%` variable to refer to the text in the search bar.
The following is an example query:
```json
{
"query": {
"multi_match": {
"query": "%SearchText%",
"fields": [ "description", "item_name" ]
}
}
}
```
**Step 4:** Select an index for **Query 2** and enter a query (request body only).
The following example query boosts the `title` field in search results:
```json
{
"query": {
"multi_match": {
"query": "%SearchText%",
"fields": [ "description", "item_name^3" ]
}
}
}
```
**Step 5:** Select **Search** and compare the results in **Result 1** and **Result 2**.
The following example screen shows a search for the word "cup" in the `description` and `item_name` fields with and without boosting the `item_name`:
<img src="{{site.url}}{{site.baseurl}}/images/search_relevance.png" alt="Compare search results"/>{: .img-fluid }
If a result in Result 1 appears in Result 2, the `Up` and `Down` indicators below the result number signify how many places the result moved up or down compared to the same result in Result 2. In this example, the document with the ID 2 is `Up 1` place in Result 2 compared to Result 1 and `Down 1` place in Result 1 compared to Result 2.
## Changing the number of results
By default, OpenSearch returns the top 10 results. To change the number of returned results to a different value, specify the `size` parameter in the query:
```json
{
"size": 15,
"query": {
"multi_match": {
"query": "%SearchText%",
"fields": [ "title^3", "text" ]
}
}
}
```
Setting `size` to a high value (for example, larger than 250 documents) may degrade performance.
{: .note}
You cannot save a given comparison for future use, so Compare Search Results is not suitable for systematic testing.
{: .note}
## Comparing OpenSearch search results with reranked results
One use case for Compare Search Results is the comparison of raw OpenSearch results with the same results processed by a reranking application. OpenSearch currently integrates with the following two rerankers:
- [Kendra Intelligent Ranking for OpenSearch](#reranking-results-with-kendra-intelligent-ranking-for-opensearch)
- [Amazon Personalize Search Ranking](#personalizing-search-results-with-amazon-personalize-search-ranking)
### Reranking results with Kendra Intelligent Ranking for OpenSearch
An example of a reranker is **Kendra Intelligent Ranking for OpenSearch**, contributed by the Amazon Kendra team. This plugin takes search results from OpenSearch and applies Amazon Kendras semantic relevance rankings calculated using vector embeddings and other semantic search techniques. For many applications, this provides better result rankings.
To try Kendra Intelligent Ranking, you must first set up the Amazon Kendra service. To get started, see [Amazon Kendra](https://aws.amazon.com/kendra/). For detailed information, including plugin setup instructions, see [Intelligently ranking OpenSearch (self managed) results using Amazon Kendra](https://docs.aws.amazon.com/kendra/latest/dg/opensearch-rerank.html).
### Comparing search results with reranked results in OpenSearch Dashboards
To compare search results with reranked results in OpenSearch Dashboards, enter a query in **Query 1** and enter the same query using a reranker in **Query 2**. Then compare the search results from OpenSearch with reranked results.
The following example searches for the text "snacking nuts" in the `abo` index. The documents in the index contain snack descriptions in the `bullet_point` array.
<img src="{{site.url}}{{site.baseurl}}/images/kendra_query.png" alt="OpenSearch Intelligent Ranking query"/>{: .img-fluid }
1. Enter `snacking nuts` in the search bar.
1. Enter the following query, which searches the `bullet_point` field for the search text "snacking nuts", in **Query 1**:
```json
{
"query": {
"match": {
"bullet_point": "%SearchText%"
}
},
"size": 25
}
```
1. Enter the same query with a reranker in **Query 2**. This example uses Kendra Intelligent Ranking:
```json
{
"query" : {
"match" : {
"bullet_point": "%SearchText%"
}
},
"size": 25,
"ext": {
"search_configuration":{
"result_transformer" : {
"kendra_intelligent_ranking": {
"order": 1,
"properties": {
"title_field": "item_name",
"body_field": "bullet_point"
}
}
}
}
}
}
```
In the preceding query, `body_field` refers to the body field of the documents in the index, which Kendra Intelligent Ranking uses to rank the results. The `body_field` is required, while the `title_field` is optional.
1. Select **Search** and compare the results in **Result 1** and **Result 2**.
### Personalizing search results with Amazon Personalize Search Ranking
Another example of a reranker is **Amazon Personalize Search Ranking**, contributed by the Amazon Personalize team. Amazon Personalize uses machine learning (ML) techniques to generate custom recommendations for your users. The plugin takes search results from OpenSearch and applies a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) to rerank them according to their Amazon Personalize ranking. The Amazon Personalize rankings are based on the user's past behavior and metadata about the search items and the user. This workflow improves the search experience for your users by personalizing their search results.
To try Amazon Personalize Search Ranking, you must first set up Amazon Personalize. To get started, see [Amazon Personalize](https://docs.aws.amazon.com/personalize/latest/dg/setup.html). For detailed information, including plugin setup instructions, see [Personalizing search results from OpenSearch (self-managed)](https://docs.aws.amazon.com/personalize/latest/dg/personalize-opensearch.html).
- [Querqy]({{site.url}}{{site.baseurl}}/search-plugins/querqy/) offers query rewriting capability.

View File

@ -2,7 +2,8 @@
layout: default
title: Search Relevance Stats API
nav_order: 65
parent: Search relevance
parent: Compare Search Results
grand_parent: Search relevance
has_children: false
---

View File

@ -2,7 +2,8 @@
layout: default
title: Paginate results
parent: Searching data
nav_order: 21
has_children: true
nav_order: 10
redirect_from:
- /opensearch/search/paginate/
---

View File

@ -4,8 +4,10 @@ title: Point in Time API
nav_order: 59
has_children: false
parent: Point in Time
grand_parent: Searching data
redirect_from:
- /opensearch/point-in-time-api/
- /search-plugins/point-in-time-api/
---
# Point in Time API

View File

@ -1,11 +1,13 @@
---
layout: default
title: Point in Time
nav_order: 58
parent: Searching data
nav_order: 20
has_children: true
has_toc: false
redirect_from:
- /opensearch/point-in-time/
- /search-plugins/point-in-time/
---
# Point in Time

View File

@ -1,22 +1,24 @@
---
layout: default
title: Text search
nav_order: 10
title: Semantic search
nav_order: 35
has_children: false
parent: Neural search
redirect_from:
- /search-plugins/neural-text-search/
- /search-plugins/search-methods/semantic-search/
---
# Neural text search
# Semantic search
Use text search for text data. In neural search, text search is facilitated by text embedding models. Text search creates a dense vector (a list of floats) and ingests data into a k-NN index.
Semantic search considers the context and intent of a query. In OpenSearch, semantic search is facilitated by neural search with text embedding models. Semantic search creates a dense vector (a list of floats) and ingests data into a k-NN index.
**PREREQUISITE**<br>
Before using text search, you must set up a text embedding model. For more information, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/index/).
Before using semantic search, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
## Using text search
## Using semantic search
To use text search, follow these steps:
To use semantic search, follow these steps:
1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline).
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).

View File

@ -1,16 +1,19 @@
---
layout: default
title: Sparse search
nav_order: 30
parent: Search
nav_order: 50
has_children: false
parent: Neural search
redirect_from:
- /search-plugins/neural-sparse-search/
- /search-plugins/search-methods/sparse-search/
---
# Neural sparse search
# Sparse search
Introduced 2.11
{: .label .label-purple }
[Neural text search]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/) relies on dense retrieval that is based on text embedding models. However, dense methods use k-NN search, which consumes a large amount of memory and CPU resources. An alternative to neural text search, sparse neural search is implemented using an inverted index and thus is as efficient as BM25. Sparse search is facilitated by sparse embedding models. When you perform a sparse search, it creates a sparse vector (a list of `token: weight` key-value pairs representing an entry and its weight) and ingests data into a rank features index.
[Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/search-methods/semantic-search/) relies on dense retrieval that is based on text embedding models. However, dense methods use k-NN search, which consumes a large amount of memory and CPU resources. An alternative to semantic search, sparse search is implemented using an inverted index and is thus as efficient as BM25. Sparse search is facilitated by sparse embedding models. When you perform a sparse search, it creates a sparse vector (a list of `token: weight` key-value pairs representing an entry and its weight) and ingests data into a rank features index.
When selecting a model, choose one of the following options:
@ -18,7 +21,7 @@ When selecting a model, choose one of the following options:
- Use a sparse encoding model at ingestion time and a tokenizer model at search time (low performance, relatively low latency).
**PREREQUISITE**<br>
Before using sparse search, make sure to set up a [pretrained sparse embedding model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models) or your own sparse embedding model. For more information, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/extensibility/index/).
Before using sparse search, make sure to set up a [pretrained sparse embedding model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models) or your own sparse embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
## Using sparse search

View File

@ -1,7 +1,7 @@
---
layout: default
title: SQL and PPL
nav_order: 38
nav_order: 230
has_children: true
has_toc: false
redirect_from: