Add documentation for new reranking feature in 2.12 (#6368)

* Create reranking.md

document new reranking feature in 2.12

Signed-off-by: HenryL27 <hmlindeman@yahoo.com>

* Doc review and address comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/search-pipelines/rerank-processor.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/search-pipelines/rerank-processor.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

---------

Signed-off-by: HenryL27 <hmlindeman@yahoo.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
HenryL27 2024-02-13 11:24:08 -08:00 committed by GitHub
parent b6b2ed7fa0
commit 6d884b6db3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 241 additions and 4 deletions

View File

@ -0,0 +1,116 @@
---
layout: default
title: Rerank
nav_order: 25
has_children: false
parent: Search processors
grand_parent: Search pipelines
---
# Rerank processor
The `rerank` search request processor intercepts search results and passes them to a cross-encoder model to be reranked. The model reranks the results, taking into account the scoring context. Then the processor orders documents in the search results based on their new scores.
## Request fields
The following table lists all available request fields.
Field | Data type | Description
:--- | :--- | :---
`<reranker_type>` | Object | The reranker type provides the rerank processor with static information needed across all reranking calls. Required.
`context` | Object | Provides the rerank processor with information necessary for generating reranking context at query time.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
### The `ml_opensearch` reranker type
The `ml_opensearch` reranker type is designed to work with the cross-encoder model provided by OpenSearch. For this reranker type, specify the following fields.
Field | Data type | Description
:--- | :--- | :---
`ml_opensearch` | Object | Provides the rerank processor with model information. Required.
`ml_opensearch.model_id` | String | The model ID for the cross-encoder model. Required. For more information, see [Using ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
`context.document_fields` | Array | An array of document fields that specifies the fields from which to retrieve context for the cross-encoder model. Required.
## Example
The following example demonstrates using a search pipeline with a `rerank` processor.
### Creating a search pipeline
The following request creates a search pipeline with a `rerank` response processor:
```json
PUT /_search/pipeline/rerank_pipeline
{
"response_processors": [
{
"rerank": {
"ml_opensearch": {
"model_id": "gnDIbI0BfUsSoeNT_jAw"
},
"context": {
"document_fields": [ "title", "text_representation"]
}
}
}
]
}
```
{% include copy-curl.html %}
### Using a search pipeline
Combine an OpenSearch query with an `ext` object that contains the query context for the large language model (LLM). Provide the `query_text` that will be used to rerank the results:
```json
POST /_search?search_pipeline=rerank_pipeline
{
"query": {
"match": {
"text_representation": "Where is Albuquerque?"
}
},
"ext": {
"rerank": {
"query_context": {
"query_text": "Where is Albuquerque?"
}
}
}
}
```
{% include copy-curl.html %}
Instead of specifying `query_text`, you can provide a full path to the field containing text to use for reranking. For example, if you specify a subfield `query` in the `text_representation` object, specify its path in the `query_text_path` parameter:
```json
POST /_search?search_pipeline=rerank_pipeline
{
"query": {
"match": {
"text_representation": {
"query": "Where is Albuquerque?"
}
}
},
"ext": {
"rerank": {
"query_context": {
"query_text_path": "query.match.text_representation.query"
}
}
}
}
```
{% include copy-curl.html %}
The `query_context` object contains the following fields.
Field name | Description
:--- | :---
`query_text` | The natural language text of the question that you want to use to rerank the search results. Either `query_text` or `query_text_path` (not both) is required.
`query_text_path` | The full JSON path to the text of the question that you want to use to rerank the search results. Either `query_text` or `query_text_path` (not both) is required. The maximum number of characters in the path is `1000`.
For more information about setting up reranking, see [Reranking search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/).

View File

@ -39,6 +39,7 @@ Processor | Description | Earliest available version
:--- | :--- | :---
[`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9
[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
[`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12
[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12
[`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor. | 2.12

View File

@ -1,6 +1,6 @@
---
layout: default
title: Compare Search Results
title: Comparing search results
nav_order: 55
parent: Search relevance
has_children: true
@ -9,7 +9,7 @@ redirect_from:
- /search-plugins/search-relevance/
---
# Compare Search Results
# Comparing search results
With Compare Search Results in OpenSearch Dashboards, you can compare results from two queries side by side to determine whether one query produces better results than the other. Using this tool, you can evaluate search quality by experimenting with queries.

View File

@ -14,6 +14,8 @@ Search relevance evaluates the accuracy of the search results returned by a quer
OpenSearch provides the following search relevance features:
- [Compare Search Results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/compare-search-results/) in OpenSearch Dashboards lets you compare results from two queries side by side.
- [Comparing search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/compare-search-results/) from two queries side by side in OpenSearch Dashboards.
- [Querqy]({{site.url}}{{site.baseurl}}/search-plugins/querqy/) offers query rewriting capability.
- [Reranking search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/) using a cross-encoder reranker.
- Rewriting queries using [Querqy]({{site.url}}{{site.baseurl}}/search-plugins/querqy/).

View File

@ -0,0 +1,118 @@
---
layout: default
title: Reranking search results
parent: Search relevance
has_children: false
nav_order: 60
---
# Reranking search results
Introduced 2.12
{: .label .label-purple }
You can rerank search results using a cross-encoder reranker in order to improve search relevance. To implement reranking, you need to configure a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline intercepts search results and applies the [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/) to them. The `rerank` processor evaluates the search results and sorts them based on the new scores provided by the cross-encoder model.
**PREREQUISITE**<br>
Before using hybrid search, you must set up a cross-encoder model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
## Running a search with reranking
To run a search with reranking, follow these steps:
1. [Configure a search pipeline](#step-1-configure-a-search-pipeline).
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
1. [Search using reranking](#step-4-search-using-reranking).
## Step 1: Configure a search pipeline
Next, configure a search pipeline with a [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/).
The following example request creates a search pipeline with an `ml_opensearch` rerank processor. In the request, provide a model ID for the cross-encoder model and the document fields to use as context:
```json
PUT /_search/pipeline/my_pipeline
{
"description": "Pipeline for reranking with a cross-encoder",
"response_processors": [
{
"rerank": {
"ml_opensearch": {
"model_id": "gnDIbI0BfUsSoeNT_jAw"
},
"context": {
"document_fields": [
"passage_text"
]
}
}
}
]
}
```
{% include copy-curl.html %}
For more information about the request fields, see [Request fields]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/#request-fields).
## Step 2: Create an index for ingestion
In order to use the rerank processor defined in your pipeline, create an OpenSearch index and add the pipeline created in the previous step as the default pipeline:
```json
PUT /my-index
{
"settings": {
"index.search.default_pipeline" : "my_pipeline"
},
"mappings": {
"properties": {
"passage_text": {
"type": "text"
}
}
}
}
```
{% include copy-curl.html %}
## Step 3: Ingest documents into the index
To ingest documents into the index created in the previous step, send the following bulk request:
```json
POST /_bulk
{ "index": { "_index": "my-index" } }
{ "passage_text" : "I said welcome to them and we entered the house" }
{ "index": { "_index": "my-index" } }
{ "passage_text" : "I feel welcomed in their family" }
{ "index": { "_index": "my-index" } }
{ "passage_text" : "Welcoming gifts are great" }
```
{% include copy-curl.html %}
## Step 4: Search using reranking
To perform reranking search on your index, use any OpenSearch query and provide an additional `ext.rerank` field:
```json
POST /my-index/_search
{
"query": {
"match": {
"passage_text": "how to welcome in family"
}
},
"ext": {
"rerank": {
"query_context": {
"query_text": "how to welcome in family"
}
}
}
}
```
{% include copy-curl.html %}
Alternatively, you can provide the full path to the field containing the context. For more information, see [Rerank processor example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/#example).