Add documentation for new reranking feature in 2.12 (#6368)

* Create reranking.md document new reranking feature in 2.12 Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * Doc review and address comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/rerank-processor.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/rerank-processor.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: HenryL27 <hmlindeman@yahoo.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2024-02-13 11:24:08 -08:00 · 2024-02-13 11:24:08 -08:00 · 6d884b6db3
parent b6b2ed7fa0
commit 6d884b6db3
5 changed files with 241 additions and 4 deletions
--- a/_search-plugins/search-pipelines/rerank-processor.md
+++ b/_search-plugins/search-pipelines/rerank-processor.md
@ -0,0 +1,116 @@
+---
+layout: default
+title: Rerank
+nav_order: 25
+has_children: false
+parent: Search processors
+grand_parent: Search pipelines
+---
+
+# Rerank processor
+
+The `rerank` search request processor intercepts search results and passes them to a cross-encoder model to be reranked. The model reranks the results, taking into account the scoring context. Then the processor orders documents in the search results based on their new scores.
+
+## Request fields
+
+The following table lists all available request fields.
+
+Field | Data type | Description
+:--- | :--- | :---
+`<reranker_type>` | Object | The reranker type provides the rerank processor with static information needed across all reranking calls. Required.
+`context` | Object | Provides the rerank processor with information necessary for generating reranking context at query time.
+`tag` | String | The processor's identifier. Optional.
+`description` | String | A description of the processor. Optional.
+`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
+
+### The `ml_opensearch` reranker type
+
+The `ml_opensearch` reranker type is designed to work with the cross-encoder model provided by OpenSearch. For this reranker type, specify the following fields.
+
+Field  | Data type | Description
+:--- | :---  | :--- 
+`ml_opensearch` | Object | Provides the rerank processor with model information. Required.
+`ml_opensearch.model_id` | String | The model ID for the cross-encoder model. Required. For more information, see [Using ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
+`context.document_fields` | Array | An array of document fields that specifies the fields from which to retrieve context for the cross-encoder model. Required.
+
+## Example 
+
+The following example demonstrates using a search pipeline with a `rerank` processor.
+
+### Creating a search pipeline
+
+The following request creates a search pipeline with a `rerank` response processor:
+
+```json
+PUT /_search/pipeline/rerank_pipeline
+{
+  "response_processors": [
+    {
+      "rerank": {
+        "ml_opensearch": {
+          "model_id": "gnDIbI0BfUsSoeNT_jAw"
+        },
+        "context": {
+          "document_fields": [ "title", "text_representation"]
+        }
+      }
+    }
+  ]
+}
+```
+{% include copy-curl.html %}
+
+### Using a search pipeline
+
+Combine an OpenSearch query with an `ext` object that contains the query context for the large language model (LLM). Provide the `query_text` that will be used to rerank the results:
+
+```json
+POST /_search?search_pipeline=rerank_pipeline
+{
+  "query": {
+    "match": {
+      "text_representation": "Where is Albuquerque?"
+    }
+  },
+  "ext": {
+    "rerank": {
+      "query_context": {
+        "query_text": "Where is Albuquerque?"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+Instead of specifying `query_text`, you can provide a full path to the field containing text to use for reranking. For example, if you specify a subfield `query` in the `text_representation` object, specify its path in the `query_text_path` parameter:
+
+```json
+POST /_search?search_pipeline=rerank_pipeline
+{
+  "query": {
+    "match": {
+      "text_representation": {
+        "query": "Where is Albuquerque?"
+      }
+    }
+  },
+  "ext": {
+    "rerank": {
+      "query_context": {
+        "query_text_path": "query.match.text_representation.query"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+The `query_context` object contains the following fields. 
+
+Field name  | Description
+:--- | :---  
+`query_text` | The natural language text of the question that you want to use to rerank the search results. Either `query_text` or `query_text_path` (not both) is required.
+`query_text_path` | The full JSON path to the text of the question that you want to use to rerank the search results. Either `query_text` or `query_text_path` (not both) is required. The maximum number of characters in the path is `1000`.
+
+For more information about setting up reranking, see [Reranking search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/).
--- a/_search-plugins/search-pipelines/search-processors.md
+++ b/_search-plugins/search-pipelines/search-processors.md
@ -39,6 +39,7 @@ Processor | Description | Earliest available version
 :--- | :--- | :---
 [`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9
 [`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
+[`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12
 [`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12
 [`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor.  | 2.12

--- a/_search-plugins/search-relevance/compare-search-results.md
+++ b/_search-plugins/search-relevance/compare-search-results.md
@ -1,6 +1,6 @@
 ---
 layout: default
-title: Compare Search Results
+title: Comparing search results
 nav_order: 55
 parent: Search relevance
 has_children: true
@ -9,7 +9,7 @@ redirect_from:
  - /search-plugins/search-relevance/
 ---

-# Compare Search Results
+# Comparing search results

 With Compare Search Results in OpenSearch Dashboards, you can compare results from two queries side by side to determine whether one query produces better results than the other. Using this tool, you can evaluate search quality by experimenting with queries. 

--- a/_search-plugins/search-relevance/index.md
+++ b/_search-plugins/search-relevance/index.md
@ -14,6 +14,8 @@ Search relevance evaluates the accuracy of the search results returned by a quer

 OpenSearch provides the following search relevance features:

- [Compare Search Results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/compare-search-results/) in OpenSearch Dashboards lets you compare results from two queries side by side. 
+- [Comparing search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/compare-search-results/) from two queries side by side in OpenSearch Dashboards. 

- [Querqy]({{site.url}}{{site.baseurl}}/search-plugins/querqy/) offers query rewriting capability.
+- [Reranking search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/) using a cross-encoder reranker. 
+
+- Rewriting queries using [Querqy]({{site.url}}{{site.baseurl}}/search-plugins/querqy/).
--- a/_search-plugins/search-relevance/reranking-search-results.md
+++ b/_search-plugins/search-relevance/reranking-search-results.md
@ -0,0 +1,118 @@
+---
+layout: default
+title: Reranking search results
+parent: Search relevance
+has_children: false
+nav_order: 60
+---
+
+# Reranking search results
+Introduced 2.12
+{: .label .label-purple }
+
+You can rerank search results using a cross-encoder reranker in order to improve search relevance. To implement reranking, you need to configure a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline intercepts search results and applies the [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/) to them. The `rerank` processor evaluates the search results and sorts them based on the new scores provided by the cross-encoder model. 
+
+**PREREQUISITE**<br>
+Before using hybrid search, you must set up a cross-encoder model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
+{: .note}
+
+## Running a search with reranking
+
+To run a search with reranking, follow these steps:
+
+1. [Configure a search pipeline](#step-1-configure-a-search-pipeline).
+1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
+1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
+1. [Search using reranking](#step-4-search-using-reranking).
+
+## Step 1: Configure a search pipeline
+
+Next, configure a search pipeline with a [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/).
+
+The following example request creates a search pipeline with an `ml_opensearch` rerank processor. In the request, provide a model ID for the cross-encoder model and the document fields to use as context:
+
+```json
+PUT /_search/pipeline/my_pipeline
+{
+  "description": "Pipeline for reranking with a cross-encoder",
+  "response_processors": [
+    {
+      "rerank": {
+        "ml_opensearch": {
+          "model_id": "gnDIbI0BfUsSoeNT_jAw"
+        },
+        "context": {
+          "document_fields": [
+            "passage_text"
+          ]
+        }
+      }
+    }
+  ]
+}
+```
+{% include copy-curl.html %}
+
+For more information about the request fields, see [Request fields]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/#request-fields).
+
+## Step 2: Create an index for ingestion
+
+In order to use the rerank processor defined in your pipeline, create an OpenSearch index and add the pipeline created in the previous step as the default pipeline:
+
+```json
+PUT /my-index
+{
+  "settings": {
+    "index.search.default_pipeline" : "my_pipeline"
+  },
+  "mappings": {
+    "properties": {
+      "passage_text": {
+        "type": "text"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+## Step 3: Ingest documents into the index
+
+To ingest documents into the index created in the previous step, send the following bulk request:
+
+```json
+POST /_bulk
+{ "index": { "_index": "my-index" } }
+{ "passage_text" : "I said welcome to them and we entered the house" }
+{ "index": { "_index": "my-index" } }
+{ "passage_text" : "I feel welcomed in their family" }
+{ "index": { "_index": "my-index" } }
+{ "passage_text" : "Welcoming gifts are great" }
+
+```
+{% include copy-curl.html %}
+
+## Step 4: Search using reranking
+
+To perform reranking search on your index, use any OpenSearch query and provide an additional `ext.rerank` field:
+
+```json
+POST /my-index/_search
+{
+  "query": {
+    "match": {
+      "passage_text": "how to welcome in family"
+    }
+  },
+  "ext": {
+    "rerank": {
+      "query_context": {
+         "query_text": "how to welcome in family"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+Alternatively, you can provide the full path to the field containing the context. For more information, see [Rerank processor example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/#example).