opensearch-docs-cn/_search-plugins/hybrid-search.md

219 lines
7.0 KiB
Markdown
Raw Permalink Normal View History

Add an overview of search methods and pages for each search method (#5636) * Restructuring TOC Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Resolve merge conflicts Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * More foundational rewrites of ML Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * TOC restructure Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Rename and rewrite search pages and add keyword search Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Small wording change Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Small wording change Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Updated response Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Small rewording Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Move neural search to top of vector search list Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Change terminology Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Reorganize search methods list Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Rename links Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * More link renames Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Implemented editorial comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-11-29 15:28:20 -05:00
---
layout: default
title: Hybrid search
has_children: false
nav_order: 60
Add an overview of search methods and pages for each search method (#5636) * Restructuring TOC Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Resolve merge conflicts Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * More foundational rewrites of ML Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * TOC restructure Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Rename and rewrite search pages and add keyword search Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Small wording change Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Small wording change Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Updated response Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Small rewording Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Move neural search to top of vector search list Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Change terminology Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Reorganize search methods list Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Rename links Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * More link renames Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Implemented editorial comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-11-29 15:28:20 -05:00
---
# Hybrid search
Introduced 2.11
{: .label .label-purple }
Hybrid search combines keyword and neural search to improve search relevance. To implement hybrid search, you need to set up a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline you'll configure intercepts search results at an intermediate stage and applies the [`normalization_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) to them. The `normalization_processor` normalizes and combines the document scores from multiple query clauses, rescoring the documents according to the chosen normalization and combination techniques.
**PREREQUISITE**<br>
Before using hybrid search, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
## Using hybrid search
To use hybrid search, follow these steps:
1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline).
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
1. [Configure a search pipeline](#step-4-configure-a-search-pipeline).
1. [Search the index using hybrid search](#step-5-search-the-index-using-hybrid-search).
## Step 1: Create an ingest pipeline
To generate vector embeddings, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains a [`text_embedding` processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/text-embedding/), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings.
The following example request creates an ingest pipeline that converts the text from `passage_text` to text embeddings and stores the embeddings in `passage_embedding`:
```json
PUT /_ingest/pipeline/nlp-ingest-pipeline
{
"description": "A text embedding pipeline",
"processors": [
{
"text_embedding": {
"model_id": "bQ1J8ooBpBj3wT4HVUsb",
"field_map": {
"passage_text": "passage_embedding"
}
}
}
]
}
```
{% include copy-curl.html %}
## Step 2: Create an index for ingestion
In order to use the text embedding processor defined in your pipeline, create a k-NN index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as a k-NN vector with a dimension that matches the model dimension. Similarly, the `passage_text` field should be mapped as `text`.
The following example request creates a k-NN index that is set up with a default ingest pipeline:
```json
PUT /my-nlp-index
{
"settings": {
"index.knn": true,
"default_pipeline": "nlp-ingest-pipeline"
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"passage_embedding": {
"type": "knn_vector",
"dimension": 768,
"method": {
"engine": "lucene",
"space_type": "l2",
"name": "hnsw",
"parameters": {}
}
},
"passage_text": {
"type": "text"
}
}
}
}
```
{% include copy-curl.html %}
For more information about creating a k-NN index and using supported methods, see [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/).
## Step 3: Ingest documents into the index
To ingest documents into the index created in the previous step, send the following requests:
```json
PUT /my-nlp-index/_doc/1
{
"passage_text": "Hello world",
"id": "s1"
}
```
{% include copy-curl.html %}
```json
PUT /my-nlp-index/_doc/2
{
"passage_text": "Hi planet",
"id": "s2"
}
```
{% include copy-curl.html %}
Before the document is ingested into the index, the ingest pipeline runs the `text_embedding` processor on the document, generating text embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings.
## Step 4: Configure a search pipeline
To configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/), use the following request. The normalization technique in the processor is set to `min_max`, and the combination technique is set to `arithmetic_mean`. The `weights` array specifies the weights assigned to each query clause as decimal percentages:
```json
PUT /_search/pipeline/nlp-search-pipeline
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean",
"parameters": {
"weights": [
0.3,
0.7
]
}
}
}
}
]
}
```
{% include copy-curl.html %}
## Step 5: Search the index using hybrid search
To perform hybrid search on your index, use the [`hybrid` query]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/), which combines the results of keyword and semantic search.
The following example request combines two query clauses---a neural query and a `match` query. It specifies the search pipeline created in the previous step as a query parameter:
```json
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
"_source": {
"exclude": [
"passage_embedding"
]
},
"query": {
"hybrid": {
"queries": [
{
"match": {
"text": {
"query": "Hi world"
}
}
},
{
"neural": {
"passage_embedding": {
"query_text": "Hi world",
"model_id": "aVeif4oB5Vm0Tdw8zYO2",
"k": 5
}
}
}
]
}
}
}
```
{% include copy-curl.html %}
Alternatively, you can set a default search pipeline for the `my-nlp-index` index. For more information, see [Default search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/#default-search-pipeline).
The response contains the matching document:
```json
{
"took" : 36,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2251667,
"hits" : [
{
"_index" : "my-nlp-index",
"_id" : "1",
"_score" : 1.2251667,
"_source" : {
"passage_text" : "Hello world",
"id" : "s1"
}
}
]
}
}
```