219 lines
7.0 KiB
Markdown
219 lines
7.0 KiB
Markdown
---
|
|
layout: default
|
|
title: Hybrid search
|
|
has_children: false
|
|
nav_order: 60
|
|
---
|
|
|
|
# Hybrid search
|
|
Introduced 2.11
|
|
{: .label .label-purple }
|
|
|
|
Hybrid search combines keyword and neural search to improve search relevance. To implement hybrid search, you need to set up a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline you'll configure intercepts search results at an intermediate stage and applies the [`normalization_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) to them. The `normalization_processor` normalizes and combines the document scores from multiple query clauses, rescoring the documents according to the chosen normalization and combination techniques.
|
|
|
|
**PREREQUISITE**<br>
|
|
Before using hybrid search, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
|
|
{: .note}
|
|
|
|
## Using hybrid search
|
|
|
|
To use hybrid search, follow these steps:
|
|
|
|
1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline).
|
|
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
|
|
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
|
|
1. [Configure a search pipeline](#step-4-configure-a-search-pipeline).
|
|
1. [Search the index using hybrid search](#step-5-search-the-index-using-hybrid-search).
|
|
|
|
## Step 1: Create an ingest pipeline
|
|
|
|
To generate vector embeddings, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains a [`text_embedding` processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/text-embedding/), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings.
|
|
|
|
The following example request creates an ingest pipeline that converts the text from `passage_text` to text embeddings and stores the embeddings in `passage_embedding`:
|
|
|
|
```json
|
|
PUT /_ingest/pipeline/nlp-ingest-pipeline
|
|
{
|
|
"description": "A text embedding pipeline",
|
|
"processors": [
|
|
{
|
|
"text_embedding": {
|
|
"model_id": "bQ1J8ooBpBj3wT4HVUsb",
|
|
"field_map": {
|
|
"passage_text": "passage_embedding"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
{% include copy-curl.html %}
|
|
|
|
## Step 2: Create an index for ingestion
|
|
|
|
In order to use the text embedding processor defined in your pipeline, create a k-NN index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as a k-NN vector with a dimension that matches the model dimension. Similarly, the `passage_text` field should be mapped as `text`.
|
|
|
|
The following example request creates a k-NN index that is set up with a default ingest pipeline:
|
|
|
|
```json
|
|
PUT /my-nlp-index
|
|
{
|
|
"settings": {
|
|
"index.knn": true,
|
|
"default_pipeline": "nlp-ingest-pipeline"
|
|
},
|
|
"mappings": {
|
|
"properties": {
|
|
"id": {
|
|
"type": "text"
|
|
},
|
|
"passage_embedding": {
|
|
"type": "knn_vector",
|
|
"dimension": 768,
|
|
"method": {
|
|
"engine": "lucene",
|
|
"space_type": "l2",
|
|
"name": "hnsw",
|
|
"parameters": {}
|
|
}
|
|
},
|
|
"passage_text": {
|
|
"type": "text"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
{% include copy-curl.html %}
|
|
|
|
For more information about creating a k-NN index and using supported methods, see [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/).
|
|
|
|
## Step 3: Ingest documents into the index
|
|
|
|
To ingest documents into the index created in the previous step, send the following requests:
|
|
|
|
```json
|
|
PUT /my-nlp-index/_doc/1
|
|
{
|
|
"passage_text": "Hello world",
|
|
"id": "s1"
|
|
}
|
|
```
|
|
{% include copy-curl.html %}
|
|
|
|
```json
|
|
PUT /my-nlp-index/_doc/2
|
|
{
|
|
"passage_text": "Hi planet",
|
|
"id": "s2"
|
|
}
|
|
```
|
|
{% include copy-curl.html %}
|
|
|
|
Before the document is ingested into the index, the ingest pipeline runs the `text_embedding` processor on the document, generating text embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings.
|
|
|
|
## Step 4: Configure a search pipeline
|
|
|
|
To configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/), use the following request. The normalization technique in the processor is set to `min_max`, and the combination technique is set to `arithmetic_mean`. The `weights` array specifies the weights assigned to each query clause as decimal percentages:
|
|
|
|
```json
|
|
PUT /_search/pipeline/nlp-search-pipeline
|
|
{
|
|
"description": "Post processor for hybrid search",
|
|
"phase_results_processors": [
|
|
{
|
|
"normalization-processor": {
|
|
"normalization": {
|
|
"technique": "min_max"
|
|
},
|
|
"combination": {
|
|
"technique": "arithmetic_mean",
|
|
"parameters": {
|
|
"weights": [
|
|
0.3,
|
|
0.7
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
{% include copy-curl.html %}
|
|
|
|
## Step 5: Search the index using hybrid search
|
|
|
|
To perform hybrid search on your index, use the [`hybrid` query]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/), which combines the results of keyword and semantic search.
|
|
|
|
The following example request combines two query clauses---a neural query and a `match` query. It specifies the search pipeline created in the previous step as a query parameter:
|
|
|
|
```json
|
|
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
|
|
{
|
|
"_source": {
|
|
"exclude": [
|
|
"passage_embedding"
|
|
]
|
|
},
|
|
"query": {
|
|
"hybrid": {
|
|
"queries": [
|
|
{
|
|
"match": {
|
|
"text": {
|
|
"query": "Hi world"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"neural": {
|
|
"passage_embedding": {
|
|
"query_text": "Hi world",
|
|
"model_id": "aVeif4oB5Vm0Tdw8zYO2",
|
|
"k": 5
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
{% include copy-curl.html %}
|
|
|
|
Alternatively, you can set a default search pipeline for the `my-nlp-index` index. For more information, see [Default search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/#default-search-pipeline).
|
|
|
|
The response contains the matching document:
|
|
|
|
```json
|
|
{
|
|
"took" : 36,
|
|
"timed_out" : false,
|
|
"_shards" : {
|
|
"total" : 1,
|
|
"successful" : 1,
|
|
"skipped" : 0,
|
|
"failed" : 0
|
|
},
|
|
"hits" : {
|
|
"total" : {
|
|
"value" : 1,
|
|
"relation" : "eq"
|
|
},
|
|
"max_score" : 1.2251667,
|
|
"hits" : [
|
|
{
|
|
"_index" : "my-nlp-index",
|
|
"_id" : "1",
|
|
"_score" : 1.2251667,
|
|
"_source" : {
|
|
"passage_text" : "Hello world",
|
|
"id" : "s1"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|