Add documentation about setting a default model for neural search (#5121)
* Add documentation about setting a default model for neural search Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Add new processor to the processor list Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * More tweaks Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Refactor search pipeline documentation Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Refactor retrieving search pipelines Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Add working examples Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implement tech review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Add responses to documentation Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update _search-plugins/search-pipelines/neural-query-enricher.md Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
parent
b149493bea
commit
06527a2772
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
layout: default
|
||||
title: Neural Search plugin
|
||||
title: Neural search
|
||||
nav_order: 200
|
||||
has_children: false
|
||||
has_toc: false
|
||||
|
@ -8,169 +8,196 @@ redirect_from:
|
|||
- /neural-search-plugin/index/
|
||||
---
|
||||
|
||||
# Neural Search plugin
|
||||
# Neural search
|
||||
|
||||
The Neural Search plugin is Generally Available as of OpenSearch 2.9
|
||||
{: .note}
|
||||
Neural search transforms text into vectors and facilitates vector search both at ingestion time and at search time. During ingestion, neural search transforms document text into vector embeddings and indexes both the text and its vector embeddings in a k-NN index. When you use a neural query during search, neural search converts the query text into vector embeddings, uses vector search to compare the query and document embeddings, and returns the closest results.
|
||||
|
||||
The OpenSearch Neural Search plugin enables the integration of machine learning (ML) language models into your search workloads. During ingestion and search, the Neural Search plugin transforms text into vectors. Then, Neural Search uses the transformed vectors in vector-based search.
|
||||
The Neural Search plugin comes bundled with OpenSearch and is generally available as of OpenSearch 2.9. For more information, see [Managing plugins]({{site.url}}{{site.baseurl}}/opensearch/install/plugins#managing-plugins).
|
||||
|
||||
The Neural Search plugin comes bundled with OpenSearch. For more information, see [Managing plugins]({{site.url}}{{site.baseurl}}/opensearch/install/plugins#managing-plugins).
|
||||
## Using neural search
|
||||
|
||||
## Ingest data with Neural Search
|
||||
To use neural search, follow these steps:
|
||||
|
||||
In order to ingest vectorized documents, you need to create a Neural Search ingest _pipeline_. An ingest pipeline consists of a series of processors that manipulate documents during ingestion, allowing the documents to be vectorized. The following API operation creates a Neural Search ingest pipeline:
|
||||
1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline).
|
||||
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
|
||||
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
|
||||
1. [Search the index using neural search](#step-4-search-the-index-using-neural-search).
|
||||
|
||||
```
|
||||
## Step 1: Create an ingest pipeline
|
||||
|
||||
To generate vector embeddings for text fields, you need to create a neural search [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). An ingest pipeline consists of a series of processors that manipulate documents during ingestion, allowing the documents to be vectorized.
|
||||
|
||||
### Path and HTTP method
|
||||
|
||||
The following API operation creates a neural search ingest pipeline:
|
||||
|
||||
```json
|
||||
PUT _ingest/pipeline/<pipeline_name>
|
||||
```
|
||||
|
||||
In the pipeline request body, the `text_embedding` processor, the only processor supported by Neural Search, converts a document's text to vector embeddings. `text_embedding` uses `field_map`s to determine what fields from which to generate vector embeddings and also which field to store the embedding.
|
||||
|
||||
### Path parameter
|
||||
|
||||
Use `pipeline_name` to create a name for your Neural Search ingest pipeline.
|
||||
Use `pipeline_name` to create a name for your neural search ingest pipeline.
|
||||
|
||||
### Request fields
|
||||
|
||||
In the pipeline request body, you must set up a `text_embedding` processor (the only processor supported by neural search), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings:
|
||||
|
||||
```json
|
||||
"text_embedding": {
|
||||
"model_id": "<model_id>",
|
||||
"field_map": {
|
||||
"<input_field>": "<vector_field>"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The following table lists the `text_embedding` processor request fields.
|
||||
|
||||
Field | Data type | Description
|
||||
:--- | :--- | :---
|
||||
description | string | A description of the processor.
|
||||
model_id | string | The ID of the model that will be used in the embedding interface. The model must be indexed in OpenSearch before it can be used in Neural Search. For more information, see [Model Serving Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-serving-framework/)
|
||||
input_field_name | string | The field name used to cache text for text embeddings.
|
||||
output_field_name | string | The name of the field in which output text is stored.
|
||||
`model_id` | String | The ID of the model that will be used to generate the embeddings. The model must be indexed in OpenSearch before it can be used in neural search. For more information, see [ML Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
|
||||
`field_map.<input_field>` | String | The name of the field from which to obtain text for generating text embeddings.
|
||||
`field_map.<vector_field>` | String | The name of the vector field in which to store the generated text embeddings.
|
||||
|
||||
### Example request
|
||||
|
||||
Use the following example request to create a pipeline:
|
||||
The following example request creates an ingest pipeline where the text from `passage_text` will be converted into text embeddings and the embeddings will be stored in `passage_embedding`:
|
||||
|
||||
```
|
||||
PUT _ingest/pipeline/nlp-pipeline
|
||||
```json
|
||||
PUT /_ingest/pipeline/nlp-ingest-pipeline
|
||||
{
|
||||
"description": "An example neural search pipeline",
|
||||
"processors" : [
|
||||
"description": "An NLP ingest pipeline",
|
||||
"processors": [
|
||||
{
|
||||
"text_embedding": {
|
||||
"model_id": "bxoDJ7IHGM14UqatWc_2j",
|
||||
"model_id": "bQ1J8ooBpBj3wT4HVUsb",
|
||||
"field_map": {
|
||||
"passage_text": "passage_embedding"
|
||||
"passage_text": "passage_embedding"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Example response
|
||||
## Step 2: Create an index for ingestion
|
||||
|
||||
OpenSearch responds with an acknowledgment of the pipeline's creation.
|
||||
|
||||
```json
|
||||
PUT _ingest/pipeline/nlp-pipeline
|
||||
{
|
||||
"acknowledged" : true
|
||||
}
|
||||
```
|
||||
|
||||
## Create an index for ingestion
|
||||
|
||||
In order to use the text embedding processor defined in your pipelines, create an index with mapping data that aligns with the maps specified in your pipeline. For example, the `output_fields` defined in the `field_map` field of your processor request must map to the k-NN vector fields with a dimension that matches the model. Similarly, the `text_fields` defined in your processor should map to the `text_fields` in your index.
|
||||
In order to use the text embedding processor defined in your pipelines, create a k-NN index with mapping data that aligns with the maps specified in your pipeline. For example, the `<vector_field>` defined in the `field_map` of your processor must be mapped as a k-NN vector field with a dimension that matches the model dimension. Similarly, the `<input_field>` defined in your processor should be mapped as `text` in your index.
|
||||
|
||||
### Example request
|
||||
|
||||
|
||||
The following example request creates an index that attaches to a Neural Search pipeline. Because the index maps to k-NN vector fields, the index setting field `index-knn` is set to `true`. To match the maps defined in the Neural Search pipeline, `mapping` settings use [k-NN method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#method-definitions).
|
||||
The following example request creates a k-NN index that is set up with a default ingest pipeline:
|
||||
|
||||
```json
|
||||
PUT /my-nlp-index-1
|
||||
PUT /my-nlp-index
|
||||
{
|
||||
"settings": {
|
||||
"index.knn": true,
|
||||
"default_pipeline": "<pipeline_name>"
|
||||
},
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"passage_embedding": {
|
||||
"type": "knn_vector",
|
||||
"dimension": int,
|
||||
"method": {
|
||||
"name": "string",
|
||||
"space_type": "string",
|
||||
"engine": "string",
|
||||
"parameters": json_object
|
||||
}
|
||||
},
|
||||
"passage_text": {
|
||||
"type": "text"
|
||||
},
|
||||
"settings": {
|
||||
"index.knn": true,
|
||||
"default_pipeline": "nlp-ingest-pipeline"
|
||||
},
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"id": {
|
||||
"type": "text"
|
||||
},
|
||||
"passage_embedding": {
|
||||
"type": "knn_vector",
|
||||
"dimension": 768,
|
||||
"method": {
|
||||
"engine": "lucene",
|
||||
"space_type": "l2",
|
||||
"name": "hnsw",
|
||||
"parameters": {}
|
||||
}
|
||||
},
|
||||
"passage_text": {
|
||||
"type": "text"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Example response
|
||||
For more information about creating a k-NN index and the methods it supports, see [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/).
|
||||
|
||||
OpenSearch responds with information about your new index:
|
||||
## Step 3: Ingest documents into the index
|
||||
|
||||
To ingest documents into the index created in the previous step, send a POST request for each document:
|
||||
|
||||
```json
|
||||
PUT /my-nlp-index/_doc/1
|
||||
{
|
||||
"acknowledged" : true,
|
||||
"shards_acknowledged" : true,
|
||||
"index" : "my-nlp-index-1"
|
||||
"passage_text": "Hello world",
|
||||
"id": "s1"
|
||||
}
|
||||
```
|
||||
|
||||
## Ingest documents into Neural Search
|
||||
|
||||
OpenSearch's [Ingest API]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) manages document ingestion, similar to other OpenSearch indexes. For example, you can ingest a document that contains the `passage_text: "Hello world"` with a simple POST method:
|
||||
{% include copy-curl.html %}
|
||||
|
||||
```json
|
||||
POST /my-nlp-index-1/_doc
|
||||
PUT /my-nlp-index/_doc/2
|
||||
{
|
||||
"passage_text": "Hello world"
|
||||
"passage_text": "Hi planet",
|
||||
"id": "s2"
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
Before the document is ingested into the index, the ingest pipeline runs the `text_embedding` processor on the document, generating text embeddings for the `passage_text` field. The indexed document contains the `passage_text` field that has the original text and the `passage_embedding` field that has the vector embeddings.
|
||||
|
||||
## Step 4: Search the index using neural search
|
||||
|
||||
To perform vector search on your index, use the `neural` query clause either in the [k-NN plugin API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/#search-model) or [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. You can refine the results by using a [k-NN search filter]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/).
|
||||
|
||||
### Neural query request fields
|
||||
|
||||
Include the following request fields under the `neural` query clause:
|
||||
|
||||
```json
|
||||
"neural": {
|
||||
"<vector_field>": {
|
||||
"query_text": "<query_text>",
|
||||
"model_id": "<model_id>",
|
||||
"k": 100
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
With the text_embedding processor in place through a Neural Search ingest pipeline, the example indexes "Hello world" as a `text_field` and converts "Hello world" into an associated k-NN vector field.
|
||||
|
||||
## Search a neural index
|
||||
|
||||
To convert a text query into a k-NN vector query by using a language model, use the `neural` query fields in your query. The neural query request fields can be used in both the [k-NN plugin API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/#search-model) and [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/). Furthermore, you can use a [k-NN search filter]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/) to refine your neural search query.
|
||||
|
||||
### Neural request fields
|
||||
|
||||
Include the following request fields under the `neural` field in your query:
|
||||
The top-level `vector_field` specifies the vector field against which to run a search query. The following table lists the other neural query fields.
|
||||
|
||||
Field | Data type | Description
|
||||
:--- | :--- | :---
|
||||
vector_field | string | The vector field against which to run a search query.
|
||||
query_text | string | The query text from which to produce queries.
|
||||
model_id | string | The ID of the model that will be used in the embedding interface. The model must be indexed in OpenSearch before it can be used in Neural Search.
|
||||
k | integer | The number of results the k-NN search returns.
|
||||
`query_text` | String | The query text from which to generate text embeddings.
|
||||
`model_id` | String | The ID of the model that will be used to generate text embeddings from the query text. The model must be indexed in OpenSearch before it can be used in neural search.
|
||||
`k` | Integer | The number of results returned by the k-NN search.
|
||||
|
||||
### Example request
|
||||
|
||||
The following example request uses a search query that returns vectors for the "Hello World" query text:
|
||||
|
||||
The following example request uses a Boolean query to combine a filter clause and two query clauses---a neural query and a `match` query. The `script_score` query assigns custom weights to the query clauses:
|
||||
|
||||
```json
|
||||
GET my_index/_search
|
||||
GET /my-nlp-index/_search
|
||||
{
|
||||
"_source": {
|
||||
"excludes": [
|
||||
"passage_embedding"
|
||||
]
|
||||
},
|
||||
"query": {
|
||||
"bool" : {
|
||||
"bool": {
|
||||
"filter": {
|
||||
"range": {
|
||||
"distance": { "lte" : 20 }
|
||||
}
|
||||
"wildcard": { "id": "*1" }
|
||||
},
|
||||
"should" : [
|
||||
"should": [
|
||||
{
|
||||
"script_score": {
|
||||
"query": {
|
||||
"neural": {
|
||||
"passage_vector": {
|
||||
"query_text": "Hello world",
|
||||
"model_id": "xzy76xswsd",
|
||||
"passage_embedding": {
|
||||
"query_text": "Hi world",
|
||||
"model_id": "bQ1J8ooBpBj3wT4HVUsb",
|
||||
"k": 100
|
||||
}
|
||||
}
|
||||
|
@ -179,12 +206,13 @@ GET my_index/_search
|
|||
"source": "_score * 1.5"
|
||||
}
|
||||
}
|
||||
}
|
||||
,
|
||||
},
|
||||
{
|
||||
"script_score": {
|
||||
"query": {
|
||||
"match": { "passage_text": "Hello world" }
|
||||
"match": {
|
||||
"passage_text": "Hi world"
|
||||
}
|
||||
},
|
||||
"script": {
|
||||
"source": "_score * 1.7"
|
||||
|
@ -196,7 +224,135 @@ GET my_index/_search
|
|||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The response contains the matching document:
|
||||
|
||||
```json
|
||||
{
|
||||
"took" : 36,
|
||||
"timed_out" : false,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
},
|
||||
"hits" : {
|
||||
"total" : {
|
||||
"value" : 1,
|
||||
"relation" : "eq"
|
||||
},
|
||||
"max_score" : 1.2251667,
|
||||
"hits" : [
|
||||
{
|
||||
"_index" : "my-nlp-index",
|
||||
"_id" : "1",
|
||||
"_score" : 1.2251667,
|
||||
"_source" : {
|
||||
"passage_text" : "Hello world",
|
||||
"id" : "s1"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Setting a default model on an index or field
|
||||
|
||||
To eliminate passing the model ID with each neural query request, you can set a default model on a k-NN index or a field.
|
||||
|
||||
First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model for an index, provide the model ID in the `default_model_id` parameter. To set a default model for a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map. If you provide both `default_model_id` and `neural_field_default_id`, `neural_field_default_id` takes precedence:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/default_model_pipeline
|
||||
{
|
||||
"request_processors": [
|
||||
{
|
||||
"neural_query_enricher" : {
|
||||
"default_model_id": "bQ1J8ooBpBj3wT4HVUsb",
|
||||
"neural_field_default_id": {
|
||||
"my_field_1": "uZj0qYoBMtvQlfhaYeud",
|
||||
"my_field_2": "upj0qYoBMtvQlfhaZOuM"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
Then set the default model for your index:
|
||||
|
||||
```json
|
||||
PUT /my-nlp-index/_settings
|
||||
{
|
||||
"index.search.default_pipeline" : "default_model_pipeline"
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
You can now omit the model ID when searching:
|
||||
|
||||
```json
|
||||
GET /my-nlp-index/_search
|
||||
{
|
||||
"_source": {
|
||||
"excludes": [
|
||||
"passage_embedding"
|
||||
]
|
||||
},
|
||||
"query": {
|
||||
"neural": {
|
||||
"passage_embedding": {
|
||||
"query_text": "Hi world",
|
||||
"k": 100
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The response contains both documents:
|
||||
|
||||
```json
|
||||
{
|
||||
"took" : 41,
|
||||
"timed_out" : false,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
},
|
||||
"hits" : {
|
||||
"total" : {
|
||||
"value" : 2,
|
||||
"relation" : "eq"
|
||||
},
|
||||
"max_score" : 1.22762,
|
||||
"hits" : [
|
||||
{
|
||||
"_index" : "my-nlp-index",
|
||||
"_id" : "2",
|
||||
"_score" : 1.22762,
|
||||
"_source" : {
|
||||
"passage_text" : "Hi planet",
|
||||
"id" : "s2"
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my-nlp-index",
|
||||
"_id" : "1",
|
||||
"_score" : 1.2251667,
|
||||
"_source" : {
|
||||
"passage_text" : "Hello world",
|
||||
"id" : "s1"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
|
@ -0,0 +1,156 @@
|
|||
---
|
||||
layout: default
|
||||
title: Creating a search pipeline
|
||||
nav_order: 10
|
||||
has_children: false
|
||||
parent: Search pipelines
|
||||
grand_parent: Search
|
||||
---
|
||||
|
||||
# Creating a search pipeline
|
||||
|
||||
Search pipelines are stored in the cluster state. To create a search pipeline, you must configure an ordered list of processors in your OpenSearch cluster. You can have more than one processor of the same type in the pipeline. Each processor has a `tag` identifier that distinguishes it from the others. Tagging a specific processor can be helpful when debugging error messages, especially if you add multiple processors of the same type.
|
||||
|
||||
#### Example request
|
||||
|
||||
The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/my_pipeline
|
||||
{
|
||||
"request_processors": [
|
||||
{
|
||||
"filter_query" : {
|
||||
"tag" : "tag1",
|
||||
"description" : "This processor is going to restrict to publicly visible documents",
|
||||
"query" : {
|
||||
"term": {
|
||||
"visibility": "public"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"response_processors": [
|
||||
{
|
||||
"rename_field": {
|
||||
"field": "message",
|
||||
"target_field": "notification"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
## Ignoring processor failures
|
||||
|
||||
By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline:
|
||||
|
||||
```json
|
||||
"filter_query" : {
|
||||
"tag" : "tag1",
|
||||
"description" : "This processor is going to restrict to publicly visible documents",
|
||||
"ignore_failure": true,
|
||||
"query" : {
|
||||
"term": {
|
||||
"visibility": "public"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-pipeline-metrics/).
|
||||
|
||||
## Updating a search pipeline
|
||||
|
||||
To update a search pipeline dynamically, replace the search pipeline using the Search Pipeline API.
|
||||
|
||||
#### Example request
|
||||
|
||||
The following example request upserts `my_pipeline` by adding a `filter_query` request processor and a `rename_field` response processor:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/my_pipeline
|
||||
{
|
||||
"request_processors": [
|
||||
{
|
||||
"filter_query": {
|
||||
"tag": "tag1",
|
||||
"description": "This processor returns only publicly visible documents",
|
||||
"query": {
|
||||
"term": {
|
||||
"visibility": "public"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"response_processors": [
|
||||
{
|
||||
"rename_field": {
|
||||
"field": "message",
|
||||
"target_field": "notification"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
## Search pipeline versions
|
||||
|
||||
When creating your pipeline, you can specify a version for it in the `version` parameter:
|
||||
|
||||
```json
|
||||
PUT _search/pipeline/my_pipeline
|
||||
{
|
||||
"version": 1234,
|
||||
"request_processors": [
|
||||
{
|
||||
"script": {
|
||||
"source": """
|
||||
if (ctx._source['size'] > 100) {
|
||||
ctx._source['explain'] = false;
|
||||
}
|
||||
"""
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The version is provided in all subsequent responses to `get pipeline` requests:
|
||||
|
||||
```json
|
||||
GET _search/pipeline/my_pipeline
|
||||
```
|
||||
|
||||
The response contains the pipeline version:
|
||||
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"my_pipeline": {
|
||||
"version": 1234,
|
||||
"request_processors": [
|
||||
{
|
||||
"script": {
|
||||
"source": """
|
||||
if (ctx._source['size'] > 100) {
|
||||
ctx._source['explain'] = false;
|
||||
}
|
||||
"""
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
|
@ -20,7 +20,7 @@ Field | Data type | Description
|
|||
`query` | Object | A query in query domain-specific language (DSL). For a list of OpenSearch query types, see [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). Required.
|
||||
`tag` | String | The processor's identifier. Optional.
|
||||
`description` | String | A description of the processor. Optional.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
|
||||
## Example
|
||||
|
||||
|
|
|
@ -29,13 +29,10 @@ Both request and response processing for the pipeline are performed on the coord
|
|||
|
||||
To learn more about available search processors, see [Search processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors/).
|
||||
|
||||
## Creating a search pipeline
|
||||
|
||||
Search pipelines are stored in the cluster state. To create a search pipeline, you must configure an ordered list of processors in your OpenSearch cluster. You can have more than one processor of the same type in the pipeline. Each processor has a `tag` identifier that distinguishes it from the others. Tagging a specific processor can be helpful for debugging error messages, especially if you add multiple processors of the same type.
|
||||
## Example
|
||||
|
||||
#### Example request
|
||||
|
||||
The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`:
|
||||
To create a search pipeline, send a request to the search pipeline endpoint specifying an ordered list of processors, which will be applied sequentially:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/my_pipeline
|
||||
|
@ -65,26 +62,7 @@ PUT /_search/pipeline/my_pipeline
|
|||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Ignoring processor failures
|
||||
|
||||
By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline:
|
||||
|
||||
```json
|
||||
"filter_query" : {
|
||||
"tag" : "tag1",
|
||||
"description" : "This processor is going to restrict to publicly visible documents",
|
||||
"ignore_failure": true,
|
||||
"query" : {
|
||||
"term": {
|
||||
"visibility": "public"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics](#search-pipeline-metrics).
|
||||
|
||||
## Using search pipelines
|
||||
For more information about creating and updating a search pipeline, see [Creating a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/).
|
||||
|
||||
To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter:
|
||||
|
||||
|
@ -95,151 +73,8 @@ GET /my_index/_search?search_pipeline=my_pipeline
|
|||
|
||||
Alternatively, you can use a temporary pipeline with a request or set a default pipeline for an index. To learn more, see [Using a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/).
|
||||
|
||||
## Retrieving search pipelines
|
||||
To learn about retrieving details for an existing search pipeline, see [Retrieving search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/retrieving-search-pipeline/).
|
||||
|
||||
To retrieve the details of an existing search pipeline, use the Search Pipeline API.
|
||||
|
||||
To view all search pipelines, use the following request:
|
||||
|
||||
```json
|
||||
GET /_search/pipeline
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The response contains the pipeline that you set up in the previous section:
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"my_pipeline" : {
|
||||
"request_processors" : [
|
||||
{
|
||||
"filter_query" : {
|
||||
"tag" : "tag1",
|
||||
"description" : "This processor is going to restrict to publicly visible documents",
|
||||
"query" : {
|
||||
"term" : {
|
||||
"visibility" : "public"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
To view a particular pipeline, specify the pipeline name as a path parameter:
|
||||
|
||||
```json
|
||||
GET /_search/pipeline/my_pipeline
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
You can also use wildcard patterns to view a subset of pipelines, for example:
|
||||
|
||||
```json
|
||||
GET /_search/pipeline/my*
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
## Updating a search pipeline
|
||||
|
||||
To update a search pipeline dynamically, replace the search pipeline using the Search Pipeline API.
|
||||
|
||||
#### Example request
|
||||
|
||||
The following request upserts `my_pipeline` by adding a `filter_query` request processor and a `rename_field` response processor:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/my_pipeline
|
||||
{
|
||||
"request_processors": [
|
||||
{
|
||||
"filter_query": {
|
||||
"tag": "tag1",
|
||||
"description": "This processor returns only publicly visible documents",
|
||||
"query": {
|
||||
"term": {
|
||||
"visibility": "public"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"response_processors": [
|
||||
{
|
||||
"rename_field": {
|
||||
"field": "message",
|
||||
"target_field": "notification"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
## Search pipeline versions
|
||||
|
||||
When creating your pipeline, you can specify a version for it in the `version` parameter:
|
||||
|
||||
```json
|
||||
PUT _search/pipeline/my_pipeline
|
||||
{
|
||||
"version": 1234,
|
||||
"request_processors": [
|
||||
{
|
||||
"script": {
|
||||
"source": """
|
||||
if (ctx._source['size'] > 100) {
|
||||
ctx._source['explain'] = false;
|
||||
}
|
||||
"""
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The version is provided in all subsequent responses to `get pipeline` requests:
|
||||
|
||||
```json
|
||||
GET _search/pipeline/my_pipeline
|
||||
```
|
||||
|
||||
The response contains the pipeline version:
|
||||
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"my_pipeline": {
|
||||
"version": 1234,
|
||||
"request_processors": [
|
||||
{
|
||||
"script": {
|
||||
"source": """
|
||||
if (ctx._source['size'] > 100) {
|
||||
ctx._source['explain'] = false;
|
||||
}
|
||||
"""
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
## Search pipeline metrics
|
||||
|
||||
|
|
|
@ -0,0 +1,47 @@
|
|||
---
|
||||
layout: default
|
||||
title: Neural query enricher
|
||||
nav_order: 12
|
||||
has_children: false
|
||||
parent: Search processors
|
||||
grand_parent: Search pipelines
|
||||
---
|
||||
|
||||
# Neural query enricher processor
|
||||
|
||||
The `neural_query_enricher` search request processor is designed to set a default machine learning (ML) model ID at the index or field level for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) queries. To learn more about ML models, see [ML Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
|
||||
|
||||
## Request fields
|
||||
|
||||
The following table lists all available request fields.
|
||||
|
||||
Field | Data type | Description
|
||||
:--- | :--- | :---
|
||||
`default_model_id` | String | The model ID of the default model for an index. Optional. You must specify at least one `default_model_id` or `neural_field_default_id`. If both are provided, `neural_field_default_id` takes precedence.
|
||||
`neural_field_default_id` | Object | A map of key-value pairs representing document field names and their associated default model IDs. Optional. You must specify at least one `default_model_id` or `neural_field_default_id`. If both are provided, `neural_field_default_id` takes precedence.
|
||||
`tag` | String | The processor's identifier. Optional.
|
||||
`description` | String | A description of the processor. Optional.
|
||||
|
||||
## Example
|
||||
|
||||
The following example request creates a search pipeline with a `neural_query_enricher` search request processor. The processor sets a default model ID at the index level and provides different default model IDs for two specific fields in the index:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/default_model_pipeline
|
||||
{
|
||||
"request_processors": [
|
||||
{
|
||||
"neural_query_enricher" : {
|
||||
"tag": "tag1",
|
||||
"description": "Sets the default model ID at index and field levels",
|
||||
"default_model_id": "u5j0qYoBMtvQlfhaxOsa",
|
||||
"neural_field_default_id": {
|
||||
"my_field_1": "uZj0qYoBMtvQlfhaYeud",
|
||||
"my_field_2": "upj0qYoBMtvQlfhaZOuM"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
|
@ -27,7 +27,7 @@ Field | Data type | Description
|
|||
`iam_role_arn` | String | If you use multiple roles to restrict permissions for different groups of users in your organization, specify the ARN of the role that has permission to access Amazon Personalize. If you use only the AWS credentials in your OpenSearch keystore, you can omit this field. Optional.
|
||||
`tag` | String | The processor's identifier. Optional.
|
||||
`description` | String | A description of the processor. Optional.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
|
||||
## Example
|
||||
|
||||
|
|
|
@ -21,7 +21,7 @@ Field | Data type | Description
|
|||
`target_field` | String | The new field name. Required.
|
||||
`tag` | String | The processor's identifier.
|
||||
`description` | String | A description of the processor.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
|
||||
## Example
|
||||
|
||||
|
|
|
@ -0,0 +1,61 @@
|
|||
---
|
||||
layout: default
|
||||
title: Retrieving search pipelines
|
||||
nav_order: 25
|
||||
has_children: false
|
||||
parent: Search pipelines
|
||||
grand_parent: Search
|
||||
---
|
||||
|
||||
# Retrieving search pipelines
|
||||
|
||||
To retrieve the details of an existing search pipeline, use the Search Pipeline API.
|
||||
|
||||
To view all search pipelines, use the following request:
|
||||
|
||||
```json
|
||||
GET /_search/pipeline
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The response contains the pipeline that you set up in the previous section:
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"my_pipeline" : {
|
||||
"request_processors" : [
|
||||
{
|
||||
"filter_query" : {
|
||||
"tag" : "tag1",
|
||||
"description" : "This processor is going to restrict to publicly visible documents",
|
||||
"query" : {
|
||||
"term" : {
|
||||
"visibility" : "public"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
To view a particular pipeline, specify the pipeline name as a path parameter:
|
||||
|
||||
```json
|
||||
GET /_search/pipeline/my_pipeline
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
You can also use wildcard patterns to view a subset of pipelines, for example:
|
||||
|
||||
```json
|
||||
GET /_search/pipeline/my*
|
||||
```
|
||||
{% include copy-curl.html %}
|
|
@ -34,7 +34,7 @@ Field | Data type | Description
|
|||
`lang` | String | The script language. Optional. Only `painless` is supported.
|
||||
`tag` | String | The processor's identifier. Optional.
|
||||
`description` | String | A description of the processor. Optional.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
|
||||
## Example
|
||||
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: default
|
||||
title: Search pipeline metrics
|
||||
nav_order: 40
|
||||
nav_order: 50
|
||||
has_children: false
|
||||
parent: Search pipelines
|
||||
grand_parent: Search
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: default
|
||||
title: Search processors
|
||||
nav_order: 50
|
||||
nav_order: 40
|
||||
has_children: true
|
||||
parent: Search pipelines
|
||||
grand_parent: Search
|
||||
|
@ -23,8 +23,9 @@ The following table lists all supported search request processors.
|
|||
|
||||
Processor | Description | Earliest available version
|
||||
:--- | :--- | :---
|
||||
[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8
|
||||
[`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8
|
||||
[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search at the index or field level. | 2.11
|
||||
[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8
|
||||
|
||||
## Search response processors
|
||||
|
||||
|
@ -34,8 +35,8 @@ The following table lists all supported search response processors.
|
|||
|
||||
Processor | Description | Earliest available version
|
||||
:--- | :--- | :---
|
||||
[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
|
||||
[`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9
|
||||
[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
|
||||
|
||||
## Search phase results processors
|
||||
|
||||
|
|
|
@ -17,7 +17,7 @@ You can use a search pipeline in the following ways:
|
|||
|
||||
## Specifying an existing search pipeline for a request
|
||||
|
||||
After you [create a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index#creating-a-search-pipeline), you can use the pipeline with a query by specifying the pipeline name in the `search_pipeline` query parameter:
|
||||
After you [create a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/), you can use the pipeline with a query by specifying the pipeline name in the `search_pipeline` query parameter:
|
||||
|
||||
```json
|
||||
GET /my_index/_search?search_pipeline=my_pipeline
|
||||
|
|
Loading…
Reference in New Issue