Add documentation about setting a default model for neural search (#5121)

* Add documentation about setting a default model for neural search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add new processor to the processor list

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* More tweaks

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Refactor search pipeline documentation

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Refactor retrieving search pipelines

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add working examples

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implement tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add responses to documentation

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Update _search-plugins/search-pipelines/neural-query-enricher.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
kolchfa-aws 2023-10-04 13:35:37 -04:00 committed by GitHub
parent b149493bea
commit 06527a2772
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
12 changed files with 530 additions and 274 deletions

View File

@ -1,6 +1,6 @@
---
layout: default
title: Neural Search plugin
title: Neural search
nav_order: 200
has_children: false
has_toc: false
@ -8,169 +8,196 @@ redirect_from:
- /neural-search-plugin/index/
---
# Neural Search plugin
# Neural search
The Neural Search plugin is Generally Available as of OpenSearch 2.9
{: .note}
Neural search transforms text into vectors and facilitates vector search both at ingestion time and at search time. During ingestion, neural search transforms document text into vector embeddings and indexes both the text and its vector embeddings in a k-NN index. When you use a neural query during search, neural search converts the query text into vector embeddings, uses vector search to compare the query and document embeddings, and returns the closest results.
The OpenSearch Neural Search plugin enables the integration of machine learning (ML) language models into your search workloads. During ingestion and search, the Neural Search plugin transforms text into vectors. Then, Neural Search uses the transformed vectors in vector-based search.
The Neural Search plugin comes bundled with OpenSearch and is generally available as of OpenSearch 2.9. For more information, see [Managing plugins]({{site.url}}{{site.baseurl}}/opensearch/install/plugins#managing-plugins).
The Neural Search plugin comes bundled with OpenSearch. For more information, see [Managing plugins]({{site.url}}{{site.baseurl}}/opensearch/install/plugins#managing-plugins).
## Using neural search
## Ingest data with Neural Search
To use neural search, follow these steps:
In order to ingest vectorized documents, you need to create a Neural Search ingest _pipeline_. An ingest pipeline consists of a series of processors that manipulate documents during ingestion, allowing the documents to be vectorized. The following API operation creates a Neural Search ingest pipeline:
1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline).
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
1. [Search the index using neural search](#step-4-search-the-index-using-neural-search).
```
## Step 1: Create an ingest pipeline
To generate vector embeddings for text fields, you need to create a neural search [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). An ingest pipeline consists of a series of processors that manipulate documents during ingestion, allowing the documents to be vectorized.
### Path and HTTP method
The following API operation creates a neural search ingest pipeline:
```json
PUT _ingest/pipeline/<pipeline_name>
```
In the pipeline request body, the `text_embedding` processor, the only processor supported by Neural Search, converts a document's text to vector embeddings. `text_embedding` uses `field_map`s to determine what fields from which to generate vector embeddings and also which field to store the embedding.
### Path parameter
Use `pipeline_name` to create a name for your Neural Search ingest pipeline.
Use `pipeline_name` to create a name for your neural search ingest pipeline.
### Request fields
In the pipeline request body, you must set up a `text_embedding` processor (the only processor supported by neural search), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings:
```json
"text_embedding": {
"model_id": "<model_id>",
"field_map": {
"<input_field>": "<vector_field>"
}
}
```
The following table lists the `text_embedding` processor request fields.
Field | Data type | Description
:--- | :--- | :---
description | string | A description of the processor.
model_id | string | The ID of the model that will be used in the embedding interface. The model must be indexed in OpenSearch before it can be used in Neural Search. For more information, see [Model Serving Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-serving-framework/)
input_field_name | string | The field name used to cache text for text embeddings.
output_field_name | string | The name of the field in which output text is stored.
`model_id` | String | The ID of the model that will be used to generate the embeddings. The model must be indexed in OpenSearch before it can be used in neural search. For more information, see [ML Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/).
`field_map.<input_field>` | String | The name of the field from which to obtain text for generating text embeddings.
`field_map.<vector_field>` | String | The name of the vector field in which to store the generated text embeddings.
### Example request
Use the following example request to create a pipeline:
The following example request creates an ingest pipeline where the text from `passage_text` will be converted into text embeddings and the embeddings will be stored in `passage_embedding`:
```
PUT _ingest/pipeline/nlp-pipeline
```json
PUT /_ingest/pipeline/nlp-ingest-pipeline
{
"description": "An example neural search pipeline",
"processors" : [
"description": "An NLP ingest pipeline",
"processors": [
{
"text_embedding": {
"model_id": "bxoDJ7IHGM14UqatWc_2j",
"model_id": "bQ1J8ooBpBj3wT4HVUsb",
"field_map": {
"passage_text": "passage_embedding"
"passage_text": "passage_embedding"
}
}
}
]
}
```
{% include copy-curl.html %}
### Example response
## Step 2: Create an index for ingestion
OpenSearch responds with an acknowledgment of the pipeline's creation.
```json
PUT _ingest/pipeline/nlp-pipeline
{
"acknowledged" : true
}
```
## Create an index for ingestion
In order to use the text embedding processor defined in your pipelines, create an index with mapping data that aligns with the maps specified in your pipeline. For example, the `output_fields` defined in the `field_map` field of your processor request must map to the k-NN vector fields with a dimension that matches the model. Similarly, the `text_fields` defined in your processor should map to the `text_fields` in your index.
In order to use the text embedding processor defined in your pipelines, create a k-NN index with mapping data that aligns with the maps specified in your pipeline. For example, the `<vector_field>` defined in the `field_map` of your processor must be mapped as a k-NN vector field with a dimension that matches the model dimension. Similarly, the `<input_field>` defined in your processor should be mapped as `text` in your index.
### Example request
The following example request creates an index that attaches to a Neural Search pipeline. Because the index maps to k-NN vector fields, the index setting field `index-knn` is set to `true`. To match the maps defined in the Neural Search pipeline, `mapping` settings use [k-NN method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#method-definitions).
The following example request creates a k-NN index that is set up with a default ingest pipeline:
```json
PUT /my-nlp-index-1
PUT /my-nlp-index
{
"settings": {
"index.knn": true,
"default_pipeline": "<pipeline_name>"
},
"mappings": {
"properties": {
"passage_embedding": {
"type": "knn_vector",
"dimension": int,
"method": {
"name": "string",
"space_type": "string",
"engine": "string",
"parameters": json_object
}
},
"passage_text": {
"type": "text"
},
"settings": {
"index.knn": true,
"default_pipeline": "nlp-ingest-pipeline"
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"passage_embedding": {
"type": "knn_vector",
"dimension": 768,
"method": {
"engine": "lucene",
"space_type": "l2",
"name": "hnsw",
"parameters": {}
}
},
"passage_text": {
"type": "text"
}
}
}
}
```
{% include copy-curl.html %}
### Example response
For more information about creating a k-NN index and the methods it supports, see [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/).
OpenSearch responds with information about your new index:
## Step 3: Ingest documents into the index
To ingest documents into the index created in the previous step, send a POST request for each document:
```json
PUT /my-nlp-index/_doc/1
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "my-nlp-index-1"
"passage_text": "Hello world",
"id": "s1"
}
```
## Ingest documents into Neural Search
OpenSearch's [Ingest API]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) manages document ingestion, similar to other OpenSearch indexes. For example, you can ingest a document that contains the `passage_text: "Hello world"` with a simple POST method:
{% include copy-curl.html %}
```json
POST /my-nlp-index-1/_doc
PUT /my-nlp-index/_doc/2
{
"passage_text": "Hello world"
"passage_text": "Hi planet",
"id": "s2"
}
```
{% include copy-curl.html %}
Before the document is ingested into the index, the ingest pipeline runs the `text_embedding` processor on the document, generating text embeddings for the `passage_text` field. The indexed document contains the `passage_text` field that has the original text and the `passage_embedding` field that has the vector embeddings.
## Step 4: Search the index using neural search
To perform vector search on your index, use the `neural` query clause either in the [k-NN plugin API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/#search-model) or [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. You can refine the results by using a [k-NN search filter]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/).
### Neural query request fields
Include the following request fields under the `neural` query clause:
```json
"neural": {
"<vector_field>": {
"query_text": "<query_text>",
"model_id": "<model_id>",
"k": 100
}
}
```
With the text_embedding processor in place through a Neural Search ingest pipeline, the example indexes "Hello world" as a `text_field` and converts "Hello world" into an associated k-NN vector field.
## Search a neural index
To convert a text query into a k-NN vector query by using a language model, use the `neural` query fields in your query. The neural query request fields can be used in both the [k-NN plugin API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/#search-model) and [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/). Furthermore, you can use a [k-NN search filter]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/) to refine your neural search query.
### Neural request fields
Include the following request fields under the `neural` field in your query:
The top-level `vector_field` specifies the vector field against which to run a search query. The following table lists the other neural query fields.
Field | Data type | Description
:--- | :--- | :---
vector_field | string | The vector field against which to run a search query.
query_text | string | The query text from which to produce queries.
model_id | string | The ID of the model that will be used in the embedding interface. The model must be indexed in OpenSearch before it can be used in Neural Search.
k | integer | The number of results the k-NN search returns.
`query_text` | String | The query text from which to generate text embeddings.
`model_id` | String | The ID of the model that will be used to generate text embeddings from the query text. The model must be indexed in OpenSearch before it can be used in neural search.
`k` | Integer | The number of results returned by the k-NN search.
### Example request
The following example request uses a search query that returns vectors for the "Hello World" query text:
The following example request uses a Boolean query to combine a filter clause and two query clauses---a neural query and a `match` query. The `script_score` query assigns custom weights to the query clauses:
```json
GET my_index/_search
GET /my-nlp-index/_search
{
"_source": {
"excludes": [
"passage_embedding"
]
},
"query": {
"bool" : {
"bool": {
"filter": {
"range": {
"distance": { "lte" : 20 }
}
"wildcard": { "id": "*1" }
},
"should" : [
"should": [
{
"script_score": {
"query": {
"neural": {
"passage_vector": {
"query_text": "Hello world",
"model_id": "xzy76xswsd",
"passage_embedding": {
"query_text": "Hi world",
"model_id": "bQ1J8ooBpBj3wT4HVUsb",
"k": 100
}
}
@ -179,12 +206,13 @@ GET my_index/_search
"source": "_score * 1.5"
}
}
}
,
},
{
"script_score": {
"query": {
"match": { "passage_text": "Hello world" }
"match": {
"passage_text": "Hi world"
}
},
"script": {
"source": "_score * 1.7"
@ -196,7 +224,135 @@ GET my_index/_search
}
}
```
{% include copy-curl.html %}
The response contains the matching document:
```json
{
"took" : 36,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2251667,
"hits" : [
{
"_index" : "my-nlp-index",
"_id" : "1",
"_score" : 1.2251667,
"_source" : {
"passage_text" : "Hello world",
"id" : "s1"
}
}
]
}
}
```
### Setting a default model on an index or field
To eliminate passing the model ID with each neural query request, you can set a default model on a k-NN index or a field.
First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model for an index, provide the model ID in the `default_model_id` parameter. To set a default model for a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map. If you provide both `default_model_id` and `neural_field_default_id`, `neural_field_default_id` takes precedence:
```json
PUT /_search/pipeline/default_model_pipeline
{
"request_processors": [
{
"neural_query_enricher" : {
"default_model_id": "bQ1J8ooBpBj3wT4HVUsb",
"neural_field_default_id": {
"my_field_1": "uZj0qYoBMtvQlfhaYeud",
"my_field_2": "upj0qYoBMtvQlfhaZOuM"
}
}
}
]
}
```
{% include copy-curl.html %}
Then set the default model for your index:
```json
PUT /my-nlp-index/_settings
{
"index.search.default_pipeline" : "default_model_pipeline"
}
```
{% include copy-curl.html %}
You can now omit the model ID when searching:
```json
GET /my-nlp-index/_search
{
"_source": {
"excludes": [
"passage_embedding"
]
},
"query": {
"neural": {
"passage_embedding": {
"query_text": "Hi world",
"k": 100
}
}
}
}
```
{% include copy-curl.html %}
The response contains both documents:
```json
{
"took" : 41,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.22762,
"hits" : [
{
"_index" : "my-nlp-index",
"_id" : "2",
"_score" : 1.22762,
"_source" : {
"passage_text" : "Hi planet",
"id" : "s2"
}
},
{
"_index" : "my-nlp-index",
"_id" : "1",
"_score" : 1.2251667,
"_source" : {
"passage_text" : "Hello world",
"id" : "s1"
}
}
]
}
}
```

View File

@ -0,0 +1,156 @@
---
layout: default
title: Creating a search pipeline
nav_order: 10
has_children: false
parent: Search pipelines
grand_parent: Search
---
# Creating a search pipeline
Search pipelines are stored in the cluster state. To create a search pipeline, you must configure an ordered list of processors in your OpenSearch cluster. You can have more than one processor of the same type in the pipeline. Each processor has a `tag` identifier that distinguishes it from the others. Tagging a specific processor can be helpful when debugging error messages, especially if you add multiple processors of the same type.
#### Example request
The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`:
```json
PUT /_search/pipeline/my_pipeline
{
"request_processors": [
{
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"query" : {
"term": {
"visibility": "public"
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
```
{% include copy-curl.html %}
## Ignoring processor failures
By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline:
```json
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"ignore_failure": true,
"query" : {
"term": {
"visibility": "public"
}
}
}
```
If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-pipeline-metrics/).
## Updating a search pipeline
To update a search pipeline dynamically, replace the search pipeline using the Search Pipeline API.
#### Example request
The following example request upserts `my_pipeline` by adding a `filter_query` request processor and a `rename_field` response processor:
```json
PUT /_search/pipeline/my_pipeline
{
"request_processors": [
{
"filter_query": {
"tag": "tag1",
"description": "This processor returns only publicly visible documents",
"query": {
"term": {
"visibility": "public"
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
```
{% include copy-curl.html %}
## Search pipeline versions
When creating your pipeline, you can specify a version for it in the `version` parameter:
```json
PUT _search/pipeline/my_pipeline
{
"version": 1234,
"request_processors": [
{
"script": {
"source": """
if (ctx._source['size'] > 100) {
ctx._source['explain'] = false;
}
"""
}
}
]
}
```
{% include copy-curl.html %}
The version is provided in all subsequent responses to `get pipeline` requests:
```json
GET _search/pipeline/my_pipeline
```
The response contains the pipeline version:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"my_pipeline": {
"version": 1234,
"request_processors": [
{
"script": {
"source": """
if (ctx._source['size'] > 100) {
ctx._source['explain'] = false;
}
"""
}
}
]
}
}
```
</details>

View File

@ -20,7 +20,7 @@ Field | Data type | Description
`query` | Object | A query in query domain-specific language (DSL). For a list of OpenSearch query types, see [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). Required.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
## Example

View File

@ -29,13 +29,10 @@ Both request and response processing for the pipeline are performed on the coord
To learn more about available search processors, see [Search processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors/).
## Creating a search pipeline
Search pipelines are stored in the cluster state. To create a search pipeline, you must configure an ordered list of processors in your OpenSearch cluster. You can have more than one processor of the same type in the pipeline. Each processor has a `tag` identifier that distinguishes it from the others. Tagging a specific processor can be helpful for debugging error messages, especially if you add multiple processors of the same type.
## Example
#### Example request
The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`:
To create a search pipeline, send a request to the search pipeline endpoint specifying an ordered list of processors, which will be applied sequentially:
```json
PUT /_search/pipeline/my_pipeline
@ -65,26 +62,7 @@ PUT /_search/pipeline/my_pipeline
```
{% include copy-curl.html %}
### Ignoring processor failures
By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline:
```json
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"ignore_failure": true,
"query" : {
"term": {
"visibility": "public"
}
}
}
```
If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics](#search-pipeline-metrics).
## Using search pipelines
For more information about creating and updating a search pipeline, see [Creating a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/).
To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter:
@ -95,151 +73,8 @@ GET /my_index/_search?search_pipeline=my_pipeline
Alternatively, you can use a temporary pipeline with a request or set a default pipeline for an index. To learn more, see [Using a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/).
## Retrieving search pipelines
To learn about retrieving details for an existing search pipeline, see [Retrieving search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/retrieving-search-pipeline/).
To retrieve the details of an existing search pipeline, use the Search Pipeline API.
To view all search pipelines, use the following request:
```json
GET /_search/pipeline
```
{% include copy-curl.html %}
The response contains the pipeline that you set up in the previous section:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"my_pipeline" : {
"request_processors" : [
{
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"query" : {
"term" : {
"visibility" : "public"
}
}
}
}
]
}
}
```
</details>
To view a particular pipeline, specify the pipeline name as a path parameter:
```json
GET /_search/pipeline/my_pipeline
```
{% include copy-curl.html %}
You can also use wildcard patterns to view a subset of pipelines, for example:
```json
GET /_search/pipeline/my*
```
{% include copy-curl.html %}
## Updating a search pipeline
To update a search pipeline dynamically, replace the search pipeline using the Search Pipeline API.
#### Example request
The following request upserts `my_pipeline` by adding a `filter_query` request processor and a `rename_field` response processor:
```json
PUT /_search/pipeline/my_pipeline
{
"request_processors": [
{
"filter_query": {
"tag": "tag1",
"description": "This processor returns only publicly visible documents",
"query": {
"term": {
"visibility": "public"
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
```
{% include copy-curl.html %}
## Search pipeline versions
When creating your pipeline, you can specify a version for it in the `version` parameter:
```json
PUT _search/pipeline/my_pipeline
{
"version": 1234,
"request_processors": [
{
"script": {
"source": """
if (ctx._source['size'] > 100) {
ctx._source['explain'] = false;
}
"""
}
}
]
}
```
{% include copy-curl.html %}
The version is provided in all subsequent responses to `get pipeline` requests:
```json
GET _search/pipeline/my_pipeline
```
The response contains the pipeline version:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"my_pipeline": {
"version": 1234,
"request_processors": [
{
"script": {
"source": """
if (ctx._source['size'] > 100) {
ctx._source['explain'] = false;
}
"""
}
}
]
}
}
```
</details>
## Search pipeline metrics

View File

@ -0,0 +1,47 @@
---
layout: default
title: Neural query enricher
nav_order: 12
has_children: false
parent: Search processors
grand_parent: Search pipelines
---
# Neural query enricher processor
The `neural_query_enricher` search request processor is designed to set a default machine learning (ML) model ID at the index or field level for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) queries. To learn more about ML models, see [ML Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
## Request fields
The following table lists all available request fields.
Field | Data type | Description
:--- | :--- | :---
`default_model_id` | String | The model ID of the default model for an index. Optional. You must specify at least one `default_model_id` or `neural_field_default_id`. If both are provided, `neural_field_default_id` takes precedence.
`neural_field_default_id` | Object | A map of key-value pairs representing document field names and their associated default model IDs. Optional. You must specify at least one `default_model_id` or `neural_field_default_id`. If both are provided, `neural_field_default_id` takes precedence.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
## Example
The following example request creates a search pipeline with a `neural_query_enricher` search request processor. The processor sets a default model ID at the index level and provides different default model IDs for two specific fields in the index:
```json
PUT /_search/pipeline/default_model_pipeline
{
"request_processors": [
{
"neural_query_enricher" : {
"tag": "tag1",
"description": "Sets the default model ID at index and field levels",
"default_model_id": "u5j0qYoBMtvQlfhaxOsa",
"neural_field_default_id": {
"my_field_1": "uZj0qYoBMtvQlfhaYeud",
"my_field_2": "upj0qYoBMtvQlfhaZOuM"
}
}
}
]
}
```
{% include copy-curl.html %}

View File

@ -27,7 +27,7 @@ Field | Data type | Description
`iam_role_arn` | String | If you use multiple roles to restrict permissions for different groups of users in your organization, specify the ARN of the role that has permission to access Amazon Personalize. If you use only the AWS credentials in your OpenSearch keystore, you can omit this field. Optional.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
## Example

View File

@ -21,7 +21,7 @@ Field | Data type | Description
`target_field` | String | The new field name. Required.
`tag` | String | The processor's identifier.
`description` | String | A description of the processor.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
## Example

View File

@ -0,0 +1,61 @@
---
layout: default
title: Retrieving search pipelines
nav_order: 25
has_children: false
parent: Search pipelines
grand_parent: Search
---
# Retrieving search pipelines
To retrieve the details of an existing search pipeline, use the Search Pipeline API.
To view all search pipelines, use the following request:
```json
GET /_search/pipeline
```
{% include copy-curl.html %}
The response contains the pipeline that you set up in the previous section:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"my_pipeline" : {
"request_processors" : [
{
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"query" : {
"term" : {
"visibility" : "public"
}
}
}
}
]
}
}
```
</details>
To view a particular pipeline, specify the pipeline name as a path parameter:
```json
GET /_search/pipeline/my_pipeline
```
{% include copy-curl.html %}
You can also use wildcard patterns to view a subset of pipelines, for example:
```json
GET /_search/pipeline/my*
```
{% include copy-curl.html %}

View File

@ -34,7 +34,7 @@ Field | Data type | Description
`lang` | String | The script language. Optional. Only `painless` is supported.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
## Example

View File

@ -1,7 +1,7 @@
---
layout: default
title: Search pipeline metrics
nav_order: 40
nav_order: 50
has_children: false
parent: Search pipelines
grand_parent: Search

View File

@ -1,7 +1,7 @@
---
layout: default
title: Search processors
nav_order: 50
nav_order: 40
has_children: true
parent: Search pipelines
grand_parent: Search
@ -23,8 +23,9 @@ The following table lists all supported search request processors.
Processor | Description | Earliest available version
:--- | :--- | :---
[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8
[`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8
[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search at the index or field level. | 2.11
[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8
## Search response processors
@ -34,8 +35,8 @@ The following table lists all supported search response processors.
Processor | Description | Earliest available version
:--- | :--- | :---
[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
[`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9
[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
## Search phase results processors

View File

@ -17,7 +17,7 @@ You can use a search pipeline in the following ways:
## Specifying an existing search pipeline for a request
After you [create a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index#creating-a-search-pipeline), you can use the pipeline with a query by specifying the pipeline name in the `search_pipeline` query parameter:
After you [create a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/), you can use the pipeline with a query by specifying the pipeline name in the `search_pipeline` query parameter:
```json
GET /my_index/_search?search_pipeline=my_pipeline