Refactor search pipeline documentation (#4908)

* Refactor search pipeline documentation

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
kolchfa-aws 2023-08-28 11:56:11 -04:00 committed by GitHub
parent 000aecc1a3
commit 87ed0af851
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 444 additions and 383 deletions

View File

@ -3,8 +3,8 @@ layout: default
title: Filter query processor
nav_order: 10
has_children: false
parent: Search pipelines
grand_parent: Search
parent: Search processors
grand_parent: Search pipelines
---
# Filter query processor

View File

@ -24,87 +24,9 @@ The following is a list of search pipeline terminology:
Both request and response processing for the pipeline are performed on the coordinator node, so there is no shard-level processing.
{: .note}
## Search request processors
## Processors
OpenSearch supports the following search request processors:
- [`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/): Adds a script that is run on newly indexed documents.
- [`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/): Adds a filtering query that is used to filter requests.
## Search response processors
OpenSearch supports the following search response processors:
- [`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/): Renames an existing field.
- [`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/): Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service).
## Viewing available processor types
You can use the Nodes Search Pipelines API to view the available processor types:
```json
GET /_nodes/search_pipelines
```
{% include copy-curl.html %}
The response contains the `search_pipelines` object that lists the available request and response processors:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "runTask",
"nodes" : {
"36FHvCwHT6Srbm2ZniEPhA" : {
"name" : "runTask-0",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "3.0.0",
"build_type" : "tar",
"build_hash" : "unknown",
"roles" : [
"cluster_manager",
"data",
"ingest",
"remote_cluster_client"
],
"attributes" : {
"testattr" : "test",
"shard_indexing_pressure_enabled" : "true"
},
"search_pipelines" : {
"request_processors" : [
{
"type" : "filter_query"
},
{
"type" : "script"
}
],
"response_processors" : [
{
"type" : "rename_field"
}
]
}
}
}
}
```
</details>
In addition to the processors provided by OpenSearch, additional processors may be provided by plugins.
{: .note}
To learn more about available search processors, see [Search processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors/).
## Creating a search pipeline
@ -161,46 +83,16 @@ By default, a search pipeline stops if one of its processors fails. If you want
If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics](#search-pipeline-metrics).
## Using a temporary search pipeline for a request
## Using search pipelines
As an alternative to creating a search pipeline, you can define a temporary search pipeline to be used for only the current query:
To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter:
```json
POST /my-index/_search
{
"query" : {
"match" : {
"text_field" : "some search text"
}
},
"pipeline" : {
"request_processors": [
{
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"query" : {
"term": {
"visibility": "public"
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
}
GET /my_index/_search?search_pipeline=my_pipeline
```
{% include copy-curl.html %}
With this syntax, the pipeline does not persist and is used only for the query for which it is specified.
Alternatively, you can use a temporary pipeline with a request or set a default pipeline for an index. To learn more, see [Using a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/).
## Retrieving search pipelines
@ -255,110 +147,6 @@ GET /_search/pipeline/my*
```
{% include copy-curl.html %}
## Using a search pipeline
To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter:
```json
GET /my_index/_search?search_pipeline=my_pipeline
```
{% include copy-curl.html %}
For a complete example of using a search pipeline with a `filter_query` processor, see [`filter_query` processor example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor#example).
## Default search pipeline
For convenience, you can set a default search pipeline for an index. Once your index has a default pipeline, you don't need to specify the `search_pipeline` query parameter in every search request.
### Setting a default search pipeline for an index
To set a default search pipeline for an index, specify the `index.search.default_pipeline` in the index's settings:
```json
PUT /my_index/_settings
{
"index.search.default_pipeline" : "my_pipeline"
}
```
{% include copy-curl.html %}
After setting the default pipeline for `my_index`, you can try the same search for all documents:
```json
GET /my_index/_search
```
{% include copy-curl.html %}
The response contains only the public document, indicating that the pipeline was applied by default:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 19,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 0.0,
"_source" : {
"message" : "This is a public message",
"visibility" : "public"
}
}
]
}
}
```
</details>
### Disabling the default pipeline for a request
If you want to run a search request without applying the default pipeline, you can set the `search_pipeline` query parameter to `_none`:
```json
GET /my_index/_search?search_pipeline=_none
```
{% include copy-curl.html %}
### Removing the default pipeline
To remove the default pipeline from an index, set it to `null` or `_none`:
```json
PUT /my_index/_settings
{
"index.search.default_pipeline" : null
}
```
{% include copy-curl.html %}
```json
PUT /my_index/_settings
{
"index.search.default_pipeline" : "_none"
}
```
{% include copy-curl.html %}
## Updating a search pipeline
To update a search pipeline dynamically, replace the search pipeline using the Search Pipeline API.
@ -454,160 +242,4 @@ The response contains the pipeline version:
## Search pipeline metrics
To view search pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/):
```json
GET /_nodes/stats/search_pipeline
```
{% include copy-curl.html %}
The response contains statistics for all search pipelines:
```json
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "runTask",
"nodes" : {
"CpvTK7KuRD6Oww8TTp8g2Q" : {
"timestamp" : 1689007282929,
"name" : "runTask-0",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1:9300",
"roles" : [
"cluster_manager",
"data",
"ingest",
"remote_cluster_client"
],
"attributes" : {
"testattr" : "test",
"shard_indexing_pressure_enabled" : "true"
},
"search_pipeline" : {
"total_request" : {
"count" : 5,
"time_in_millis" : 158,
"current" : 0,
"failed" : 0
},
"total_response" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
},
"pipelines" : {
"public_info" : {
"request" : {
"count" : 3,
"time_in_millis" : 71,
"current" : 0,
"failed" : 0
},
"response" : {
"count" : 0,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
},
"request_processors" : [
{
"filter_query:abc" : {
"type" : "filter_query",
"stats" : {
"count" : 1,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query" : {
"type" : "filter_query",
"stats" : {
"count" : 4,
"time_in_millis" : 2,
"current" : 0,
"failed" : 0
}
}
}
],
"response_processors" : [ ]
},
"guest_pipeline" : {
"request" : {
"count" : 2,
"time_in_millis" : 87,
"current" : 0,
"failed" : 0
},
"response" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
},
"request_processors" : [
{
"script" : {
"type" : "script",
"stats" : {
"count" : 2,
"time_in_millis" : 86,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query:abc" : {
"type" : "filter_query",
"stats" : {
"count" : 1,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query" : {
"type" : "filter_query",
"stats" : {
"count" : 3,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
}
],
"response_processors" : [
{
"rename_field" : {
"type" : "rename_field",
"stats" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
}
}
}
]
}
}
}
}
}
}
```
For descriptions of each field in the response, see the [Nodes Stats search pipeline section]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#search_pipeline).
For information about retrieving search pipeline statistics, see [Search pipeline metrics]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-pipeline-metrics/).

View File

@ -3,8 +3,8 @@ layout: default
title: Personalize search ranking processor
nav_order: 40
has_children: false
parent: Search pipelines
grand_parent: Search
parent: Search processors
grand_parent: Search pipelines
---
# Personalize search ranking processor

View File

@ -3,8 +3,8 @@ layout: default
title: Rename field processor
nav_order: 20
has_children: false
parent: Search pipelines
grand_parent: Search
parent: Search processors
grand_parent: Search pipelines
---
# Rename field processor

View File

@ -3,8 +3,8 @@ layout: default
title: Script processor
nav_order: 30
has_children: false
parent: Search pipelines
grand_parent: Search
parent: Search processors
grand_parent: Search pipelines
---
# Script processor

View File

@ -0,0 +1,168 @@
---
layout: default
title: Search pipeline metrics
nav_order: 40
has_children: false
parent: Search pipelines
grand_parent: Search
---
# Search pipeline metrics
To view search pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/):
```json
GET /_nodes/stats/search_pipeline
```
{% include copy-curl.html %}
The response contains statistics for all search pipelines:
```json
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "runTask",
"nodes" : {
"CpvTK7KuRD6Oww8TTp8g2Q" : {
"timestamp" : 1689007282929,
"name" : "runTask-0",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1:9300",
"roles" : [
"cluster_manager",
"data",
"ingest",
"remote_cluster_client"
],
"attributes" : {
"testattr" : "test",
"shard_indexing_pressure_enabled" : "true"
},
"search_pipeline" : {
"total_request" : {
"count" : 5,
"time_in_millis" : 158,
"current" : 0,
"failed" : 0
},
"total_response" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
},
"pipelines" : {
"public_info" : {
"request" : {
"count" : 3,
"time_in_millis" : 71,
"current" : 0,
"failed" : 0
},
"response" : {
"count" : 0,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
},
"request_processors" : [
{
"filter_query:abc" : {
"type" : "filter_query",
"stats" : {
"count" : 1,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query" : {
"type" : "filter_query",
"stats" : {
"count" : 4,
"time_in_millis" : 2,
"current" : 0,
"failed" : 0
}
}
}
],
"response_processors" : [ ]
},
"guest_pipeline" : {
"request" : {
"count" : 2,
"time_in_millis" : 87,
"current" : 0,
"failed" : 0
},
"response" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
},
"request_processors" : [
{
"script" : {
"type" : "script",
"stats" : {
"count" : 2,
"time_in_millis" : 86,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query:abc" : {
"type" : "filter_query",
"stats" : {
"count" : 1,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query" : {
"type" : "filter_query",
"stats" : {
"count" : 3,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
}
],
"response_processors" : [
{
"rename_field" : {
"type" : "rename_field",
"stats" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
}
}
}
]
}
}
}
}
}
}
```
For descriptions of each field in the response, see the [Nodes Stats search pipeline section]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#search_pipeline).

View File

@ -0,0 +1,101 @@
---
layout: default
title: Search processors
nav_order: 50
has_children: true
parent: Search pipelines
grand_parent: Search
---
# Search processors
Search processors can be of the following types:
- [Search request processors](#search-request-processors)
- [Search response processors](#search-response-processors)
## Search request processors
The following table lists all supported search request processors.
Processor | Description | Earliest available version
:--- | :--- | :---
[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8
[`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8
## Search response processors
The following table lists all supported search response processors.
Processor | Description | Earliest available version
:--- | :--- | :---
[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
[`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9
## Viewing available processor types
You can use the Nodes Search Pipelines API to view the available processor types:
```json
GET /_nodes/search_pipelines
```
{% include copy-curl.html %}
The response contains the `search_pipelines` object that lists the available request and response processors:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "runTask",
"nodes" : {
"36FHvCwHT6Srbm2ZniEPhA" : {
"name" : "runTask-0",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "3.0.0",
"build_type" : "tar",
"build_hash" : "unknown",
"roles" : [
"cluster_manager",
"data",
"ingest",
"remote_cluster_client"
],
"attributes" : {
"testattr" : "test",
"shard_indexing_pressure_enabled" : "true"
},
"search_pipelines" : {
"request_processors" : [
{
"type" : "filter_query"
},
{
"type" : "script"
}
],
"response_processors" : [
{
"type" : "rename_field"
}
]
}
}
}
}
```
</details>
In addition to the processors provided by OpenSearch, additional processors may be provided by plugins.
{: .note}

View File

@ -0,0 +1,160 @@
---
layout: default
title: Using a search pipeline
nav_order: 20
has_children: false
parent: Search pipelines
grand_parent: Search
---
# Using a search pipeline
You can use a search pipeline in the following ways:
- [Specify an existing pipeline](#specifying-an-existing-search-pipeline-for-a-request) for a request.
- [Use a temporary pipeline](#using-a-temporary-search-pipeline-for-a-request) for a request.
- Set a [default pipeline](#default-search-pipeline) for all requests in an index.
## Specifying an existing search pipeline for a request
After you [create a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index#creating-a-search-pipeline), you can use the pipeline with a query by specifying the pipeline name in the `search_pipeline` query parameter:
```json
GET /my_index/_search?search_pipeline=my_pipeline
```
{% include copy-curl.html %}
For a complete example of using a search pipeline with a `filter_query` processor, see [`filter_query` processor example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor#example).
## Using a temporary search pipeline for a request
As an alternative to creating a search pipeline, you can define a temporary search pipeline to be used for only the current query:
```json
POST /my-index/_search
{
"query" : {
"match" : {
"text_field" : "some search text"
}
},
"pipeline" : {
"request_processors": [
{
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"query" : {
"term": {
"visibility": "public"
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
}
```
{% include copy-curl.html %}
With this syntax, the pipeline does not persist and is used only for the query for which it is specified.
## Default search pipeline
For convenience, you can set a default search pipeline for an index. Once your index has a default pipeline, you don't need to specify the `search_pipeline` query parameter in every search request.
### Setting a default search pipeline for an index
To set a default search pipeline for an index, specify the `index.search.default_pipeline` in the index's settings:
```json
PUT /my_index/_settings
{
"index.search.default_pipeline" : "my_pipeline"
}
```
{% include copy-curl.html %}
After setting the default pipeline for `my_index`, you can try the same search for all documents:
```json
GET /my_index/_search
```
{% include copy-curl.html %}
The response contains only the public document, indicating that the pipeline was applied by default:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 19,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 0.0,
"_source" : {
"message" : "This is a public message",
"visibility" : "public"
}
}
]
}
}
```
</details>
### Disabling the default pipeline for a request
If you want to run a search request without applying the default pipeline, you can set the `search_pipeline` query parameter to `_none`:
```json
GET /my_index/_search?search_pipeline=_none
```
{% include copy-curl.html %}
### Removing the default pipeline
To remove the default pipeline from an index, set it to `null` or `_none`:
```json
PUT /my_index/_settings
{
"index.search.default_pipeline" : null
}
```
{% include copy-curl.html %}
```json
PUT /my_index/_settings
{
"index.search.default_pipeline" : "_none"
}
```
{% include copy-curl.html %}