Search pipeline GA documentation (#4553)

* Search pipeline GA documentation Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Add ad-hoc pipelines and ignore failure flag Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Rewording Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update script-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/index.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/index.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-07-14 16:28:27 -04:00 · 2023-07-14 16:28:27 -04:00 · 480589cda1
commit 480589cda1
parent 2e1773f85e
4 changed files with 234 additions and 25 deletions
--- a/_search-plugins/search-pipelines/filter-query-processor.md
+++ b/_search-plugins/search-pipelines/filter-query-processor.md
@ -9,9 +9,6 @@ grand_parent: Search

 # Filter query processor

-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).    
-{: .warning}
-
 The `filter_query` search request processor intercepts a search request and applies an additional query to the request, filtering the results. This is useful when you don't want to rewrite existing queries in your application but need additional filtering of the results.

 ## Request fields
@ -23,6 +20,7 @@ Field | Data type | Description
 `query` | Object | A query in query domain-specific language (DSL). For a list of OpenSearch query types, see [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). Required. 
 `tag` | String | The processor's identifier. Optional.
 `description` | String | A description of the processor. Optional.
+`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.

 ## Example 

--- a/_search-plugins/search-pipelines/index.md
+++ b/_search-plugins/search-pipelines/index.md
@ -8,20 +8,8 @@ has_toc: false

 # Search pipelines

-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).    
-{: .warning}
-
 You can use _search pipelines_ to build new or reuse existing result rerankers, query rewriters, and other components that operate on queries or results. Search pipelines make it easier for you to process search queries and search results within OpenSearch. Moving some of your application functionality into an OpenSearch search pipeline reduces the overall complexity of your application. As part of a search pipeline, you specify a list of processors that perform modular tasks. You can then easily add or reorder these processors to customize search results for your application. 

-## Enabling search pipelines
-
-Search pipeline functionality is disabled by default. To enable it, edit the configuration in `opensearch.yml` and then restart your cluster:
-
-1. Navigate to the OpenSearch config directory.
-1. Open the `opensearch.yml` configuration file. 
-1. Add `opensearch.experimental.feature.search_pipeline.enabled: true` and save the configuration file.
-1. Restart your cluster.
-
 ## Terminology

 The following is a list of search pipeline terminology:
@ -123,7 +111,7 @@ Search pipelines are stored in the cluster state. To create a search pipeline, y

 #### Example request

-The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages:
+The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`:

 ```json
 PUT /_search/pipeline/my_pipeline 
@ -140,11 +128,79 @@ PUT /_search/pipeline/my_pipeline
        }
      }
    }
+  ],
+  "response_processors": [
+    {
+      "rename_field": {
+        "field": "message",
+        "target_field": "notification"
+      }
+    }
  ]
 }
 ```
 {% include copy-curl.html %}

+### Ignoring processor failures
+
+By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline:
+
+```json
+"filter_query" : {
+  "tag" : "tag1",
+  "description" : "This processor is going to restrict to publicly visible documents",
+  "ignore_failure": true,
+  "query" : {
+    "term": {
+      "visibility": "public"
+    }
+  }
+}
+```
+
+If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics](#search-pipeline-metrics). 
+
+## Using a temporary search pipeline for a request
+
+As an alternative to creating a search pipeline, you can define a temporary search pipeline to be used for only the current query:
+
+```json
+POST /my-index/_search
+{
+  "query" : {
+    "match" : {
+      "text_field" : "some search text"
+    }
+  },
+  "pipeline" : {
+    "request_processors": [
+      {
+        "filter_query" : {
+          "tag" : "tag1",
+          "description" : "This processor is going to restrict to publicly visible documents",
+          "query" : {
+            "term": {
+              "visibility": "public"
+            }
+          }
+        }
+      }
+    ],
+    "response_processors": [
+      {
+        "rename_field": {
+          "field": "message",
+          "target_field": "notification"
+        }
+      }
+    ]
+  }
+}
+```
+{% include copy-curl.html %}
+
+With this syntax, the pipeline does not persist and is used only for the query for which it is specified.
+
 ## Retrieving search pipelines

 To retrieve the details of an existing search pipeline, use the Search Pipeline API. 
@ -201,7 +257,7 @@ GET /_search/pipeline/my*

 ## Using a search pipeline

-To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
+To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter:

 ```json
 GET /my_index/_search?search_pipeline=my_pipeline
@ -393,4 +449,164 @@ The response contains the pipeline version:
  }
 }
 ```
-</details>
+</details>
+
+## Search pipeline metrics
+
+To view search pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/):
+
+```json
+GET /_nodes/stats/search_pipeline
+```
+{% include copy-curl.html %}
+
+The response contains statistics for all search pipelines:
+
+```json
+{
+  "_nodes" : {
+    "total" : 1,
+    "successful" : 1,
+    "failed" : 0
+  },
+  "cluster_name" : "runTask",
+  "nodes" : {
+    "CpvTK7KuRD6Oww8TTp8g2Q" : {
+      "timestamp" : 1689007282929,
+      "name" : "runTask-0",
+      "transport_address" : "127.0.0.1:9300",
+      "host" : "127.0.0.1",
+      "ip" : "127.0.0.1:9300",
+      "roles" : [
+        "cluster_manager",
+        "data",
+        "ingest",
+        "remote_cluster_client"
+      ],
+      "attributes" : {
+        "testattr" : "test",
+        "shard_indexing_pressure_enabled" : "true"
+      },
+      "search_pipeline" : {
+        "total_request" : {
+          "count" : 5,
+          "time_in_millis" : 158,
+          "current" : 0,
+          "failed" : 0
+        },
+        "total_response" : {
+          "count" : 2,
+          "time_in_millis" : 1,
+          "current" : 0,
+          "failed" : 0
+        },
+        "pipelines" : {
+          "public_info" : {
+            "request" : {
+              "count" : 3,
+              "time_in_millis" : 71,
+              "current" : 0,
+              "failed" : 0
+            },
+            "response" : {
+              "count" : 0,
+              "time_in_millis" : 0,
+              "current" : 0,
+              "failed" : 0
+            },
+            "request_processors" : [
+              {
+                "filter_query:abc" : {
+                  "type" : "filter_query",
+                  "stats" : {
+                    "count" : 1,
+                    "time_in_millis" : 0,
+                    "current" : 0,
+                    "failed" : 0
+                  }
+                }
+              },
+              {
+                "filter_query" : {
+                  "type" : "filter_query",
+                  "stats" : {
+                    "count" : 4,
+                    "time_in_millis" : 2,
+                    "current" : 0,
+                    "failed" : 0
+                  }
+                }
+              }
+            ],
+            "response_processors" : [ ]
+          },
+          "guest_pipeline" : {
+            "request" : {
+              "count" : 2,
+              "time_in_millis" : 87,
+              "current" : 0,
+              "failed" : 0
+            },
+            "response" : {
+              "count" : 2,
+              "time_in_millis" : 1,
+              "current" : 0,
+              "failed" : 0
+            },
+            "request_processors" : [
+              {
+                "script" : {
+                  "type" : "script",
+                  "stats" : {
+                    "count" : 2,
+                    "time_in_millis" : 86,
+                    "current" : 0,
+                    "failed" : 0
+                  }
+                }
+              },
+              {
+                "filter_query:abc" : {
+                  "type" : "filter_query",
+                  "stats" : {
+                    "count" : 1,
+                    "time_in_millis" : 0,
+                    "current" : 0,
+                    "failed" : 0
+                  }
+                }
+              },
+              {
+                "filter_query" : {
+                  "type" : "filter_query",
+                  "stats" : {
+                    "count" : 3,
+                    "time_in_millis" : 0,
+                    "current" : 0,
+                    "failed" : 0
+                  }
+                }
+              }
+            ],
+            "response_processors" : [
+              {
+                "rename_field" : {
+                  "type" : "rename_field",
+                  "stats" : {
+                    "count" : 2,
+                    "time_in_millis" : 1,
+                    "current" : 0,
+                    "failed" : 0
+                  }
+                }
+              }
+            ]
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+For descriptions of each field in the response, see the [Nodes Stats search pipeline section]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#search_pipeline). 
--- a/_search-plugins/search-pipelines/rename-field-processor.md
+++ b/_search-plugins/search-pipelines/rename-field-processor.md
@ -9,9 +9,6 @@ grand_parent: Search

 # Rename field processor

-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).    
-{: .warning}
-
 The `rename_field` search response processor intercepts a search response and renames the specified field. This is useful when your index and your application use different names for the same field. For example, if you rename a field in your index, the `rename_field` processor can change the new name to the old one before sending the response to your application.

 ## Request fields
@ -24,6 +21,7 @@ Field | Data type | Description
 `target_field` | String | The new field name. Required.
 `tag` | String | The processor's identifier. 
 `description` | String | A description of the processor. 
+`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.

 ## Example 

--- a/_search-plugins/search-pipelines/script-processor.md
+++ b/_search-plugins/search-pipelines/script-processor.md
@ -9,9 +9,6 @@ grand_parent: Search

 # Script processor

-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).    
-{: .warning}
-
 The `script` search request processor intercepts a search request and adds an inline Painless script that is run on incoming requests. The script can only run on the following request fields:

 - `from` 
@ -37,6 +34,7 @@ Field | Data type | Description
 `lang` | String | The script language. Optional. Only `painless` is supported.
 `tag` | String | The processor's identifier. Optional.
 `description` | String | A description of the processor. Optional.
+`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.

 ## Example 

@ -57,4 +55,3 @@ PUT /_search/pipeline/explain_one_result
 } 
 ```
 {% include copy-curl.html %}
-