opensearch-docs-cn/_search-plugins/search-pipelines/oversample-processor.md

293 lines
6.5 KiB
Markdown
Raw Normal View History

Add documentation for collapse, oversample, truncate_hits processors (#5881) * Add documentation for collapse, oversample, truncate_hits processors Signed-off-by: Michael Froh <froh@amazon.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/oversample-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/collapse-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/oversample-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/truncate-hits-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/collapse-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/collapse-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/truncate-hits-processor.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * More editorial comments and link fixes Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Add oversample and deduplicate to vale and format files nicely Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update _search-plugins/search-pipelines/truncate-hits-processor.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Michael Froh <froh@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
2024-02-01 14:02:26 -05:00
---
layout: default
title: Oversample
nav_order: 17
has_children: false
parent: Search processors
grand_parent: Search pipelines
---
# Oversample processor
The `oversample` request processor multiplies the `size` parameter of the search request by a specified `sample_factor` (>= 1.0), saving the original value in the `original_size` pipeline variable. The `oversample` processor is designed to work with the [`truncate_hits` response processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/) but may be used on its own.
## Request fields
The following table lists all request fields.
Field | Data type | Description
:--- | :--- | :---
`sample_factor` | Float | The multiplicative factor (>= 1.0) that will be applied to the `size` parameter before processing the search request. Required.
`context_prefix` | String | May be used to scope the `original_size` variable in order to avoid collisions. Optional.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
## Example
The following example demonstrates using a search pipeline with an `oversample` processor.
### Setup
Create an index named `my_index` containing many documents:
```json
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "doc": { "title" : "document 1" }}
{ "create":{"_index":"my_index","_id":2}}
{ "doc": { "title" : "document 2" }}
{ "create":{"_index":"my_index","_id":3}}
{ "doc": { "title" : "document 3" }}
{ "create":{"_index":"my_index","_id":4}}
{ "doc": { "title" : "document 4" }}
{ "create":{"_index":"my_index","_id":5}}
{ "doc": { "title" : "document 5" }}
{ "create":{"_index":"my_index","_id":6}}
{ "doc": { "title" : "document 6" }}
{ "create":{"_index":"my_index","_id":7}}
{ "doc": { "title" : "document 7" }}
{ "create":{"_index":"my_index","_id":8}}
{ "doc": { "title" : "document 8" }}
{ "create":{"_index":"my_index","_id":9}}
{ "doc": { "title" : "document 9" }}
{ "create":{"_index":"my_index","_id":10}}
{ "doc": { "title" : "document 10" }}
```
{% include copy-curl.html %}
### Creating a search pipeline
The following request creates a search pipeline named `my_pipeline` with an `oversample` request processor that requests 50% more hits than specified in `size`:
```json
PUT /_search/pipeline/my_pipeline
{
"request_processors": [
{
"oversample" : {
"tag" : "oversample_1",
"description" : "This processor will multiply `size` by 1.5.",
"sample_factor" : 1.5
}
}
]
}
```
{% include copy-curl.html %}
### Using a search pipeline
Search for documents in `my_index` without a search pipeline:
```json
POST /my_index/_search
{
"size": 5
}
```
{% include copy-curl.html %}
The response contains five hits:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
}
]
}
}
```
</details>
To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
```json
POST /my_index/_search?search_pipeline=my_pipeline
{
"size": 5
}
```
{% include copy-curl.html %}
The response contains 8 documents (5 * 1.5 = 7.5, rounded up to 8):
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
},
{
"_index" : "my_index",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 6"
}
}
},
{
"_index" : "my_index",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 7"
}
}
},
{
"_index" : "my_index",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 8"
}
}
}
]
}
}
```
</details>