Add documentation for collapse, oversample, truncate_hits processors (#5881)

* Add documentation for collapse, oversample, truncate_hits processors

Signed-off-by: Michael Froh <froh@amazon.com>

* Apply suggestions from code review

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/search-pipelines/oversample-processor.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/search-pipelines/collapse-processor.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/search-pipelines/oversample-processor.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/search-pipelines/truncate-hits-processor.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/search-pipelines/collapse-processor.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/search-pipelines/collapse-processor.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/search-pipelines/truncate-hits-processor.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* More editorial comments and link fixes

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add oversample and deduplicate to vale and format files nicely

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Update _search-plugins/search-pipelines/truncate-hits-processor.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
This commit is contained in:
Michael Froh 2024-02-01 19:02:26 +00:00 committed by GitHub
parent 6af66500eb
commit 83f91acd5b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 958 additions and 1 deletions

View File

@ -75,6 +75,7 @@ Levenshtein
[Mm]ultivalued
[Mm]ultiword
[Nn]amespace
[Oo]versamples?
pebibyte
[Pp]luggable
[Pp]reconfigure

View File

@ -0,0 +1,144 @@
---
layout: default
title: Collapse
nav_order: 7
has_children: false
parent: Search processors
grand_parent: Search pipelines
---
# Collapse processor
The `collapse` response processor discards hits that have the same value for a particular field as a previous document in the result set.
This is similar to passing the `collapse` parameter in a search request, but the response processor is applied to the
response after fetching from all shards. The `collapse` response processor may be used in conjunction with the `rescore` search
request parameter or may be applied after a reranking response processor.
Using the `collapse` response processor will likely result in fewer than `size` results being returned because hits are discarded
from a set whose size is already less than or equal to `size`. To increase the likelihood of returning `size` hits, use the
`oversample` request processor and `truncate_hits` response processor, as shown in [this example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/#oversample-collapse-and-truncate-hits).
## Request fields
The following table lists all request fields.
Field | Data type | Description
:--- | :--- | :---
`field` | String | The field whose value will be read from each returned search hit. Only the first hit for each given field value will be returned in the search response. Required.
`context_prefix` | String | May be used to read the `original_size` variable from a specific scope in order to avoid collisions. Optional.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
## Example
The following example demonstrates using a search pipeline with a `collapse` processor.
### Setup
Create many documents containing a field to use for collapsing:
```json
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "title" : "document 1", "color":"blue" }
{ "create":{"_index":"my_index","_id":2}}
{ "title" : "document 2", "color":"blue" }
{ "create":{"_index":"my_index","_id":3}}
{ "title" : "document 3", "color":"red" }
{ "create":{"_index":"my_index","_id":4}}
{ "title" : "document 4", "color":"red" }
{ "create":{"_index":"my_index","_id":5}}
{ "title" : "document 5", "color":"yellow" }
{ "create":{"_index":"my_index","_id":6}}
{ "title" : "document 6", "color":"yellow" }
{ "create":{"_index":"my_index","_id":7}}
{ "title" : "document 7", "color":"orange" }
{ "create":{"_index":"my_index","_id":8}}
{ "title" : "document 8", "color":"orange" }
{ "create":{"_index":"my_index","_id":9}}
{ "title" : "document 9", "color":"green" }
{ "create":{"_index":"my_index","_id":10}}
{ "title" : "document 10", "color":"green" }
```
{% include copy-curl.html %}
Create a pipeline that only collapses on the `color` field:
```json
PUT /_search/pipeline/collapse_pipeline
{
"response_processors": [
{
"collapse" : {
"field": "color"
}
}
]
}
```
{% include copy-curl.html %}
### Using a search pipeline
In this example, you request the top three documents before collapsing on the `color` field. Because the first two documents have the same `color`, the second one is discarded,
and the request returns the first and third documents:
```json
POST /my_index/_search?search_pipeline=collapse_pipeline
{
"size": 3
}
```
{% include copy-curl.html %}
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "document 1",
"color" : "blue"
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "document 3",
"color" : "red"
}
}
]
},
"profile" : {
"shards" : [ ]
}
}
```
</details>

View File

@ -0,0 +1,292 @@
---
layout: default
title: Oversample
nav_order: 17
has_children: false
parent: Search processors
grand_parent: Search pipelines
---
# Oversample processor
The `oversample` request processor multiplies the `size` parameter of the search request by a specified `sample_factor` (>= 1.0), saving the original value in the `original_size` pipeline variable. The `oversample` processor is designed to work with the [`truncate_hits` response processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/) but may be used on its own.
## Request fields
The following table lists all request fields.
Field | Data type | Description
:--- | :--- | :---
`sample_factor` | Float | The multiplicative factor (>= 1.0) that will be applied to the `size` parameter before processing the search request. Required.
`context_prefix` | String | May be used to scope the `original_size` variable in order to avoid collisions. Optional.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
## Example
The following example demonstrates using a search pipeline with an `oversample` processor.
### Setup
Create an index named `my_index` containing many documents:
```json
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "doc": { "title" : "document 1" }}
{ "create":{"_index":"my_index","_id":2}}
{ "doc": { "title" : "document 2" }}
{ "create":{"_index":"my_index","_id":3}}
{ "doc": { "title" : "document 3" }}
{ "create":{"_index":"my_index","_id":4}}
{ "doc": { "title" : "document 4" }}
{ "create":{"_index":"my_index","_id":5}}
{ "doc": { "title" : "document 5" }}
{ "create":{"_index":"my_index","_id":6}}
{ "doc": { "title" : "document 6" }}
{ "create":{"_index":"my_index","_id":7}}
{ "doc": { "title" : "document 7" }}
{ "create":{"_index":"my_index","_id":8}}
{ "doc": { "title" : "document 8" }}
{ "create":{"_index":"my_index","_id":9}}
{ "doc": { "title" : "document 9" }}
{ "create":{"_index":"my_index","_id":10}}
{ "doc": { "title" : "document 10" }}
```
{% include copy-curl.html %}
### Creating a search pipeline
The following request creates a search pipeline named `my_pipeline` with an `oversample` request processor that requests 50% more hits than specified in `size`:
```json
PUT /_search/pipeline/my_pipeline
{
"request_processors": [
{
"oversample" : {
"tag" : "oversample_1",
"description" : "This processor will multiply `size` by 1.5.",
"sample_factor" : 1.5
}
}
]
}
```
{% include copy-curl.html %}
### Using a search pipeline
Search for documents in `my_index` without a search pipeline:
```json
POST /my_index/_search
{
"size": 5
}
```
{% include copy-curl.html %}
The response contains five hits:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
}
]
}
}
```
</details>
To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
```json
POST /my_index/_search?search_pipeline=my_pipeline
{
"size": 5
}
```
{% include copy-curl.html %}
The response contains 8 documents (5 * 1.5 = 7.5, rounded up to 8):
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
},
{
"_index" : "my_index",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 6"
}
}
},
{
"_index" : "my_index",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 7"
}
}
},
{
"_index" : "my_index",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 8"
}
}
}
]
}
}
```
</details>

View File

@ -1,7 +1,7 @@
---
layout: default
title: Personalize search ranking
nav_order: 40
nav_order: 18
has_children: false
parent: Search processors
grand_parent: Search pipelines

View File

@ -26,6 +26,8 @@ Processor | Description | Earliest available version
[`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8
[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search at the index or field level. | 2.11
[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8
[`oversample`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/oversample-processor/) | Increases the search request `size` parameter, storing the original value in the pipeline state. | 2.12
## Search response processors
@ -37,6 +39,8 @@ Processor | Description | Earliest available version
:--- | :--- | :---
[`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9
[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12
[`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor. | 2.12
## Search phase results processors

View File

@ -0,0 +1,516 @@
---
layout: default
title: Truncate hits
nav_order: 35
has_children: false
parent: Search processors
grand_parent: Search pipelines
---
# Truncate hits processor
The `truncate_hits` response processor discards returned search hits after a given hit count is reached. The `truncate_hits` processor is designed to work with the [`oversample` request processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/oversample-processor/) but may be used on its own.
The `target_size` parameter (which specifies where to truncate) is optional. If it is not specified, then OpenSearch uses the `original_size` variable set by the
`oversample` processor (if available).
The following is a common usage pattern:
1. Add the `oversample` processor to a request pipeline to fetch a larger set of results.
1. In the response pipeline, apply a reranking processor (which may promote results from beyond the originally requested top N) or the `collapse` processor (which may discard results after deduplication).
1. Apply the `truncate` processor to return (at most) the originally requested number of hits.
## Request fields
The following table lists all request fields.
Field | Data type | Description
:--- | :--- | :---
`target_size` | Integer | The maximum number of search hits to return (>=0). If not specified, the processor will try to read the `original_size` variable and will fail if it is not available. Optional.
`context_prefix` | String | May be used to read the `original_size` variable from a specific scope in order to avoid collisions. Optional.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
## Example
The following example demonstrates using a search pipeline with a `truncate` processor.
### Setup
Create an index named `my_index` containing many documents:
```json
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "doc": { "title" : "document 1" }}
{ "create":{"_index":"my_index","_id":2}}
{ "doc": { "title" : "document 2" }}
{ "create":{"_index":"my_index","_id":3}}
{ "doc": { "title" : "document 3" }}
{ "create":{"_index":"my_index","_id":4}}
{ "doc": { "title" : "document 4" }}
{ "create":{"_index":"my_index","_id":5}}
{ "doc": { "title" : "document 5" }}
{ "create":{"_index":"my_index","_id":6}}
{ "doc": { "title" : "document 6" }}
{ "create":{"_index":"my_index","_id":7}}
{ "doc": { "title" : "document 7" }}
{ "create":{"_index":"my_index","_id":8}}
{ "doc": { "title" : "document 8" }}
{ "create":{"_index":"my_index","_id":9}}
{ "doc": { "title" : "document 9" }}
{ "create":{"_index":"my_index","_id":10}}
{ "doc": { "title" : "document 10" }}
```
{% include copy-curl.html %}
### Creating a search pipeline
The following request creates a search pipeline named `my_pipeline` with a `truncate_hits` response processor that discards hits after the first five:
```json
PUT /_search/pipeline/my_pipeline
{
"response_processors": [
{
"truncate_hits" : {
"tag" : "truncate_1",
"description" : "This processor will discard results after the first 5.",
"target_size" : 5
}
}
]
}
```
{% include copy-curl.html %}
### Using a search pipeline
Search for documents in `my_index` without a search pipeline:
```json
POST /my_index/_search
{
"size": 8
}
```
{% include copy-curl.html %}
The response contains eight hits:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
},
{
"_index" : "my_index",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 6"
}
}
},
{
"_index" : "my_index",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 7"
}
}
},
{
"_index" : "my_index",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 8"
}
}
}
]
}
}
```
</details>
To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
```json
POST /my_index/_search?search_pipeline=my_pipeline
{
"size": 8
}
```
{% include copy-curl.html %}
The response contains only 5 hits, even though 8 were requested and 10 were available:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
}
]
}
}
```
</details>
## Oversample, collapse, and truncate hits
The following is a more realistic example in which you use `oversample` to request many candidate documents, use `collapse` to remove documents that duplicate a particular field (to get more diverse results), and then use `truncate` to return the originally requested document count (to avoid returning a large result payload from the cluster).
### Setup
Create many documents containing a field that you'll use for collapsing:
```json
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "title" : "document 1", "color":"blue" }
{ "create":{"_index":"my_index","_id":2}}
{ "title" : "document 2", "color":"blue" }
{ "create":{"_index":"my_index","_id":3}}
{ "title" : "document 3", "color":"red" }
{ "create":{"_index":"my_index","_id":4}}
{ "title" : "document 4", "color":"red" }
{ "create":{"_index":"my_index","_id":5}}
{ "title" : "document 5", "color":"yellow" }
{ "create":{"_index":"my_index","_id":6}}
{ "title" : "document 6", "color":"yellow" }
{ "create":{"_index":"my_index","_id":7}}
{ "title" : "document 7", "color":"orange" }
{ "create":{"_index":"my_index","_id":8}}
{ "title" : "document 8", "color":"orange" }
{ "create":{"_index":"my_index","_id":9}}
{ "title" : "document 9", "color":"green" }
{ "create":{"_index":"my_index","_id":10}}
{ "title" : "document 10", "color":"green" }
```
{% include copy-curl.html %}
Create a pipeline that collapses only on the `color` field:
```json
PUT /_search/pipeline/collapse_pipeline
{
"response_processors": [
{
"collapse" : {
"field": "color"
}
}
]
}
```
{% include copy-curl.html %}
Create another pipeline that oversamples, collapses, and then truncates results:
```json
PUT /_search/pipeline/oversampling_collapse_pipeline
{
"request_processors": [
{
"oversample": {
"sample_factor": 3
}
}
],
"response_processors": [
{
"collapse" : {
"field": "color"
}
},
{
"truncate_hits": {
"description": "Truncates back to the original size before oversample increased it."
}
}
]
}
```
{% include copy-curl.html %}
### Collapse without oversample
In this example, you request the top three documents before collapsing on the `color` field. Because the first two documents have the same `color`, the second one is discarded, and the request returns the first and third documents:
```json
POST /my_index/_search?search_pipeline=collapse_pipeline
{
"size": 3
}
```
{% include copy-curl.html %}
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "document 1",
"color" : "blue"
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "document 3",
"color" : "red"
}
}
]
},
"profile" : {
"shards" : [ ]
}
}
```
</details>
### Oversample, collapse, and truncate
Now you will use the `oversampling_collapse_pipeline`, which requests the top 9 documents (multiplying the size by 3), deduplicates by `color`, and then returns the top 3 hits:
```json
POST /my_index/_search?search_pipeline=oversampling_collapse_pipeline
{
"size": 3
}
```
{% include copy-curl.html %}
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "document 1",
"color" : "blue"
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "document 3",
"color" : "red"
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"title" : "document 5",
"color" : "yellow"
}
}
]
},
"profile" : {
"shards" : [ ]
}
}
```
</details>