Add documentation for collapse, oversample, truncate_hits processors (#5881)
* Add documentation for collapse, oversample, truncate_hits processors Signed-off-by: Michael Froh <froh@amazon.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/oversample-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/collapse-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/oversample-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/truncate-hits-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/collapse-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/collapse-processor.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/search-pipelines/truncate-hits-processor.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * More editorial comments and link fixes Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Add oversample and deduplicate to vale and format files nicely Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update _search-plugins/search-pipelines/truncate-hits-processor.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Michael Froh <froh@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
This commit is contained in:
parent
6af66500eb
commit
83f91acd5b
|
@ -75,6 +75,7 @@ Levenshtein
|
|||
[Mm]ultivalued
|
||||
[Mm]ultiword
|
||||
[Nn]amespace
|
||||
[Oo]versamples?
|
||||
pebibyte
|
||||
[Pp]luggable
|
||||
[Pp]reconfigure
|
||||
|
|
|
@ -0,0 +1,144 @@
|
|||
---
|
||||
layout: default
|
||||
title: Collapse
|
||||
nav_order: 7
|
||||
has_children: false
|
||||
parent: Search processors
|
||||
grand_parent: Search pipelines
|
||||
---
|
||||
|
||||
# Collapse processor
|
||||
|
||||
The `collapse` response processor discards hits that have the same value for a particular field as a previous document in the result set.
|
||||
This is similar to passing the `collapse` parameter in a search request, but the response processor is applied to the
|
||||
response after fetching from all shards. The `collapse` response processor may be used in conjunction with the `rescore` search
|
||||
request parameter or may be applied after a reranking response processor.
|
||||
|
||||
Using the `collapse` response processor will likely result in fewer than `size` results being returned because hits are discarded
|
||||
from a set whose size is already less than or equal to `size`. To increase the likelihood of returning `size` hits, use the
|
||||
`oversample` request processor and `truncate_hits` response processor, as shown in [this example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/#oversample-collapse-and-truncate-hits).
|
||||
|
||||
## Request fields
|
||||
|
||||
The following table lists all request fields.
|
||||
|
||||
Field | Data type | Description
|
||||
:--- | :--- | :---
|
||||
`field` | String | The field whose value will be read from each returned search hit. Only the first hit for each given field value will be returned in the search response. Required.
|
||||
`context_prefix` | String | May be used to read the `original_size` variable from a specific scope in order to avoid collisions. Optional.
|
||||
`tag` | String | The processor's identifier. Optional.
|
||||
`description` | String | A description of the processor. Optional.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates using a search pipeline with a `collapse` processor.
|
||||
|
||||
### Setup
|
||||
|
||||
Create many documents containing a field to use for collapsing:
|
||||
|
||||
```json
|
||||
POST /_bulk
|
||||
{ "create":{"_index":"my_index","_id":1}}
|
||||
{ "title" : "document 1", "color":"blue" }
|
||||
{ "create":{"_index":"my_index","_id":2}}
|
||||
{ "title" : "document 2", "color":"blue" }
|
||||
{ "create":{"_index":"my_index","_id":3}}
|
||||
{ "title" : "document 3", "color":"red" }
|
||||
{ "create":{"_index":"my_index","_id":4}}
|
||||
{ "title" : "document 4", "color":"red" }
|
||||
{ "create":{"_index":"my_index","_id":5}}
|
||||
{ "title" : "document 5", "color":"yellow" }
|
||||
{ "create":{"_index":"my_index","_id":6}}
|
||||
{ "title" : "document 6", "color":"yellow" }
|
||||
{ "create":{"_index":"my_index","_id":7}}
|
||||
{ "title" : "document 7", "color":"orange" }
|
||||
{ "create":{"_index":"my_index","_id":8}}
|
||||
{ "title" : "document 8", "color":"orange" }
|
||||
{ "create":{"_index":"my_index","_id":9}}
|
||||
{ "title" : "document 9", "color":"green" }
|
||||
{ "create":{"_index":"my_index","_id":10}}
|
||||
{ "title" : "document 10", "color":"green" }
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
Create a pipeline that only collapses on the `color` field:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/collapse_pipeline
|
||||
{
|
||||
"response_processors": [
|
||||
{
|
||||
"collapse" : {
|
||||
"field": "color"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Using a search pipeline
|
||||
|
||||
In this example, you request the top three documents before collapsing on the `color` field. Because the first two documents have the same `color`, the second one is discarded,
|
||||
and the request returns the first and third documents:
|
||||
|
||||
```json
|
||||
POST /my_index/_search?search_pipeline=collapse_pipeline
|
||||
{
|
||||
"size": 3
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"took" : 2,
|
||||
"timed_out" : false,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
},
|
||||
"hits" : {
|
||||
"total" : {
|
||||
"value" : 10,
|
||||
"relation" : "eq"
|
||||
},
|
||||
"max_score" : 1.0,
|
||||
"hits" : [
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "1",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"title" : "document 1",
|
||||
"color" : "blue"
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "3",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"title" : "document 3",
|
||||
"color" : "red"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
"profile" : {
|
||||
"shards" : [ ]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
|
@ -0,0 +1,292 @@
|
|||
---
|
||||
layout: default
|
||||
title: Oversample
|
||||
nav_order: 17
|
||||
has_children: false
|
||||
parent: Search processors
|
||||
grand_parent: Search pipelines
|
||||
---
|
||||
|
||||
# Oversample processor
|
||||
|
||||
The `oversample` request processor multiplies the `size` parameter of the search request by a specified `sample_factor` (>= 1.0), saving the original value in the `original_size` pipeline variable. The `oversample` processor is designed to work with the [`truncate_hits` response processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/) but may be used on its own.
|
||||
|
||||
## Request fields
|
||||
|
||||
The following table lists all request fields.
|
||||
|
||||
Field | Data type | Description
|
||||
:--- | :--- | :---
|
||||
`sample_factor` | Float | The multiplicative factor (>= 1.0) that will be applied to the `size` parameter before processing the search request. Required.
|
||||
`context_prefix` | String | May be used to scope the `original_size` variable in order to avoid collisions. Optional.
|
||||
`tag` | String | The processor's identifier. Optional.
|
||||
`description` | String | A description of the processor. Optional.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates using a search pipeline with an `oversample` processor.
|
||||
|
||||
### Setup
|
||||
|
||||
Create an index named `my_index` containing many documents:
|
||||
|
||||
```json
|
||||
POST /_bulk
|
||||
{ "create":{"_index":"my_index","_id":1}}
|
||||
{ "doc": { "title" : "document 1" }}
|
||||
{ "create":{"_index":"my_index","_id":2}}
|
||||
{ "doc": { "title" : "document 2" }}
|
||||
{ "create":{"_index":"my_index","_id":3}}
|
||||
{ "doc": { "title" : "document 3" }}
|
||||
{ "create":{"_index":"my_index","_id":4}}
|
||||
{ "doc": { "title" : "document 4" }}
|
||||
{ "create":{"_index":"my_index","_id":5}}
|
||||
{ "doc": { "title" : "document 5" }}
|
||||
{ "create":{"_index":"my_index","_id":6}}
|
||||
{ "doc": { "title" : "document 6" }}
|
||||
{ "create":{"_index":"my_index","_id":7}}
|
||||
{ "doc": { "title" : "document 7" }}
|
||||
{ "create":{"_index":"my_index","_id":8}}
|
||||
{ "doc": { "title" : "document 8" }}
|
||||
{ "create":{"_index":"my_index","_id":9}}
|
||||
{ "doc": { "title" : "document 9" }}
|
||||
{ "create":{"_index":"my_index","_id":10}}
|
||||
{ "doc": { "title" : "document 10" }}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Creating a search pipeline
|
||||
|
||||
The following request creates a search pipeline named `my_pipeline` with an `oversample` request processor that requests 50% more hits than specified in `size`:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/my_pipeline
|
||||
{
|
||||
"request_processors": [
|
||||
{
|
||||
"oversample" : {
|
||||
"tag" : "oversample_1",
|
||||
"description" : "This processor will multiply `size` by 1.5.",
|
||||
"sample_factor" : 1.5
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Using a search pipeline
|
||||
|
||||
Search for documents in `my_index` without a search pipeline:
|
||||
|
||||
```json
|
||||
POST /my_index/_search
|
||||
{
|
||||
"size": 5
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The response contains five hits:
|
||||
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"took" : 3,
|
||||
"timed_out" : false,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
},
|
||||
"hits" : {
|
||||
"total" : {
|
||||
"value" : 10,
|
||||
"relation" : "eq"
|
||||
},
|
||||
"max_score" : 1.0,
|
||||
"hits" : [
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "1",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 1"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "2",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 2"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "3",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 3"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "4",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 4"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "5",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 5"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
|
||||
|
||||
```json
|
||||
POST /my_index/_search?search_pipeline=my_pipeline
|
||||
{
|
||||
"size": 5
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The response contains 8 documents (5 * 1.5 = 7.5, rounded up to 8):
|
||||
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"took" : 13,
|
||||
"timed_out" : false,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
},
|
||||
"hits" : {
|
||||
"total" : {
|
||||
"value" : 10,
|
||||
"relation" : "eq"
|
||||
},
|
||||
"max_score" : 1.0,
|
||||
"hits" : [
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "1",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 1"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "2",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 2"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "3",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 3"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "4",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 4"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "5",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 5"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "6",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 6"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "7",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 7"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "8",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 8"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: default
|
||||
title: Personalize search ranking
|
||||
nav_order: 40
|
||||
nav_order: 18
|
||||
has_children: false
|
||||
parent: Search processors
|
||||
grand_parent: Search pipelines
|
||||
|
|
|
@ -26,6 +26,8 @@ Processor | Description | Earliest available version
|
|||
[`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8
|
||||
[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search at the index or field level. | 2.11
|
||||
[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8
|
||||
[`oversample`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/oversample-processor/) | Increases the search request `size` parameter, storing the original value in the pipeline state. | 2.12
|
||||
|
||||
|
||||
## Search response processors
|
||||
|
||||
|
@ -37,6 +39,8 @@ Processor | Description | Earliest available version
|
|||
:--- | :--- | :---
|
||||
[`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9
|
||||
[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
|
||||
[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12
|
||||
[`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor. | 2.12
|
||||
|
||||
## Search phase results processors
|
||||
|
||||
|
|
|
@ -0,0 +1,516 @@
|
|||
---
|
||||
layout: default
|
||||
title: Truncate hits
|
||||
nav_order: 35
|
||||
has_children: false
|
||||
parent: Search processors
|
||||
grand_parent: Search pipelines
|
||||
---
|
||||
|
||||
# Truncate hits processor
|
||||
|
||||
The `truncate_hits` response processor discards returned search hits after a given hit count is reached. The `truncate_hits` processor is designed to work with the [`oversample` request processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/oversample-processor/) but may be used on its own.
|
||||
|
||||
The `target_size` parameter (which specifies where to truncate) is optional. If it is not specified, then OpenSearch uses the `original_size` variable set by the
|
||||
`oversample` processor (if available).
|
||||
|
||||
The following is a common usage pattern:
|
||||
|
||||
1. Add the `oversample` processor to a request pipeline to fetch a larger set of results.
|
||||
1. In the response pipeline, apply a reranking processor (which may promote results from beyond the originally requested top N) or the `collapse` processor (which may discard results after deduplication).
|
||||
1. Apply the `truncate` processor to return (at most) the originally requested number of hits.
|
||||
|
||||
## Request fields
|
||||
|
||||
The following table lists all request fields.
|
||||
|
||||
Field | Data type | Description
|
||||
:--- | :--- | :---
|
||||
`target_size` | Integer | The maximum number of search hits to return (>=0). If not specified, the processor will try to read the `original_size` variable and will fail if it is not available. Optional.
|
||||
`context_prefix` | String | May be used to read the `original_size` variable from a specific scope in order to avoid collisions. Optional.
|
||||
`tag` | String | The processor's identifier. Optional.
|
||||
`description` | String | A description of the processor. Optional.
|
||||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates using a search pipeline with a `truncate` processor.
|
||||
|
||||
### Setup
|
||||
|
||||
Create an index named `my_index` containing many documents:
|
||||
|
||||
```json
|
||||
POST /_bulk
|
||||
{ "create":{"_index":"my_index","_id":1}}
|
||||
{ "doc": { "title" : "document 1" }}
|
||||
{ "create":{"_index":"my_index","_id":2}}
|
||||
{ "doc": { "title" : "document 2" }}
|
||||
{ "create":{"_index":"my_index","_id":3}}
|
||||
{ "doc": { "title" : "document 3" }}
|
||||
{ "create":{"_index":"my_index","_id":4}}
|
||||
{ "doc": { "title" : "document 4" }}
|
||||
{ "create":{"_index":"my_index","_id":5}}
|
||||
{ "doc": { "title" : "document 5" }}
|
||||
{ "create":{"_index":"my_index","_id":6}}
|
||||
{ "doc": { "title" : "document 6" }}
|
||||
{ "create":{"_index":"my_index","_id":7}}
|
||||
{ "doc": { "title" : "document 7" }}
|
||||
{ "create":{"_index":"my_index","_id":8}}
|
||||
{ "doc": { "title" : "document 8" }}
|
||||
{ "create":{"_index":"my_index","_id":9}}
|
||||
{ "doc": { "title" : "document 9" }}
|
||||
{ "create":{"_index":"my_index","_id":10}}
|
||||
{ "doc": { "title" : "document 10" }}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Creating a search pipeline
|
||||
|
||||
The following request creates a search pipeline named `my_pipeline` with a `truncate_hits` response processor that discards hits after the first five:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/my_pipeline
|
||||
{
|
||||
"response_processors": [
|
||||
{
|
||||
"truncate_hits" : {
|
||||
"tag" : "truncate_1",
|
||||
"description" : "This processor will discard results after the first 5.",
|
||||
"target_size" : 5
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Using a search pipeline
|
||||
|
||||
Search for documents in `my_index` without a search pipeline:
|
||||
|
||||
```json
|
||||
POST /my_index/_search
|
||||
{
|
||||
"size": 8
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The response contains eight hits:
|
||||
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"took" : 13,
|
||||
"timed_out" : false,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
},
|
||||
"hits" : {
|
||||
"total" : {
|
||||
"value" : 10,
|
||||
"relation" : "eq"
|
||||
},
|
||||
"max_score" : 1.0,
|
||||
"hits" : [
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "1",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 1"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "2",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 2"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "3",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 3"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "4",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 4"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "5",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 5"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "6",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 6"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "7",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 7"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "8",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 8"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
|
||||
|
||||
```json
|
||||
POST /my_index/_search?search_pipeline=my_pipeline
|
||||
{
|
||||
"size": 8
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The response contains only 5 hits, even though 8 were requested and 10 were available:
|
||||
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"took" : 3,
|
||||
"timed_out" : false,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
},
|
||||
"hits" : {
|
||||
"total" : {
|
||||
"value" : 10,
|
||||
"relation" : "eq"
|
||||
},
|
||||
"max_score" : 1.0,
|
||||
"hits" : [
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "1",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 1"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "2",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 2"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "3",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 3"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "4",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 4"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "5",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"doc" : {
|
||||
"title" : "document 5"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
## Oversample, collapse, and truncate hits
|
||||
|
||||
The following is a more realistic example in which you use `oversample` to request many candidate documents, use `collapse` to remove documents that duplicate a particular field (to get more diverse results), and then use `truncate` to return the originally requested document count (to avoid returning a large result payload from the cluster).
|
||||
|
||||
|
||||
### Setup
|
||||
|
||||
Create many documents containing a field that you'll use for collapsing:
|
||||
|
||||
```json
|
||||
POST /_bulk
|
||||
{ "create":{"_index":"my_index","_id":1}}
|
||||
{ "title" : "document 1", "color":"blue" }
|
||||
{ "create":{"_index":"my_index","_id":2}}
|
||||
{ "title" : "document 2", "color":"blue" }
|
||||
{ "create":{"_index":"my_index","_id":3}}
|
||||
{ "title" : "document 3", "color":"red" }
|
||||
{ "create":{"_index":"my_index","_id":4}}
|
||||
{ "title" : "document 4", "color":"red" }
|
||||
{ "create":{"_index":"my_index","_id":5}}
|
||||
{ "title" : "document 5", "color":"yellow" }
|
||||
{ "create":{"_index":"my_index","_id":6}}
|
||||
{ "title" : "document 6", "color":"yellow" }
|
||||
{ "create":{"_index":"my_index","_id":7}}
|
||||
{ "title" : "document 7", "color":"orange" }
|
||||
{ "create":{"_index":"my_index","_id":8}}
|
||||
{ "title" : "document 8", "color":"orange" }
|
||||
{ "create":{"_index":"my_index","_id":9}}
|
||||
{ "title" : "document 9", "color":"green" }
|
||||
{ "create":{"_index":"my_index","_id":10}}
|
||||
{ "title" : "document 10", "color":"green" }
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
Create a pipeline that collapses only on the `color` field:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/collapse_pipeline
|
||||
{
|
||||
"response_processors": [
|
||||
{
|
||||
"collapse" : {
|
||||
"field": "color"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
Create another pipeline that oversamples, collapses, and then truncates results:
|
||||
|
||||
```json
|
||||
PUT /_search/pipeline/oversampling_collapse_pipeline
|
||||
{
|
||||
"request_processors": [
|
||||
{
|
||||
"oversample": {
|
||||
"sample_factor": 3
|
||||
}
|
||||
}
|
||||
],
|
||||
"response_processors": [
|
||||
{
|
||||
"collapse" : {
|
||||
"field": "color"
|
||||
}
|
||||
},
|
||||
{
|
||||
"truncate_hits": {
|
||||
"description": "Truncates back to the original size before oversample increased it."
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Collapse without oversample
|
||||
|
||||
In this example, you request the top three documents before collapsing on the `color` field. Because the first two documents have the same `color`, the second one is discarded, and the request returns the first and third documents:
|
||||
|
||||
```json
|
||||
POST /my_index/_search?search_pipeline=collapse_pipeline
|
||||
{
|
||||
"size": 3
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"took" : 2,
|
||||
"timed_out" : false,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
},
|
||||
"hits" : {
|
||||
"total" : {
|
||||
"value" : 10,
|
||||
"relation" : "eq"
|
||||
},
|
||||
"max_score" : 1.0,
|
||||
"hits" : [
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "1",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"title" : "document 1",
|
||||
"color" : "blue"
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "3",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"title" : "document 3",
|
||||
"color" : "red"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
"profile" : {
|
||||
"shards" : [ ]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
|
||||
### Oversample, collapse, and truncate
|
||||
|
||||
Now you will use the `oversampling_collapse_pipeline`, which requests the top 9 documents (multiplying the size by 3), deduplicates by `color`, and then returns the top 3 hits:
|
||||
|
||||
```json
|
||||
POST /my_index/_search?search_pipeline=oversampling_collapse_pipeline
|
||||
{
|
||||
"size": 3
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
|
||||
<details open markdown="block">
|
||||
<summary>
|
||||
Response
|
||||
</summary>
|
||||
{: .text-delta}
|
||||
|
||||
```json
|
||||
{
|
||||
"took" : 2,
|
||||
"timed_out" : false,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
},
|
||||
"hits" : {
|
||||
"total" : {
|
||||
"value" : 10,
|
||||
"relation" : "eq"
|
||||
},
|
||||
"max_score" : 1.0,
|
||||
"hits" : [
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "1",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"title" : "document 1",
|
||||
"color" : "blue"
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "3",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"title" : "document 3",
|
||||
"color" : "red"
|
||||
}
|
||||
},
|
||||
{
|
||||
"_index" : "my_index",
|
||||
"_id" : "5",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"title" : "document 5",
|
||||
"color" : "yellow"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
"profile" : {
|
||||
"shards" : [ ]
|
||||
}
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
|
Loading…
Reference in New Issue