Add max_docs_per_value to diversified sampler documentation (#6134)

* Add max_docs_per_value to diversified sampler documentation

Signed-off-by: Jay Deng <jayd0104@gmail.com>

* Update _aggregations/bucket/diversified-sampler.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Jay Deng <jayd0104@gmail.com>

* Update diversified-sampler.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update sampler.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update diversified-sampler.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update diversified-sampler.md

Copy edits

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _aggregations/bucket/diversified-sampler.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _aggregations/bucket/diversified-sampler.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _aggregations/bucket/sampler.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update diversified-sampler.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update sampler.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

---------

Signed-off-by: Jay Deng <jayd0104@gmail.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
Jay Deng 2024-01-26 08:09:49 -08:00 committed by GitHub
parent b2fd154a0e
commit e3ebb35942
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 11 additions and 5 deletions

View File

@ -8,9 +8,11 @@ redirect_from:
- /query-dsl/aggregations/bucket/diversified-sampler/
---
# Diversified sampler aggregations
# Diversified sampler
The `diversified_sampler` aggregation lets you reduce the bias in the distribution of the sample pool. You can use the `field` setting to control the maximum number of documents collected on any one shard which shares a common value:
The `diversified_sampler` aggregation lets you reduce the bias in the distribution of the sample pool by deduplicating documents containing the same `field`. It does so by using the `max_docs_per_value` and `field` settings, which limit the maximum number of documents collected on a shard for the provided `field`. The `max_docs_per_value` setting is an optional parameter used to determine the maximum number of documents that will be returned per `field`. The default value of this setting is `1`.
Similarly to the [`sampler` aggregation]({{site.url}}{{site.baseurl}}/aggregations/bucket/sampler/), you can use the `shard_size` setting to control the maximum number of documents collected on any one shard, as shown in the following example:
```json
GET opensearch_dashboards_sample_data_logs/_search
@ -18,7 +20,7 @@ GET opensearch_dashboards_sample_data_logs/_search
"size": 0,
"aggs": {
"sample": {
"diversified_sampler": {
"diversified_": {
"shard_size": 1000,
"field": "response.keyword"
},
@ -57,6 +59,8 @@ GET opensearch_dashboards_sample_data_logs/_search
]
}
}
}
}
```

View File

@ -8,7 +8,7 @@ nav_order: 170
# Sampler aggregations
If you're aggregating over millions of documents, you can use a `sampler` aggregation to reduce its scope to a small sample of documents for a faster response. The `sampler` aggregation selects the samples by top-scoring documents.
If you're aggregating a very large number of documents, you can use a `sampler` aggregation to reduce the scope to a small sample of documents, resulting in a faster response. The `sampler` aggregation selects the samples by top-scoring documents.
The results are approximate but closely represent the distribution of the real data. The `sampler` aggregation significantly improves query performance, but the estimated responses are not entirely reliable.
@ -25,6 +25,8 @@ The basic syntax is:
}
```
## Shard size property
The `shard_size` property tells OpenSearch how many documents (at most) to collect from each shard.
The following example limits the number of documents collected on each shard to 1,000 and then buckets the documents by a `terms` aggregation:
@ -79,4 +81,4 @@ GET opensearch_dashboards_sample_data_logs/_search
}
}
}
```
```