Add max_docs_per_value to diversified sampler documentation (#6134)
* Add max_docs_per_value to diversified sampler documentation Signed-off-by: Jay Deng <jayd0104@gmail.com> * Update _aggregations/bucket/diversified-sampler.md Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: Jay Deng <jayd0104@gmail.com> * Update diversified-sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update diversified-sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update diversified-sampler.md Copy edits Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _aggregations/bucket/diversified-sampler.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _aggregations/bucket/diversified-sampler.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _aggregations/bucket/sampler.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update diversified-sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> --------- Signed-off-by: Jay Deng <jayd0104@gmail.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
parent
b2fd154a0e
commit
e3ebb35942
|
@ -8,9 +8,11 @@ redirect_from:
|
|||
- /query-dsl/aggregations/bucket/diversified-sampler/
|
||||
---
|
||||
|
||||
# Diversified sampler aggregations
|
||||
# Diversified sampler
|
||||
|
||||
The `diversified_sampler` aggregation lets you reduce the bias in the distribution of the sample pool. You can use the `field` setting to control the maximum number of documents collected on any one shard which shares a common value:
|
||||
The `diversified_sampler` aggregation lets you reduce the bias in the distribution of the sample pool by deduplicating documents containing the same `field`. It does so by using the `max_docs_per_value` and `field` settings, which limit the maximum number of documents collected on a shard for the provided `field`. The `max_docs_per_value` setting is an optional parameter used to determine the maximum number of documents that will be returned per `field`. The default value of this setting is `1`.
|
||||
|
||||
Similarly to the [`sampler` aggregation]({{site.url}}{{site.baseurl}}/aggregations/bucket/sampler/), you can use the `shard_size` setting to control the maximum number of documents collected on any one shard, as shown in the following example:
|
||||
|
||||
```json
|
||||
GET opensearch_dashboards_sample_data_logs/_search
|
||||
|
@ -18,7 +20,7 @@ GET opensearch_dashboards_sample_data_logs/_search
|
|||
"size": 0,
|
||||
"aggs": {
|
||||
"sample": {
|
||||
"diversified_sampler": {
|
||||
"diversified_": {
|
||||
"shard_size": 1000,
|
||||
"field": "response.keyword"
|
||||
},
|
||||
|
@ -57,6 +59,8 @@ GET opensearch_dashboards_sample_data_logs/_search
|
|||
]
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
|
|
@ -8,7 +8,7 @@ nav_order: 170
|
|||
|
||||
# Sampler aggregations
|
||||
|
||||
If you're aggregating over millions of documents, you can use a `sampler` aggregation to reduce its scope to a small sample of documents for a faster response. The `sampler` aggregation selects the samples by top-scoring documents.
|
||||
If you're aggregating a very large number of documents, you can use a `sampler` aggregation to reduce the scope to a small sample of documents, resulting in a faster response. The `sampler` aggregation selects the samples by top-scoring documents.
|
||||
|
||||
The results are approximate but closely represent the distribution of the real data. The `sampler` aggregation significantly improves query performance, but the estimated responses are not entirely reliable.
|
||||
|
||||
|
@ -25,6 +25,8 @@ The basic syntax is:
|
|||
}
|
||||
```
|
||||
|
||||
## Shard size property
|
||||
|
||||
The `shard_size` property tells OpenSearch how many documents (at most) to collect from each shard.
|
||||
|
||||
The following example limits the number of documents collected on each shard to 1,000 and then buckets the documents by a `terms` aggregation:
|
||||
|
@ -79,4 +81,4 @@ GET opensearch_dashboards_sample_data_logs/_search
|
|||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
```
|
||||
|
|
Loading…
Reference in New Issue