Add max_docs_per_value to diversified sampler documentation (#6134)
* Add max_docs_per_value to diversified sampler documentation Signed-off-by: Jay Deng <jayd0104@gmail.com> * Update _aggregations/bucket/diversified-sampler.md Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: Jay Deng <jayd0104@gmail.com> * Update diversified-sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update diversified-sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update diversified-sampler.md Copy edits Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _aggregations/bucket/diversified-sampler.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _aggregations/bucket/diversified-sampler.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _aggregations/bucket/sampler.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update diversified-sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> --------- Signed-off-by: Jay Deng <jayd0104@gmail.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
parent
b2fd154a0e
commit
e3ebb35942
|
@ -8,9 +8,11 @@ redirect_from:
|
||||||
- /query-dsl/aggregations/bucket/diversified-sampler/
|
- /query-dsl/aggregations/bucket/diversified-sampler/
|
||||||
---
|
---
|
||||||
|
|
||||||
# Diversified sampler aggregations
|
# Diversified sampler
|
||||||
|
|
||||||
The `diversified_sampler` aggregation lets you reduce the bias in the distribution of the sample pool. You can use the `field` setting to control the maximum number of documents collected on any one shard which shares a common value:
|
The `diversified_sampler` aggregation lets you reduce the bias in the distribution of the sample pool by deduplicating documents containing the same `field`. It does so by using the `max_docs_per_value` and `field` settings, which limit the maximum number of documents collected on a shard for the provided `field`. The `max_docs_per_value` setting is an optional parameter used to determine the maximum number of documents that will be returned per `field`. The default value of this setting is `1`.
|
||||||
|
|
||||||
|
Similarly to the [`sampler` aggregation]({{site.url}}{{site.baseurl}}/aggregations/bucket/sampler/), you can use the `shard_size` setting to control the maximum number of documents collected on any one shard, as shown in the following example:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
GET opensearch_dashboards_sample_data_logs/_search
|
GET opensearch_dashboards_sample_data_logs/_search
|
||||||
|
@ -18,7 +20,7 @@ GET opensearch_dashboards_sample_data_logs/_search
|
||||||
"size": 0,
|
"size": 0,
|
||||||
"aggs": {
|
"aggs": {
|
||||||
"sample": {
|
"sample": {
|
||||||
"diversified_sampler": {
|
"diversified_": {
|
||||||
"shard_size": 1000,
|
"shard_size": 1000,
|
||||||
"field": "response.keyword"
|
"field": "response.keyword"
|
||||||
},
|
},
|
||||||
|
@ -57,6 +59,8 @@ GET opensearch_dashboards_sample_data_logs/_search
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
|
@ -8,7 +8,7 @@ nav_order: 170
|
||||||
|
|
||||||
# Sampler aggregations
|
# Sampler aggregations
|
||||||
|
|
||||||
If you're aggregating over millions of documents, you can use a `sampler` aggregation to reduce its scope to a small sample of documents for a faster response. The `sampler` aggregation selects the samples by top-scoring documents.
|
If you're aggregating a very large number of documents, you can use a `sampler` aggregation to reduce the scope to a small sample of documents, resulting in a faster response. The `sampler` aggregation selects the samples by top-scoring documents.
|
||||||
|
|
||||||
The results are approximate but closely represent the distribution of the real data. The `sampler` aggregation significantly improves query performance, but the estimated responses are not entirely reliable.
|
The results are approximate but closely represent the distribution of the real data. The `sampler` aggregation significantly improves query performance, but the estimated responses are not entirely reliable.
|
||||||
|
|
||||||
|
@ -25,6 +25,8 @@ The basic syntax is:
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Shard size property
|
||||||
|
|
||||||
The `shard_size` property tells OpenSearch how many documents (at most) to collect from each shard.
|
The `shard_size` property tells OpenSearch how many documents (at most) to collect from each shard.
|
||||||
|
|
||||||
The following example limits the number of documents collected on each shard to 1,000 and then buckets the documents by a `terms` aggregation:
|
The following example limits the number of documents collected on each shard to 1,000 and then buckets the documents by a `terms` aggregation:
|
||||||
|
|
Loading…
Reference in New Issue