Add max_docs_per_value to diversified sampler documentation (#6134)

* Add max_docs_per_value to diversified sampler documentation Signed-off-by: Jay Deng <jayd0104@gmail.com> * Update _aggregations/bucket/diversified-sampler.md Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: Jay Deng <jayd0104@gmail.com> * Update diversified-sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update diversified-sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update diversified-sampler.md Copy edits Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _aggregations/bucket/diversified-sampler.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _aggregations/bucket/diversified-sampler.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _aggregations/bucket/sampler.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update diversified-sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update sampler.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> --------- Signed-off-by: Jay Deng <jayd0104@gmail.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2025-02-21 05:56:39 +00:00 · 2024-01-26 08:09:49 -08:00 · 2024-01-26 08:09:49 -08:00 · e3ebb35942
commit e3ebb35942
parent b2fd154a0e
2 changed files with 11 additions and 5 deletions
--- a/_aggregations/bucket/diversified-sampler.md
+++ b/_aggregations/bucket/diversified-sampler.md
@ -8,9 +8,11 @@ redirect_from:
  - /query-dsl/aggregations/bucket/diversified-sampler/
 ---

-# Diversified sampler aggregations
+# Diversified sampler

-The `diversified_sampler` aggregation lets you reduce the bias in the distribution of the sample pool. You can use the `field` setting to control the maximum number of documents collected on any one shard which shares a common value:
+The `diversified_sampler` aggregation lets you reduce the bias in the distribution of the sample pool by deduplicating documents containing the same `field`. It does so by using the `max_docs_per_value` and `field` settings, which limit the maximum number of documents collected on a shard for the provided `field`. The `max_docs_per_value` setting is an optional parameter used to determine the maximum number of documents that will be returned per `field`. The default value of this setting is `1`.
+
+Similarly to the [`sampler` aggregation]({{site.url}}{{site.baseurl}}/aggregations/bucket/sampler/), you can use the `shard_size` setting to control the maximum number of documents collected on any one shard, as shown in the following example:

 ```json
 GET opensearch_dashboards_sample_data_logs/_search
@ -18,7 +20,7 @@ GET opensearch_dashboards_sample_data_logs/_search
  "size": 0,
  "aggs": {
    "sample": {
-      "diversified_sampler": {
+      "diversified_": {
        "shard_size": 1000,
        "field": "response.keyword"
      },
@ -57,6 +59,8 @@ GET opensearch_dashboards_sample_data_logs/_search
      ]
    }
  }
+
 }
 }
 ```
+ 
--- a/_aggregations/bucket/sampler.md
+++ b/_aggregations/bucket/sampler.md
@ -8,7 +8,7 @@ nav_order: 170

 # Sampler aggregations

-If you're aggregating over millions of documents, you can use a `sampler` aggregation to reduce its scope to a small sample of documents for a faster response. The `sampler` aggregation selects the samples by top-scoring documents.
+If you're aggregating a very large number of documents, you can use a `sampler` aggregation to reduce the scope to a small sample of documents, resulting in a faster response. The `sampler` aggregation selects the samples by top-scoring documents.

 The results are approximate but closely represent the distribution of the real data. The `sampler` aggregation significantly improves query performance, but the estimated responses are not entirely reliable.

@ -25,6 +25,8 @@ The basic syntax is:
 }
 ```

+## Shard size property
+
 The `shard_size` property tells OpenSearch how many documents (at most) to collect from each shard.

 The following example limits the number of documents collected on each shard to 1,000 and then buckets the documents by a `terms` aggregation:
@ -79,4 +81,4 @@ GET opensearch_dashboards_sample_data_logs/_search
  }
 }
 }
-```
+```