opensearch-docs-cn/_aggregations/bucket/diversified-sampler.md
Jay Deng e3ebb35942
Add max_docs_per_value to diversified sampler documentation (#6134)
* Add max_docs_per_value to diversified sampler documentation

Signed-off-by: Jay Deng <jayd0104@gmail.com>

* Update _aggregations/bucket/diversified-sampler.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Jay Deng <jayd0104@gmail.com>

* Update diversified-sampler.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update sampler.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update diversified-sampler.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update diversified-sampler.md

Copy edits

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _aggregations/bucket/diversified-sampler.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _aggregations/bucket/diversified-sampler.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _aggregations/bucket/sampler.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update diversified-sampler.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update sampler.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

---------

Signed-off-by: Jay Deng <jayd0104@gmail.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
2024-01-26 09:09:49 -07:00

1.8 KiB

layout title parent grand_parent nav_order redirect_from
default Diversified sampler Bucket aggregations Aggregations 40
/query-dsl/aggregations/bucket/diversified-sampler/

Diversified sampler

The diversified_sampler aggregation lets you reduce the bias in the distribution of the sample pool by deduplicating documents containing the same field. It does so by using the max_docs_per_value and field settings, which limit the maximum number of documents collected on a shard for the provided field. The max_docs_per_value setting is an optional parameter used to determine the maximum number of documents that will be returned per field. The default value of this setting is 1.

Similarly to the sampler aggregation, you can use the shard_size setting to control the maximum number of documents collected on any one shard, as shown in the following example:

GET opensearch_dashboards_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "sample": {
      "diversified_": {
        "shard_size": 1000,
        "field": "response.keyword"
      },
      "aggs": {
        "terms": {
          "terms": {
            "field": "agent.keyword"
          }
        }
      }
    }
  }
}

{% include copy-curl.html %}

Example response

...
"aggregations" : {
  "sample" : {
    "doc_count" : 3,
    "terms" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1",
          "doc_count" : 2
        },
        {
          "key" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
          "doc_count" : 1
        }
      ]
    }
  }

 }
}