opensearch-docs-cn/_query-dsl/aggregations/bucket/terms.md

156 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
layout: default
title: Terms
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 200
---
# Terms aggregations
The `terms` aggregation dynamically creates a bucket for each unique term of a field.
The following example uses the `terms` aggregation to find the number of documents per response code in web log data:
```json
GET opensearch_dashboards_sample_data_logs/_search
{
"size": 0,
"aggs": {
"response_codes": {
"terms": {
"field": "response.keyword",
"size": 10
}
}
}
}
```
{% include copy-curl.html %}
#### Example response
```json
...
"aggregations" : {
"response_codes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "200",
"doc_count" : 12832
},
{
"key" : "404",
"doc_count" : 801
},
{
"key" : "503",
"doc_count" : 441
}
]
}
}
}
```
The values are returned with the key `key`.
`doc_count` specifies the number of documents in each bucket. By default, the buckets are sorted in descending order of `doc-count`.
The response also includes two keys named `doc_count_error_upper_bound` and `sum_other_doc_count`.
The `terms` aggregation returns the top unique terms. So, if the data has many unique terms, then some of them might not appear in the results. The `sum_other_doc_count` field is the sum of the documents that are left out of the response. In this case, the number is 0 because all the unique values appear in the response.
The `doc_count_error_upper_bound` field represents the maximum possible count for a unique value that's left out of the final results. Use this field to estimate the error margin for the count.
The count might not be accurate. A coordinating node thats responsible for the aggregation prompts each shard for its top unique terms. Imagine a scenario where the `size` parameter is 3.
The `terms` aggregation requests each shard for its top 3 unique terms. The coordinating node takes each of the results and aggregates them to compute the final result. If a shard has an object thats not part of the top 3, then it won't show up in the response.
This is especially true if `size` is set to a low number. Because the default size is 10, an error is unlikely to happen. If you dont need high accuracy and want to increase the performance, you can reduce the size.
## Account for pre-aggregated data
While the `doc_count` field provides a representation of the number of individual documents aggregated in a bucket, `doc_count` by itself does not have a way to correctly increment documents that store pre-aggregated data. To account for pre-aggregated data and accurately calculate the number of documents in a bucket, you can use the `_doc_count` field to add the number of documents in a single summary field. When a document includes the `_doc_count` field, all bucket aggregations recognize its value and increase the bucket `doc_count` cumulatively. Keep these considerations in mind when using the `_doc_count` field:
* The field does not support nested arrays; only positive integers can be used.
* If a document does not contain the `_doc_count` field, aggregation uses the document to increase the count by 1.
OpenSearch features that rely on an accurate document count illustrate the importance of using the `_doc_count` field. To see how this field can be used to support other search tools, refer to [Index rollups](https://opensearch.org/docs/latest/im-plugin/index-rollups/index/), an OpenSearch feature for the Index Management (IM) plugin that stores documents with pre-aggregated data in rollup indexes.
{: .tip}
#### Example request
```json
PUT /my_index/_doc/1
{
"response_code": 404,
"date":"2022-08-05",
"_doc_count": 20
}
PUT /my_index/_doc/2
{
"response_code": 404,
"date":"2022-08-06",
"_doc_count": 10
}
PUT /my_index/_doc/3
{
"response_code": 200,
"date":"2022-08-06",
"_doc_count": 300
}
GET /my_index/_search
{
"size": 0,
"aggs": {
"response_codes": {
"terms": {
"field" : "response_code"
}
}
}
}
```
#### Example response
```json
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"response_codes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 200,
"doc_count" : 300
},
{
"key" : 404,
"doc_count" : 30
}
]
}
}
}
```