“fix#829-PreAgg-field_doc_count” (#839)

* “fix#829-PreAgg-field_doc_count”

Signed-off-by: cwillum <cwmmoore@amazon.com>

* “fix#829-PreAgg-field_doc_count”

Signed-off-by: cwillum <cwmmoore@amazon.com>

* “fix#829-PreAgg-field_doc_count”

Signed-off-by: cwillum <cwmmoore@amazon.com>

* “fix#829-PreAgg-field_doc_count”

Signed-off-by: cwillum <cwmmoore@amazon.com>

* “fix#829-PreAgg-field_doc_count”

Signed-off-by: cwillum <cwmmoore@amazon.com>
This commit is contained in:
cwillum 2022-08-08 10:40:17 -07:00 committed by GitHub
parent 23bc23dd86
commit 07eda05688
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 86 additions and 0 deletions

View File

@ -74,6 +74,92 @@ The `terms` aggregation requests each shard for its top 3 unique terms. The coor
This is especially true if `size` is set to a low number. Because the default size is 10, an error is unlikely to happen. If you dont need high accuracy and want to increase the performance, you can reduce the size. This is especially true if `size` is set to a low number. Because the default size is 10, an error is unlikely to happen. If you dont need high accuracy and want to increase the performance, you can reduce the size.
### Account for pre-aggregated data
While the `doc_count` field provides a representation of the number of individual documents aggregated in a bucket, `doc_count` by itself does not have a way to correctly increment documents that store pre-aggregated data. To account for pre-aggregated data and accurately calculate the number of documents in a bucket, you can use the `_doc_count` field to add the number of documents in a single summary field. When a document includes the `_doc_count` field, all bucket aggregations recognize its value and increase the bucket `doc_count` cumulatively. Keep these considerations in mind when using the `_doc_count` field:
* The field does not support nested arrays; only positive integers can be used.
* If a document does not contain the `_doc_count` field, aggregation uses the document to increase the count by 1.
OpenSearch features that rely on an accurate document count illustrate the importance of using the `_doc_count` field. To see how this field can be used to support other search tools, refer to [Index rollups](https://opensearch.org/docs/latest/im-plugin/index-rollups/index/), an OpenSearch feature for the Index Management (IM) plugin that stores documents with pre-aggregated data in rollup indexes.
{: .tip}
### Example usage
```json
PUT /my_index/_doc/1
{
"response_code": 404,
"date":"2022-08-05",
"_doc_count": 20
}
PUT /my_index/_doc/2
{
"response_code": 404,
"date":"2022-08-06",
"_doc_count": 10
}
PUT /my_index/_doc/3
{
"response_code": 200,
"date":"2022-08-06",
"_doc_count": 300
}
GET /my_index/_search
{
"size": 0,
"aggs": {
"response_codes": {
"terms": {
"field" : "response_code"
}
}
}
}
```
#### Sample response
```json
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"response_codes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 200,
"doc_count" : 300
},
{
"key" : 404,
"doc_count" : 30
}
]
}
}
}
```
## Multi-terms ## Multi-terms
Similar to the `terms` bucket aggregation, you can also search for multiple terms using the `multi_terms` aggregation. Multi-terms aggregations are useful when you need to sort by document count, or when you need to sort by a metric aggregation on a composite key and get the top `n` results. For example, you could search for a specific number of documents (e.g., 1000) and the number of servers per location that show CPU usage greater than 90%. The top number of results would be returned for this multi-term query. Similar to the `terms` bucket aggregation, you can also search for multiple terms using the `multi_terms` aggregation. Multi-terms aggregations are useful when you need to sort by document count, or when you need to sort by a metric aggregation on a composite key and get the top `n` results. For example, you could search for a specific number of documents (e.g., 1000) and the number of servers per location that show CPU usage greater than 90%. The top number of results would be returned for this multi-term query.