2013-11-24 06:13:08 -05:00
[[search-aggregations-metrics-avg-aggregation]]
2014-05-12 19:35:58 -04:00
=== Avg Aggregation
2013-11-24 06:13:08 -05:00
A `single-value` metrics aggregation that computes the average of numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
2017-02-07 14:17:54 -05:00
Assuming the data consists of documents representing exams grades (between 0
and 100) of students we can average their scores with:
2013-11-24 06:13:08 -05:00
2019-09-05 10:11:25 -04:00
[source,console]
2013-11-24 06:13:08 -05:00
--------------------------------------------------
2017-02-07 13:33:00 -05:00
POST /exams/_search?size=0
2013-11-24 06:13:08 -05:00
{
2020-07-20 15:59:00 -04:00
"aggs": {
"avg_grade": { "avg": { "field": "grade" } }
}
2013-11-24 06:13:08 -05:00
}
--------------------------------------------------
2017-02-07 13:33:00 -05:00
// TEST[setup:exams]
2013-11-24 06:13:08 -05:00
The above aggregation computes the average grade over all documents. The aggregation type is `avg` and the `field` setting defines the numeric field of the documents the average will be computed on. The above will return the following:
2019-09-06 16:09:09 -04:00
[source,console-result]
2013-11-24 06:13:08 -05:00
--------------------------------------------------
{
2020-07-20 15:59:00 -04:00
...
"aggregations": {
"avg_grade": {
"value": 75.0
2013-11-24 06:13:08 -05:00
}
2020-07-20 15:59:00 -04:00
}
2013-11-24 06:13:08 -05:00
}
--------------------------------------------------
2017-02-07 13:33:00 -05:00
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
2013-11-24 06:13:08 -05:00
2014-01-17 11:20:05 -05:00
The name of the aggregation (`avg_grade` above) also serves as the key by which the aggregation result can be retrieved from the returned response.
2013-11-24 06:13:08 -05:00
==== Script
Computing the average grade based on a script:
2019-09-05 10:11:25 -04:00
[source,console]
2013-11-24 06:13:08 -05:00
--------------------------------------------------
2017-02-07 13:33:00 -05:00
POST /exams/_search?size=0
2013-11-24 06:13:08 -05:00
{
2020-07-20 15:59:00 -04:00
"aggs": {
"avg_grade": {
"avg": {
"script": {
"source": "doc.grade.value"
2016-06-27 09:55:16 -04:00
}
2020-07-20 15:59:00 -04:00
}
2013-11-24 06:13:08 -05:00
}
2020-07-20 15:59:00 -04:00
}
2013-11-24 06:13:08 -05:00
}
--------------------------------------------------
2017-02-07 13:33:00 -05:00
// TEST[setup:exams]
2013-11-24 06:13:08 -05:00
2017-05-17 17:42:25 -04:00
This will interpret the `script` parameter as an `inline` script with the `painless` script language and no script parameters. To use a stored script use the following syntax:
2015-05-12 05:37:22 -04:00
2019-09-05 10:11:25 -04:00
[source,console]
2015-05-12 05:37:22 -04:00
--------------------------------------------------
2017-02-07 13:33:00 -05:00
POST /exams/_search?size=0
2015-05-12 05:37:22 -04:00
{
2020-07-20 15:59:00 -04:00
"aggs": {
"avg_grade": {
"avg": {
"script": {
"id": "my_script",
"params": {
"field": "grade"
}
2015-05-12 05:37:22 -04:00
}
2020-07-20 15:59:00 -04:00
}
2015-05-12 05:37:22 -04:00
}
2020-07-20 15:59:00 -04:00
}
2015-05-12 05:37:22 -04:00
}
--------------------------------------------------
2017-05-17 17:42:25 -04:00
// TEST[setup:exams,stored_example_script]
2015-04-26 11:30:38 -04:00
2013-11-24 06:13:08 -05:00
===== Value Script
2014-01-17 11:20:05 -05:00
It turned out that the exam was way above the level of the students and a grade correction needs to be applied. We can use value script to get the new average:
2013-11-24 06:13:08 -05:00
2019-09-05 10:11:25 -04:00
[source,console]
2013-11-24 06:13:08 -05:00
--------------------------------------------------
2017-02-07 13:33:00 -05:00
POST /exams/_search?size=0
2013-11-24 06:13:08 -05:00
{
2020-07-20 15:59:00 -04:00
"aggs": {
"avg_corrected_grade": {
"avg": {
"field": "grade",
"script": {
"lang": "painless",
"source": "_value * params.correction",
"params": {
"correction": 1.2
}
2013-11-24 06:13:08 -05:00
}
2020-07-20 15:59:00 -04:00
}
2013-11-24 06:13:08 -05:00
}
2020-07-20 15:59:00 -04:00
}
2013-11-24 06:13:08 -05:00
}
2015-05-07 10:46:40 -04:00
--------------------------------------------------
2017-02-07 13:33:00 -05:00
// TEST[setup:exams]
2015-05-07 10:46:40 -04:00
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
2019-09-05 10:11:25 -04:00
[source,console]
2015-05-07 10:46:40 -04:00
--------------------------------------------------
2017-02-07 13:33:00 -05:00
POST /exams/_search?size=0
2015-05-07 10:46:40 -04:00
{
2020-07-20 15:59:00 -04:00
"aggs": {
"grade_avg": {
"avg": {
"field": "grade",
"missing": 10 <1>
}
2015-05-07 10:46:40 -04:00
}
2020-07-20 15:59:00 -04:00
}
2015-05-07 10:46:40 -04:00
}
--------------------------------------------------
2017-02-07 13:33:00 -05:00
// TEST[setup:exams]
2015-05-07 10:46:40 -04:00
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.
2020-05-04 06:23:02 -04:00
[[search-aggregations-metrics-avg-aggregation-histogram-fields]]
==== Histogram fields
When average is computed on <<histogram,histogram fields>>, the result of the aggregation is the weighted average
of all elements in the `values` array taking into consideration the number in the same position in the `counts` array.
For example, for the following index that stores pre-aggregated histograms with latency metrics for different networks:
[source,console]
--------------------------------------------------
PUT metrics_index/_doc/1
{
"network.name" : "net-1",
"latency_histo" : {
"values" : [0.1, 0.2, 0.3, 0.4, 0.5], <1>
"counts" : [3, 7, 23, 12, 6] <2>
}
}
PUT metrics_index/_doc/2
{
"network.name" : "net-2",
"latency_histo" : {
"values" : [0.1, 0.2, 0.3, 0.4, 0.5], <1>
"counts" : [8, 17, 8, 7, 6] <2>
}
}
POST /metrics_index/_search?size=0
{
2020-07-20 15:59:00 -04:00
"aggs": {
"avg_latency":
{ "avg": { "field": "latency_histo" }
2020-05-04 06:23:02 -04:00
}
2020-07-20 15:59:00 -04:00
}
2020-05-04 06:23:02 -04:00
}
--------------------------------------------------
For each histogram field the `avg` aggregation adds each number in the `values` array <1> multiplied by its associated count
in the `counts` array <2>. Eventually, it will compute the average over those values for all histograms and return the following result:
[source,console-result]
--------------------------------------------------
{
2020-07-20 15:59:00 -04:00
...
"aggregations": {
"avg_latency": {
"value": 0.29690721649
2020-05-04 06:23:02 -04:00
}
2020-07-20 15:59:00 -04:00
}
2020-05-04 06:23:02 -04:00
}
--------------------------------------------------
// TESTRESPONSE[skip:test not setup]