183 lines
5.9 KiB
Plaintext
183 lines
5.9 KiB
Plaintext
[[search-aggregations-metrics-extendedstats-aggregation]]
|
|
=== Extended Stats Aggregation
|
|
|
|
A `multi-value` metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
|
|
|
|
The `extended_stats` aggregations is an extended version of the <<search-aggregations-metrics-stats-aggregation,`stats`>> aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`.
|
|
|
|
Assuming the data consists of documents representing exams grades (between 0 and 100) of students
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /exams/_search
|
|
{
|
|
"size": 0,
|
|
"aggs" : {
|
|
"grades_stats" : { "extended_stats" : { "field" : "grade" } }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:exams]
|
|
|
|
The above aggregation computes the grades statistics over all documents. The aggregation type is `extended_stats` and the `field` setting defines the numeric field of the documents the stats will be computed on. The above will return the following:
|
|
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
|
|
"aggregations": {
|
|
"grades_stats": {
|
|
"count": 2,
|
|
"min": 50.0,
|
|
"max": 100.0,
|
|
"avg": 75.0,
|
|
"sum": 150.0,
|
|
"sum_of_squares": 12500.0,
|
|
"variance": 625.0,
|
|
"std_deviation": 25.0,
|
|
"std_deviation_bounds": {
|
|
"upper": 125.0,
|
|
"lower": 25.0
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
The name of the aggregation (`grades_stats` above) also serves as the key by which the aggregation result can be retrieved from the returned response.
|
|
|
|
==== Standard Deviation Bounds
|
|
By default, the `extended_stats` metric will return an object called `std_deviation_bounds`, which provides an interval of plus/minus two standard
|
|
deviations from the mean. This can be a useful way to visualize variance of your data. If you want a different boundary, for example
|
|
three standard deviations, you can set `sigma` in the request:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /exams/_search
|
|
{
|
|
"size": 0,
|
|
"aggs" : {
|
|
"grades_stats" : {
|
|
"extended_stats" : {
|
|
"field" : "grade",
|
|
"sigma" : 3 <1>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:exams]
|
|
<1> `sigma` controls how many standard deviations +/- from the mean should be displayed
|
|
|
|
`sigma` can be any non-negative double, meaning you can request non-integer values such as `1.5`. A value of `0` is valid, but will simply
|
|
return the average for both `upper` and `lower` bounds.
|
|
|
|
.Standard Deviation and Bounds require normality
|
|
[NOTE]
|
|
=====
|
|
The standard deviation and its bounds are displayed by default, but they are not always applicable to all data-sets. Your data must
|
|
be normally distributed for the metrics to make sense. The statistics behind standard deviations assumes normally distributed data, so
|
|
if your data is skewed heavily left or right, the value returned will be misleading.
|
|
=====
|
|
|
|
==== Script
|
|
|
|
Computing the grades stats based on a script:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /exams/_search
|
|
{
|
|
"size": 0,
|
|
"aggs" : {
|
|
"grades_stats" : {
|
|
"extended_stats" : {
|
|
"script" : {
|
|
"source" : "doc['grade'].value",
|
|
"lang" : "painless"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:exams]
|
|
|
|
This will interpret the `script` parameter as an `inline` script with the `painless` script language and no script parameters. To use a stored script use the following syntax:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /exams/_search
|
|
{
|
|
"size": 0,
|
|
"aggs" : {
|
|
"grades_stats" : {
|
|
"extended_stats" : {
|
|
"script" : {
|
|
"id": "my_script",
|
|
"params": {
|
|
"field": "grade"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:exams,stored_example_script]
|
|
|
|
===== Value Script
|
|
|
|
It turned out that the exam was way above the level of the students and a grade correction needs to be applied. We can use value script to get the new stats:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /exams/_search
|
|
{
|
|
"size": 0,
|
|
"aggs" : {
|
|
"grades_stats" : {
|
|
"extended_stats" : {
|
|
"field" : "grade",
|
|
"script" : {
|
|
"lang" : "painless",
|
|
"source": "_value * params.correction",
|
|
"params" : {
|
|
"correction" : 1.2
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:exams]
|
|
|
|
==== Missing value
|
|
|
|
The `missing` parameter defines how documents that are missing a value should be treated.
|
|
By default they will be ignored but it is also possible to treat them as if they
|
|
had a value.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /exams/_search
|
|
{
|
|
"size": 0,
|
|
"aggs" : {
|
|
"grades_stats" : {
|
|
"extended_stats" : {
|
|
"field" : "grade",
|
|
"missing": 0 <1>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:exams]
|
|
|
|
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `0`.
|