mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-06 04:58:50 +00:00
35a58d874e
This change unifies the way scripts and templates are specified for all instances in the codebase. It builds on the Script class added previously and adds request building and parsing support as well as the ability to transfer script objects between nodes. It also adds a Template class which aims to provide the same functionality for template APIs Closes #11091
166 lines
5.4 KiB
Plaintext
166 lines
5.4 KiB
Plaintext
[[search-aggregations-metrics-extendedstats-aggregation]]
|
|
=== Extended Stats Aggregation
|
|
|
|
A `multi-value` metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
|
|
|
|
The `extended_stats` aggregations is an extended version of the <<search-aggregations-metrics-stats-aggregation,`stats`>> aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`.
|
|
|
|
Assuming the data consists of documents representing exams grades (between 0 and 100) of students
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"grades_stats" : { "extended_stats" : { "field" : "grade" } }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The above aggregation computes the grades statistics over all documents. The aggregation type is `extended_stats` and the `field` setting defines the numeric field of the documents the stats will be computed on. The above will return the following:
|
|
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
|
|
"aggregations": {
|
|
"grade_stats": {
|
|
"count": 9,
|
|
"min": 72,
|
|
"max": 99,
|
|
"avg": 86,
|
|
"sum": 774,
|
|
"sum_of_squares": 67028,
|
|
"variance": 51.55555555555556,
|
|
"std_deviation": 7.180219742846005,
|
|
"std_deviation_bounds": {
|
|
"upper": 100.36043948569201,
|
|
"lower": 71.63956051430799
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The name of the aggregation (`grades_stats` above) also serves as the key by which the aggregation result can be retrieved from the returned response.
|
|
|
|
==== Standard Deviation Bounds
|
|
By default, the `extended_stats` metric will return an object called `std_deviation_bounds`, which provides an interval of plus/minus two standard
|
|
deviations from the mean. This can be a useful way to visualize variance of your data. If you want a different boundary, for example
|
|
three standard deviations, you can set `sigma` in the request:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"grades_stats" : {
|
|
"extended_stats" : {
|
|
"field" : "grade",
|
|
"sigma" : 3 <1>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> `sigma` controls how many standard deviations +/- from the mean should be displayed
|
|
|
|
`sigma` can be any non-negative double, meaning you can request non-integer values such as `1.5`. A value of `0` is valid, but will simply
|
|
return the average for both `upper` and `lower` bounds.
|
|
|
|
.Standard Deviation and Bounds require normality
|
|
[NOTE]
|
|
=====
|
|
The standard deviation and its bounds are displayed by default, but they are not always applicable to all data-sets. Your data must
|
|
be normally distributed for the metrics to make sense. The statistics behind standard deviations assumes normally distributed data, so
|
|
if your data is skewed heavily left or right, the value returned will be misleading.
|
|
=====
|
|
|
|
==== Script
|
|
|
|
Computing the grades stats based on a script:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...,
|
|
|
|
"aggs" : {
|
|
"grades_stats" : { "extended_stats" : { "script" : "doc['grade'].value" } }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
This will interpret the `script` parameter as an `inline` script with the default script language and no script parameters. To use a file script use the following syntax:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...,
|
|
|
|
"aggs" : {
|
|
"grades_stats" : {
|
|
"extended_stats" : {
|
|
"script" : {
|
|
"file": "my_script",
|
|
"params": {
|
|
"field": "grade"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
TIP: for indexed scripts replace the `file` parameter with an `id` parameter.
|
|
|
|
===== Value Script
|
|
|
|
It turned out that the exam was way above the level of the students and a grade correction needs to be applied. We can use value script to get the new stats:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
...
|
|
|
|
"aggs" : {
|
|
"grades_stats" : {
|
|
"extended_stats" : {
|
|
"field" : "grade",
|
|
"script" : {
|
|
"inline": "_value * correction",
|
|
"params" : {
|
|
"correction" : 1.2
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
==== Missing value
|
|
|
|
The `missing` parameter defines how documents that are missing a value should be treated.
|
|
By default they will be ignored but it is also possible to treat them as if they
|
|
had a value.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"grades_stats" : {
|
|
"extended_stats" : {
|
|
"field" : "grade",
|
|
"missing": 0 <1>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `0`.
|