2016-06-02 17:30:50 -04:00
|
|
|
[[search-aggregations-matrix-stats-aggregation]]
|
2016-05-13 14:37:05 -04:00
|
|
|
=== Matrix Stats
|
|
|
|
|
|
|
|
The `matrix_stats` aggregation is a numeric aggregation that computes the following statistics over a set of document fields:
|
|
|
|
|
|
|
|
[horizontal]
|
|
|
|
`count`:: Number of per field samples included in the calculation.
|
|
|
|
`mean`:: The average value for each field.
|
|
|
|
`variance`:: Per field Measurement for how spread out the samples are from the mean.
|
|
|
|
`skewness`:: Per field measurement quantifying the asymmetric distribution around the mean.
|
|
|
|
`kurtosis`:: Per field measurement quantifying the shape of the distribution.
|
|
|
|
`covariance`:: A matrix that quantitatively describes how changes in one field are associated with another.
|
|
|
|
`correlation`:: The covariance matrix scaled to a range of -1 to 1, inclusive. Describes the relationship between field
|
|
|
|
distributions.
|
|
|
|
|
2017-08-30 06:11:10 -04:00
|
|
|
//////////////////////////
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
2017-12-14 11:47:53 -05:00
|
|
|
PUT /statistics/_doc/0
|
2017-08-30 06:11:10 -04:00
|
|
|
{"poverty": 24.0, "income": 50000.0}
|
|
|
|
|
2017-12-14 11:47:53 -05:00
|
|
|
PUT /statistics/_doc/1
|
2017-08-30 06:11:10 -04:00
|
|
|
{"poverty": 13.0, "income": 95687.0}
|
|
|
|
|
2017-12-14 11:47:53 -05:00
|
|
|
PUT /statistics/_doc/2
|
2017-08-30 06:11:10 -04:00
|
|
|
{"poverty": 69.0, "income": 7890.0}
|
|
|
|
|
|
|
|
POST /_refresh
|
|
|
|
--------------------------------------------------
|
|
|
|
// NOTCONSOLE
|
|
|
|
// TESTSETUP
|
|
|
|
|
|
|
|
//////////////////////////
|
|
|
|
|
2016-05-13 14:37:05 -04:00
|
|
|
The following example demonstrates the use of matrix stats to describe the relationship between income and poverty.
|
|
|
|
|
2019-09-05 10:11:25 -04:00
|
|
|
[source,console]
|
2016-05-13 14:37:05 -04:00
|
|
|
--------------------------------------------------
|
2017-08-30 06:11:10 -04:00
|
|
|
GET /_search
|
2016-05-13 14:37:05 -04:00
|
|
|
{
|
|
|
|
"aggs": {
|
2017-08-30 06:11:10 -04:00
|
|
|
"statistics": {
|
2016-05-13 14:37:05 -04:00
|
|
|
"matrix_stats": {
|
|
|
|
"fields": ["poverty", "income"]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2017-08-30 06:11:10 -04:00
|
|
|
// TEST[s/_search/_search\?filter_path=aggregations/]
|
2016-05-13 14:37:05 -04:00
|
|
|
|
|
|
|
The aggregation type is `matrix_stats` and the `fields` setting defines the set of fields (as an array) for computing
|
|
|
|
the statistics. The above request returns the following response:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
...
|
|
|
|
"aggregations": {
|
2017-08-30 06:11:10 -04:00
|
|
|
"statistics": {
|
2017-05-30 03:39:41 -04:00
|
|
|
"doc_count": 50,
|
2016-05-13 14:37:05 -04:00
|
|
|
"fields": [{
|
|
|
|
"name": "income",
|
|
|
|
"count": 50,
|
|
|
|
"mean": 51985.1,
|
|
|
|
"variance": 7.383377037755103E7,
|
|
|
|
"skewness": 0.5595114003506483,
|
|
|
|
"kurtosis": 2.5692365287787124,
|
|
|
|
"covariance": {
|
|
|
|
"income": 7.383377037755103E7,
|
|
|
|
"poverty": -21093.65836734694
|
|
|
|
},
|
|
|
|
"correlation": {
|
|
|
|
"income": 1.0,
|
|
|
|
"poverty": -0.8352655256272504
|
|
|
|
}
|
|
|
|
}, {
|
|
|
|
"name": "poverty",
|
|
|
|
"count": 50,
|
|
|
|
"mean": 12.732000000000001,
|
|
|
|
"variance": 8.637730612244896,
|
|
|
|
"skewness": 0.4516049811903419,
|
|
|
|
"kurtosis": 2.8615929677997767,
|
|
|
|
"covariance": {
|
|
|
|
"income": -21093.65836734694,
|
|
|
|
"poverty": 8.637730612244896
|
|
|
|
},
|
|
|
|
"correlation": {
|
|
|
|
"income": -0.8352655256272504,
|
|
|
|
"poverty": 1.0
|
|
|
|
}
|
|
|
|
}]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2017-08-30 06:11:10 -04:00
|
|
|
// TESTRESPONSE[s/\.\.\.//]
|
|
|
|
// TESTRESPONSE[s/: (\-)?[0-9\.E]+/: $body.$_path/]
|
2016-05-13 14:37:05 -04:00
|
|
|
|
2017-05-30 03:39:41 -04:00
|
|
|
The `doc_count` field indicates the number of documents involved in the computation of the statistics.
|
|
|
|
|
2016-05-13 14:37:05 -04:00
|
|
|
==== Multi Value Fields
|
|
|
|
|
|
|
|
The `matrix_stats` aggregation treats each document field as an independent sample. The `mode` parameter controls what
|
|
|
|
array value the aggregation will use for array or multi-valued fields. This parameter can take one of the following:
|
|
|
|
|
|
|
|
[horizontal]
|
|
|
|
`avg`:: (default) Use the average of all values.
|
|
|
|
`min`:: Pick the lowest value.
|
|
|
|
`max`:: Pick the highest value.
|
|
|
|
`sum`:: Use the sum of all values.
|
|
|
|
`median`:: Use the median of all values.
|
|
|
|
|
|
|
|
==== Missing Values
|
|
|
|
|
|
|
|
The `missing` parameter defines how documents that are missing a value should be treated.
|
|
|
|
By default they will be ignored but it is also possible to treat them as if they had a value.
|
|
|
|
This is done by adding a set of fieldname : value mappings to specify default values per field.
|
|
|
|
|
2019-09-05 10:11:25 -04:00
|
|
|
[source,console]
|
2016-05-13 14:37:05 -04:00
|
|
|
--------------------------------------------------
|
2017-08-30 06:11:10 -04:00
|
|
|
GET /_search
|
2016-05-13 14:37:05 -04:00
|
|
|
{
|
|
|
|
"aggs": {
|
|
|
|
"matrixstats": {
|
|
|
|
"matrix_stats": {
|
|
|
|
"fields": ["poverty", "income"],
|
|
|
|
"missing": {"income" : 50000} <1>
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
<1> Documents without a value in the `income` field will have the default value `50000`.
|
|
|
|
|
|
|
|
==== Script
|
|
|
|
|
|
|
|
This aggregation family does not yet support scripting.
|