OpenSearch/docs/reference/aggregations/matrix/stats-aggregation.asciidoc

[[search-aggregations-matrix-stats-aggregation]]
=== Matrix Stats

The `matrix_stats` aggregation is a numeric aggregation that computes the following statistics over a set of document fields:

[horizontal]
`count`:: Number of per field samples included in the calculation.
`mean`:: The average value for each field.
`variance`:: Per field Measurement for how spread out the samples are from the mean.
`skewness`:: Per field measurement quantifying the asymmetric distribution around the mean.
`kurtosis`:: Per field measurement quantifying the shape of the distribution.
`covariance`:: A matrix that quantitatively describes how changes in one field are associated with another.
`correlation`:: The covariance matrix scaled to a range of -1 to 1, inclusive. Describes the relationship between field
            distributions.

//////////////////////////

[source,js]
--------------------------------------------------
PUT /statistics/_doc/0
{"poverty": 24.0, "income": 50000.0}

PUT /statistics/_doc/1
{"poverty": 13.0, "income": 95687.0}

PUT /statistics/_doc/2
{"poverty": 69.0, "income": 7890.0}

POST /_refresh
--------------------------------------------------
// NOTCONSOLE
// TESTSETUP

//////////////////////////

The following example demonstrates the use of matrix stats to describe the relationship between income and poverty.

[source,console]
--------------------------------------------------
GET /_search
{
    "aggs": {
        "statistics": {
            "matrix_stats": {
                "fields": ["poverty", "income"]
            }
        }
    }
}
--------------------------------------------------
// TEST[s/_search/_search\?filter_path=aggregations/]

The aggregation type is `matrix_stats` and the `fields` setting defines the set of fields (as an array) for computing
the statistics. The above request returns the following response:

[source,console-result]
--------------------------------------------------
{
    ...
    "aggregations": {
        "statistics": {
            "doc_count": 50,
            "fields": [{
                "name": "income",
                "count": 50,
                "mean": 51985.1,
                "variance": 7.383377037755103E7,
                "skewness": 0.5595114003506483,
                "kurtosis": 2.5692365287787124,
                "covariance": {
                    "income": 7.383377037755103E7,
                    "poverty": -21093.65836734694
                },
                "correlation": {
                    "income": 1.0,
                    "poverty": -0.8352655256272504
                }
            }, {
                "name": "poverty",
                "count": 50,
                "mean": 12.732000000000001,
                "variance": 8.637730612244896,
                "skewness": 0.4516049811903419,
                "kurtosis": 2.8615929677997767,
                "covariance": {
                    "income": -21093.65836734694,
                    "poverty": 8.637730612244896
                },
                "correlation": {
                    "income": -0.8352655256272504,
                    "poverty": 1.0
                }
            }]
        }
    }
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\.//]
// TESTRESPONSE[s/: (\-)?[0-9\.E]+/: $body.$_path/]

The `doc_count` field indicates the number of documents involved in the computation of the statistics.

==== Multi Value Fields

The `matrix_stats` aggregation treats each document field as an independent sample. The `mode` parameter controls what
array value the aggregation will use for array or multi-valued fields. This parameter can take one of the following:

[horizontal]
`avg`:: (default) Use the average of all values.
`min`:: Pick the lowest value.
`max`:: Pick the highest value.
`sum`:: Use the sum of all values.
`median`:: Use the median of all values.

==== Missing Values

The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they had a value.
This is done by adding a set of fieldname : value mappings to specify default values per field.

[source,console]
--------------------------------------------------
GET /_search
{
    "aggs": {
        "matrixstats": {
            "matrix_stats": {
                "fields": ["poverty", "income"],
                "missing": {"income" : 50000} <1>
            }
        }
    }
}
--------------------------------------------------

<1> Documents without a value in the `income` field will have the default value `50000`.

==== Script

This aggregation family does not yet support scripting.