druid/docs/development/extensions-core/stats.md

---
id: stats
title: "Stats aggregator"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->


This Apache Druid extension includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../development/extensions.md#loading-extensions) `druid-stats` in the extensions load list.

## Variance aggregator

Algorithm of the aggregator is the same with that of apache hive. This is the description in GenericUDAFVariance in hive.

Evaluate the variance using the algorithm described by Chan, Golub, and LeVeque in
"Algorithms for computing the sample variance: analysis and recommendations"
The American Statistician, 37 (1983) pp. 242--247.

variance = variance1 + variance2 + n/(m*(m+n)) * pow(((m/n)*t1 - t2),2)

where: - variance is sum(x-avg^2) (this is actually n times the variance)
and is updated at every step. - n is the count of elements in chunk1 - m is
the count of elements in chunk2 - t1 = sum of elements in chunk1, t2 =
sum of elements in chunk2.

This algorithm was proven to be numerically stable by J.L. Barlow in
"Error analysis of a pairwise summation algorithm to compute sample variance"
Numer. Math, 58 (1991) pp. 583--590

> As with all [aggregators](../../querying/sql.md#aggregation-functions), the order of operations across segments is
> non-deterministic. This means that if this aggregator operates with an input type of "float" or "double", the result
> of the aggregation may not be precisely the same across multiple runs of the query.
>
> To produce consistent results, round the variance to a fixed number of decimal places so that the results are 
> precisely the same across query runs.

### Pre-aggregating variance at ingestion time

To use this feature, an "variance" aggregator must be included at indexing time.
The ingestion aggregator can only apply to numeric values. If you use "variance"
then any input rows missing the value will be considered to have a value of 0.

User can specify expected input type as one of "float", "double", "long", "variance" for ingestion, which is by default "float".

```json
{
  "type" : "variance",
  "name" : <output_name>,
  "fieldName" : <metric_name>,
  "inputType" : <input_type>,
  "estimator" : <string>
}
```

To query for results, "variance" aggregator with "variance" input type or simply a "varianceFold" aggregator must be included in the query.

```json
{
  "type" : "varianceFold",
  "name" : <output_name>,
  "fieldName" : <metric_name>,
  "estimator" : <string>
}
```

|Property                 |Description                   |Default                           |
|-------------------------|------------------------------|----------------------------------|
|`estimator`|Set "population" to get variance_pop rather than variance_sample, which is default.|null|


### Standard deviation post-aggregator

To acquire standard deviation from variance, user can use "stddev" post aggregator.

```json
{
  "type": "stddev",
  "name": "<output_name>",
  "fieldName": "<aggregator_name>",
  "estimator": <string>
}
```

## Query examples:

### Timeseries query

```json
{
  "queryType": "timeseries",
  "dataSource": "testing",
  "granularity": "day",
  "aggregations": [
    {
      "type": "variance",
      "name": "index_var",
      "fieldName": "index_var"
    }
  ],
  "intervals": [
    "2016-03-01T00:00:00.000/2013-03-20T00:00:00.000"
  ]
}
```

### TopN query

```json
{
  "queryType": "topN",
  "dataSource": "testing",
  "dimensions": ["alias"],
  "threshold": 5,
  "granularity": "all",
  "aggregations": [
    {
      "type": "variance",
      "name": "index_var",
      "fieldName": "index"
    }
  ],
  "postAggregations": [
    {
      "type": "stddev",
      "name": "index_stddev",
      "fieldName": "index_var"
    }
  ],
  "intervals": [
    "2016-03-06T00:00:00/2016-03-06T23:59:59"
  ]
}
```

### GroupBy query

```json
{
  "queryType": "groupBy",
  "dataSource": "testing",
  "dimensions": ["alias"],
  "granularity": "all",
  "aggregations": [
    {
      "type": "variance",
      "name": "index_var",
      "fieldName": "index"
    }
  ],
  "postAggregations": [
    {
      "type": "stddev",
      "name": "index_stddev",
      "fieldName": "index_var"
    }
  ],
  "intervals": [
    "2016-03-06T00:00:00/2016-03-06T23:59:59"
  ]
}
```
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`---`
Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`id: stats`
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`title: "Stats aggregator"`
			`---`

add missing license headers, in particular to MD files; clean up RAT … (#6563) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg 2018-11-13 12:38:37 -05:00			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`

Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
MySQL extension with MariaDB connector docs (#11608) * add docs for mariadb support via mysql extensions * add logging so you know what druid knows * homogenize * spelling * missed a couple 2021-08-19 04:52:26 -04:00			This Apache Druid extension includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../development/extensions.md#loading-extensions) `druid-stats` in the extensions load list.
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			`## Variance aggregator`

			`Algorithm of the aggregator is the same with that of apache hive. This is the description in GenericUDAFVariance in hive.`

			`Evaluate the variance using the algorithm described by Chan, Golub, and LeVeque in`
			`"Algorithms for computing the sample variance: analysis and recommendations"`
			`The American Statistician, 37 (1983) pp. 242--247.`

			`variance = variance1 + variance2 + n/(m(m+n)) pow(((m/n)*t1 - t2),2)`

Docs consistency cleanup (#6259) 2018-09-04 15:54:41 -04:00			`where: - variance is sum(x-avg^2) (this is actually n times the variance)`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00			`and is updated at every step. - n is the count of elements in chunk1 - m is`
			`the count of elements in chunk2 - t1 = sum of elements in chunk1, t2 =`
			`sum of elements in chunk2.`

			`This algorithm was proven to be numerically stable by J.L. Barlow in`
			`"Error analysis of a pairwise summation algorithm to compute sample variance"`
			`Numer. Math, 58 (1991) pp. 583--590`

Add note about aggregations on floats (#10285) * Add note about aggreations on floats Floating point math is known to be unstable. Due to the way aggregators work across segments it's possible for the same query operating on the same data to produce slightly different results. The same problem exists with any aggregators that are not commutative since the merge order across segments is not guaranteed. * Also talk about doubles * Apply suggestions from code review 2020-08-17 16:29:57 -04:00			`> As with all [aggregators](../../querying/sql.md#aggregation-functions), the order of operations across segments is`
			`> non-deterministic. This means that if this aggregator operates with an input type of "float" or "double", the result`
			`> of the aggregation may not be precisely the same across multiple runs of the query.`
			`>`
			`> To produce consistent results, round the variance to a fixed number of decimal places so that the results are`
			`> precisely the same across query runs.`

Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00			`### Pre-aggregating variance at ingestion time`

			`To use this feature, an "variance" aggregator must be included at indexing time.`
			`The ingestion aggregator can only apply to numeric values. If you use "variance"`
			`then any input rows missing the value will be considered to have a value of 0.`

variance aggregator support for double columns (#9076) * variance aggregator support for double column instead of casting to float * docs * everything in its right place * checkstyle * adjustments 2020-02-12 12:32:42 -05:00			`User can specify expected input type as one of "float", "double", "long", "variance" for ingestion, which is by default "float".`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			```json
			`{`
			`"type" : "variance",`
			`"name" : <output_name>,`
			`"fieldName" : <metric_name>,`
			`"inputType" : <input_type>,`
			`"estimator" : <string>`
			`}`
			```

			`To query for results, "variance" aggregator with "variance" input type or simply a "varianceFold" aggregator must be included in the query.`

			```json
			`{`
			`"type" : "varianceFold",`
			`"name" : <output_name>,`
			`"fieldName" : <metric_name>,`
			`"estimator" : <string>`
			`}`
			```

			`\|Property \|Description \|Default \|`
			`\|-------------------------\|------------------------------\|----------------------------------\|`
			\|`estimator`\|Set "population" to get variance_pop rather than variance_sample, which is default.\|null\|


Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`### Standard deviation post-aggregator`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			`To acquire standard deviation from variance, user can use "stddev" post aggregator.`

			```json
			`{`
			`"type": "stddev",`
			`"name": "<output_name>",`
			`"fieldName": "<aggregator_name>",`
			`"estimator": <string>`
			`}`
			```

Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`## Query examples:`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`### Timeseries query`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			```json
			`{`
			`"queryType": "timeseries",`
			`"dataSource": "testing",`
			`"granularity": "day",`
			`"aggregations": [`
			`{`
			`"type": "variance",`
			`"name": "index_var",`
			`"fieldName": "index_var"`
			`}`
			`],`
			`"intervals": [`
			`"2016-03-01T00:00:00.000/2013-03-20T00:00:00.000"`
			`]`
			`}`
			```

Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`### TopN query`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			```json
			`{`
			`"queryType": "topN",`
			`"dataSource": "testing",`
			`"dimensions": ["alias"],`
			`"threshold": 5,`
			`"granularity": "all",`
			`"aggregations": [`
			`{`
			`"type": "variance",`
			`"name": "index_var",`
			`"fieldName": "index"`
			`}`
			`],`
			`"postAggregations": [`
			`{`
			`"type": "stddev",`
			`"name": "index_stddev",`
			`"fieldName": "index_var"`
			`}`
			`],`
			`"intervals": [`
			`"2016-03-06T00:00:00/2016-03-06T23:59:59"`
			`]`
			`}`
			```

Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`### GroupBy query`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			```json
			`{`
			`"queryType": "groupBy",`
			`"dataSource": "testing",`
			`"dimensions": ["alias"],`
			`"granularity": "all",`
			`"aggregations": [`
			`{`
			`"type": "variance",`
			`"name": "index_var",`
			`"fieldName": "index"`
			`}`
			`],`
			`"postAggregations": [`
			`{`
			`"type": "stddev",`
			`"name": "index_stddev",`
			`"fieldName": "index_var"`
			`}`
			`],`
			`"intervals": [`
			`"2016-03-06T00:00:00/2016-03-06T23:59:59"`
			`]`
			`}`
			```