druid/docs/development/extensions-core/stats.md

---
id: stats
title: "Stats aggregator"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->


This Apache Druid (incubating) extension includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../development/extensions.md#loading-extensions) `druid-stats` as an extension.

## Variance aggregator

Algorithm of the aggregator is the same with that of apache hive. This is the description in GenericUDAFVariance in hive.

Evaluate the variance using the algorithm described by Chan, Golub, and LeVeque in
"Algorithms for computing the sample variance: analysis and recommendations"
The American Statistician, 37 (1983) pp. 242--247.

variance = variance1 + variance2 + n/(m*(m+n)) * pow(((m/n)*t1 - t2),2)

where: - variance is sum(x-avg^2) (this is actually n times the variance)
and is updated at every step. - n is the count of elements in chunk1 - m is
the count of elements in chunk2 - t1 = sum of elements in chunk1, t2 =
sum of elements in chunk2.

This algorithm was proven to be numerically stable by J.L. Barlow in
"Error analysis of a pairwise summation algorithm to compute sample variance"
Numer. Math, 58 (1991) pp. 583--590

### Pre-aggregating variance at ingestion time

To use this feature, an "variance" aggregator must be included at indexing time.
The ingestion aggregator can only apply to numeric values. If you use "variance"
then any input rows missing the value will be considered to have a value of 0.

User can specify expected input type as one of "float", "long", "variance" for ingestion, which is by default "float".

```json
{
  "type" : "variance",
  "name" : <output_name>,
  "fieldName" : <metric_name>,
  "inputType" : <input_type>,
  "estimator" : <string>
}
```

To query for results, "variance" aggregator with "variance" input type or simply a "varianceFold" aggregator must be included in the query.

```json
{
  "type" : "varianceFold",
  "name" : <output_name>,
  "fieldName" : <metric_name>,
  "estimator" : <string>
}
```

|Property                 |Description                   |Default                           |
|-------------------------|------------------------------|----------------------------------|
|`estimator`|Set "population" to get variance_pop rather than variance_sample, which is default.|null|


### Standard deviation post-aggregator

To acquire standard deviation from variance, user can use "stddev" post aggregator.

```json
{
  "type": "stddev",
  "name": "<output_name>",
  "fieldName": "<aggregator_name>",
  "estimator": <string>
}
```

## Query examples:

### Timeseries query

```json
{
  "queryType": "timeseries",
  "dataSource": "testing",
  "granularity": "day",
  "aggregations": [
    {
      "type": "variance",
      "name": "index_var",
      "fieldName": "index_var"
    }
  ],
  "intervals": [
    "2016-03-01T00:00:00.000/2013-03-20T00:00:00.000"
  ]
}
```

### TopN query

```json
{
  "queryType": "topN",
  "dataSource": "testing",
  "dimensions": ["alias"],
  "threshold": 5,
  "granularity": "all",
  "aggregations": [
    {
      "type": "variance",
      "name": "index_var",
      "fieldName": "index"
    }
  ],
  "postAggregations": [
    {
      "type": "stddev",
      "name": "index_stddev",
      "fieldName": "index_var"
    }
  ],
  "intervals": [
    "2016-03-06T00:00:00/2016-03-06T23:59:59"
  ]
}
```

### GroupBy query

```json
{
  "queryType": "groupBy",
  "dataSource": "testing",
  "dimensions": ["alias"],
  "granularity": "all",
  "aggregations": [
    {
      "type": "variance",
      "name": "index_var",
      "fieldName": "index"
    }
  ],
  "postAggregations": [
    {
      "type": "stddev",
      "name": "index_stddev",
      "fieldName": "index_var"
    }
  ],
  "intervals": [
    "2016-03-06T00:00:00/2016-03-06T23:59:59"
  ]
}
```
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`---`
Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`id: stats`
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`title: "Stats aggregator"`
			`---`

add missing license headers, in particular to MD files; clean up RAT … (#6563) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg 2018-11-13 12:38:37 -05:00			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`

Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			This Apache Druid (incubating) extension includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../development/extensions.md#loading-extensions) `druid-stats` as an extension.
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			`## Variance aggregator`

			`Algorithm of the aggregator is the same with that of apache hive. This is the description in GenericUDAFVariance in hive.`

			`Evaluate the variance using the algorithm described by Chan, Golub, and LeVeque in`
			`"Algorithms for computing the sample variance: analysis and recommendations"`
			`The American Statistician, 37 (1983) pp. 242--247.`

			`variance = variance1 + variance2 + n/(m(m+n)) pow(((m/n)*t1 - t2),2)`

Docs consistency cleanup (#6259) 2018-09-04 15:54:41 -04:00			`where: - variance is sum(x-avg^2) (this is actually n times the variance)`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00			`and is updated at every step. - n is the count of elements in chunk1 - m is`
			`the count of elements in chunk2 - t1 = sum of elements in chunk1, t2 =`
			`sum of elements in chunk2.`

			`This algorithm was proven to be numerically stable by J.L. Barlow in`
			`"Error analysis of a pairwise summation algorithm to compute sample variance"`
			`Numer. Math, 58 (1991) pp. 583--590`

			`### Pre-aggregating variance at ingestion time`

			`To use this feature, an "variance" aggregator must be included at indexing time.`
			`The ingestion aggregator can only apply to numeric values. If you use "variance"`
			`then any input rows missing the value will be considered to have a value of 0.`

			`User can specify expected input type as one of "float", "long", "variance" for ingestion, which is by default "float".`

			```json
			`{`
			`"type" : "variance",`
			`"name" : <output_name>,`
			`"fieldName" : <metric_name>,`
			`"inputType" : <input_type>,`
			`"estimator" : <string>`
			`}`
			```

			`To query for results, "variance" aggregator with "variance" input type or simply a "varianceFold" aggregator must be included in the query.`

			```json
			`{`
			`"type" : "varianceFold",`
			`"name" : <output_name>,`
			`"fieldName" : <metric_name>,`
			`"estimator" : <string>`
			`}`
			```

			`\|Property \|Description \|Default \|`
			`\|-------------------------\|------------------------------\|----------------------------------\|`
			\|`estimator`\|Set "population" to get variance_pop rather than variance_sample, which is default.\|null\|


Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`### Standard deviation post-aggregator`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			`To acquire standard deviation from variance, user can use "stddev" post aggregator.`

			```json
			`{`
			`"type": "stddev",`
			`"name": "<output_name>",`
			`"fieldName": "<aggregator_name>",`
			`"estimator": <string>`
			`}`
			```

Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`## Query examples:`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`### Timeseries query`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			```json
			`{`
			`"queryType": "timeseries",`
			`"dataSource": "testing",`
			`"granularity": "day",`
			`"aggregations": [`
			`{`
			`"type": "variance",`
			`"name": "index_var",`
			`"fieldName": "index_var"`
			`}`
			`],`
			`"intervals": [`
			`"2016-03-01T00:00:00.000/2013-03-20T00:00:00.000"`
			`]`
			`}`
			```

Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`### TopN query`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			```json
			`{`
			`"queryType": "topN",`
			`"dataSource": "testing",`
			`"dimensions": ["alias"],`
			`"threshold": 5,`
			`"granularity": "all",`
			`"aggregations": [`
			`{`
			`"type": "variance",`
			`"name": "index_var",`
			`"fieldName": "index"`
			`}`
			`],`
			`"postAggregations": [`
			`{`
			`"type": "stddev",`
			`"name": "index_stddev",`
			`"fieldName": "index_var"`
			`}`
			`],`
			`"intervals": [`
			`"2016-03-06T00:00:00/2016-03-06T23:59:59"`
			`]`
			`}`
			```

Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`### GroupBy query`
Support variance and standard deviation (#2525) * Support variance and standard deviation * addressed comments 2016-08-04 20:32:58 -04:00
			```json
			`{`
			`"queryType": "groupBy",`
			`"dataSource": "testing",`
			`"dimensions": ["alias"],`
			`"granularity": "all",`
			`"aggregations": [`
			`{`
			`"type": "variance",`
			`"name": "index_var",`
			`"fieldName": "index"`
			`}`
			`],`
			`"postAggregations": [`
			`{`
			`"type": "stddev",`
			`"name": "index_stddev",`
			`"fieldName": "index_var"`
			`}`
			`],`
			`"intervals": [`
			`"2016-03-06T00:00:00/2016-03-06T23:59:59"`
			`]`
			`}`
			```