druid/docs/content/Aggregations.md

---
layout: doc_page
---
# Aggregations
Aggregations are specifications of processing over metrics available in Druid.
Available aggregations are:

### Count aggregator

`count` computes the row count that match the filters

```json
{ "type" : "count", "name" : <output_name> }
```

### Sum aggregators

#### `longSum` aggregator

computes the sum of values as a 64-bit, signed integer

```json
{ "type" : "longSum", "name" : <output_name>, "fieldName" : <metric_name> }
```

`name` – output name for the summed value
`fieldName` – name of the metric column to sum over

#### `doubleSum` aggregator

Computes the sum of values as 64-bit floating point value. Similar to `longSum`

```json
{ "type" : "doubleSum", "name" : <output_name>, "fieldName" : <metric_name> }
```

### Min / Max aggregators

#### `min` aggregator

`min` computes the minimum metric value

```json
{ "type" : "min", "name" : <output_name>, "fieldName" : <metric_name> }
```

#### `max` aggregator

`max` computes the maximum metric value

```json
{ "type" : "max", "name" : <output_name>, "fieldName" : <metric_name> }
```

### JavaScript aggregator

Computes an arbitrary JavaScript function over a set of columns (both metrics and dimensions).

All JavaScript functions must return numerical values.

```json
{ "type": "javascript",
  "name": "<output_name>",
  "fieldNames"  : [ <column1>, <column2>, ... ],
  "fnAggregate" : "function(current, column1, column2, ...) {
                     <updates partial aggregate (current) based on the current row values>
                     return <updated partial aggregate>
                   }",
  "fnCombine"   : "function(partialA, partialB) { return <combined partial results>; }",
  "fnReset"     : "function()                   { return <initial value>; }"
}
```

**Example**

```json
{
  "type": "javascript",
  "name": "sum(log(x)/y) + 10",
  "fieldNames": ["x", "y"],
  "fnAggregate" : "function(current, a, b)      { return current + (Math.log(a) * b); }",
  "fnCombine"   : "function(partialA, partialB) { return partialA + partialB; }",
  "fnReset"     : "function()                   { return 10; }"
}
```

### Cardinality aggregator

Computes the cardinality of a set of Druid dimensions, using HyperLogLog to estimate the cardinality.

```json
{
  "type": "cardinality",
  "name": "<output_name>",
  "fieldNames": [ <dimension1>, <dimension2>, ... ],
  "byRow": <false | true> # (optional, defaults to false)
}
```

#### Cardinality by value

When setting `byRow` to `false` (the default) it computes the cardinality of the set composed of the union of all dimension values for all the given dimensions.

* For a single dimension, this is equivalent to

```sql
SELECT COUNT(DISCTINCT(dimension)) FROM <datasource>
```

* For multiple dimensions, this is equivalent to something akin to

```sql
SELECT COUNT(DISTINCT(value)) FROM (
  SELECT dim_1 as value FROM <datasource>
  UNION
  SELECT dim_2 as value FROM <datasource>
  UNION
  SELECT dim_3 as value FROM <datasource>
)
```

#### Cardinality by row

When setting `byRow` to `true` it computes the cardinality by row, i.e. the cardinality of distinct dimension combinations
This is equivalent to something akin to

```sql
SELECT COUNT(*) FROM ( SELECT DIM1, DIM2, DIM3 FROM <datasource> GROUP BY DIM1, DIM2, DIM3
```

**Example**

Determine the number of distinct categories items are assigned to.

```json
{
  "type": "cardinality",
  "name": "distinct_values",
  "fieldNames": [ "main_category", "secondary_category" ]
}
```

Determine the number of distinct   are assigned to.

```json
{
  "type": "cardinality",
  "name": "distinct_values",
  "fieldNames": [ "", "secondary_category" ],
  "byRow" : true
}
```

## Complex Aggregations

### HyperUnique aggregator

Uses [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) to compute the estimated cardinality of a dimension that has been aggregated as a "hyperUnique" metric at indexing time.

```json
{ "type" : "hyperUnique", "name" : <output_name>, "fieldName" : <metric_name> }
```

## Miscellaneous Aggregations

### Filtered Aggregator

A filtered aggregator wraps any given aggregator, but only aggregates the values for which the given dimension filter matches.

This makes it possible to compute the results of a filtered and an unfiltered aggregation simultaneously, without having to issue multiple queries, and use both results as part of post-aggregations.

*Limitations:* The filtered aggregator currently only supports selector and not filter with a single selector, i.e. matching a dimension against a single value.

*Note:* If only the filtered results are required, consider putting the filter on the query itself, which will be much faster since it does not require scanning all the data.

```json
{
  "type" : "filtered",
  "name" : "aggMatching",
  "filter" : {
    "type" : "selector",
    "dimension" : <dimension>,
    "value" : <dimension value>
  }
  "aggregator" : <aggregation>
}
```
-												Added prepend tag to make pages display.

											
										
										
											2013-09-16 17:49:36 -04:00
+								---
-												Docs working

											
										
										
											2013-09-26 19:22:28 -04:00
+								layout: doc_page
-												Added prepend tag to make pages display.

											
										
										
											2013-09-16 17:49:36 -04:00
+								---
-												added titles; fixed a few typos

											
										
										
											2014-01-16 18:37:07 -05:00
+								# Aggregations
-												Converted links, sans space to slash

											
										
										
											2013-09-16 19:01:14 -04:00
+								Aggregations are specifications of processing over metrics available in Druid.
 								Available aggregations are:
-												Finish converting docs over to something that displays properly

											
										
										
											2013-09-27 20:08:34 -04:00
+								### Count aggregator
 								`count` computes the row count that match the filters
 								```json
 								{ "type" : "count", "name" : <output_name> }
 								```
-												Converted links, sans space to slash

											
										
										
											2013-09-16 19:01:14 -04:00
+								### Sum aggregators
 								#### `longSum` aggregator
 								computes the sum of values as a 64-bit, signed integer
-												Finish converting docs over to something that displays properly

											
										
										
											2013-09-27 20:08:34 -04:00
+								```json
 								{ "type" : "longSum", "name" : <output_name>, "fieldName" : <metric_name> }
 								```
-												Converted links, sans space to slash

											
										
										
											2013-09-16 19:01:14 -04:00
 								`name` – output name for the summed value
 								`fieldName` – name of the metric column to sum over
 								#### `doubleSum` aggregator
 								Computes the sum of values as 64-bit floating point value. Similar to `longSum`
-												Finish converting docs over to something that displays properly

											
										
										
											2013-09-27 20:08:34 -04:00
+								```json
 								{ "type" : "doubleSum", "name" : <output_name>, "fieldName" : <metric_name> }
 								```
-												Converted links, sans space to slash

											
										
										
											2013-09-16 19:01:14 -04:00
 								### Min / Max aggregators
 								#### `min` aggregator
 								`min` computes the minimum metric value
-												Finish converting docs over to something that displays properly

											
										
										
											2013-09-27 20:08:34 -04:00
+								```json
 								{ "type" : "min", "name" : <output_name>, "fieldName" : <metric_name> }
 								```
-												Converted links, sans space to slash

											
										
										
											2013-09-16 19:01:14 -04:00
 								#### `max` aggregator
 								`max` computes the maximum metric value
-												Finish converting docs over to something that displays properly

											
										
										
											2013-09-27 20:08:34 -04:00
+								```json
 								{ "type" : "max", "name" : <output_name>, "fieldName" : <metric_name> }
 								```
-												Converted links, sans space to slash

											
										
										
											2013-09-16 19:01:14 -04:00
 								### JavaScript aggregator
 								Computes an arbitrary JavaScript function over a set of columns (both metrics and dimensions).
 								All JavaScript functions must return numerical values.
-												Finish converting docs over to something that displays properly

											
										
										
											2013-09-27 20:08:34 -04:00
+								```json
-												add cardinality aggregator docs

											
										
										
											2014-04-25 16:23:43 -04:00
+								{ "type": "javascript",
 								  "name": "<output_name>",
-												Finish converting docs over to something that displays properly

											
										
										
											2013-09-27 20:08:34 -04:00
+								  "fieldNames"  : [ <column1>, <column2>, ... ],
 								  "fnAggregate" : "function(current, column1, column2, ...) {
 								                     <updates partial aggregate (current) based on the current row values>
 								                     return <updated partial aggregate>
 								                   }",
 								  "fnCombine"   : "function(partialA, partialB) { return <combined partial results>; }",
 								  "fnReset"     : "function()                   { return <initial value>; }"
 								}
 								```
-												Converted links, sans space to slash

											
										
										
											2013-09-16 19:01:14 -04:00
 								**Example**
-												Finish converting docs over to something that displays properly

											
										
										
											2013-09-27 20:08:34 -04:00
+								```json
 								{
 								  "type": "javascript",
 								  "name": "sum(log(x)/y) + 10",
 								  "fieldNames": ["x", "y"],
 								  "fnAggregate" : "function(current, a, b)      { return current + (Math.log(a) * b); }",
 								  "fnCombine"   : "function(partialA, partialB) { return partialA + partialB; }",
 								  "fnReset"     : "function()                   { return 10; }"
 								}
 								```
-												port hyperunique to open source

											
										
										
											2014-03-05 17:19:38 -05:00
-												add cardinality aggregator docs

											
										
										
											2014-04-25 16:23:43 -04:00
+								### Cardinality aggregator
-												port hyperunique to open source

											
										
										
											2014-03-05 17:19:38 -05:00
-												add cardinality aggregator docs

											
										
										
											2014-04-25 16:23:43 -04:00
+								Computes the cardinality of a set of Druid dimensions, using HyperLogLog to estimate the cardinality.
-												port hyperunique to open source

											
										
										
											2014-03-05 17:19:38 -05:00
-												add cardinality aggregator docs

											
										
										
											2014-04-25 16:23:43 -04:00
+								```json
 								{
 								  "type": "cardinality",
 								  "name": "<output_name>",
 								  "fieldNames": [ <dimension1>, <dimension2>, ... ],
 								  "byRow": <false | true> # (optional, defaults to false)
 								}
 								```
 								#### Cardinality by value
-												cosmetic changes

											
										
										
											2014-04-25 16:24:34 -04:00
+								When setting `byRow` to `false` (the default) it computes the cardinality of the set composed of the union of all dimension values for all the given dimensions.
-												add cardinality aggregator docs

											
										
										
											2014-04-25 16:23:43 -04:00
 								* For a single dimension, this is equivalent to
 								```sql
 								SELECT COUNT(DISCTINCT(dimension)) FROM <datasource>
 								```
 								* For multiple dimensions, this is equivalent to something akin to
 								```sql
-												cosmetic changes

											
										
										
											2014-04-25 16:24:34 -04:00
+								SELECT COUNT(DISTINCT(value)) FROM (
-												add cardinality aggregator docs

											
										
										
											2014-04-25 16:23:43 -04:00
+								  SELECT dim_1 as value FROM <datasource>
 								  UNION
 								  SELECT dim_2 as value FROM <datasource>
 								  UNION
 								  SELECT dim_3 as value FROM <datasource>
 								)
 								```
 								#### Cardinality by row
-												cosmetic changes

											
										
										
											2014-04-25 16:24:34 -04:00
+								When setting `byRow` to `true` it computes the cardinality by row, i.e. the cardinality of distinct dimension combinations
-												add cardinality aggregator docs

											
										
										
											2014-04-25 16:23:43 -04:00
+								This is equivalent to something akin to
 								```sql
-												cosmetic changes

											
										
										
											2014-04-25 16:24:34 -04:00
+								SELECT COUNT(*) FROM ( SELECT DIM1, DIM2, DIM3 FROM <datasource> GROUP BY DIM1, DIM2, DIM3
-												add cardinality aggregator docs

											
										
										
											2014-04-25 16:23:43 -04:00
+								```
 								**Example**
 								Determine the number of distinct categories items are assigned to.
 								```json
 								{
 								  "type": "cardinality",
 								  "name": "distinct_values",
 								  "fieldNames": [ "main_category", "secondary_category" ]
 								}
 								```
 								Determine the number of distinct   are assigned to.
 								```json
 								{
 								  "type": "cardinality",
 								  "name": "distinct_values",
 								  "fieldNames": [ "", "secondary_category" ],
 								  "byRow" : true
 								}
 								```
 								## Complex Aggregations
 								### HyperUnique aggregator
-												Approximate Histogram Module

											
										
										
											2014-06-27 19:51:01 -04:00
+								Uses [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) to compute the estimated cardinality of a dimension that has been aggregated as a "hyperUnique" metric at indexing time.
-												port hyperunique to open source

											
										
										
											2014-03-05 17:19:38 -05:00
 								```json
 								{ "type" : "hyperUnique", "name" : <output_name>, "fieldName" : <metric_name> }
-												fix serde

											
										
										
											2014-10-21 17:09:11 -04:00
+								```
-												add docs

											
										
										
											2014-10-21 20:24:44 -04:00
 								## Miscellaneous Aggregations
 								### Filtered Aggregator
 								A filtered aggregator wraps any given aggregator, but only aggregates the values for which the given dimension filter matches.
 								This makes it possible to compute the results of a filtered and an unfiltered aggregation simultaneously, without having to issue multiple queries, and use both results as part of post-aggregations.
-												review comments

											
										
										
											2014-10-22 11:58:31 -04:00
+								*Limitations:* The filtered aggregator currently only supports selector and not filter with a single selector, i.e. matching a dimension against a single value.
-												add docs

											
										
										
											2014-10-21 20:24:44 -04:00
 								*Note:* If only the filtered results are required, consider putting the filter on the query itself, which will be much faster since it does not require scanning all the data.
 								```json
 								{
 								  "type" : "filtered",
 								  "name" : "aggMatching",
 								  "filter" : {
 								    "type" : "selector",
 								    "dimension" : <dimension>,
 								    "value" : <dimension value>
 								  }
 								  "aggregator" : <aggregation>
 								}
 								```