druid/docs/content/Aggregations.md

163 lines
4.0 KiB
Markdown
Raw Normal View History

---
2013-09-26 19:22:28 -04:00
layout: doc_page
---
2014-01-16 18:37:07 -05:00
# Aggregations
2013-09-16 19:01:14 -04:00
Aggregations are specifications of processing over metrics available in Druid.
Available aggregations are:
### Count aggregator
`count` computes the row count that match the filters
```json
{ "type" : "count", "name" : <output_name> }
```
2013-09-16 19:01:14 -04:00
### Sum aggregators
#### `longSum` aggregator
computes the sum of values as a 64-bit, signed integer
```json
{ "type" : "longSum", "name" : <output_name>, "fieldName" : <metric_name> }
```
2013-09-16 19:01:14 -04:00
`name` output name for the summed value
`fieldName` name of the metric column to sum over
#### `doubleSum` aggregator
Computes the sum of values as 64-bit floating point value. Similar to `longSum`
```json
{ "type" : "doubleSum", "name" : <output_name>, "fieldName" : <metric_name> }
```
2013-09-16 19:01:14 -04:00
### Min / Max aggregators
#### `min` aggregator
`min` computes the minimum metric value
```json
{ "type" : "min", "name" : <output_name>, "fieldName" : <metric_name> }
```
2013-09-16 19:01:14 -04:00
#### `max` aggregator
`max` computes the maximum metric value
```json
{ "type" : "max", "name" : <output_name>, "fieldName" : <metric_name> }
```
2013-09-16 19:01:14 -04:00
### JavaScript aggregator
Computes an arbitrary JavaScript function over a set of columns (both metrics and dimensions).
All JavaScript functions must return numerical values.
```json
2014-04-25 16:23:43 -04:00
{ "type": "javascript",
"name": "<output_name>",
"fieldNames" : [ <column1>, <column2>, ... ],
"fnAggregate" : "function(current, column1, column2, ...) {
<updates partial aggregate (current) based on the current row values>
return <updated partial aggregate>
}",
"fnCombine" : "function(partialA, partialB) { return <combined partial results>; }",
"fnReset" : "function() { return <initial value>; }"
}
```
2013-09-16 19:01:14 -04:00
**Example**
```json
{
"type": "javascript",
"name": "sum(log(x)/y) + 10",
"fieldNames": ["x", "y"],
"fnAggregate" : "function(current, a, b) { return current + (Math.log(a) * b); }",
"fnCombine" : "function(partialA, partialB) { return partialA + partialB; }",
"fnReset" : "function() { return 10; }"
}
```
2014-03-05 17:19:38 -05:00
2014-04-25 16:23:43 -04:00
### Cardinality aggregator
2014-03-05 17:19:38 -05:00
2014-04-25 16:23:43 -04:00
Computes the cardinality of a set of Druid dimensions, using HyperLogLog to estimate the cardinality.
2014-03-05 17:19:38 -05:00
2014-04-25 16:23:43 -04:00
```json
{
"type": "cardinality",
"name": "<output_name>",
"fieldNames": [ <dimension1>, <dimension2>, ... ],
"byRow": <false | true> # (optional, defaults to false)
}
```
#### Cardinality by value
2014-04-25 16:24:34 -04:00
When setting `byRow` to `false` (the default) it computes the cardinality of the set composed of the union of all dimension values for all the given dimensions.
2014-04-25 16:23:43 -04:00
* For a single dimension, this is equivalent to
```sql
SELECT COUNT(DISCTINCT(dimension)) FROM <datasource>
```
* For multiple dimensions, this is equivalent to something akin to
```sql
2014-04-25 16:24:34 -04:00
SELECT COUNT(DISTINCT(value)) FROM (
2014-04-25 16:23:43 -04:00
SELECT dim_1 as value FROM <datasource>
UNION
SELECT dim_2 as value FROM <datasource>
UNION
SELECT dim_3 as value FROM <datasource>
)
```
#### Cardinality by row
2014-04-25 16:24:34 -04:00
When setting `byRow` to `true` it computes the cardinality by row, i.e. the cardinality of distinct dimension combinations
2014-04-25 16:23:43 -04:00
This is equivalent to something akin to
```sql
2014-04-25 16:24:34 -04:00
SELECT COUNT(*) FROM ( SELECT DIM1, DIM2, DIM3 FROM <datasource> GROUP BY DIM1, DIM2, DIM3
2014-04-25 16:23:43 -04:00
```
**Example**
Determine the number of distinct categories items are assigned to.
```json
{
"type": "cardinality",
"name": "distinct_values",
"fieldNames": [ "main_category", "secondary_category" ]
}
```
Determine the number of distinct are assigned to.
```json
{
"type": "cardinality",
"name": "distinct_values",
"fieldNames": [ "", "secondary_category" ],
"byRow" : true
}
```
## Complex Aggregations
### HyperUnique aggregator
Uses [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) to compute the estimated cardinality of a dimension that has been aggregated as a hyperUnique metric at indexing time.
2014-03-05 17:19:38 -05:00
```json
{ "type" : "hyperUnique", "name" : <output_name>, "fieldName" : <metric_name> }
```