mirror of https://github.com/apache/druid.git
203 lines
5.3 KiB
Markdown
203 lines
5.3 KiB
Markdown
|
---
|
|||
|
layout: doc_page
|
|||
|
---
|
|||
|
# Aggregations
|
|||
|
Aggregations are specifications of processing over metrics available in Druid.
|
|||
|
Available aggregations are:
|
|||
|
|
|||
|
### Count aggregator
|
|||
|
|
|||
|
`count` computes the row count that match the filters
|
|||
|
|
|||
|
```json
|
|||
|
{ "type" : "count", "name" : <output_name> }
|
|||
|
```
|
|||
|
|
|||
|
### Sum aggregators
|
|||
|
|
|||
|
#### `longSum` aggregator
|
|||
|
|
|||
|
computes the sum of values as a 64-bit, signed integer
|
|||
|
|
|||
|
```json
|
|||
|
{ "type" : "longSum", "name" : <output_name>, "fieldName" : <metric_name> }
|
|||
|
```
|
|||
|
|
|||
|
`name` – output name for the summed value
|
|||
|
`fieldName` – name of the metric column to sum over
|
|||
|
|
|||
|
#### `doubleSum` aggregator
|
|||
|
|
|||
|
Computes the sum of values as 64-bit floating point value. Similar to `longSum`
|
|||
|
|
|||
|
```json
|
|||
|
{ "type" : "doubleSum", "name" : <output_name>, "fieldName" : <metric_name> }
|
|||
|
```
|
|||
|
|
|||
|
### Min / Max aggregators
|
|||
|
|
|||
|
#### `doubleMin` aggregator
|
|||
|
|
|||
|
`doubleMin` computes the minimum of all metric values and Double.POSITIVE_INFINITY
|
|||
|
|
|||
|
```json
|
|||
|
{ "type" : "doubleMin", "name" : <output_name>, "fieldName" : <metric_name> }
|
|||
|
```
|
|||
|
|
|||
|
#### `doubleMax` aggregator
|
|||
|
|
|||
|
`doubleMax` computes the maximum of all metric values and Double.NEGATIVE_INFINITY
|
|||
|
|
|||
|
```json
|
|||
|
{ "type" : "doubleMax", "name" : <output_name>, "fieldName" : <metric_name> }
|
|||
|
```
|
|||
|
|
|||
|
#### `longMin` aggregator
|
|||
|
|
|||
|
`longMin` computes the minimum of all metric values and Long.MAX_VALUE
|
|||
|
|
|||
|
```json
|
|||
|
{ "type" : "longMin", "name" : <output_name>, "fieldName" : <metric_name> }
|
|||
|
```
|
|||
|
|
|||
|
#### `longMax` aggregator
|
|||
|
|
|||
|
`longMax` computes the maximum of all metric values and Long.MIN_VALUE
|
|||
|
|
|||
|
```json
|
|||
|
{ "type" : "longMax", "name" : <output_name>, "fieldName" : <metric_name> }
|
|||
|
```
|
|||
|
|
|||
|
### JavaScript aggregator
|
|||
|
|
|||
|
Computes an arbitrary JavaScript function over a set of columns (both metrics and dimensions).
|
|||
|
|
|||
|
All JavaScript functions must return numerical values.
|
|||
|
|
|||
|
```json
|
|||
|
{ "type": "javascript",
|
|||
|
"name": "<output_name>",
|
|||
|
"fieldNames" : [ <column1>, <column2>, ... ],
|
|||
|
"fnAggregate" : "function(current, column1, column2, ...) {
|
|||
|
<updates partial aggregate (current) based on the current row values>
|
|||
|
return <updated partial aggregate>
|
|||
|
}",
|
|||
|
"fnCombine" : "function(partialA, partialB) { return <combined partial results>; }",
|
|||
|
"fnReset" : "function() { return <initial value>; }"
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
**Example**
|
|||
|
|
|||
|
```json
|
|||
|
{
|
|||
|
"type": "javascript",
|
|||
|
"name": "sum(log(x)/y) + 10",
|
|||
|
"fieldNames": ["x", "y"],
|
|||
|
"fnAggregate" : "function(current, a, b) { return current + (Math.log(a) * b); }",
|
|||
|
"fnCombine" : "function(partialA, partialB) { return partialA + partialB; }",
|
|||
|
"fnReset" : "function() { return 10; }"
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
### Cardinality aggregator
|
|||
|
|
|||
|
Computes the cardinality of a set of Druid dimensions, using HyperLogLog to estimate the cardinality.
|
|||
|
|
|||
|
```json
|
|||
|
{
|
|||
|
"type": "cardinality",
|
|||
|
"name": "<output_name>",
|
|||
|
"fieldNames": [ <dimension1>, <dimension2>, ... ],
|
|||
|
"byRow": <false | true> # (optional, defaults to false)
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
#### Cardinality by value
|
|||
|
|
|||
|
When setting `byRow` to `false` (the default) it computes the cardinality of the set composed of the union of all dimension values for all the given dimensions.
|
|||
|
|
|||
|
* For a single dimension, this is equivalent to
|
|||
|
|
|||
|
```sql
|
|||
|
SELECT COUNT(DISTINCT(dimension)) FROM <datasource>
|
|||
|
```
|
|||
|
|
|||
|
* For multiple dimensions, this is equivalent to something akin to
|
|||
|
|
|||
|
```sql
|
|||
|
SELECT COUNT(DISTINCT(value)) FROM (
|
|||
|
SELECT dim_1 as value FROM <datasource>
|
|||
|
UNION
|
|||
|
SELECT dim_2 as value FROM <datasource>
|
|||
|
UNION
|
|||
|
SELECT dim_3 as value FROM <datasource>
|
|||
|
)
|
|||
|
```
|
|||
|
|
|||
|
#### Cardinality by row
|
|||
|
|
|||
|
When setting `byRow` to `true` it computes the cardinality by row, i.e. the cardinality of distinct dimension combinations
|
|||
|
This is equivalent to something akin to
|
|||
|
|
|||
|
```sql
|
|||
|
SELECT COUNT(*) FROM ( SELECT DIM1, DIM2, DIM3 FROM <datasource> GROUP BY DIM1, DIM2, DIM3
|
|||
|
```
|
|||
|
|
|||
|
**Example**
|
|||
|
|
|||
|
Determine the number of distinct categories items are assigned to.
|
|||
|
|
|||
|
```json
|
|||
|
{
|
|||
|
"type": "cardinality",
|
|||
|
"name": "distinct_values",
|
|||
|
"fieldNames": [ "main_category", "secondary_category" ]
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
Determine the number of distinct are assigned to.
|
|||
|
|
|||
|
```json
|
|||
|
{
|
|||
|
"type": "cardinality",
|
|||
|
"name": "distinct_values",
|
|||
|
"fieldNames": [ "", "secondary_category" ],
|
|||
|
"byRow" : true
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
## Complex Aggregations
|
|||
|
|
|||
|
### HyperUnique aggregator
|
|||
|
|
|||
|
Uses [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) to compute the estimated cardinality of a dimension that has been aggregated as a "hyperUnique" metric at indexing time.
|
|||
|
|
|||
|
```json
|
|||
|
{ "type" : "hyperUnique", "name" : <output_name>, "fieldName" : <metric_name> }
|
|||
|
```
|
|||
|
|
|||
|
## Miscellaneous Aggregations
|
|||
|
|
|||
|
### Filtered Aggregator
|
|||
|
|
|||
|
A filtered aggregator wraps any given aggregator, but only aggregates the values for which the given dimension filter matches.
|
|||
|
|
|||
|
This makes it possible to compute the results of a filtered and an unfiltered aggregation simultaneously, without having to issue multiple queries, and use both results as part of post-aggregations.
|
|||
|
|
|||
|
*Limitations:* The filtered aggregator currently only supports 'or', 'and', 'selector' and 'not' filters, i.e. matching one or multiple dimensions against a single value.
|
|||
|
|
|||
|
*Note:* If only the filtered results are required, consider putting the filter on the query itself, which will be much faster since it does not require scanning all the data.
|
|||
|
|
|||
|
```json
|
|||
|
{
|
|||
|
"type" : "filtered",
|
|||
|
"filter" : {
|
|||
|
"type" : "selector",
|
|||
|
"dimension" : <dimension>,
|
|||
|
"value" : <dimension value>
|
|||
|
}
|
|||
|
"aggregator" : <aggregation>
|
|||
|
}
|
|||
|
```
|