druid/docs/querying/post-aggregations.md

10 KiB
Raw Blame History

id title
post-aggregations Post-aggregations

:::info Apache Druid supports two query languages: Druid SQL and native queries. This document describes the native language. For information about functions available in SQL, refer to the SQL documentation. :::

Post-aggregations are specifications of processing that should happen on aggregated values as they come out of Apache Druid. If you include a post aggregation as part of a query, make sure to include all aggregators the post-aggregator requires.

There are several post-aggregators available.

Arithmetic post-aggregator

The arithmetic post-aggregator applies the provided function to the given fields from left to right. The fields can be aggregators or other post aggregators.

Property Description Required
type Must be "arithmetic". Yes
name Output name of the post-aggregation Yes
fn Supported functions are +, -, *, /, pow and quotient Yes
fields List of post-aggregator specs which define inputs to the fn Yes
ordering If no ordering (or null) is specified, the default floating point ordering is used. numericFirst ordering always returns finite values first, followed by NaN, and infinite values last. No

Note:

  • / division always returns 0 if dividing by0, regardless of the numerator.
  • quotient division behaves like regular floating point division
  • Arithmetic post-aggregators always use floating point arithmetic.

Example:

{
  "type"  : "arithmetic",
  "name"  : "mult",
  "fn"    : "*",
  "fields": [
    {"type": "fieldAccess", "fieldName":  "someAgg"},
    {"type": "fieldAccess", "fieldName":  "someOtherAgg"}
  ]
}

Field accessor post-aggregators

These post-aggregators return the value produced by the specified dimension or aggregator.

Property Description Required
type Must be "fieldAccess" or "finalizingFieldAccess". Use type "fieldAccess" to return the raw aggregation object, or use type "finalizingFieldAccess" to return a finalized value, such as an estimated cardinality. Yes
name Output name of the post-aggregation Yes if defined as a standalone post-aggregation, but may be omitted if used inline to some other post-aggregator in a fields list
fieldName The output name of the dimension or aggregator to reference Yes

Example:

{ "type" : "fieldAccess", "name": "someField", "fieldName" : "someAggregator" }

or

{ "type" : "finalizingFieldAccess", "name": "someFinalizedField", "fieldName" : "someAggregator" }

Constant post-aggregator

The constant post-aggregator always returns the specified value.

Property Description Required
type Must be "constant" Yes
name Output name of the post-aggregation Yes
value The constant value Yes

Example:

{ "type"  : "constant", "name"  : "someConstant", "value" : 1234 }

Expression post-aggregator

The expression post-aggregator is defined using a Druid expression.

Property Description Required
type Must be "expression" Yes
name Output name of the post-aggregation Yes
expression Native Druid expression to compute, may refer to any dimension or aggregator output names Yes
ordering If no ordering (or null) is specified, the "natural" ordering is used. numericFirst ordering always returns finite values first, followed by NaN, and infinite values last. If the expression produces array or complex types, specify ordering as null and use outputType instead to use the correct type native ordering. No
outputType Output type is optional, and can be any native Druid type: LONG, FLOAT, DOUBLE, STRING, ARRAY types (e.g. ARRAY<LONG>), or COMPLEX types (e.g. COMPLEX<json>). If not specified, the output type will be inferred from the expression. If specified and ordering is null, the type native ordering will be used for sorting values. If the expression produces array or complex types, this value must be non-null to ensure the correct ordering is used. If outputType does not match the actual output type of the expression, the value will be attempted to coerced to the specified type, possibly failing if coercion is not possible. No

Example:

{
  "type": "expression",
  "name": "someExpression",
  "expression": "someAgg + someOtherAgg",
  "ordering": null,
  "outputType": "LONG" 
}

Greatest / Least post-aggregators

doubleGreatest and longGreatest computes the maximum of all fields and Double.NEGATIVE_INFINITY. doubleLeast and longLeast computes the minimum of all fields and Double.POSITIVE_INFINITY.

Property Description Required
type Must be "doubleGreatest", "doubleLeast", "longGreatest", or "longLeast". Yes
name Output name of the post-aggregation Yes
fields List of post-aggregator specs which define inputs to the greatest or least function Yes

The difference between the doubleMax aggregator and the doubleGreatest post-aggregator is that doubleMax returns the highest value of all rows for one specific column while doubleGreatest returns the highest value of multiple columns in one row. These are similar to the SQL MAX and GREATEST functions.

Example:

{
  "type"  : "doubleGreatest",
  "name"  : "theGreatest",
  "fields": [
   { "type": "fieldAccess", "fieldName": "someAgg" },
   { "type": "fieldAccess", "fieldName": "someOtherAgg" }
  ]
}

JavaScript post-aggregator

Applies the provided JavaScript function to the given fields. Fields are passed as arguments to the JavaScript function in the given order.

Property Description Required
type Must be "javascript" Yes
name Output name of the post-aggregation Yes
fieldNames List of input dimension or aggregator output names Yes
function String javascript function which accepts fieldNames as arguments Yes

Example:

{
  "type": "javascript",
  "name": "someJavascript",
  "fieldNames" : ["someAgg", "someOtherAgg"],
  "function": "function(someAgg, someOtherAgg) { return 100 * Math.abs(someAgg) / someOtherAgg;"
}

:::info JavaScript-based functionality is disabled by default. Please refer to the Druid JavaScript programming guide for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it. :::

HyperUnique Cardinality post-aggregator

The hyperUniqueCardinality post aggregator is used to wrap a hyperUnique object such that it can be used in post aggregations.

Property Description Required
type Must be "hyperUniqueCardinality" Yes
name Output name of the post-aggregation Yes
fieldName The output name of a hyperUnique aggregator Yes
{
  "type"  : "hyperUniqueCardinality",
  "name": "someCardinality",
  "fieldName"  : "someHyperunique"
}

It can be used in a sample calculation as so:

{
  ...
  "aggregations" : [{
    {"type" : "count", "name" : "rows"},
    {"type" : "hyperUnique", "name" : "unique_users", "fieldName" : "uniques"}
  }],
  "postAggregations" : [{
    "type"   : "arithmetic",
    "name"   : "average_users_per_row",
    "fn"     : "/",
    "fields" : [
      { "type" : "hyperUniqueCardinality", "fieldName" : "unique_users" },
      { "type" : "fieldAccess", "name" : "rows", "fieldName" : "rows" }
    ]
  }]
  ...

This post-aggregator will inherit the rounding behavior of the aggregator it references. Note that this inheritance is only effective if you directly reference an aggregator. Going through another post-aggregator, for example, will cause the user-specified rounding behavior to get lost and default to "no rounding".

Example Usage

In this example, lets calculate a simple percentage using post aggregators. Lets imagine our data set has a metric called "total".

The format of the query JSON is as follows:

{
  ...
  "aggregations" : [
    { "type" : "count", "name" : "rows" },
    { "type" : "doubleSum", "name" : "tot", "fieldName" : "total" }
  ],
  "postAggregations" : [{
    "type"   : "arithmetic",
    "name"   : "average",
    "fn"     : "/",
    "fields" : [
           { "type" : "fieldAccess", "name" : "tot", "fieldName" : "tot" },
           { "type" : "fieldAccess", "name" : "rows", "fieldName" : "rows" }
         ]
  }]
  ...
}
{
  ...
  "aggregations" : [
    { "type" : "doubleSum", "name" : "tot", "fieldName" : "total" },
    { "type" : "doubleSum", "name" : "part", "fieldName" : "part" }
  ],
  "postAggregations" : [{
    "type"   : "arithmetic",
    "name"   : "part_percentage",
    "fn"     : "*",
    "fields" : [
       { "type"   : "arithmetic",
         "name"   : "ratio",
         "fn"     : "/",
         "fields" : [
           { "type" : "fieldAccess", "name" : "part", "fieldName" : "part" },
           { "type" : "fieldAccess", "name" : "tot", "fieldName" : "tot" }
         ]
       },
       { "type" : "constant", "name": "const", "value" : 100 }
    ]
  }]
  ...
}

The same could be computed using an expression post-aggregator:

{
  ...
  "aggregations" : [
    { "type" : "doubleSum", "name" : "tot", "fieldName" : "total" },
    { "type" : "doubleSum", "name" : "part", "fieldName" : "part" }
  ],
  "postAggregations" : [{
    "type"       : "expression",
    "name"       : "part_percentage",
    "expression" : "100 * (part / tot)"
  }]
  ...
}