[role="xpack"] [testenv="basic"] [[sql-functions-aggs]] === Aggregate Functions Functions for computing a _single_ result from a set of input values. {es-sql} supports aggregate functions only alongside <> (implicit or explicit). [[sql-functions-aggs-general]] [float] === General Purpose [[sql-functions-aggs-avg]] ==== `AVG` .Synopsis: [source, sql] -------------------------------------------------- AVG(numeric_field) <1> -------------------------------------------------- *Input*: <1> numeric field *Output*: `double` numeric value *Description*: Returns the https://en.wikipedia.org/wiki/Arithmetic_mean[Average] (arithmetic mean) of input values. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggAvg] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggAvgScalars] -------------------------------------------------- [[sql-functions-aggs-count]] ==== `COUNT` .Synopsis: [source, sql] -------------------------------------------------- COUNT(expression) <1> -------------------------------------------------- *Input*: <1> a field name, wildcard (`*`) or any numeric value *Output*: numeric value *Description*: Returns the total number (count) of input values. In case of `COUNT(*)` or `COUNT()`, _all_ values are considered (including `null` or missing ones). In case of `COUNT()` `null` values are not considered. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountStar] -------------------------------------------------- [[sql-functions-aggs-count-all]] ==== `COUNT(ALL)` .Synopsis: [source, sql] -------------------------------------------------- COUNT(ALL field_name) <1> -------------------------------------------------- *Input*: <1> a field name *Output*: numeric value *Description*: Returns the total number (count) of all _non-null_ input values. `COUNT()` and `COUNT(ALL )` are equivalent. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountAll] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountAllScalars] -------------------------------------------------- [[sql-functions-aggs-count-distinct]] ==== `COUNT(DISTINCT)` .Synopsis: [source, sql] -------------------------------------------------- COUNT(DISTINCT field_name) <1> -------------------------------------------------- *Input*: <1> a field name *Output*: numeric value *Description*: Returns the total number of _distinct non-null_ values in input values. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountDistinct] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountDistinctScalars] -------------------------------------------------- [[sql-functions-aggs-first]] ==== `FIRST/FIRST_VALUE` .Synopsis: [source, sql] ---------------------------------------------- FIRST( field_name <1> [, ordering_field_name]) <2> ---------------------------------------------- *Input*: <1> target field for the aggregation <2> optional field used for ordering *Output*: same type as the input *Description*: Returns the first **non-NULL** value (if such exists) of the `field_name` input column sorted by the `ordering_field_name` column. If `ordering_field_name` is not provided, only the `field_name` column is used for the sorting. E.g.: [cols="<,<"] |=== s| a | b | 100 | 1 | 200 | 1 | 1 | 2 | 2 | 2 | 10 | null | 20 | null | null | null |=== [source, sql] ---------------------- SELECT FIRST(a) FROM t ---------------------- will result in: [cols="<"] |=== s| FIRST(a) | 1 |=== and [source, sql] ------------------------- SELECT FIRST(a, b) FROM t ------------------------- will result in: [cols="<"] |=== s| FIRST(a, b) | 100 |=== ["source","sql",subs="attributes,macros"] ----------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[firstWithOneArg] ----------------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[firstWithOneArgAndGroupBy] -------------------------------------------------------------------- ["source","sql",subs="attributes,macros"] ----------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[firstWithTwoArgs] ----------------------------------------------------------- ["source","sql",subs="attributes,macros"] --------------------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[firstWithTwoArgsAndGroupBy] --------------------------------------------------------------------- `FIRST_VALUE` is a name alias and can be used instead of `FIRST`, e.g.: ["source","sql",subs="attributes,macros"] -------------------------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[firstValueWithTwoArgsAndGroupBy] -------------------------------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[firstValueWithTwoArgsAndGroupByScalars] -------------------------------------------------------------------------- [NOTE] `FIRST` cannot be used in a HAVING clause. [NOTE] `FIRST` cannot be used with columns of type <> unless the field is also <>. [[sql-functions-aggs-last]] ==== `LAST/LAST_VALUE` .Synopsis: [source, sql] -------------------------------------------------- LAST( field_name <1> [, ordering_field_name]) <2> -------------------------------------------------- *Input*: <1> target field for the aggregation <2> optional field used for ordering *Output*: same type as the input *Description*: It's the inverse of <>. Returns the last **non-NULL** value (if such exists) of the `field_name` input column sorted descending by the `ordering_field_name` column. If `ordering_field_name` is not provided, only the `field_name` column is used for the sorting. E.g.: [cols="<,<"] |=== s| a | b | 10 | 1 | 20 | 1 | 1 | 2 | 2 | 2 | 100 | null | 200 | null | null | null |=== [source, sql] ------------------------ SELECT LAST(a) FROM t ------------------------ will result in: [cols="<"] |=== s| LAST(a) | 200 |=== and [source, sql] ------------------------ SELECT LAST(a, b) FROM t ------------------------ will result in: [cols="<"] |=== s| LAST(a, b) | 2 |=== ["source","sql",subs="attributes,macros"] ----------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[lastWithOneArg] ----------------------------------------------------------- ["source","sql",subs="attributes,macros"] ------------------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[lastWithOneArgAndGroupBy] ------------------------------------------------------------------- ["source","sql",subs="attributes,macros"] ----------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[lastWithTwoArgs] ----------------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[lastWithTwoArgsAndGroupBy] -------------------------------------------------------------------- `LAST_VALUE` is a name alias and can be used instead of `LAST`, e.g.: ["source","sql",subs="attributes,macros"] ------------------------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[lastValueWithTwoArgsAndGroupBy] ------------------------------------------------------------------------- ["source","sql",subs="attributes,macros"] ------------------------------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[lastValueWithTwoArgsAndGroupByScalars] ------------------------------------------------------------------------- [NOTE] `LAST` cannot be used in `HAVING` clause. [NOTE] `LAST` cannot be used with columns of type <> unless the field is also <>. [[sql-functions-aggs-max]] ==== `MAX` .Synopsis: [source, sql] -------------------------------------------------- MAX(field_name) <1> -------------------------------------------------- *Input*: <1> a numeric field *Output*: same type as the input *Description*: Returns the maximum value across input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggMax] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggMaxScalars] -------------------------------------------------- [NOTE] `MAX` on a field of type <> or <> is translated into <> and therefore, it cannot be used in `HAVING` clause. [[sql-functions-aggs-min]] ==== `MIN` .Synopsis: [source, sql] -------------------------------------------------- MIN(field_name) <1> -------------------------------------------------- *Input*: <1> a numeric field *Output*: same type as the input *Description*: Returns the minimum value across input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggMin] -------------------------------------------------- [NOTE] `MIN` on a field of type <> or <> is translated into <> and therefore, it cannot be used in `HAVING` clause. [[sql-functions-aggs-sum]] ==== `SUM` .Synopsis: [source, sql] -------------------------------------------------- SUM(field_name) <1> -------------------------------------------------- *Input*: <1> a numeric field *Output*: `bigint` for integer input, `double` for floating points *Description*: Returns the sum of input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggSum] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggSumScalars] -------------------------------------------------- [[sql-functions-aggs-statistics]] [float] === Statistics [[sql-functions-aggs-kurtosis]] ==== `KURTOSIS` .Synopsis: [source, sql] -------------------------------------------------- KURTOSIS(field_name) <1> -------------------------------------------------- *Input*: <1> a numeric field *Output*: `double` numeric value *Description*: https://en.wikipedia.org/wiki/Kurtosis[Quantify] the shape of the distribution of input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggKurtosis] -------------------------------------------------- [NOTE] ==== `KURTOSIS` cannot be used on top of scalar functions or operators but only directly on a field. So, for example, the following is not allowed and an error is returned: [source, sql] --------------------------------------- SELECT KURTOSIS(salary / 12.0), gender FROM emp GROUP BY gender --------------------------------------- ==== [[sql-functions-aggs-mad]] ==== `MAD` .Synopsis: [source, sql] -------------------------------------------------- MAD(field_name) <1> -------------------------------------------------- *Input*: <1> a numeric field *Output*: `double` numeric value *Description*: https://en.wikipedia.org/wiki/Median_absolute_deviation[Measure] the variability of the input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggMad] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggMadScalars] -------------------------------------------------- [[sql-functions-aggs-percentile]] ==== `PERCENTILE` .Synopsis: [source, sql] -------------------------------------------------- PERCENTILE( field_name, <1> numeric_exp) <2> -------------------------------------------------- *Input*: <1> a numeric field <2> a numeric expression (must be a constant and not based on a field) *Output*: `double` numeric value *Description*: Returns the nth https://en.wikipedia.org/wiki/Percentile[percentile] (represented by `numeric_exp` parameter) of input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggPercentile] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggPercentileScalars] -------------------------------------------------- [[sql-functions-aggs-percentile-rank]] ==== `PERCENTILE_RANK` .Synopsis: [source, sql] -------------------------------------------------- PERCENTILE_RANK( field_name, <1> numeric_exp) <2> -------------------------------------------------- *Input*: <1> a numeric field <2> a numeric expression (must be a constant and not based on a field) *Output*: `double` numeric value *Description*: Returns the nth https://en.wikipedia.org/wiki/Percentile_rank[percentile rank] (represented by `numeric_exp` parameter) of input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggPercentileRank] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggPercentileRankScalars] -------------------------------------------------- [[sql-functions-aggs-skewness]] ==== `SKEWNESS` .Synopsis: [source, sql] -------------------------------------------------- SKEWNESS(field_name) <1> -------------------------------------------------- *Input*: <1> a numeric field *Output*: `double` numeric value *Description*: https://en.wikipedia.org/wiki/Skewness[Quantify] the asymmetric distribution of input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggSkewness] -------------------------------------------------- [NOTE] ==== `SKEWNESS` cannot be used on top of scalar functions but only directly on a field. So, for example, the following is not allowed and an error is returned: [source, sql] --------------------------------------- SELECT SKEWNESS(ROUND(salary / 12.0, 2), gender FROM emp GROUP BY gender --------------------------------------- ==== [[sql-functions-aggs-stddev-pop]] ==== `STDDEV_POP` .Synopsis: [source, sql] -------------------------------------------------- STDDEV_POP(field_name) <1> -------------------------------------------------- *Input*: <1> a numeric field *Output*: `double` numeric value *Description*: Returns the https://en.wikipedia.org/wiki/Standard_deviations[population standard deviation] of input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggStddevPop] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggStddevPopScalars] -------------------------------------------------- [[sql-functions-aggs-sum-squares]] ==== `SUM_OF_SQUARES` .Synopsis: [source, sql] -------------------------------------------------- SUM_OF_SQUARES(field_name) <1> -------------------------------------------------- *Input*: <1> a numeric field *Output*: `double` numeric value *Description*: Returns the https://en.wikipedia.org/wiki/Total_sum_of_squares[sum of squares] of input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggSumOfSquares] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggSumOfSquaresScalars] -------------------------------------------------- [[sql-functions-aggs-var-pop]] ==== `VAR_POP` .Synopsis: [source, sql] -------------------------------------------------- VAR_POP(field_name) <1> -------------------------------------------------- *Input*: <1> a numeric field *Output*: `double` numeric value *Description*: Returns the https://en.wikipedia.org/wiki/Variance[population variance] of input values in the field `field_name`. ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggVarPop] -------------------------------------------------- ["source","sql",subs="attributes,macros"] -------------------------------------------------- include-tagged::{sql-specs}/docs/docs.csv-spec[aggVarPopScalars] --------------------------------------------------