Costin Leau 61f49af497 SQL: Spec tests now use classpath discovery (#40388)
To avoid having to specify each spec by hand (which can miss specs to be
added), the test infrastructure now performs classpath discovery so that
each spec added, is automatically considered.

Relates #40358

(cherry picked from commit d0f60b4425c731509aa8ca765d55f563f866ef90)
2019-03-25 15:22:52 +02:00

581 lines
15 KiB
Plaintext

[role="xpack"]
[testenv="basic"]
[[sql-functions-aggs]]
=== Aggregate Functions
Functions for computing a _single_ result from a set of input values.
{es-sql} supports aggregate functions only alongside <<sql-syntax-group-by,grouping>> (implicit or explicit).
==== General Purpose
[[sql-functions-aggs-avg]]
===== `AVG`
.Synopsis:
[source, sql]
--------------------------------------------------
AVG(numeric_field<1>)
--------------------------------------------------
*Input*:
<1> numeric field
*Output*: `double` numeric value
.Description:
Returns the https://en.wikipedia.org/wiki/Arithmetic_mean[Average] (arithmetic mean) of input values.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggAvg]
--------------------------------------------------
[[sql-functions-aggs-count]]
===== `COUNT`
.Synopsis:
[source, sql]
--------------------------------------------------
COUNT(expression<1>)
--------------------------------------------------
*Input*:
<1> a field name, wildcard (`*`) or any numeric value
*Output*: numeric value
.Description:
Returns the total number (count) of input values.
In case of `COUNT(*)` or `COUNT(<literal>)`, _all_ values are considered (including `null` or missing ones).
In case of `COUNT(<field_name>)` `null` values are not considered.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountStar]
--------------------------------------------------
[[sql-functions-aggs-count-all]]
===== `COUNT(ALL)`
.Synopsis:
[source, sql]
--------------------------------------------------
COUNT(ALL field_name<1>)
--------------------------------------------------
*Input*:
<1> a field name
*Output*: numeric value
.Description:
Returns the total number (count) of all _non-null_ input values. `COUNT(<field_name>)` and `COUNT(ALL <field_name>)` are equivalent.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountAll]
--------------------------------------------------
[[sql-functions-aggs-count-distinct]]
===== `COUNT(DISTINCT)`
.Synopsis:
[source, sql]
--------------------------------------------------
COUNT(DISTINCT field_name<1>)
--------------------------------------------------
*Input*:
<1> a field name
*Output*: numeric value
.Description:
Returns the total number of _distinct non-null_ values in input values.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountDistinct]
--------------------------------------------------
[[sql-functions-aggs-first]]
===== `FIRST/FIRST_VALUE`
.Synopsis:
[source, sql]
----------------------------------------------
FIRST(field_name<1>[, ordering_field_name]<2>)
----------------------------------------------
*Input*:
<1> target field for the aggregation
<2> optional field used for ordering
*Output*: same type as the input
.Description:
Returns the first **non-NULL** value (if such exists) of the `field_name` input column sorted by
the `ordering_field_name` column. If `ordering_field_name` is not provided, only the `field_name`
column is used for the sorting. E.g.:
[cols="<,<"]
|===
s| a | b
| 100 | 1
| 200 | 1
| 1 | 2
| 2 | 2
| 10 | null
| 20 | null
| null | null
|===
[source, sql]
----------------------
SELECT FIRST(a) FROM t
----------------------
will result in:
[cols="<"]
|===
s| FIRST(a)
| 1
|===
and
[source, sql]
-------------------------
SELECT FIRST(a, b) FROM t
-------------------------
will result in:
[cols="<"]
|===
s| FIRST(a, b)
| 100
|===
["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[firstWithOneArg]
-----------------------------------------------------------
["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[firstWithOneArgAndGroupBy]
--------------------------------------------------------------------
["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[firstWithTwoArgs]
-----------------------------------------------------------
["source","sql",subs="attributes,macros"]
---------------------------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[firstWithTwoArgsAndGroupBy]
---------------------------------------------------------------------
`FIRST_VALUE` is a name alias and can be used instead of `FIRST`, e.g.:
["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[firstValueWithTwoArgsAndGroupBy]
--------------------------------------------------------------------------
[NOTE]
`FIRST` cannot be used in a HAVING clause.
[NOTE]
`FIRST` cannot be used with columns of type <<text, `text`>> unless
the field is also <<before-enabling-fielddata,saved as a keyword>>.
[[sql-functions-aggs-last]]
===== `LAST/LAST_VALUE`
.Synopsis:
[source, sql]
--------------------------------------------------
LAST(field_name<1>[, ordering_field_name]<2>)
--------------------------------------------------
*Input*:
<1> target field for the aggregation
<2> optional field used for ordering
*Output*: same type as the input
.Description:
It's the inverse of <<sql-functions-aggs-first>>. Returns the last **non-NULL** value (if such exists) of the
`field_name`input column sorted descending by the `ordering_field_name` column. If `ordering_field_name` is not
provided, only the `field_name` column is used for the sorting. E.g.:
[cols="<,<"]
|===
s| a | b
| 10 | 1
| 20 | 1
| 1 | 2
| 2 | 2
| 100 | null
| 200 | null
| null | null
|===
[source, sql]
------------------------
SELECT LAST(a) FROM t
------------------------
will result in:
[cols="<"]
|===
s| LAST(a)
| 200
|===
and
[source, sql]
------------------------
SELECT LAST(a, b) FROM t
------------------------
will result in:
[cols="<"]
|===
s| LAST(a, b)
| 2
|===
["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[lastWithOneArg]
-----------------------------------------------------------
["source","sql",subs="attributes,macros"]
-------------------------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[lastWithOneArgAndGroupBy]
-------------------------------------------------------------------
["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[lastWithTwoArgs]
-----------------------------------------------------------
["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[lastWithTwoArgsAndGroupBy]
--------------------------------------------------------------------
`LAST_VALUE` is a name alias and can be used instead of `LAST`, e.g.:
["source","sql",subs="attributes,macros"]
-------------------------------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[lastValueWithTwoArgsAndGroupBy]
-------------------------------------------------------------------------
[NOTE]
`LAST` cannot be used in `HAVING` clause.
[NOTE]
`LAST` cannot be used with columns of type <<text, `text`>> unless
the field is also <<before-enabling-fielddata,`saved as a keyword`>>.
[[sql-functions-aggs-max]]
===== `MAX`
.Synopsis:
[source, sql]
--------------------------------------------------
MAX(field_name<1>)
--------------------------------------------------
*Input*:
<1> a numeric field
*Output*: same type as the input
.Description:
Returns the maximum value across input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggMax]
--------------------------------------------------
[NOTE]
`MAX` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
<<sql-functions-aggs-last>> and therefore, it cannot be used in `HAVING` clause.
[[sql-functions-aggs-min]]
===== `MIN`
.Synopsis:
[source, sql]
--------------------------------------------------
MIN(field_name<1>)
--------------------------------------------------
*Input*:
<1> a numeric field
*Output*: same type as the input
.Description:
Returns the minimum value across input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggMin]
--------------------------------------------------
[NOTE]
`MIN` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
<<sql-functions-aggs-first>> and therefore, it cannot be used in `HAVING` clause.
[[sql-functions-aggs-sum]]
===== `SUM`
.Synopsis:
[source, sql]
--------------------------------------------------
SUM(field_name<1>)
--------------------------------------------------
*Input*:
<1> a numeric field
*Output*: `bigint` for integer input, `double` for floating points
.Description:
Returns the sum of input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggSum]
--------------------------------------------------
==== Statistics
[[sql-functions-aggs-kurtosis]]
===== `KURTOSIS`
.Synopsis:
[source, sql]
--------------------------------------------------
KURTOSIS(field_name<1>)
--------------------------------------------------
*Input*:
<1> a numeric field
*Output*: `double` numeric value
.Description:
https://en.wikipedia.org/wiki/Kurtosis[Quantify] the shape of the distribution of input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggKurtosis]
--------------------------------------------------
[[sql-functions-aggs-mad]]
===== `MAD`
.Synopsis:
[source, sql]
--------------------------------------------------
MAD(field_name<1>)
--------------------------------------------------
*Input*:
<1> a numeric field
*Output*: `double` numeric value
.Description:
https://en.wikipedia.org/wiki/Median_absolute_deviation[Measure] the variability of the input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggMad]
--------------------------------------------------
[[sql-functions-aggs-percentile]]
===== `PERCENTILE`
.Synopsis:
[source, sql]
--------------------------------------------------
PERCENTILE(field_name<1>, numeric_exp<2>)
--------------------------------------------------
*Input*:
<1> a numeric field
<2> a numeric expression (must be a constant and not based on a field)
*Output*: `double` numeric value
.Description:
Returns the nth https://en.wikipedia.org/wiki/Percentile[percentile] (represented by `numeric_exp` parameter)
of input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggPercentile]
--------------------------------------------------
[[sql-functions-aggs-percentile-rank]]
===== `PERCENTILE_RANK`
.Synopsis:
[source, sql]
--------------------------------------------------
PERCENTILE_RANK(field_name<1>, numeric_exp<2>)
--------------------------------------------------
*Input*:
<1> a numeric field
<2> a numeric expression (must be a constant and not based on a field)
*Output*: `double` numeric value
.Description:
Returns the nth https://en.wikipedia.org/wiki/Percentile_rank[percentile rank] (represented by `numeric_exp` parameter)
of input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggPercentileRank]
--------------------------------------------------
[[sql-functions-aggs-skewness]]
===== `SKEWNESS`
.Synopsis:
[source, sql]
--------------------------------------------------
SKEWNESS(field_name<1>)
--------------------------------------------------
*Input*:
<1> a numeric field
*Output*: `double` numeric value
.Description:
https://en.wikipedia.org/wiki/Skewness[Quantify] the asymmetric distribution of input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggSkewness]
--------------------------------------------------
[[sql-functions-aggs-stddev-pop]]
===== `STDDEV_POP`
.Synopsis:
[source, sql]
--------------------------------------------------
STDDEV_POP(field_name<1>)
--------------------------------------------------
*Input*:
<1> a numeric field
*Output*: `double` numeric value
.Description:
Returns the https://en.wikipedia.org/wiki/Standard_deviations[population standard deviation] of input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggStddevPop]
--------------------------------------------------
[[sql-functions-aggs-sum-squares]]
===== `SUM_OF_SQUARES`
.Synopsis:
[source, sql]
--------------------------------------------------
SUM_OF_SQUARES(field_name<1>)
--------------------------------------------------
*Input*:
<1> a numeric field
*Output*: `double` numeric value
.Description:
Returns the https://en.wikipedia.org/wiki/Total_sum_of_squares[sum of squares] of input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggSumOfSquares]
--------------------------------------------------
[[sql-functions-aggs-var-pop]]
===== `VAR_POP`
.Synopsis:
[source, sql]
--------------------------------------------------
VAR_POP(field_name<1>)
--------------------------------------------------
*Input*:
<1> a numeric field
*Output*: `double` numeric value
.Description:
Returns the https://en.wikipedia.org/wiki/Variance[population variance] of input values in the field `field_name`.
["source","sql",subs="attributes,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs/docs.csv-spec[aggVarPop]
--------------------------------------------------