OpenSearch/docs/reference/sql/functions/grouping.asciidoc

[role="xpack"]
[testenv="basic"]
[[sql-functions-grouping]]
=== Grouping Functions

Functions for creating special __grouping__s (also known as _bucketing_); as such these need to be used
as part of the <<sql-syntax-group-by, grouping>>.

[[sql-functions-grouping-histogram]]
==== `HISTOGRAM`

.Synopsis:
[source, sql]
----
HISTOGRAM(numeric_exp<1>, numeric_interval<2>)
HISTOGRAM(date_exp<3>, date_time_interval<4>)
----

*Input*:

<1> numeric expression (typically a field)
<2> numeric interval
<3> date/time expression (typically a field)
<4> date/time <<sql-functions-datetime-interval, interval>>

*Output*: non-empty buckets or groups of the given expression divided according to the given interval

.Description

The histogram function takes all matching values and divides them into buckets with fixed size matching the given interval, using (roughly) the following formula:

[source, sql]
----
bucket_key = Math.floor(value / interval) * interval
----

NOTE:: The histogram in SQL does *NOT* return empty buckets for missing intervals as the traditional <<search-aggregations-bucket-histogram-aggregation, histogram>> and  <<search-aggregations-bucket-datehistogram-aggregation, date histogram>>. Such behavior does not fit conceptually in SQL which treats all missing values as `NULL`; as such the histogram places all missing values in the `NULL` group.

`Histogram` can be applied on either numeric fields:


["source","sql",subs="attributes,callouts,macros"]
----
include-tagged::{sql-specs}/docs/docs.csv-spec[histogramNumeric]
----

or date/time fields:

["source","sql",subs="attributes,callouts,macros"]
----
include-tagged::{sql-specs}/docs/docs.csv-spec[histogramDateTime]
----

Expressions inside the histogram are also supported as long as the
return type is numeric:

["source","sql",subs="attributes,callouts,macros"]
----
include-tagged::{sql-specs}/docs/docs.csv-spec[histogramNumericExpression]
----

Do note that histograms (and grouping functions in general) allow custom expressions but cannot have any functions applied to them in the `GROUP BY`. In other words, the following statement is *NOT* allowed:

["source","sql",subs="attributes,callouts,macros"]
----
include-tagged::{sql-specs}/docs/docs.csv-spec[expressionOnHistogramNotAllowed]
----

as it requires two groupings (one for histogram followed by a second for applying the function on top of the histogram groups).

Instead one can rewrite the query to move the expression on the histogram _inside_ of it:

["source","sql",subs="attributes,callouts,macros"]
----
include-tagged::{sql-specs}/docs/docs.csv-spec[histogramDateTimeExpression]
----

[IMPORTANT]
When the histogram in SQL is applied on **DATE** type instead of **DATETIME**, the interval specified is truncated to
the multiple of a day. E.g.: for `HISTOGRAM(CAST(birth_date AS DATE), INTERVAL '2 3:04' DAY TO MINUTE)` the interval
actually used will be `INTERVAL '2' DAY`. If the interval specified is less than 1 day, e.g.:
`HISTOGRAM(CAST(birth_date AS DATE), INTERVAL '20' HOUR)` then the interval used will be `INTERVAL '1' DAY`.

[IMPORTANT]
Histogram in SQL cannot be applied applied on **TIME** type.
E.g.: `HISTOGRAM(CAST(birth_date AS TIME), INTERVAL '10' MINUTES)` is currently not supported.
SQL: Introduce HISTOGRAM grouping function (#36510) Introduce Histogram grouping function for bucketing/grouping data based on a given range. Both date and numeric histograms are supported using the appropriate range declaration (numbers vs intervals). SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h In addition add multiply operator for Intervals Add docs for intervals and histogram Fix #36509 2018-12-14 18:20:37 +02:00			`[role="xpack"]`
			`[testenv="basic"]`
			`[[sql-functions-grouping]]`
			`=== Grouping Functions`

			`Functions for creating special __grouping__s (also known as _bucketing_); as such these need to be used`
			`as part of the <<sql-syntax-group-by, grouping>>.`

			`[[sql-functions-grouping-histogram]]`
			==== `HISTOGRAM`

SQL: documentation improvements and updates (#36918) * Added Limitations page * Made the aggregations page follow the common template for functions * Modified all tables to have the first row's cells content centered * Polishing in other various sections 2018-12-21 23:25:54 +02:00			`.Synopsis:`
SQL: Introduce HISTOGRAM grouping function (#36510) Introduce Histogram grouping function for bucketing/grouping data based on a given range. Both date and numeric histograms are supported using the appropriate range declaration (numbers vs intervals). SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h In addition add multiply operator for Intervals Add docs for intervals and histogram Fix #36509 2018-12-14 18:20:37 +02:00			`[source, sql]`
			`----`
SQL: documentation improvements and updates (#36918) * Added Limitations page * Made the aggregations page follow the common template for functions * Modified all tables to have the first row's cells content centered * Polishing in other various sections 2018-12-21 23:25:54 +02:00			`HISTOGRAM(numeric_exp<1>, numeric_interval<2>)`
			`HISTOGRAM(date_exp<3>, date_time_interval<4>)`
SQL: Introduce HISTOGRAM grouping function (#36510) Introduce Histogram grouping function for bucketing/grouping data based on a given range. Both date and numeric histograms are supported using the appropriate range declaration (numbers vs intervals). SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h In addition add multiply operator for Intervals Add docs for intervals and histogram Fix #36509 2018-12-14 18:20:37 +02:00			`----`

			`Input:`

			`<1> numeric expression (typically a field)`
			`<2> numeric interval`
			`<3> date/time expression (typically a field)`
			`<4> date/time <<sql-functions-datetime-interval, interval>>`

			`Output: non-empty buckets or groups of the given expression divided according to the given interval`

			`.Description`

			`The histogram function takes all matching values and divides them into buckets with fixed size matching the given interval, using (roughly) the following formula:`

			`[source, sql]`
			`----`
			`bucket_key = Math.floor(value / interval) * interval`
			`----`

SQL: Fix bug regarding histograms usage in scripting (#36866) Allow scripts to correctly reference grouping functions Fix bug in translation of date/time functions mixed with histograms. Enhance Verifier to prevent histograms being nested inside other functions inside GROUP BY (as it implies double grouping) Extend Histogram docs 2018-12-20 23:11:56 +02:00			NOTE:: The histogram in SQL does NOT return empty buckets for missing intervals as the traditional <<search-aggregations-bucket-histogram-aggregation, histogram>> and <<search-aggregations-bucket-datehistogram-aggregation, date histogram>>. Such behavior does not fit conceptually in SQL which treats all missing values as `NULL`; as such the histogram places all missing values in the `NULL` group.

SQL: Introduce HISTOGRAM grouping function (#36510) Introduce Histogram grouping function for bucketing/grouping data based on a given range. Both date and numeric histograms are supported using the appropriate range declaration (numbers vs intervals). SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h In addition add multiply operator for Intervals Add docs for intervals and histogram Fix #36509 2018-12-14 18:20:37 +02:00			`Histogram` can be applied on either numeric fields:


			`["source","sql",subs="attributes,callouts,macros"]`
			`----`
SQL: Spec tests now use classpath discovery (#40388) To avoid having to specify each spec by hand (which can miss specs to be added), the test infrastructure now performs classpath discovery so that each spec added, is automatically considered. Relates #40358 (cherry picked from commit d0f60b4425c731509aa8ca765d55f563f866ef90) 2019-03-25 15:22:59 +02:00			`include-tagged::{sql-specs}/docs/docs.csv-spec[histogramNumeric]`
SQL: Introduce HISTOGRAM grouping function (#36510) Introduce Histogram grouping function for bucketing/grouping data based on a given range. Both date and numeric histograms are supported using the appropriate range declaration (numbers vs intervals). SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h In addition add multiply operator for Intervals Add docs for intervals and histogram Fix #36509 2018-12-14 18:20:37 +02:00			`----`

			`or date/time fields:`

			`["source","sql",subs="attributes,callouts,macros"]`
			`----`
SQL: Spec tests now use classpath discovery (#40388) To avoid having to specify each spec by hand (which can miss specs to be added), the test infrastructure now performs classpath discovery so that each spec added, is automatically considered. Relates #40358 (cherry picked from commit d0f60b4425c731509aa8ca765d55f563f866ef90) 2019-03-25 15:22:59 +02:00			`include-tagged::{sql-specs}/docs/docs.csv-spec[histogramDateTime]`
SQL: Introduce HISTOGRAM grouping function (#36510) Introduce Histogram grouping function for bucketing/grouping data based on a given range. Both date and numeric histograms are supported using the appropriate range declaration (numbers vs intervals). SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h In addition add multiply operator for Intervals Add docs for intervals and histogram Fix #36509 2018-12-14 18:20:37 +02:00			`----`

SQL: Fix bug regarding histograms usage in scripting (#36866) Allow scripts to correctly reference grouping functions Fix bug in translation of date/time functions mixed with histograms. Enhance Verifier to prevent histograms being nested inside other functions inside GROUP BY (as it implies double grouping) Extend Histogram docs 2018-12-20 23:11:56 +02:00			`Expressions inside the histogram are also supported as long as the`
			`return type is numeric:`

			`["source","sql",subs="attributes,callouts,macros"]`
			`----`
SQL: Spec tests now use classpath discovery (#40388) To avoid having to specify each spec by hand (which can miss specs to be added), the test infrastructure now performs classpath discovery so that each spec added, is automatically considered. Relates #40358 (cherry picked from commit d0f60b4425c731509aa8ca765d55f563f866ef90) 2019-03-25 15:22:59 +02:00			`include-tagged::{sql-specs}/docs/docs.csv-spec[histogramNumericExpression]`
SQL: Fix bug regarding histograms usage in scripting (#36866) Allow scripts to correctly reference grouping functions Fix bug in translation of date/time functions mixed with histograms. Enhance Verifier to prevent histograms being nested inside other functions inside GROUP BY (as it implies double grouping) Extend Histogram docs 2018-12-20 23:11:56 +02:00			`----`

			Do note that histograms (and grouping functions in general) allow custom expressions but cannot have any functions applied to them in the `GROUP BY`. In other words, the following statement is NOT allowed:
SQL: Introduce HISTOGRAM grouping function (#36510) Introduce Histogram grouping function for bucketing/grouping data based on a given range. Both date and numeric histograms are supported using the appropriate range declaration (numbers vs intervals). SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h In addition add multiply operator for Intervals Add docs for intervals and histogram Fix #36509 2018-12-14 18:20:37 +02:00
SQL: Fix bug regarding histograms usage in scripting (#36866) Allow scripts to correctly reference grouping functions Fix bug in translation of date/time functions mixed with histograms. Enhance Verifier to prevent histograms being nested inside other functions inside GROUP BY (as it implies double grouping) Extend Histogram docs 2018-12-20 23:11:56 +02:00			`["source","sql",subs="attributes,callouts,macros"]`
			`----`
SQL: Spec tests now use classpath discovery (#40388) To avoid having to specify each spec by hand (which can miss specs to be added), the test infrastructure now performs classpath discovery so that each spec added, is automatically considered. Relates #40358 (cherry picked from commit d0f60b4425c731509aa8ca765d55f563f866ef90) 2019-03-25 15:22:59 +02:00			`include-tagged::{sql-specs}/docs/docs.csv-spec[expressionOnHistogramNotAllowed]`
SQL: Fix bug regarding histograms usage in scripting (#36866) Allow scripts to correctly reference grouping functions Fix bug in translation of date/time functions mixed with histograms. Enhance Verifier to prevent histograms being nested inside other functions inside GROUP BY (as it implies double grouping) Extend Histogram docs 2018-12-20 23:11:56 +02:00			`----`

			`as it requires two groupings (one for histogram followed by a second for applying the function on top of the histogram groups).`

			`Instead one can rewrite the query to move the expression on the histogram _inside_ of it:`

			`["source","sql",subs="attributes,callouts,macros"]`
			`----`
SQL: Spec tests now use classpath discovery (#40388) To avoid having to specify each spec by hand (which can miss specs to be added), the test infrastructure now performs classpath discovery so that each spec added, is automatically considered. Relates #40358 (cherry picked from commit d0f60b4425c731509aa8ca765d55f563f866ef90) 2019-03-25 15:22:59 +02:00			`include-tagged::{sql-specs}/docs/docs.csv-spec[histogramDateTimeExpression]`
SQL: Fix bug regarding histograms usage in scripting (#36866) Allow scripts to correctly reference grouping functions Fix bug in translation of date/time functions mixed with histograms. Enhance Verifier to prevent histograms being nested inside other functions inside GROUP BY (as it implies double grouping) Extend Histogram docs 2018-12-20 23:11:56 +02:00			`----`
SQL: Introduce SQL DATE data type (#37693) * SQL: Introduce SQL DATE data type Support ANSI SQL's DATE type by introducing a runtime-only ES SQL date type. Closes: #37340 2019-01-24 13:41:58 +02:00
			`[IMPORTANT]`
			`When the histogram in SQL is applied on DATE type instead of DATETIME, the interval specified is truncated to`
			the multiple of a day. E.g.: for `HISTOGRAM(CAST(birth_date AS DATE), INTERVAL '2 3:04' DAY TO MINUTE)` the interval
			actually used will be `INTERVAL '2' DAY`. If the interval specified is less than 1 day, e.g.:
			`HISTOGRAM(CAST(birth_date AS DATE), INTERVAL '20' HOUR)` then the interval used will be `INTERVAL '1' DAY`.
SQL: Introduce SQL TIME data type (#39802) Support ANSI SQL's TIME type by introductin a runtime-only ES SQL time type. Closes: #38174 (cherry picked from commit 046ccd4cf0a251b2a3ddff6b072ab539a6711900) 2019-04-01 23:30:39 +02:00
			`[IMPORTANT]`
			`Histogram in SQL cannot be applied applied on TIME type.`
			E.g.: `HISTOGRAM(CAST(birth_date AS TIME), INTERVAL '10' MINUTES)` is currently not supported.