diff --git a/docs/reference/sql/language/syntax/commands/select.asciidoc b/docs/reference/sql/language/syntax/commands/select.asciidoc index eddbdd89c8e..0bfc81ee99a 100644 --- a/docs/reference/sql/language/syntax/commands/select.asciidoc +++ b/docs/reference/sql/language/syntax/commands/select.asciidoc @@ -13,6 +13,7 @@ SELECT select_expr [, ...] [ HAVING condition] [ ORDER BY expression [ ASC | DESC ] [, ...] ] [ LIMIT [ count ] ] +[ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ] ---- .Description @@ -391,3 +392,54 @@ To return ---- include-tagged::{sql-specs}/docs/docs.csv-spec[limitBasic] ---- + +[[sql-syntax-pivot]] +==== PIVOT + +The `PIVOT` clause performs a cross tabulation on the results of the query: it aggregates the results and rotates rows into columns. The rotation is done by turning unique values from one column in the expression - the pivoting column - into multiple columns in the output. The column values are aggregations on the remaining columns specified in the expression. + +The clause can be broken down in three parts: the aggregation, the `FOR`- and the `IN`-subclause. + +The `aggregation_expr` subclause specifies an expression containing an <> to be applied on one of the source columns. Only one aggregation can be provided, currently. + +The `FOR`-subclause specifies the pivoting column: the distinct values of this column will become the candidate set of values to be rotated. + +The `IN`-subclause defines a filter: the intersection between the set provided here and the candidate set from the `FOR`-subclause will be rotated to become the headers of the columns appended to the end result. The filter can not be a subquery, one must provide here literal values, obtained in advance. + +The pivoting operation will perform an implicit <> on all source columns not specified in the `PIVOT` clause, along with the values filtered through the `IN`-clause. Consider the following statement: + +[source, sql] +---- +include-tagged::{sql-specs}/pivot.csv-spec[sumWithoutSubquery] +---- + +The query execution could logically be broken down in the following steps: + +. a <> on the column in the `FOR`-clause: `languages`; +. the resulting values are filtered through the set provided in the `IN`-clause; +. the now filtered column is pivoted to form the headers of the two additional columns appended to the result: `1` and `2`; +. a <> on all columns of the source table `test_emp`, except `salary` (part of the aggregation subclause) and `languages` (part of the `FOR`-clause); +. the values in these appended columns are the `SUM` aggregations of `salary`, grouped by the respective language. + + +The table-value expression to cross-tabulate can also be the result of a subquery: + +[source, sql] +---- +include-tagged::{sql-specs}/pivot.csv-spec[averageWithOneValue] +---- + +The pivoted columns can be aliased (and quoting is required to accommodate white spaces), with or without a supporting `AS` token: + +[source, sql] +---- +include-tagged::{sql-specs}/pivot.csv-spec[averageWithTwoValuesAndAlias] +---- + +The resulting cross tabulation can further have the <> and <> clauses applied: + +[source, sql] +---- +include-tagged::{sql-specs}/pivot.csv-spec[averageWithOneValueAndOrder] +---- + diff --git a/docs/reference/sql/limitations.asciidoc b/docs/reference/sql/limitations.asciidoc index 5a97c656105..07bd3ec9142 100644 --- a/docs/reference/sql/limitations.asciidoc +++ b/docs/reference/sql/limitations.asciidoc @@ -176,6 +176,30 @@ from `docvalue_fields`. [[fields-from-docvalues]] === Retrieving from `docvalue_fields` -When the number of columns retrieveable from `docvalue_fields` is greater than the configured <> +When the number of columns retrievable from `docvalue_fields` is greater than the configured <> the query will fail with `IllegalArgumentException: Trying to retrieve too many docvalue_fields` error. Either the mentioned {es} -setting needs to be adjusted or fewer columns retrieveable from `docvalue_fields` need to be selected. +setting needs to be adjusted or fewer columns retrievable from `docvalue_fields` need to be selected. + +[float] +[[aggs-in-pivot]] +=== Aggregations in the <> clause + +The aggregation expression in <> will currently accept only one aggregation. It is thus not possible to obtain multiple aggregations for any one pivoted column. + +[float] +[[subquery-in-pivot]] +=== Using a subquery in <>'s `IN`-subclause + +The values that the <> query could pivot must be provided in the query as a list of literals; providing a subquery instead to build this list is not currently supported. For example, in this query: + +[source, sql] +------------------------------------------------------------- +SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (1, 2)) +------------------------------------------------------------- + +the `languages` of interest must be listed explicitly: `IN (1, 2)`. On the other hand, this example would **not work**: + +[source, sql] +------------------------------------------------------------- +SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (SELECT languages FROM test_emp WHERE languages <=2 GROUP BY languages)) +------------------------------------------------------------- diff --git a/x-pack/plugin/sql/qa/src/main/resources/pivot.csv-spec b/x-pack/plugin/sql/qa/src/main/resources/pivot.csv-spec index 7b23953299b..c280f58491f 100644 --- a/x-pack/plugin/sql/qa/src/main/resources/pivot.csv-spec +++ b/x-pack/plugin/sql/qa/src/main/resources/pivot.csv-spec @@ -2,6 +2,7 @@ averageWithOneValue schema::languages:bt|'F':d +// tag::averageWithOneValue SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary) FOR gender IN ('F')); languages | 'F' @@ -12,6 +13,8 @@ null |62140.666666666664 3 |53660.0 4 |49291.5 5 |46705.555555555555 + +// end::averageWithOneValue ; averageWithAliasAndOneValue @@ -58,6 +61,7 @@ null |48396.28571428572|62140.666666666664 averageWithTwoValuesAndAlias schema::languages:bt|XY:d|XX:d +// tag::averageWithTwoValuesAndAlias SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary) FOR gender IN ('M' AS "XY", 'F' "XX")); languages | XY | XX @@ -68,6 +72,8 @@ null |48396.28571428572|62140.666666666664 3 |51741.90909090909|53660.0 4 |47058.90909090909|49291.5 5 |39052.875 |46705.555555555555 + +// end::averageWithTwoValuesAndAlias ; averageWithThreeValuesIncludingNull @@ -129,6 +135,7 @@ null |48396.28571428572|62140.666666666664 averageWithOneValueAndOrder schema::languages:bt|'F':d +// tag::averageWithOneValueAndOrder SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary) FOR gender IN ('F')) ORDER BY languages DESC LIMIT 4; languages | 'F' ---------------+------------------ @@ -136,6 +143,8 @@ SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary 4 |49291.5 3 |53660.0 2 |50684.4 + +// end::averageWithOneValueAndOrder ; averageWithTwoValuesAndOrderDesc @@ -178,6 +187,7 @@ null |48396.28571428572|62140.666666666664 sumWithoutSubquery schema::birth_date:ts|emp_no:i|first_name:s|gender:s|hire_date:ts|last_name:s|1:i|2:i +// tag::sumWithoutSubquery SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (1, 2)) LIMIT 5; birth_date | emp_no | first_name | gender | hire_date | last_name | 1 | 2 @@ -187,6 +197,8 @@ null |10043 |Yishay |M |1990-10-20 null |10044 |Mingsen |F |1994-05-21 00:00:00.0|Casley |39728 |null 1952-04-19 00:00:00.0|10009 |Sumant |F |1985-02-18 00:00:00.0|Peac |66174 |null 1953-01-07 00:00:00.0|10067 |Claudi |M |1987-03-04 00:00:00.0|Stavenow |null |52044 + +// end::sumWithoutSubquery ; averageWithOneValueAndMath