SQL:Docs: add the PIVOT clause to SELECT section (#49129)

The PR adds the documentation on the PIVOT clause.

(cherry picked from commit a55b36065e6496c44b6e3191296931d477a8e5f5)
This commit is contained in:
Bogdan Pintea 2019-11-20 16:53:16 +01:00 committed by Bogdan Pintea
parent a27e0fe10d
commit 8c2ab8bb72
3 changed files with 90 additions and 2 deletions

View File

@ -13,6 +13,7 @@ SELECT select_expr [, ...]
[ HAVING condition]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count ] ]
[ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]
----
.Description
@ -391,3 +392,54 @@ To return
----
include-tagged::{sql-specs}/docs/docs.csv-spec[limitBasic]
----
[[sql-syntax-pivot]]
==== PIVOT
The `PIVOT` clause performs a cross tabulation on the results of the query: it aggregates the results and rotates rows into columns. The rotation is done by turning unique values from one column in the expression - the pivoting column - into multiple columns in the output. The column values are aggregations on the remaining columns specified in the expression.
The clause can be broken down in three parts: the aggregation, the `FOR`- and the `IN`-subclause.
The `aggregation_expr` subclause specifies an expression containing an <<sql-functions-aggs, aggregation function>> to be applied on one of the source columns. Only one aggregation can be provided, currently.
The `FOR`-subclause specifies the pivoting column: the distinct values of this column will become the candidate set of values to be rotated.
The `IN`-subclause defines a filter: the intersection between the set provided here and the candidate set from the `FOR`-subclause will be rotated to become the headers of the columns appended to the end result. The filter can not be a subquery, one must provide here literal values, obtained in advance.
The pivoting operation will perform an implicit <<sql-syntax-group-by, GROUP BY>> on all source columns not specified in the `PIVOT` clause, along with the values filtered through the `IN`-clause. Consider the following statement:
[source, sql]
----
include-tagged::{sql-specs}/pivot.csv-spec[sumWithoutSubquery]
----
The query execution could logically be broken down in the following steps:
. a <<sql-syntax-group-by, GROUP BY>> on the column in the `FOR`-clause: `languages`;
. the resulting values are filtered through the set provided in the `IN`-clause;
. the now filtered column is pivoted to form the headers of the two additional columns appended to the result: `1` and `2`;
. a <<sql-syntax-group-by, GROUP BY>> on all columns of the source table `test_emp`, except `salary` (part of the aggregation subclause) and `languages` (part of the `FOR`-clause);
. the values in these appended columns are the `SUM` aggregations of `salary`, grouped by the respective language.
The table-value expression to cross-tabulate can also be the result of a subquery:
[source, sql]
----
include-tagged::{sql-specs}/pivot.csv-spec[averageWithOneValue]
----
The pivoted columns can be aliased (and quoting is required to accommodate white spaces), with or without a supporting `AS` token:
[source, sql]
----
include-tagged::{sql-specs}/pivot.csv-spec[averageWithTwoValuesAndAlias]
----
The resulting cross tabulation can further have the <<sql-syntax-order-by,ORDER BY>> and <<sql-syntax-limit, LIMIT>> clauses applied:
[source, sql]
----
include-tagged::{sql-specs}/pivot.csv-spec[averageWithOneValueAndOrder]
----

View File

@ -176,6 +176,30 @@ from `docvalue_fields`.
[[fields-from-docvalues]]
=== Retrieving from `docvalue_fields`
When the number of columns retrieveable from `docvalue_fields` is greater than the configured <<dynamic-index-settings,`index.max_docvalue_fields_search` setting>>
When the number of columns retrievable from `docvalue_fields` is greater than the configured <<dynamic-index-settings,`index.max_docvalue_fields_search` setting>>
the query will fail with `IllegalArgumentException: Trying to retrieve too many docvalue_fields` error. Either the mentioned {es}
setting needs to be adjusted or fewer columns retrieveable from `docvalue_fields` need to be selected.
setting needs to be adjusted or fewer columns retrievable from `docvalue_fields` need to be selected.
[float]
[[aggs-in-pivot]]
=== Aggregations in the <<sql-syntax-pivot, `PIVOT`>> clause
The aggregation expression in <<sql-syntax-pivot, `PIVOT`>> will currently accept only one aggregation. It is thus not possible to obtain multiple aggregations for any one pivoted column.
[float]
[[subquery-in-pivot]]
=== Using a subquery in <<sql-syntax-pivot, `PIVOT`>>'s `IN`-subclause
The values that the <<sql-syntax-pivot, `PIVOT`>> query could pivot must be provided in the query as a list of literals; providing a subquery instead to build this list is not currently supported. For example, in this query:
[source, sql]
-------------------------------------------------------------
SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (1, 2))
-------------------------------------------------------------
the `languages` of interest must be listed explicitly: `IN (1, 2)`. On the other hand, this example would **not work**:
[source, sql]
-------------------------------------------------------------
SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (SELECT languages FROM test_emp WHERE languages <=2 GROUP BY languages))
-------------------------------------------------------------

View File

@ -2,6 +2,7 @@
averageWithOneValue
schema::languages:bt|'F':d
// tag::averageWithOneValue
SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary) FOR gender IN ('F'));
languages | 'F'
@ -12,6 +13,8 @@ null |62140.666666666664
3 |53660.0
4 |49291.5
5 |46705.555555555555
// end::averageWithOneValue
;
averageWithAliasAndOneValue
@ -58,6 +61,7 @@ null |48396.28571428572|62140.666666666664
averageWithTwoValuesAndAlias
schema::languages:bt|XY:d|XX:d
// tag::averageWithTwoValuesAndAlias
SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary) FOR gender IN ('M' AS "XY", 'F' "XX"));
languages | XY | XX
@ -68,6 +72,8 @@ null |48396.28571428572|62140.666666666664
3 |51741.90909090909|53660.0
4 |47058.90909090909|49291.5
5 |39052.875 |46705.555555555555
// end::averageWithTwoValuesAndAlias
;
averageWithThreeValuesIncludingNull
@ -129,6 +135,7 @@ null |48396.28571428572|62140.666666666664
averageWithOneValueAndOrder
schema::languages:bt|'F':d
// tag::averageWithOneValueAndOrder
SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary) FOR gender IN ('F')) ORDER BY languages DESC LIMIT 4;
languages | 'F'
---------------+------------------
@ -136,6 +143,8 @@ SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary
4 |49291.5
3 |53660.0
2 |50684.4
// end::averageWithOneValueAndOrder
;
averageWithTwoValuesAndOrderDesc
@ -178,6 +187,7 @@ null |48396.28571428572|62140.666666666664
sumWithoutSubquery
schema::birth_date:ts|emp_no:i|first_name:s|gender:s|hire_date:ts|last_name:s|1:i|2:i
// tag::sumWithoutSubquery
SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (1, 2)) LIMIT 5;
birth_date | emp_no | first_name | gender | hire_date | last_name | 1 | 2
@ -187,6 +197,8 @@ null |10043 |Yishay |M |1990-10-20
null |10044 |Mingsen |F |1994-05-21 00:00:00.0|Casley |39728 |null
1952-04-19 00:00:00.0|10009 |Sumant |F |1985-02-18 00:00:00.0|Peac |66174 |null
1953-01-07 00:00:00.0|10067 |Claudi |M |1987-03-04 00:00:00.0|Stavenow |null |52044
// end::sumWithoutSubquery
;
averageWithOneValueAndMath