SQL:Docs: add the PIVOT clause to SELECT section (#49129)

The PR adds the documentation on the PIVOT clause. (cherry picked from commit a55b36065e6496c44b6e3191296931d477a8e5f5)
2019-11-20 16:53:16 +01:00 · 2019-11-20 16:53:16 +01:00 · 8c2ab8bb72
parent a27e0fe10d
commit 8c2ab8bb72
3 changed files with 90 additions and 2 deletions
--- a/docs/reference/sql/language/syntax/commands/select.asciidoc
+++ b/docs/reference/sql/language/syntax/commands/select.asciidoc
@ -13,6 +13,7 @@ SELECT select_expr [, ...]
 [ HAVING condition]
 [ ORDER BY expression [ ASC | DESC ] [, ...] ]
 [ LIMIT [ count ] ]
+[ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]
 ----

 .Description
@ -391,3 +392,54 @@ To return
 ----
 include-tagged::{sql-specs}/docs/docs.csv-spec[limitBasic]
 ----
+
+[[sql-syntax-pivot]]
+==== PIVOT
+
+The `PIVOT` clause performs a cross tabulation on the results of the query: it aggregates the results and rotates rows into columns. The rotation is done by turning unique values from one column in the expression - the pivoting column - into multiple columns in the output. The column values are aggregations on the remaining columns specified in the expression.
+
+The clause can be broken down in three parts: the aggregation, the `FOR`- and the `IN`-subclause.
+
+The `aggregation_expr` subclause specifies an expression containing an <<sql-functions-aggs, aggregation function>> to be applied on one of the source columns. Only one aggregation can be provided, currently.
+
+The `FOR`-subclause specifies the pivoting column: the distinct values of this column will become the candidate set of values to be rotated.
+
+The `IN`-subclause defines a filter: the intersection between the set provided here and the candidate set from the `FOR`-subclause will be rotated to become the headers of the columns appended to the end result. The filter can not be a subquery, one must provide here literal values, obtained in advance.
+
+The pivoting operation will perform an implicit <<sql-syntax-group-by, GROUP BY>> on all source columns not specified in the `PIVOT` clause, along with the values filtered through the `IN`-clause. Consider the following statement:
+
+[source, sql]
+----
+include-tagged::{sql-specs}/pivot.csv-spec[sumWithoutSubquery]
+----
+
+The query execution could logically be broken down in the following steps:
+
+. a <<sql-syntax-group-by, GROUP BY>> on the column in the `FOR`-clause: `languages`;
+. the resulting values are filtered through the set provided in the `IN`-clause;
+. the now filtered column is pivoted to form the headers of the two additional columns appended to the result: `1` and `2`;
+. a <<sql-syntax-group-by, GROUP BY>> on all columns of the source table `test_emp`, except `salary` (part of the aggregation subclause) and `languages` (part of the `FOR`-clause);
+. the values in these appended columns are the `SUM` aggregations of `salary`, grouped by the respective language.
+
+
+The table-value expression to cross-tabulate can also be the result of a subquery:
+
+[source, sql]
+----
+include-tagged::{sql-specs}/pivot.csv-spec[averageWithOneValue]
+----
+
+The pivoted columns can be aliased (and quoting is required to accommodate white spaces), with or without a supporting `AS` token:
+
+[source, sql]
+----
+include-tagged::{sql-specs}/pivot.csv-spec[averageWithTwoValuesAndAlias]
+----
+
+The resulting cross tabulation can further have the <<sql-syntax-order-by,ORDER BY>> and <<sql-syntax-limit, LIMIT>> clauses applied:
+
+[source, sql]
+----
+include-tagged::{sql-specs}/pivot.csv-spec[averageWithOneValueAndOrder]
+----
+
--- a/docs/reference/sql/limitations.asciidoc
+++ b/docs/reference/sql/limitations.asciidoc
@ -176,6 +176,30 @@ from `docvalue_fields`.
 [[fields-from-docvalues]]
 === Retrieving from `docvalue_fields`

-When the number of columns retrieveable from `docvalue_fields` is greater than the configured <<dynamic-index-settings,`index.max_docvalue_fields_search` setting>>
+When the number of columns retrievable from `docvalue_fields` is greater than the configured <<dynamic-index-settings,`index.max_docvalue_fields_search` setting>>
 the query will fail with `IllegalArgumentException: Trying to retrieve too many docvalue_fields` error. Either the mentioned {es}
-setting needs to be adjusted or fewer columns retrieveable from `docvalue_fields` need to be selected.
+setting needs to be adjusted or fewer columns retrievable from `docvalue_fields` need to be selected.
+
+[float]
+[[aggs-in-pivot]]
+=== Aggregations in the <<sql-syntax-pivot, `PIVOT`>> clause
+
+The aggregation expression in <<sql-syntax-pivot, `PIVOT`>> will currently accept only one aggregation. It is thus not possible to obtain multiple aggregations for any one pivoted column.
+
+[float]
+[[subquery-in-pivot]]
+=== Using a subquery in <<sql-syntax-pivot, `PIVOT`>>'s `IN`-subclause
+
+The values that the <<sql-syntax-pivot, `PIVOT`>> query could pivot must be provided in the query as a list of literals; providing a subquery instead to build this list is not currently supported. For example, in this query:
+
+[source, sql]
+-------------------------------------------------------------
+SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (1, 2))
+-------------------------------------------------------------
+
+the `languages` of interest must be listed explicitly: `IN (1, 2)`. On the other hand, this example would **not work**:
+
+[source, sql]
+-------------------------------------------------------------
+SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (SELECT languages FROM test_emp WHERE languages <=2 GROUP BY languages))
+-------------------------------------------------------------
--- a/x-pack/plugin/sql/qa/src/main/resources/pivot.csv-spec
+++ b/x-pack/plugin/sql/qa/src/main/resources/pivot.csv-spec
@ -2,6 +2,7 @@

 averageWithOneValue
 schema::languages:bt|'F':d
+// tag::averageWithOneValue
 SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary) FOR gender IN ('F'));

   languages   |       'F'        
@ -12,6 +13,8 @@ null           |62140.666666666664
 3              |53660.0           
 4              |49291.5           
 5              |46705.555555555555
+
+// end::averageWithOneValue
 ;

 averageWithAliasAndOneValue
@ -58,6 +61,7 @@ null           |48396.28571428572|62140.666666666664

 averageWithTwoValuesAndAlias
 schema::languages:bt|XY:d|XX:d
+// tag::averageWithTwoValuesAndAlias
 SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary) FOR gender IN ('M' AS "XY", 'F' "XX"));

   languages   |        XY       |        XX        
@ -68,6 +72,8 @@ null           |48396.28571428572|62140.666666666664
 3              |51741.90909090909|53660.0           
 4              |47058.90909090909|49291.5           
 5              |39052.875        |46705.555555555555
+
+// end::averageWithTwoValuesAndAlias
 ;

 averageWithThreeValuesIncludingNull
@ -129,6 +135,7 @@ null           |48396.28571428572|62140.666666666664

 averageWithOneValueAndOrder
 schema::languages:bt|'F':d
+// tag::averageWithOneValueAndOrder
 SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary) FOR gender IN ('F')) ORDER BY languages DESC LIMIT 4;
   languages   |       'F'
 ---------------+------------------
@ -136,6 +143,8 @@ SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary
 4              |49291.5
 3              |53660.0
 2              |50684.4
+
+// end::averageWithOneValueAndOrder
 ;

 averageWithTwoValuesAndOrderDesc
@ -178,6 +187,7 @@ null           |48396.28571428572|62140.666666666664

 sumWithoutSubquery
 schema::birth_date:ts|emp_no:i|first_name:s|gender:s|hire_date:ts|last_name:s|1:i|2:i
+// tag::sumWithoutSubquery
 SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (1, 2)) LIMIT 5;

     birth_date      |    emp_no     |  first_name   |    gender     |      hire_date      |   last_name   |       1       |       2
@ -187,6 +197,8 @@ null                 |10043          |Yishay         |M              |1990-10-20
 null                 |10044          |Mingsen        |F              |1994-05-21 00:00:00.0|Casley         |39728          |null
 1952-04-19 00:00:00.0|10009          |Sumant         |F              |1985-02-18 00:00:00.0|Peac           |66174          |null
 1953-01-07 00:00:00.0|10067          |Claudi         |M              |1987-03-04 00:00:00.0|Stavenow       |null           |52044
+
+// end::sumWithoutSubquery
 ;

 averageWithOneValueAndMath