mirror of https://github.com/apache/druid.git
update window functions doc (#15902)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
This commit is contained in:
parent
5871b81a78
commit
3caacba8c5
|
@ -480,6 +480,14 @@ Calculates the trigonometric cotangent of an angle expressed in radians.
|
|||
|
||||
Counts the number of rows.
|
||||
|
||||
## CUME_DIST
|
||||
|
||||
`CUME_DIST()`
|
||||
|
||||
**Function type:** [Window](sql-window-functions.md#window-function-reference)
|
||||
|
||||
Returns the cumulative distribution of the current row within the window calculated as `number of window rows at the same rank or higher than current row` / `total window rows`. The return value ranges between `1/number of rows` and 1.
|
||||
|
||||
## CURRENT_DATE
|
||||
|
||||
`CURRENT_DATE`
|
||||
|
@ -529,6 +537,14 @@ Decodes a Base64-encoded string into a UTF-8 encoded string.
|
|||
|
||||
Converts an angle from radians to degrees.
|
||||
|
||||
## DENSE_RANK
|
||||
|
||||
`DENSE_RANK()`
|
||||
|
||||
**Function type:** [Window](sql-window-functions.md#window-function-reference)
|
||||
|
||||
Returns the rank for a row within a window without gaps. For example, if two rows tie for a rank of 1, the subsequent row is ranked 2.
|
||||
|
||||
## DIV
|
||||
|
||||
`DIV(x, y)`
|
||||
|
@ -684,6 +700,14 @@ Calculates _e_ raised to the power of the numeric expression.
|
|||
|
||||
Extracts the value of some unit of the timestamp, optionally from a certain time zone, and returns the number.
|
||||
|
||||
## FIRST_VALUE
|
||||
|
||||
`FIRST_VALUE(expr)`
|
||||
|
||||
**Function type:** [Window](sql-window-functions.md#window-function-reference)
|
||||
|
||||
Returns the value evaluated for the expression for the first row within the window.
|
||||
|
||||
## FLOOR (date and time)
|
||||
|
||||
`FLOOR(<TIMESTAMP> TO <TIME_UNIT>)`
|
||||
|
@ -860,6 +884,22 @@ Extracts an `ARRAY<COMPLEX<json>>` value from `expr` at the specified `path`. If
|
|||
|
||||
Extracts a literal value from `expr` at the specified `path`. If you specify `RETURNING` and an SQL type name (such as `VARCHAR`, `BIGINT`, `DOUBLE`, etc) the function plans the query using the suggested type. Otherwise, it attempts to infer the type based on the context. If it can't infer the type, it defaults to `VARCHAR`.
|
||||
|
||||
## LAG
|
||||
|
||||
`LAG(expr[, offset])`
|
||||
|
||||
**Function type:** [Window](sql-window-functions.md#window-function-reference)
|
||||
|
||||
If you do not supply an `offset`, returns the value evaluated at the row preceding the current row. Specify an offset number `n` to return the value evaluated at `n` rows preceding the current one.
|
||||
|
||||
## LAST_VALUE
|
||||
|
||||
`LAST_VALUE(expr)`
|
||||
|
||||
**Function type:** [Window](sql-window-functions.md#window-function-reference)
|
||||
|
||||
Returns the value evaluated for the expression for the last row within the window.
|
||||
|
||||
## LATEST
|
||||
|
||||
`LATEST(expr, [maxBytesPerValue])`
|
||||
|
@ -876,6 +916,14 @@ Returns the value of a numeric or string expression corresponding to the latest
|
|||
|
||||
Returns the value of a numeric or string expression corresponding to the latest time value from `timestampExpr`.
|
||||
|
||||
## LEAD
|
||||
|
||||
`LEAD(expr[, offset])`
|
||||
|
||||
**Function type:** [Window](sql-window-functions.md#window-function-reference)
|
||||
|
||||
If you do not supply an `offset`, returns the value evaluated at the row following the current row. Specify an offset number `n` to return the value evaluated at `n` rows following the current one; if there is no such row, returns the given default value.
|
||||
|
||||
## LEAST
|
||||
|
||||
`LEAST([expr1, ...])`
|
||||
|
@ -1092,6 +1140,14 @@ Returns a slice of the array from the zero-based start and end indexes.
|
|||
|
||||
Joins all elements of the array together by the given delimiter.
|
||||
|
||||
## NTILE
|
||||
|
||||
`NTILE(tiles)`
|
||||
|
||||
**Function type:** [Window](sql-window-functions.md#window-function-reference)
|
||||
|
||||
Divides the rows within a window as evenly as possible into the number of tiles, also called buckets, and returns the value of the tile that the row falls into.
|
||||
|
||||
## NULLIF
|
||||
|
||||
`NULLIF(value1, value2)`
|
||||
|
@ -1124,6 +1180,14 @@ Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON val
|
|||
|
||||
Converts a string into a BIGINT with the given base or into a DECIMAL data type if the base is not specified.
|
||||
|
||||
## PERCENT_RANK
|
||||
|
||||
`PERCENT_RANK()`
|
||||
|
||||
**Function type:** [Window](sql-window-functions.md#window-function-reference)
|
||||
|
||||
Returns the relative rank of the row calculated as a percentage according to the formula: `RANK() OVER (window) / COUNT(1) OVER (window)`.
|
||||
|
||||
## POSITION
|
||||
|
||||
`POSITION(<CHARACTER> IN <CHARACTER> [FROM <INTEGER>])`
|
||||
|
@ -1148,6 +1212,14 @@ Calculates a numerical expression raised to the specified power.
|
|||
|
||||
Converts an angle from degrees to radians.
|
||||
|
||||
## RANK
|
||||
|
||||
`RANK()`
|
||||
|
||||
**Function type:** [Window](sql-window-functions.md#window-function-reference)
|
||||
|
||||
Returns the rank with gaps for a row within a window. For example, if two rows tie for rank 1, the next rank is 3.
|
||||
|
||||
## REGEXP_EXTRACT
|
||||
|
||||
`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
|
||||
|
@ -1213,6 +1285,14 @@ Returns the rightmost number of characters from an expression.
|
|||
|
||||
Calculates the rounded value for a numerical expression.
|
||||
|
||||
## ROW_NUMBER
|
||||
|
||||
`ROW_NUMBER()`
|
||||
|
||||
**Function type:** [Window](sql-window-functions.md#window-function-reference)
|
||||
|
||||
Returns the number of the row within the window starting from 1.
|
||||
|
||||
## RPAD
|
||||
|
||||
`RPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
|
||||
|
|
|
@ -822,7 +822,6 @@ Druid does not support all SQL features. In particular, the following features a
|
|||
- JOIN conditions that are not an equality between expressions from the left- and right-hand sides.
|
||||
- JOIN conditions containing a constant value inside the condition.
|
||||
- JOIN conditions on a column which contains a multi-value dimension.
|
||||
- OVER clauses, and analytic functions such as `LAG` and `LEAD`.
|
||||
- ORDER BY for a non-aggregating query, except for `ORDER BY __time` or `ORDER BY __time DESC`, which are supported.
|
||||
This restriction only applies to non-aggregating queries; you can ORDER BY any column in an aggregating query.
|
||||
- DDL and DML.
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
id: sql-window-functions
|
||||
title: Window functions
|
||||
description: Reference for window functions
|
||||
---
|
||||
|
||||
<!--
|
||||
|
@ -38,10 +39,10 @@ Set the context parameter `enableWindowing: true` to use window functions.
|
|||
|
||||
Window functions in Apache Druid produce values based upon the relationship of one row within a window of rows to the other rows within the same window. A window is a group of related rows within a result set. For example, rows with the same value for a specific dimension.
|
||||
|
||||
The following example organizes results with the same `channel` value into windows. For each window, the query returns the rank of each row in ascending order based upon its `delta` value.
|
||||
|
||||
Window functions in Druid require a GROUP BY statement. Druid performs the row-level aggregations for the GROUP BY before performing the window function calculations.
|
||||
|
||||
The following example organizes results with the same `channel` value into windows. For each window, the query returns the rank of each row in ascending order based upon its `changed` value.
|
||||
|
||||
```sql
|
||||
SELECT FLOOR(__time TO DAY) AS event_time,
|
||||
channel,
|
||||
|
@ -88,19 +89,36 @@ You can use the OVER clause to treat other Druid aggregation functions as window
|
|||
|
||||
Window functions support aliasing.
|
||||
|
||||
## Define a window with the OVER clause
|
||||
## Window function syntax
|
||||
|
||||
You can write a window function in Druid using either syntax below.
|
||||
The second syntax shows a window alias to reference a window that you can reuse.
|
||||
|
||||
|
||||
```sql
|
||||
window_function() OVER (
|
||||
[PARTITION BY partitioning expression]
|
||||
[ORDER BY order expression]
|
||||
[[ROWS, RANGE] BETWEEN range start AND range end])
|
||||
FROM table
|
||||
GROUP BY dimensions
|
||||
```
|
||||
|
||||
```sql
|
||||
window_function() OVER w
|
||||
FROM table
|
||||
WINDOW w AS ([PARTITION BY partitioning expression] [ORDER BY order expression]
|
||||
[[ROWS, RANGE] BETWEEN range start AND range end])
|
||||
GROUP BY dimensions
|
||||
```
|
||||
|
||||
The OVER clause defines the query windows for window functions as follows:
|
||||
- PARTITION BY indicates the dimension that defines the rows within the window
|
||||
- ORDER BY specifies the order of the rows within the windows.
|
||||
- PARTITION BY indicates the dimension that defines window boundaries
|
||||
- ORDER BY specifies the order of the rows within the windows
|
||||
|
||||
:::note
|
||||
An empty OVER clause or the absence of a PARTITION BY clause indicates that all data belongs to a single window.
|
||||
|
||||
Sometimes windows are called partitions. However, the partitioning for window functions are a shuffle (partition) of the result set created at query time and is not to be confused with Druid's segment partitioning feature which partitions data at ingest time.
|
||||
|
||||
:::
|
||||
|
||||
The following OVER clause example sets the window dimension to `channel` and orders the results by the absolute value of `delta` ascending:
|
||||
In the following example, the following OVER clause example sets the window dimension to `channel` and orders the results by the absolute value of `delta` ascending:
|
||||
|
||||
```sql
|
||||
...
|
||||
|
@ -108,20 +126,145 @@ RANK() OVER (PARTITION BY channel ORDER BY ABS(delta) ASC)
|
|||
...
|
||||
```
|
||||
|
||||
Window frames, set in ROWS and RANGE expressions, limit the set of rows used for the windowed aggregation.
|
||||
|
||||
ROWS and RANGE accept the following values for `range start` and `range end`:
|
||||
- UNBOUNDED PRECEDING: from the beginning of the window as ordered by the order expression
|
||||
- _N_ ROWS PRECEDING: _N_ rows before the current row as ordered by the order expression
|
||||
- CURRENT ROW: the current row
|
||||
- _N_ ROWS FOLLOWING: _N_ rows after the current row as ordered by the order expression
|
||||
- UNBOUNDED FOLLOWING: to the end of the window as ordered by the order expression
|
||||
|
||||
See [Example with window frames](#example-with-window-frames) for more detail.
|
||||
|
||||
Druid applies the GROUP BY dimensions before calculating all non-window aggregation functions. Then it applies the window function over the aggregated results.
|
||||
|
||||
:::note
|
||||
|
||||
Sometimes windows are called partitions. However, the partitioning for window functions are a shuffle (partition) of the result set created at query time and is not to be confused with Druid's segment partitioning feature which partitions data at ingest time.
|
||||
|
||||
:::
|
||||
|
||||
### ORDER BY windows
|
||||
|
||||
When the window definition only specifies ORDER BY and not PARTITION BY, it sorts the aggregate data set and applies the function in that order.
|
||||
|
||||
The following query uses `ORDER BY SUM(delta) DESC` to rank user hourly activity from the most changed the least changed within an hour:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
TIME_FLOOR(__time, 'PT1H') as time_hour,
|
||||
channel,
|
||||
user,
|
||||
SUM(delta) net_user_changes,
|
||||
RANK() OVER (ORDER BY SUM(delta) DESC) AS editing_rank
|
||||
FROM "wikipedia"
|
||||
WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
|
||||
AND __time BETWEEN '2016-06-27' AND '2016-06-28'
|
||||
GROUP BY TIME_FLOOR(__time, 'PT1H'), channel, user
|
||||
ORDER BY 5
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary> View results </summary>
|
||||
|
||||
| `time_hour` | `channel` | `user` | `net_user_changes` | `editing_rank` |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `2016-06-27T15:00:00.000Z` | `#kk.wikipedia` | `Nurkhan` | 6900| 1 |
|
||||
| `2016-06-27T19:00:00.000Z` | `#lt.wikipedia` | `77.221.66.41` | 4358 | 2 |
|
||||
| `2016-06-27T09:00:00.000Z` | `#kk.wikipedia` | `Салиха` | 2702 | 3 |
|
||||
| `2016-06-27T04:00:00.000Z` | `#kk.wikipedia` | `Nurkhan` | 2440 | 4 |
|
||||
| `2016-06-27T09:00:00.000Z` | `#lt.wikipedia` | `80.4.147.222` | 894 | 5 |
|
||||
| `2016-06-27T09:00:00.000Z` | `#lt.wikipedia` | `178.11.203.212` | 447 | 6 |
|
||||
| `2016-06-27T11:00:00.000Z` | `#kk.wikipedia` | `Нұрлан Рахымжанов` | 126 | 7 |
|
||||
| `2016-06-27T06:00:00.000Z` | `#kk.wikipedia` | `Шокай` | 91 | 8 |
|
||||
| `2016-06-27T11:00:00.000Z` | `#lt.wikipedia` | `MaryroseB54` | 59 | 9 |
|
||||
| `2016-06-27T04:00:00.000Z` | `#kk.wikipedia` | `Нұрлан Рахымжанов` | 56 | 10 |
|
||||
| `2016-06-27T12:00:00.000Z` | `#lt.wikipedia` | `Karoliuk` | 53 | 11 |
|
||||
| `2016-06-27T12:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | 28 | 12 |
|
||||
| `2016-06-27T07:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | 13 | 13 |
|
||||
| `2016-06-27T10:00:00.000Z` | `#lt.wikipedia` | `80.4.147.222` | 1 | 14 |
|
||||
| `2016-06-27T07:00:00.000Z` | `#kk.wikipedia` | `Салиха` | -1 | 15 |
|
||||
| `2016-06-27T06:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | -2 | 16 |
|
||||
</details>
|
||||
|
||||
### PARTITION BY windows
|
||||
|
||||
When a window only specifies PARTITION BY partition expression, Druid calculates the aggregate window function over all the rows that share a value within the selected dataset.
|
||||
|
||||
The following example demonstrates a query that uses two different windows—`PARTITION BY channel` and `PARTITION BY user`—to calculate the total activity in the channel and total activity by the user so that they can be compared to individual hourly activity:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
TIME_FLOOR(__time, 'PT1H') as time_hour,
|
||||
channel,
|
||||
user,
|
||||
SUM(delta) AS hourly_user_changes,
|
||||
SUM(SUM(delta)) OVER (PARTITION BY user) AS total_user_changes,
|
||||
SUM(SUM(delta)) OVER (PARTITION BY channel) AS total_channel_changes
|
||||
FROM "wikipedia"
|
||||
WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
|
||||
AND __time BETWEEN '2016-06-27' AND '2016-06-28'
|
||||
GROUP BY TIME_FLOOR(__time, 'PT1H'), 2, 3
|
||||
ORDER BY channel, TIME_FLOOR(__time, 'PT1H'), user
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary> View results </summary>
|
||||
|
||||
| `time_hour` | `channel` | `user` | `hourly_user_changes` | `total_user_changes` | `total_channel_changes` |
|
||||
| --- | ---| ---| --- | --- | --- |
|
||||
| `2016-06-27T04:00:00.000Z` | `#kk.wikipedia` | `Nurkhan` | 2440 | 9340 | 12314 |
|
||||
| `2016-06-27T04:00:00.000Z` | `#kk.wikipedia` | `Нұрлан Рахымжанов` | 56 | 182 | 12314 |
|
||||
| `2016-06-27T06:00:00.000Z` | `#kk.wikipedia` | `Шокай` | 91 | 91 | 12314 |
|
||||
| `2016-06-27T07:00:00.000Z` | `#kk.wikipedia` | `Салиха` | -1 | 2701 | 12314 |
|
||||
| `2016-06-27T09:00:00.000Z` | `#kk.wikipedia` | `Салиха` | 2702 | 2701 | 12314 |
|
||||
| `2016-06-27T11:00:00.000Z` | `#kk.wikipedia` | `Нұрлан Рахымжанов` | 126 | 182 | 12314 |
|
||||
| `2016-06-27T15:00:00.000Z` | `#kk.wikipedia` | `Nurkhan` | 6900 | 9340 | 12314 |
|
||||
| `2016-06-27T06:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | -2 | 39 | 5851 |
|
||||
| `2016-06-27T07:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | 13 | 39 | 5851 |
|
||||
| `2016-06-27T09:00:00.000Z` | `#lt.wikipedia` | `178.11.203.212` | 447 | 447 | 5851 |
|
||||
| `2016-06-27T09:00:00.000Z` | `#lt.wikipedia` | `80.4.147.222` | 894 | 895 | 5851 |
|
||||
| `2016-06-27T10:00:00.000Z` | `#lt.wikipedia` | `80.4.147.222` | 1 | 895 | 5851 |
|
||||
| `2016-06-27T11:00:00.000Z` | `#lt.wikipedia` | `MaryroseB54` | 59 | 59 | 5851 |
|
||||
| `2016-06-27T12:00:00.000Z` | `#lt.wikipedia` | `Karoliuk` | 53 | 53 | 5851 |
|
||||
| `2016-06-27T12:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | 28 | 39 | 5851 |
|
||||
| `2016-06-27T19:00:00.000Z` | `#lt.wikipedia` | `77.221.66.41` | 4358 | 4358 | 5851 |
|
||||
|
||||
</details>
|
||||
|
||||
In this example, the dataset is filtered for a single day. Therefore the window function results represent the total activity for the day, for the `user` and for the `channel` dimensions respectively.
|
||||
|
||||
This type of result helps you analyze the impact of an individual user's hourly activity:
|
||||
- the impact to the channel by comparing `hourly_user_changes` to `total_channel_changes`
|
||||
- the impact of each user over the channel by `total_user_changes` to `total_channel_changes`
|
||||
- the progress of each user's individual activity by comparing `hourly_user_changes` to `total_user_changes`
|
||||
|
||||
#### Window frame guardrails
|
||||
|
||||
Druid has guardrail logic to prevent you from executing window function queries with window frame expressions that might return unexpected results.
|
||||
|
||||
For example:
|
||||
- You cannot set expressions as bounds for window frames.
|
||||
- You cannot use two FOLLOWING expressions in the window frame. For example: `ROWS BETWEEN 2 ROWS FOLLOWING and 3 ROWS FOLLOWING`.
|
||||
- You can only use a RANGE frames when both endpoints are unbounded or current row.
|
||||
|
||||
If you write a query that violates one of these conditions, Druid throws an error: "The query contains a window frame which may return incorrect results. To disregard this warning, set `windowingStrictValidation` to false in the query context."
|
||||
|
||||
## Window function reference
|
||||
|
||||
|Function|Notes|
|
||||
|--------|-----|
|
||||
| `ROW_NUMBER()`| Returns the number of the row within the window |
|
||||
|`RANK()`| Returns the rank for a row within a window |
|
||||
|`DENSE_RANK()`| Returns the rank for a row within a window without gaps. For example, if two rows tie for rank of 1, the subsequent row is ranked 2. |
|
||||
|`PERCENT_RANK()`| Returns the rank of the row calculated as a percentage according to the formula: `(rank - 1) / (total window rows - 1)` |
|
||||
|`CUME_DIST()`| Returns the cumulative distribution of the current row within the window calculated as `number of window rows at the same rank or higher than current row` / `total window rows` |
|
||||
|`NTILE(tiles)`| Divides the rows within a window as evenly as possible into the number of tiles, also called buckets, and returns the value of the tile that the row falls into | None |
|
||||
|`LAG(expr[, offset])`| Returns the value evaluated at the row that precedes the current row by the offset number within the window. `offset` defaults to 1 if not provided |
|
||||
|`LEAD(expr[, offset])`| Returns the value evaluated at the row that follows the current row by the offset number within the window; if there is no such row, returns the given default value. `offset` defaults to 1 if not provided |
|
||||
|`FIRST_VALUE(expr)`| Returns the value for the expression for the first row within the window |
|
||||
|`LAST_VALUE(expr)`| Returns the value for the expression for the last row within the window |
|
||||
| `ROW_NUMBER()` | Returns the number of the row within the window starting from 1 |
|
||||
| `RANK()` | Returns the rank with gaps for a row within a window. For example, if two rows tie for rank 1, the next rank is 3 |
|
||||
| `DENSE_RANK()` | Returns the rank for a row within a window without gaps. For example, if two rows tie for rank of 1, the subsequent row is ranked 2. |
|
||||
| `PERCENT_RANK()` | Returns the relative rank of the row calculated as a percentage according to the formula: `RANK() OVER (window) / COUNT(1) OVER (window)` |
|
||||
| `CUME_DIST()` | Returns the cumulative distribution of the current row within the window calculated as number of window rows at the same rank or higher than current row divided by total window rows. The return value ranges between `1/number of rows` and 1 |
|
||||
| `NTILE(tiles)` | Divides the rows within a window as evenly as possible into the number of tiles, also called buckets, and returns the value of the tile that the row falls into | None |
|
||||
| `LAG(expr[, offset])` | If you do not supply an `offset`, returns the value evaluated at the row preceding the current row. Specify an offset number, `n`, to return the value evaluated at `n` rows preceding the current one |
|
||||
| `LEAD(expr[, offset])` | If you do not supply an `offset`, returns the value evaluated at the row following the current row. Specify an offset number `n` to return the value evaluated at `n` rows following the current one; if there is no such row, returns the given default value |
|
||||
| `FIRST_VALUE(expr)` | Returns the value evaluated for the expression for the first row within the window |
|
||||
| `LAST_VALUE(expr)` | Returns the value evaluated for the expression for the last row within the window |
|
||||
|
||||
## Examples
|
||||
|
||||
|
@ -214,6 +357,75 @@ GROUP BY channel, __time, delta
|
|||
|
||||
</details>
|
||||
|
||||
### Example with window frames
|
||||
|
||||
The following query uses a few different window frames to calculate overall activity by channel:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
channel,
|
||||
TIME_FLOOR(__time, 'PT1H') AS time_hour,
|
||||
SUM(delta) AS hourly_channel_changes,
|
||||
SUM(SUM(delta)) OVER cumulative AS cumulative_activity_in_channel,
|
||||
SUM(SUM(delta)) OVER moving5 AS csum5,
|
||||
COUNT(1) OVER moving5 AS count5
|
||||
FROM "wikipedia"
|
||||
WHERE channel = '#en.wikipedia'
|
||||
AND __time BETWEEN '2016-06-27' AND '2016-06-28'
|
||||
GROUP BY 1, TIME_FLOOR(__time, 'PT1H')
|
||||
WINDOW cumulative AS (
|
||||
PARTITION BY channel
|
||||
ORDER BY TIME_FLOOR(__time, 'PT1H')
|
||||
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|
||||
)
|
||||
,
|
||||
moving5 AS (
|
||||
PARTITION BY channel
|
||||
ORDER BY TIME_FLOOR(__time, 'PT1H')
|
||||
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
|
||||
)
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary> View results </summary>
|
||||
|
||||
| `channel` | `time_hour` | `hourly_channel_changes` | `cumulative_activity_in_channel` | `csum5` | `count5` |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| `#en.wikipedia` | `2016-06-27T00:00:00.000Z` | 74996 | 74996 | 74996 | 1 |
|
||||
| `#en.wikipedia` | `2016-06-27T01:00:00.000Z` | 24150 | 99146 | 99146 | 2 |
|
||||
| `#en.wikipedia` | `2016-06-27T02:00:00.000Z` | 102372 | 201518 | 201518 | 3 |
|
||||
| `#en.wikipedia` | `2016-06-27T03:00:00.000Z` | 61362 | 262880 | 262880 | 4 |
|
||||
| `#en.wikipedia` | `2016-06-27T04:00:00.000Z` | 61666 | 324546 | 324546 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T05:00:00.000Z` | 144199 | 468745 | 393749 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T06:00:00.000Z` | 33414 | 502159 | 403013 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T07:00:00.000Z` | 79397 | 581556 | 380038 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T08:00:00.000Z` | 104436 | 685992 | 423112 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T09:00:00.000Z` | 58020 | 744012 | 419466 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T10:00:00.000Z` | 93904 | 837916 | 369171 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T11:00:00.000Z` | 74436 | 912352 | 410193 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T12:00:00.000Z` | 83491 | 995843 | 414287 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T13:00:00.000Z` | 103051 | 1098894 | 412902 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T14:00:00.000Z` | 211411 | 1310305 | 566293 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T15:00:00.000Z` | 101247 | 1411552 | 573636 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T16:00:00.000Z` | 189765 | 1601317 | 688965 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T17:00:00.000Z` | 74404 | 1675721 | 679878 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T18:00:00.000Z` | 104824 | 1780545 | 681651 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T19:00:00.000Z` | 71268 | 1851813 | 541508 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T20:00:00.000Z` | 88185 | 1939998 | 528446 | 5 |
|
||||
| `#en.wikipedia` | `2016-06-27T21:00:00.000Z` | 42584 | 1982582 | 381265 | 5 |
|
||||
|
||||
</details>
|
||||
|
||||
The example defines multiple window specifications in the WINDOW clause that you can use for various window function calculations.
|
||||
|
||||
The query uses two windows:
|
||||
- `cumulative` is partitioned by channel and includes all rows from the beginning of partition up to the current row as ordered by `__time` to enable cumulative aggregation
|
||||
- `moving5` is also partitioned by channel but only includes up to the last four rows and the current row as ordered by time
|
||||
|
||||
The number of rows considered for the `moving5` window for the `count5` column:
|
||||
- starts at a single row because there are no rows before the current one
|
||||
- grows up to five rows as defined by `ROWS BETWEEN 4 ROWS PRECEDING AND CURRENT ROW`
|
||||
|
||||
## Known issues
|
||||
|
||||
The following are known issues with window functions:
|
||||
|
|
|
@ -57,7 +57,9 @@ CloudWatch
|
|||
ColumnDescriptor
|
||||
Corretto
|
||||
CLI
|
||||
CUME_DIST
|
||||
DDL
|
||||
DENSE_RANK
|
||||
DML
|
||||
DNS
|
||||
DRUIDVERSION
|
||||
|
@ -94,6 +96,7 @@ ETL
|
|||
Elasticsearch
|
||||
Enums
|
||||
FirehoseFactory
|
||||
FIRST_VALUE
|
||||
FlattenSpec
|
||||
Float.NEGATIVE_INFINITY
|
||||
Float.NEGATIVE_INFINITY.
|
||||
|
@ -416,6 +419,7 @@ nonfinalized
|
|||
non-null
|
||||
non-nullable
|
||||
noop
|
||||
NTILE
|
||||
numerics
|
||||
numShards
|
||||
parameterize
|
||||
|
@ -427,6 +431,7 @@ partitioner
|
|||
partitionFunction
|
||||
partitionsSpec
|
||||
pathParts
|
||||
PERCENT_RANK
|
||||
performant
|
||||
plaintext
|
||||
pluggable
|
||||
|
@ -489,6 +494,7 @@ repo
|
|||
requireSSL
|
||||
rollup
|
||||
rollups
|
||||
ROW_NUMBER
|
||||
rsync
|
||||
runtime
|
||||
schemas
|
||||
|
|
|
@ -122,6 +122,7 @@
|
|||
"label": "Druid SQL",
|
||||
"items": [
|
||||
"querying/sql",
|
||||
"querying/sql-functions",
|
||||
"querying/tips-good-queries",
|
||||
"querying/query-deep-storage",
|
||||
"querying/sql-data-types",
|
||||
|
@ -132,7 +133,6 @@
|
|||
"querying/sql-array-functions",
|
||||
"querying/sql-multivalue-string-functions",
|
||||
"querying/sql-json-functions",
|
||||
"querying/sql-functions",
|
||||
"querying/sql-query-context",
|
||||
"querying/sql-metadata-tables",
|
||||
"querying/sql-translation"
|
||||
|
|
Loading…
Reference in New Issue