Skip to content

Latest commit

 

History

History
311 lines (251 loc) · 9.64 KB

window.rst

File metadata and controls

311 lines (251 loc) · 9.64 KB

Window Functions

Table of contents

A window function consists of 2 pieces: a function and a window definition. A window definition defines a window of data rows with a given scope around the current row. The window is also called window frame sometimes. In a window frame, the function performs a calculation across the set of data and generate a result value for each row.

There are three categories of common window functions:

  1. Aggregate Functions: COUNT(), MIN(), MAX(), AVG(), SUM(), STDDEV_POP, STDDEV_SAMP, VAR_POP and VAR_SAMP.
  2. Ranking Functions: ROW_NUMBER(), RANK(), DENSE_RANK(), PERCENT_RANK() and NTILE().
  3. Analytic Functions: CUME_DIST(), LAG() and LEAD().

The syntax of a window function is as follows in which both PARTITION BY and ORDER BY clause are optional:

function_name (expression [, expression...])
OVER (
  PARTITION BY expression [, expression...]
  ORDER BY expression [ASC | DESC] [NULLS {FIRST | LAST}] [, ...]
)

Aggregate functions are window functions that operates on a cumulative window frame to calculate an aggregated result. How cumulative data in the window frame being aggregated is exactly same as how regular aggregate functions work. So aggregate window functions can be used to perform running calculation easily, for example running average or running sum. Note that if PARTITION BY clause present and specified column value(s) changed, the state of aggregate function will be reset.

Here is an example for COUNT function:

os> SELECT
...   gender, balance,
...   COUNT(balance) OVER(
...     PARTITION BY gender ORDER BY balance
... ) AS cnt
... FROM accounts;
fetched rows / total rows = 4/4
+--------+---------+-----+
| gender | balance | cnt |
|--------+---------+-----|
| F      | 32838   | 1   |
| M      | 4180    | 1   |
| M      | 5686    | 2   |
| M      | 39225   | 3   |
+--------+---------+-----+

Here is an example for MIN function:

os> SELECT
...   gender, balance,
...   MIN(balance) OVER(
...     PARTITION BY gender ORDER BY balance
... ) AS cnt
... FROM accounts;
fetched rows / total rows = 4/4
+--------+---------+-------+
| gender | balance | cnt   |
|--------+---------+-------|
| F      | 32838   | 32838 |
| M      | 4180    | 4180  |
| M      | 5686    | 4180  |
| M      | 39225   | 4180  |
+--------+---------+-------+

Here is an example for MAX function:

os> SELECT
...   gender, balance,
...   MAX(balance) OVER(
...     PARTITION BY gender ORDER BY balance
... ) AS cnt
... FROM accounts;
fetched rows / total rows = 4/4
+--------+---------+-------+
| gender | balance | cnt   |
|--------+---------+-------|
| F      | 32838   | 32838 |
| M      | 4180    | 4180  |
| M      | 5686    | 5686  |
| M      | 39225   | 39225 |
+--------+---------+-------+

Here is an example for AVG function:

os> SELECT
...   gender, balance,
...   AVG(balance) OVER(
...     PARTITION BY gender ORDER BY balance
... ) AS cnt
... FROM accounts;
fetched rows / total rows = 4/4
+--------+---------+--------------------+
| gender | balance | cnt                |
|--------+---------+--------------------|
| F      | 32838   | 32838.0            |
| M      | 4180    | 4180.0             |
| M      | 5686    | 4933.0             |
| M      | 39225   | 16363.666666666666 |
+--------+---------+--------------------+

Here is an example for SUM function:

os> SELECT
...   gender, balance,
...   SUM(balance) OVER(
...     PARTITION BY gender ORDER BY balance
... ) AS cnt
... FROM accounts;
fetched rows / total rows = 4/4
+--------+---------+-------+
| gender | balance | cnt   |
|--------+---------+-------|
| F      | 32838   | 32838 |
| M      | 4180    | 4180  |
| M      | 5686    | 9866  |
| M      | 39225   | 49091 |
+--------+---------+-------+

Here is an example for STDDEV_POP function:

os> SELECT
...   gender, balance,
...   STDDEV_POP(balance) OVER(
...     PARTITION BY gender ORDER BY balance
... ) AS val
... FROM accounts;
fetched rows / total rows = 4/4
+--------+---------+--------------------+
| gender | balance | val                |
|--------+---------+--------------------|
| F      | 32838   | 0.0                |
| M      | 4180    | 0.0                |
| M      | 5686    | 753.0              |
| M      | 39225   | 16177.091422406222 |
+--------+---------+--------------------+

Here is an example for STDDEV_SAMP function:

os> SELECT
...   gender, balance,
...   STDDEV_SAMP(balance) OVER(
...     PARTITION BY gender ORDER BY balance
... ) AS val
... FROM accounts;
fetched rows / total rows = 4/4
+--------+---------+--------------------+
| gender | balance | val                |
|--------+---------+--------------------|
| F      | 32838   | 0.0                |
| M      | 4180    | 0.0                |
| M      | 5686    | 1064.9028124669405 |
| M      | 39225   | 19812.809753624886 |
+--------+---------+--------------------+

Here is an example for SUM function:

os> SELECT
...   gender, balance,
...   VAR_POP(balance) OVER(
...     PARTITION BY gender ORDER BY balance
... ) AS val
... FROM accounts;
fetched rows / total rows = 4/4
+--------+---------+--------------------+
| gender | balance | val                |
|--------+---------+--------------------|
| F      | 32838   | 0.0                |
| M      | 4180    | 0.0                |
| M      | 5686    | 567009.0           |
| M      | 39225   | 261698286.88888893 |
+--------+---------+--------------------+

Here is an example for SUM function:

os> SELECT
...   gender, balance,
...   VAR_SAMP(balance) OVER(
...     PARTITION BY gender ORDER BY balance
... ) AS val
... FROM accounts;
fetched rows / total rows = 4/4
+--------+---------+-------------------+
| gender | balance | val               |
|--------+---------+-------------------|
| F      | 32838   | 0.0               |
| M      | 4180    | 0.0               |
| M      | 5686    | 1134018.0         |
| M      | 39225   | 392547430.3333334 |
+--------+---------+-------------------+

Ranking functions are window functions that assign an incremental rank to each row in the window. How the rank number gets increased is up to ranking function implementation, though the rank is mostly determined by field values in ORDER BY list. If PARTITION BY clause present, the state of ranking functions (incremental rank number maintained) will be reset.

Note that normally ranking functions are supposed to be used with window definition that defines the order of data rows in the window. Otherwise the result is undetermined. In this case, ROW_NUMBER assigns row number to data rows in random order. RANK and DENSE_RANK always assigns rank 1 to each row.

ROW_NUMBER function assigns a row number to each row. As a special case, the row number is always increased by one regardless of the fields specified in ORDER BY list. Here is an example for ROW_NUMBER function:

os> SELECT gender, balance, ROW_NUMBER() OVER(PARTITION BY gender ORDER BY balance) AS num FROM accounts;
fetched rows / total rows = 4/4
+--------+---------+-----+
| gender | balance | num |
|--------+---------+-----|
| F      | 32838   | 1   |
| M      | 4180    | 1   |
| M      | 5686    | 2   |
| M      | 39225   | 3   |
+--------+---------+-----+

Similarly as regular ORDER BY clause, you can specify null ordering by NULLS FIRST or NULLS LAST which has exactly same behavior:

os> SELECT
...  employer,
...  ROW_NUMBER() OVER(
...   ORDER BY employer NULLS LAST
... ) AS num
... FROM accounts
... ORDER BY employer NULLS LAST;
fetched rows / total rows = 4/4
+----------+-----+
| employer | num |
|----------+-----|
| Netagy   | 1   |
| Pyrami   | 2   |
| Quility  | 3   |
| null     | 4   |
+----------+-----+

RANK function assigns a rank to each row. For rows that have same values for fields specified in ORDER BY list, same rank is assigned. If this is the case, the next few ranks will be skipped depending on how many ties. Here is an example for RANK function:

os> SELECT gender, RANK() OVER(ORDER BY gender DESC) AS rnk FROM accounts;
fetched rows / total rows = 4/4
+--------+-----+
| gender | rnk |
|--------+-----|
| M      | 1   |
| M      | 1   |
| M      | 1   |
| F      | 4   |
+--------+-----+

Similarly as RANK, DENSE_RANK function also assigns a rank to each row. The difference is there is no gap between ranks. Here is an example for DENSE_RANK function:

os> SELECT gender, DENSE_RANK() OVER(ORDER BY gender DESC) AS rnk FROM accounts;
fetched rows / total rows = 4/4
+--------+-----+
| gender | rnk |
|--------+-----|
| M      | 1   |
| M      | 1   |
| M      | 1   |
| F      | 2   |
+--------+-----+