Dialect | As aggregate function | As window function |
---|---|---|

MariaDB 10.3.3 | No | Yes |

Oracle 18c | Yes | Yes |

PostgreSQL 11 | Yes | No |

SQL Server 2017 | No | Yes |

Teradata 16 | Yes | No |

- Aggregate function:
`PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY x)`

- Window function:
`PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY x) OVER (PARTITION BY y)`

### Workarounds if the feature is unavailable

Luckily, as soon as an RDBMS supports window functions, we can easily emulate`PERCENTILE_DISC`

using `PERCENT_RANK`

and `FIRST_VALUE`

as follows. We’re using the Sakila database in this example.
**Emulating window functions**Let’s emulate these first, as it requires a bit less SQL transformations. This query works out of the box in Oracle:

```
SELECT DISTINCT
rating,
percentile_disc(0.5)
WITHIN GROUP (ORDER BY length)
OVER() x1,
percentile_disc(0.5)
WITHIN GROUP (ORDER BY length)
OVER (PARTITION BY rating) x2
FROM film
ORDER BY rating;
```

RATING X1 X2 ------------------- G 114 107 NC-17 114 112 PG 114 113 PG-13 114 125 R 114 115What we can read from this is that the median length of all films is 114 minutes, and the median lengths of films

*per rating*range from 107 minutes to 125 minutes. I’ve used

`DISTINCT`

because we don’t care about visualising these values on a per-row basis in this case. This also works in SQL Server.
Now, let’s assume we’re using PostgreSQL, which doesn’t support inverse distribution window functions, or MySQL, which doesn’t support inverse distribution functions at all, but both support `PERCENT_RANK`

and `FIRST_VALUE`

. Here’s the complete query:
```
SELECT DISTINCT
rating,
first_value(length) OVER (
ORDER BY CASE WHEN p1 <= 0.5 THEN p1 END DESC NULLS LAST) x1,
first_value(length) OVER (
PARTITION BY rating
ORDER BY CASE WHEN p2 <= 0.5 THEN p2 END DESC NULLS LAST) x2
FROM (
SELECT
rating,
length,
percent_rank() OVER (ORDER BY length) p1,
percent_rank() OVER (PARTITION BY rating ORDER BY length) p2
FROM film
) t
ORDER BY rating;
```

`PERCENT_RANK`

: In a derived table, we’re calculating the`PERCENT_RANK`

value, which attributes a rank to each row ordered by length, going from 0 to 1. This makes sense. When looking for the median value, we’re really looking for the value whose`PERCENT_RANK`

is 0.5 or less. When looking for the 90% percentile, we’re looking for the value whose`PERCENT_RANK`

is 0.9 or less`FIRST_VALUE`

: Once we’ve found the`PERCENT_RANK`

, we’re not quite done yet. We need to find the*last*row whose`PERCENT_RANK`

is less or equal to the percentile we’re interested in. I could have used`LAST_VALUE`

, but then I would have needed to resort to using the quite verbose range clause of window functions. Instead, I when ordering the rows by`PERCENT_RANK`

(p1 or p2), I translated all ranks higher than the percentile I’m looking for into`NULL`

using a`CASE`

expression, and then I made sure using`NULLS LAST`

that the percentile I’m looking for will be the first row in the`FIRST_VALUE`

function’s window specification. Easy!

```
SELECT
length,
CASE WHEN p1 <= 0.5 THEN p1 END::numeric(3,2) p1,
first_value(length) OVER (
ORDER BY CASE WHEN p1 <= 0.5 THEN p1 END DESC NULLS LAST) x1
FROM (
SELECT
length,
percent_rank() OVER (ORDER BY length) p1
FROM film
) t
ORDER BY length;
```

length |p1 |x1 | -------|-----|----| 46 |0.00 |114 | 46 |0.00 |114 | 46 |0.00 |114 | 46 |0.00 |114 | 46 |0.00 |114 | 47 |0.01 |114 | ... 113 |0.49 |114 | 114 |0.49 |114 | 114 |0.49 |114 | 114 |0.49 |114 | 114 |0.49 |114 | 114 |0.49 |114 | 114 |0.49 |114 | 114 |0.49 |114 | 114 |0.49 |114 | 114 |0.49 |114 | 114 |0.49 |114 | <-- Last row whose PERCENT_RANK is <= 0.5 115 | |114 | 115 | |114 | 115 | |114 | 115 | |114 | 115 | |114 | 115 | |114 | ... 185 | |114 | 185 | |114 | 185 | |114 |So the

`FIRST_VALUE`

function just searches for that first row (descendingly, i.e. bottom up) whose p1 value is non-null.
The same for p2:
```
SELECT
length,
rating,
CASE WHEN p2 <= 0.5 THEN p2 END::numeric(3,2) p2,
first_value(length) OVER (
PARTITION BY rating
ORDER BY CASE WHEN p2 <= 0.5 THEN p2 END DESC NULLS LAST) x2
FROM (
SELECT
rating,
length,
percent_rank() OVER (PARTITION BY rating ORDER BY length) p2
FROM film
) t
ORDER BY rating, length;
```

length |rating |p2 |x2 | -------|-------|-----|----| 47 |G |0.00 |107 | 47 |G |0.00 |107 | 48 |G |0.01 |107 | 48 |G |0.01 |107 | ... 105 |G |0.47 |107 | 106 |G |0.49 |107 | 107 |G |0.49 |107 | 107 |G |0.49 |107 | <-- Last row in G partition whose 108 |G | |107 | PERCENT_RANK is <= 0.5 108 |G | |107 | 109 |G | |107 | ... 185 |G | |107 | 185 |G | |107 | 46 |PG |0.00 |113 | 47 |PG |0.01 |113 | 47 |PG |0.01 |113 | ... 111 |PG |0.49 |113 | 113 |PG |0.49 |113 | 113 |PG |0.49 |113 | <-- Last row in PG partition whose 114 |PG | |113 | PERCENT_RANK is <= 0.5 114 |PG | |113 | ...Perfect! Notice if your RDBMS doesn’t support the

`NULLS LAST`

clause in your `ORDER BY`

clause (e.g. MySQL), you might either hope that it defaults to sorting `NULLS LAST`

(MySQL does), or you can emulate it as such:
```
-- This
ORDER BY x NULLS LAST
-- Is the same as this
ORDER BY
CASE WHEN x IS NULL THEN 1 ELSE 0 END,
x
```

**Emulating aggregate functions**If you’re using SQL Server and want aggregate function behaviour, I recommend using the window function instead and emulate aggregation using

`DISTINCT`

. It will probably be easier than the emulation below. Do check for performance though!
When you’re using e.g. MySQL, which doesn’t have inverse distribution function support at all, then this chapter is for you.
Here’s how to use the aggregate function version in Oracle:
```
-- Without GROUP BY
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY length) x1
FROM film;
-- With GROUP BY
SELECT
rating,
percentile_disc(0.5) WITHIN GROUP (ORDER BY length) x2
FROM film
GROUP BY rating
ORDER BY rating;
```

X1 --- 114 RATING X2 ----------- G 107 NC-17 112 PG 113 PG-13 125 R 115Now, let’s emulate these on e.g. MySQL, using window functions.

```
-- Without GROUP BY
SELECT
MAX(x1) x1
FROM (
SELECT first_value(length) OVER (
ORDER BY CASE WHEN p1 <= 0.5 THEN p1 END DESC NULLS LAST) x1
FROM (
SELECT
length,
percent_rank() OVER (ORDER BY length) p1
FROM film
) t
) t;
```

`MAX()`

. This is the same as what I did before with `DISTINCT`

, for illustration purposes.
```
-- With GROUP BY
SELECT
rating,
MAX(x2) x2
FROM (
SELECT
rating,
first_value(length) OVER (
PARTITION BY rating
ORDER BY CASE WHEN p2 <= 0.5 THEN p2 END DESC NULLS LAST) x2
FROM (
SELECT
rating,
length,
percent_rank() OVER (
PARTITION BY rating
ORDER BY length) p2
FROM film
) t
) t
GROUP BY rating
ORDER BY rating;
```

`GROUP BY`

expression to a `PARTITION BY`

expression in the window function, and then redo the previous exercise.
### Conclusion

Window functions are extremely powerful. They can be used and combined to calculate a variety of other aggregations. With the above approach, we can calculate the`PERCENTILE_DISC`

inverse distribution function, which is not readily available in most RDBMS using a more verbose but equally powerful approach that uses `PERCENT_RANK`

and `FIRST_VALUE`

in all RDBMS that support window functions. A similar exercise could be made with `PERCENTILE_CONT`

with a slightly more tricky approach to finding that `FIRST_VALUE`

, which I’ll leave as an exercise to the reader.
A future jOOQ version might emulate this for you, automatically.
Liked this article? You may also like 10 SQL Tricks That You Didn’t Think Were Possible.