窗口函数详解
窗口函数是SQL高级分析的核心工具,支持在数据集的子集上执行计算,同时保留原始行的详细信息。
📊 支持情况
| 数据库 | 支持版本 | 备注 |
|---|---|---|
| MySQL | 8.0+ | 完整支持 |
| Oracle | 8i+ | 完整支持 |
| ClickHouse | all | 完整支持 |
| Hologres | all | 完整支持 |
| MaxCompute | all | 完整支持 |
| Hive | 2.1+ | 完整支持 |
1. ROW_NUMBER – 唯一排名
为每行生成唯一的序号,1,2,3…不重复
MySQL / Hive / MaxCompute / Hologres
SELECT
user_id,
order_amount,
ROW_NUMBER() OVER (
PARTITION BY DATE(order_date)
ORDER BY order_amount DESC
) AS rn
FROM orders
WHERE order_date >= '2024-01-01';Oracle
SELECT
user_id,
order_amount,
ROW_NUMBER() OVER (
PARTITION BY TRUNC(order_date)
ORDER BY order_amount DESC
) AS rn
FROM orders
WHERE order_date >= DATE '2024-01-01';ClickHouse
SELECT
user_id,
order_amount,
ROW_NUMBER() OVER (
PARTITION BY toDate(order_date)
ORDER BY order_amount DESC
) AS rn
FROM orders
WHERE order_date >= '2024-01-01';2. RANK – 跳跃排名
排名相同时会跳过相同值,如:1,2,2,4…
通用模板
SELECT
user_id,
score,
RANK() OVER (ORDER BY score DESC) AS rank_rank
FROM exam_scores;示例输出:
| user_id | score | rank_rank |
|---|---|---|
| 001 | 100 | 1 |
| 002 | 95 | 2 |
| 003 | 95 | 2 |
| 004 | 90 | 4 |
3. DENSE_RANK – 连续排名
排名相同时不跳过,如:1,2,2,3…
通用模板
SELECT
user_id,
score,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM exam_scores;示例输出:
| user_id | score | dense_rank |
|---|---|---|
| 001 | 100 | 1 |
| 002 | 95 | 2 |
| 003 | 95 | 2 |
| 004 | 90 | 3 |
4. NTILE – 分桶排名
将数据分成N个桶,返回桶编号
应用场景:取前20%用户
-- MySQL / Hive / MaxCompute / Hologres
SELECT
user_id,
amount,
NTILE(5) OVER (ORDER BY amount DESC) AS bucket
FROM sales
WHERE bucket = 1; -- Top 20%Oracle版本
SELECT *
FROM (
SELECT
user_id,
amount,
NTILE(5) OVER (ORDER BY amount DESC) AS bucket
FROM sales
)
WHERE bucket = 1;5. 组合使用示例
场景:查找每个部门薪资最高的前3名员工
-- MySQL 8.0+
WITH ranked_emp AS (
SELECT
emp_id,
dept_id,
salary,
ROW_NUMBER() OVER (
PARTITION BY dept_id
ORDER BY salary DESC
) AS rn
FROM employees
)
SELECT *
FROM ranked_emp
WHERE rn <= 3;ClickHouse
SELECT
emp_id,
dept_id,
salary,
row_number OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
FROM employees
ARRAY JOIN range(1, 4) AS rn_pos
WHERE rn_pos <= 3;6. 常见问题与注意事项
Q1: ROW_NUMBER vs RANK vs DENSE_RANK 区别
-- 测试数据
WITH data AS (
SELECT 'A' AS name, 100 AS score UNION ALL
SELECT 'B' AS name, 100 AS score UNION ALL
SELECT 'C' AS name, 90 AS score
)
SELECT
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS rn,
RANK() OVER (ORDER BY score DESC) AS rk,
DENSE_RANK() OVER (ORDER BY score DESC) AS dr
FROM data;输出对比:
| name | score | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|---|
| A | 100 | 1 | 1 | 1 |
| B | 100 | 2 | 1 | 1 |
| C | 90 | 3 | 3 | 2 |
Q2: PARTITION BY 为空
-- 全局排名(不分区)
SELECT
user_id,
amount,
ROW_NUMBER() OVER (ORDER BY amount DESC) AS global_rank
FROM orders;Q3: 窗口函数不能出现在WHERE子句中
-- ❌ 错误写法
SELECT user_id, ROW_NUMBER() OVER () AS rn
FROM orders
WHERE rn <= 10;
-- ✅ 正确写法:使用子查询或CTE
SELECT *
FROM (
SELECT
user_id,
ROW_NUMBER() OVER (ORDER BY user_id) AS rn
FROM orders
) t
WHERE rn <= 10;📝 实践练习
练习1:销售榜单
-- 建表
CREATE TABLE sales (
id INT,
region STRING,
salesperson STRING,
amount DECIMAL(10,2),
sale_date DATE
);
-- 插入数据
INSERT INTO sales VALUES
(1, '华东', '张三', 15000, '2024-01-15'),
(2, '华东', '李四', 18000, '2024-01-15'),
(3, '华东', '王五', 12000, '2024-01-15'),
(4, '华北', '赵六', 20000, '2024-01-15'),
(5, '华北', '钱七', 16000, '2024-01-15');练习目标:查询每个区域销售额排名第一的销售员
-- 答案
WITH ranked AS (
SELECT
region,
salesperson,
amount,
ROW_NUMBER() OVER (
PARTITION BY region
ORDER BY amount DESC
) AS rn
FROM sales
)
SELECT region, salesperson, amount
FROM ranked
WHERE rn = 1;⚠️ 数据库限制
| 数据库 | 限制说明 |
|---|---|
| MySQL < 8.0 | 不支持窗口函数 |
| Hive | 需要开启配置才能使用部分窗口函数 |
| MaxCompute | 窗口函数不能嵌套 |
| ClickHouse | 窗口函数语法略有不同 |
順子の杂货铺


