顺子の杂货铺
生命不息,折腾不止,且行且珍惜~

01-窗口函数详解

DMIT VPS

窗口函数详解

窗口函数是SQL高级分析的核心工具,支持在数据集的子集上执行计算,同时保留原始行的详细信息。

📊 支持情况

数据库支持版本备注
MySQL8.0+完整支持
Oracle8i+完整支持
ClickHouseall完整支持
Hologresall完整支持
MaxComputeall完整支持
Hive2.1+完整支持

1. ROW_NUMBER – 唯一排名

为每行生成唯一的序号,1,2,3…不重复

MySQL / Hive / MaxCompute / Hologres

SELECT
    user_id,
    order_amount,
    ROW_NUMBER() OVER (
        PARTITION BY DATE(order_date)
        ORDER BY order_amount DESC
    ) AS rn
FROM orders
WHERE order_date >= '2024-01-01';

Oracle

SELECT
    user_id,
    order_amount,
    ROW_NUMBER() OVER (
        PARTITION BY TRUNC(order_date)
        ORDER BY order_amount DESC
    ) AS rn
FROM orders
WHERE order_date >= DATE '2024-01-01';

ClickHouse

SELECT
    user_id,
    order_amount,
    ROW_NUMBER() OVER (
        PARTITION BY toDate(order_date)
        ORDER BY order_amount DESC
    ) AS rn
FROM orders
WHERE order_date >= '2024-01-01';

2. RANK – 跳跃排名

排名相同时会跳过相同值,如:1,2,2,4…

通用模板

SELECT
    user_id,
    score,
    RANK() OVER (ORDER BY score DESC) AS rank_rank
FROM exam_scores;

示例输出:

user_idscorerank_rank
0011001
002952
003952
004904

3. DENSE_RANK – 连续排名

排名相同时不跳过,如:1,2,2,3…

通用模板

SELECT
    user_id,
    score,
    DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM exam_scores;

示例输出:

user_idscoredense_rank
0011001
002952
003952
004903

4. NTILE – 分桶排名

将数据分成N个桶,返回桶编号

应用场景:取前20%用户

-- MySQL / Hive / MaxCompute / Hologres
SELECT
    user_id,
    amount,
    NTILE(5) OVER (ORDER BY amount DESC) AS bucket
FROM sales
WHERE bucket = 1;  -- Top 20%

Oracle版本

SELECT *
FROM (
    SELECT
        user_id,
        amount,
        NTILE(5) OVER (ORDER BY amount DESC) AS bucket
    FROM sales
)
WHERE bucket = 1;

5. 组合使用示例

场景:查找每个部门薪资最高的前3名员工

-- MySQL 8.0+
WITH ranked_emp AS (
    SELECT
        emp_id,
        dept_id,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY dept_id
            ORDER BY salary DESC
        ) AS rn
    FROM employees
)
SELECT *
FROM ranked_emp
WHERE rn <= 3;

ClickHouse

SELECT
    emp_id,
    dept_id,
    salary,
    row_number OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
FROM employees
ARRAY JOIN range(1, 4) AS rn_pos
WHERE rn_pos <= 3;

6. 常见问题与注意事项

Q1: ROW_NUMBER vs RANK vs DENSE_RANK 区别

-- 测试数据
WITH data AS (
    SELECT 'A' AS name, 100 AS score UNION ALL
    SELECT 'B' AS name, 100 AS score UNION ALL
    SELECT 'C' AS name, 90 AS score
)
SELECT
    name,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) AS rn,
    RANK() OVER (ORDER BY score DESC) AS rk,
    DENSE_RANK() OVER (ORDER BY score DESC) AS dr
FROM data;

输出对比:

namescoreROW_NUMBERRANKDENSE_RANK
A100111
B100211
C90332

Q2: PARTITION BY 为空

-- 全局排名(不分区)
SELECT
    user_id,
    amount,
    ROW_NUMBER() OVER (ORDER BY amount DESC) AS global_rank
FROM orders;

Q3: 窗口函数不能出现在WHERE子句中

-- ❌ 错误写法
SELECT user_id, ROW_NUMBER() OVER () AS rn
FROM orders
WHERE rn <= 10;

-- ✅ 正确写法:使用子查询或CTE
SELECT *
FROM (
    SELECT
        user_id,
        ROW_NUMBER() OVER (ORDER BY user_id) AS rn
    FROM orders
) t
WHERE rn <= 10;

📝 实践练习

练习1:销售榜单

-- 建表
CREATE TABLE sales (
    id INT,
    region STRING,
    salesperson STRING,
    amount DECIMAL(10,2),
    sale_date DATE
);

-- 插入数据
INSERT INTO sales VALUES
(1, '华东', '张三', 15000, '2024-01-15'),
(2, '华东', '李四', 18000, '2024-01-15'),
(3, '华东', '王五', 12000, '2024-01-15'),
(4, '华北', '赵六', 20000, '2024-01-15'),
(5, '华北', '钱七', 16000, '2024-01-15');

练习目标:查询每个区域销售额排名第一的销售员

-- 答案
WITH ranked AS (
    SELECT
        region,
        salesperson,
        amount,
        ROW_NUMBER() OVER (
            PARTITION BY region
            ORDER BY amount DESC
        ) AS rn
    FROM sales
)
SELECT region, salesperson, amount
FROM ranked
WHERE rn = 1;

⚠️ 数据库限制

数据库限制说明
MySQL < 8.0不支持窗口函数
Hive需要开启配置才能使用部分窗口函数
MaxCompute窗口函数不能嵌套
ClickHouse窗口函数语法略有不同
赞(0)
未经允许不得转载:順子の杂货铺 » 01-窗口函数详解
搬瓦工VPS

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

分享创造快乐

联系我们联系我们