顺子の杂货铺
生命不息,折腾不止,且行且珍惜~

02-累计计算详解

DMIT VPS

累计计算详解

累计计算(Cumulative Calculation)用于计算截至当前行的累计值,是数据分析中最常用的计算模式之一。

📊 支持情况

数据库支持情况备注
MySQL✅ 完整支持8.0+支持窗口帧
Oracle✅ 完整支持使用 ROWS BETWEEN
ClickHouse✅ 完整支持语法略有不同
Hologres✅ 完整支持兼容PostgreSQL
MaxCompute✅ 完整支持支持窗口帧
Hive✅ 完整支持2.1+支持窗口帧

1. 累计求和 (Cumulative Sum)

通用语法

SUM(column) OVER (
    PARTITION BY partition_col
    ORDER BY order_col
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)

MySQL 8.0+

SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY region
        ORDER BY sale_date
        ROWS UNBOUNDED PRECEDING
    ) AS cum_amount
FROM sales;

Oracle

SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY region
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cum_amount
FROM sales;

ClickHouse

SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY region
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cum_amount
FROM sales;

Hive / MaxCompute / Hologres

SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY region
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cum_amount
FROM sales;

2. 累计均值 (Cumulative Average)

语法

AVG(column) OVER (
    PARTITION BY partition_col
    ORDER BY order_col
    ROWS UNBOUNDED PRECEDING
)

示例

-- MySQL / Hive / MaxCompute / Hologres
SELECT
    day,
    revenue,
    ROUND(AVG(revenue) OVER (
        ORDER BY day
        ROWS UNBOUNDED PRECEDING
    ), 2) AS cum_avg
FROM daily_revenue;

输出示例:

dayrevenuecum_avg
2024-01-0110001000.00
2024-01-0215001250.00
2024-01-0312001233.33

3. 累计计数 (Cumulative Count)

MySQL

SELECT
    order_date,
    COUNT(*) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS UNBOUNDED PRECEDING
    ) AS order_count
FROM orders;

Oracle

SELECT
    order_date,
    COUNT(*) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS order_count
FROM orders;

4. 移动平均 (Moving Average)

场景:计算最近7天的平均销售额

MySQL / Hive / MaxCompute

SELECT
    sale_date,
    amount,
    ROUND(AVG(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ), 2) AS moving_avg_7d
FROM daily_sales;

ClickHouse

SELECT
    sale_date,
    amount,
    avgSimple(amount) OVER (
        ORDER BY sale_date
        ROWS 6 PRECEDING
    ) AS moving_avg_7d
FROM daily_sales;

Oracle

SELECT
    sale_date,
    amount,
    ROUND(AVG(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ), 2) AS moving_avg_7d
FROM daily_sales;

5. 累计最大值/最小值

累计最大值

-- 截至当前行的历史最高值
SELECT
    day,
    price,
    MAX(price) OVER (
        ORDER BY day
        ROWS UNBOUNDED PRECEDING
    ) AS running_max
FROM stock_prices;

累计最小值

SELECT
    day,
    price,
    MIN(price) OVER (
        ORDER BY day
        ROWS UNBOUNDED PRECEDING
    ) AS running_min
FROM stock_prices;

6. 不同窗口帧详解

窗口帧语法

ROWS BETWEEN [frame_start] AND [frame_end]

常用帧定义

帧定义含义
ROWS UNBOUNDED PRECEDING从分区第一行到当前行
ROWS n PRECEDING往前n行到当前行
ROWS n FOLLOWING当前行往后n行
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW范围(值)
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING当前行到分区最后一行

实战:计算占比

-- 计算当日金额占累计金额的比例
SELECT
    sale_date,
    amount,
    ROUND(
        amount * 100.0 / SUM(amount) OVER (
            PARTITION BY region
            ORDER BY sale_date
            ROWS UNBOUNDED PRECEDING
        ),
        2
    ) AS pct_of_cum
FROM sales;

7. 跨年累计

MySQL / Hive

SELECT
    year,
    month,
    amount,
    SUM(amount) OVER (
        PARTITION BY year
        ORDER BY month
        ROWS UNBOUNDED PRECEDING
    ) AS ytd_amount
FROM monthly_sales;

Oracle

SELECT
    TO_CHAR(sale_date, 'YYYY') AS year,
    TO_CHAR(sale_date, 'MM') AS month,
    amount,
    SUM(amount) OVER (
        PARTITION BY TO_CHAR(sale_date, 'YYYY')
        ORDER BY TO_CHAR(sale_date, 'MM')
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS ytd_amount
FROM monthly_sales;

8. 组合示例

场景:销售报表汇总

WITH sales_data AS (
    SELECT
        region,
        sale_date,
        amount
    FROM sales
)
SELECT
    region,
    sale_date,
    amount,
    -- 累计金额
    SUM(amount) OVER (
        PARTITION BY region
        ORDER BY sale_date
        ROWS UNBOUNDED PRECEDING
    ) AS cum_amount,
    -- 累计占比
    ROUND(
        100.0 * SUM(amount) OVER (
            PARTITION BY region
            ORDER BY sale_date
            ROWS UNBOUNDED PRECEDING
        ) / SUM(amount) OVER (PARTITION BY region),
        2
    ) AS cum_pct,
    -- 移动平均(3天)
    ROUND(AVG(amount) OVER (
        PARTITION BY region
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS ma_3d
FROM sales_data;

⚠️ 注意事项

1. NULL值处理

-- NULL会被跳过,不影响累计计算
SELECT
    day,
    amount,
    SUM(amount) OVER (ORDER BY day) AS cum_amount
FROM (
    SELECT '2024-01-01' AS day, 100 AS amount
    UNION ALL SELECT '2024-01-02', NULL
    UNION ALL SELECT '2024-01-03', 150
) t;

输出:

dayamountcum_amount
2024-01-01100100
2024-01-02NULL100
2024-01-03150250

2. 默认帧行为

-- 当指定 ORDER BY 时,默认帧是 ROWS UNBOUNDED PRECEDING AND CURRENT ROW
-- 当未指定 ORDER BY 时,默认帧是整个分区

-- 指定ORDER BY
SUM(x) OVER (ORDER BY day)  -- = ROWS UNBOUNDED PRECEDING AND CURRENT ROW

-- 未指定ORDER BY
SUM(x) OVER ()  -- = ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

3. 大数据量性能

-- 优化建议:使用RANGE代替ROWS(当有重复键时)
-- Hive/MaxCompute 建议
SELECT
    day,
    amount,
    SUM(amount) OVER (
        ORDER BY day
        RANGE BETWEEN INTERVAL '6' DAY PRECEDING AND CURRENT ROW
    ) AS sum_7d
FROM daily_data;

📝 练习题

题目:计算用户消费历程

-- 建表
CREATE TABLE user_orders (
    order_id INT,
    user_id INT,
    order_date DATE,
    amount DECIMAL(10,2)
);

-- 插入数据
INSERT INTO user_orders VALUES
(1, 101, '2024-01-01', 500),
(2, 101, '2024-01-05', 300),
(3, 101, '2024-01-10', 800),
(4, 102, '2024-01-02', 200),
(5, 102, '2024-01-08', 450);

要求

  1. 按用户分组,计算每笔订单的累计消费金额
  2. 计算每次消费占该用户总消费的比例
  3. 计算相邻两次消费的间隔天数

参考SQL

SELECT
    user_id,
    order_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY user_id
        ORDER BY order_date
        ROWS UNBOUNDED PRECEDING
    ) AS cum_amount,
    ROUND(100.0 * amount / SUM(amount) OVER (PARTITION BY user_id), 2) AS order_pct,
    DATEDIFF(order_date, LAG(order_date) OVER (
        PARTITION BY user_id
        ORDER BY order_date
    )) AS days_since_last
FROM user_orders;
赞(0)
未经允许不得转载:順子の杂货铺 » 02-累计计算详解
搬瓦工VPS

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

分享创造快乐

联系我们联系我们