累计计算详解
累计计算(Cumulative Calculation)用于计算截至当前行的累计值,是数据分析中最常用的计算模式之一。
📊 支持情况
| 数据库 | 支持情况 | 备注 |
|---|---|---|
| MySQL | ✅ 完整支持 | 8.0+支持窗口帧 |
| Oracle | ✅ 完整支持 | 使用 ROWS BETWEEN |
| ClickHouse | ✅ 完整支持 | 语法略有不同 |
| Hologres | ✅ 完整支持 | 兼容PostgreSQL |
| MaxCompute | ✅ 完整支持 | 支持窗口帧 |
| Hive | ✅ 完整支持 | 2.1+支持窗口帧 |
1. 累计求和 (Cumulative Sum)
通用语法
SUM(column) OVER (
PARTITION BY partition_col
ORDER BY order_col
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)MySQL 8.0+
SELECT
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS UNBOUNDED PRECEDING
) AS cum_amount
FROM sales;Oracle
SELECT
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cum_amount
FROM sales;ClickHouse
SELECT
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cum_amount
FROM sales;Hive / MaxCompute / Hologres
SELECT
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cum_amount
FROM sales;2. 累计均值 (Cumulative Average)
语法
AVG(column) OVER (
PARTITION BY partition_col
ORDER BY order_col
ROWS UNBOUNDED PRECEDING
)示例
-- MySQL / Hive / MaxCompute / Hologres
SELECT
day,
revenue,
ROUND(AVG(revenue) OVER (
ORDER BY day
ROWS UNBOUNDED PRECEDING
), 2) AS cum_avg
FROM daily_revenue;输出示例:
| day | revenue | cum_avg |
|---|---|---|
| 2024-01-01 | 1000 | 1000.00 |
| 2024-01-02 | 1500 | 1250.00 |
| 2024-01-03 | 1200 | 1233.33 |
3. 累计计数 (Cumulative Count)
MySQL
SELECT
order_date,
COUNT(*) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS UNBOUNDED PRECEDING
) AS order_count
FROM orders;Oracle
SELECT
order_date,
COUNT(*) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS order_count
FROM orders;4. 移动平均 (Moving Average)
场景:计算最近7天的平均销售额
MySQL / Hive / MaxCompute
SELECT
sale_date,
amount,
ROUND(AVG(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS moving_avg_7d
FROM daily_sales;ClickHouse
SELECT
sale_date,
amount,
avgSimple(amount) OVER (
ORDER BY sale_date
ROWS 6 PRECEDING
) AS moving_avg_7d
FROM daily_sales;Oracle
SELECT
sale_date,
amount,
ROUND(AVG(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS moving_avg_7d
FROM daily_sales;5. 累计最大值/最小值
累计最大值
-- 截至当前行的历史最高值
SELECT
day,
price,
MAX(price) OVER (
ORDER BY day
ROWS UNBOUNDED PRECEDING
) AS running_max
FROM stock_prices;累计最小值
SELECT
day,
price,
MIN(price) OVER (
ORDER BY day
ROWS UNBOUNDED PRECEDING
) AS running_min
FROM stock_prices;6. 不同窗口帧详解
窗口帧语法
ROWS BETWEEN [frame_start] AND [frame_end]常用帧定义
| 帧定义 | 含义 |
|---|---|
ROWS UNBOUNDED PRECEDING | 从分区第一行到当前行 |
ROWS n PRECEDING | 往前n行到当前行 |
ROWS n FOLLOWING | 当前行往后n行 |
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | 范围(值) |
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING | 当前行到分区最后一行 |
实战:计算占比
-- 计算当日金额占累计金额的比例
SELECT
sale_date,
amount,
ROUND(
amount * 100.0 / SUM(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS UNBOUNDED PRECEDING
),
2
) AS pct_of_cum
FROM sales;7. 跨年累计
MySQL / Hive
SELECT
year,
month,
amount,
SUM(amount) OVER (
PARTITION BY year
ORDER BY month
ROWS UNBOUNDED PRECEDING
) AS ytd_amount
FROM monthly_sales;Oracle
SELECT
TO_CHAR(sale_date, 'YYYY') AS year,
TO_CHAR(sale_date, 'MM') AS month,
amount,
SUM(amount) OVER (
PARTITION BY TO_CHAR(sale_date, 'YYYY')
ORDER BY TO_CHAR(sale_date, 'MM')
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS ytd_amount
FROM monthly_sales;8. 组合示例
场景:销售报表汇总
WITH sales_data AS (
SELECT
region,
sale_date,
amount
FROM sales
)
SELECT
region,
sale_date,
amount,
-- 累计金额
SUM(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS UNBOUNDED PRECEDING
) AS cum_amount,
-- 累计占比
ROUND(
100.0 * SUM(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS UNBOUNDED PRECEDING
) / SUM(amount) OVER (PARTITION BY region),
2
) AS cum_pct,
-- 移动平均(3天)
ROUND(AVG(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2) AS ma_3d
FROM sales_data;⚠️ 注意事项
1. NULL值处理
-- NULL会被跳过,不影响累计计算
SELECT
day,
amount,
SUM(amount) OVER (ORDER BY day) AS cum_amount
FROM (
SELECT '2024-01-01' AS day, 100 AS amount
UNION ALL SELECT '2024-01-02', NULL
UNION ALL SELECT '2024-01-03', 150
) t;输出:
| day | amount | cum_amount |
|---|---|---|
| 2024-01-01 | 100 | 100 |
| 2024-01-02 | NULL | 100 |
| 2024-01-03 | 150 | 250 |
2. 默认帧行为
-- 当指定 ORDER BY 时,默认帧是 ROWS UNBOUNDED PRECEDING AND CURRENT ROW
-- 当未指定 ORDER BY 时,默认帧是整个分区
-- 指定ORDER BY
SUM(x) OVER (ORDER BY day) -- = ROWS UNBOUNDED PRECEDING AND CURRENT ROW
-- 未指定ORDER BY
SUM(x) OVER () -- = ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING3. 大数据量性能
-- 优化建议:使用RANGE代替ROWS(当有重复键时)
-- Hive/MaxCompute 建议
SELECT
day,
amount,
SUM(amount) OVER (
ORDER BY day
RANGE BETWEEN INTERVAL '6' DAY PRECEDING AND CURRENT ROW
) AS sum_7d
FROM daily_data;📝 练习题
题目:计算用户消费历程
-- 建表
CREATE TABLE user_orders (
order_id INT,
user_id INT,
order_date DATE,
amount DECIMAL(10,2)
);
-- 插入数据
INSERT INTO user_orders VALUES
(1, 101, '2024-01-01', 500),
(2, 101, '2024-01-05', 300),
(3, 101, '2024-01-10', 800),
(4, 102, '2024-01-02', 200),
(5, 102, '2024-01-08', 450);要求:
- 按用户分组,计算每笔订单的累计消费金额
- 计算每次消费占该用户总消费的比例
- 计算相邻两次消费的间隔天数
参考SQL:
SELECT
user_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY user_id
ORDER BY order_date
ROWS UNBOUNDED PRECEDING
) AS cum_amount,
ROUND(100.0 * amount / SUM(amount) OVER (PARTITION BY user_id), 2) AS order_pct,
DATEDIFF(order_date, LAG(order_date) OVER (
PARTITION BY user_id
ORDER BY order_date
)) AS days_since_last
FROM user_orders;
順子の杂货铺

