The World’s
Greatest
Rolling Tutorial
import pandas as pd
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"
pio.renderers.default = "svg"
samples_per_day = 6
base_date_index = pd.date_range("2021-01-01", "2021-01-31", freq="D")
base_date_index
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
'2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
'2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12',
'2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16',
'2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20',
'2021-01-21', '2021-01-22', '2021-01-23', '2021-01-24',
'2021-01-25', '2021-01-26', '2021-01-27', '2021-01-28',
'2021-01-29', '2021-01-30', '2021-01-31'],
dtype='datetime64[ns]', freq='D')
date_index = base_date_index.tolist() * samples_per_day
date_index.sort()
date_index[:11]
[Timestamp('2021-01-01 00:00:00', freq='D'),
Timestamp('2021-01-01 00:00:00', freq='D'),
Timestamp('2021-01-01 00:00:00', freq='D'),
Timestamp('2021-01-01 00:00:00', freq='D'),
Timestamp('2021-01-01 00:00:00', freq='D'),
Timestamp('2021-01-01 00:00:00', freq='D'),
Timestamp('2021-01-02 00:00:00', freq='D'),
Timestamp('2021-01-02 00:00:00', freq='D'),
Timestamp('2021-01-02 00:00:00', freq='D'),
Timestamp('2021-01-02 00:00:00', freq='D'),
Timestamp('2021-01-02 00:00:00', freq='D')]
categories = ["A", "A", "A", "B", "B", "C"] * 31
data = pd.DataFrame([1] * samples_per_day * 31, columns=["metric"]).assign(
categories=categories,
date=date_index,
)
print(data.shape)
data.head(11)
|
metric |
categories |
date |
0 |
1 |
A |
2021-01-01 |
1 |
1 |
A |
2021-01-01 |
2 |
1 |
A |
2021-01-01 |
3 |
1 |
B |
2021-01-01 |
4 |
1 |
B |
2021-01-01 |
5 |
1 |
C |
2021-01-01 |
6 |
1 |
A |
2021-01-02 |
7 |
1 |
A |
2021-01-02 |
8 |
1 |
A |
2021-01-02 |
9 |
1 |
B |
2021-01-02 |
10 |
1 |
B |
2021-01-02 |
_ = data.groupby("categories", as_index=False).rolling("7D", on="date").sum()
_.head()
|
categories |
metric |
date |
0 |
A |
1.0 |
2021-01-01 |
1 |
A |
2.0 |
2021-01-01 |
2 |
A |
3.0 |
2021-01-01 |
6 |
A |
4.0 |
2021-01-02 |
7 |
A |
5.0 |
2021-01-02 |
For each group in the groupby, and for each window defined by rolling, pandas will roll sum all values inside of it.
2021-01-01 : 2021-01-07 is one window,
2021-01-02 : 2021-01-08 is another window…
However, it will not use the date as index, it will use the original index instead. If we have duplicated dates, we will have a problem.
px.line(_, x="date", y="metric", color="categories")
Thats not what we wanted…

%xmode
controls the verbosity level for exceptions.
Exception reporting mode: Minimal
data.groupby(["categories", "date"], as_index=False).sum().groupby(
"categories", as_index=False
).rolling("7D", on="date").sum().head()
ValueError: date must be monotonic
_ = data.groupby(["categories", "date"], as_index=False).sum().sort_values("date").groupby(
"categories", as_index=False
).rolling("7D", on="date").sum()
_.head()
|
categories |
date |
metric |
0 |
A |
2021-01-01 |
3.0 |
1 |
A |
2021-01-02 |
6.0 |
2 |
A |
2021-01-03 |
9.0 |
3 |
A |
2021-01-04 |
12.0 |
4 |
A |
2021-01-05 |
15.0 |
px.line(_, x="date", y="metric", color="categories")
