The World’s
Greatest
Rolling Tutorial
import pandas as pd
import plotly.express as px
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = "svg"
samples_per_day = 6
base_date_index = pd.date_range("2021-01-01", "2021-01-31", freq="D")
base_date_index
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
               '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
               '2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12',
               '2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16',
               '2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20',
               '2021-01-21', '2021-01-22', '2021-01-23', '2021-01-24',
               '2021-01-25', '2021-01-26', '2021-01-27', '2021-01-28',
               '2021-01-29', '2021-01-30', '2021-01-31'],
              dtype='datetime64[ns]', freq='D')
date_index = base_date_index.tolist() * samples_per_day
date_index.sort()
date_index[:11]
[Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-02 00:00:00', freq='D'),
 Timestamp('2021-01-02 00:00:00', freq='D'),
 Timestamp('2021-01-02 00:00:00', freq='D'),
 Timestamp('2021-01-02 00:00:00', freq='D'),
 Timestamp('2021-01-02 00:00:00', freq='D')]
categories = ["A", "A", "A", "B", "B", "C"] * 31
data = pd.DataFrame([1] * samples_per_day * 31, columns=["metric"]).assign(
    categories=categories,
    date=date_index,
)
print(data.shape)
data.head(11)
(186, 3)
metric categories date
0 1 A 2021-01-01
1 1 A 2021-01-01
2 1 A 2021-01-01
3 1 B 2021-01-01
4 1 B 2021-01-01
5 1 C 2021-01-01
6 1 A 2021-01-02
7 1 A 2021-01-02
8 1 A 2021-01-02
9 1 B 2021-01-02
10 1 B 2021-01-02
_ = data.groupby("categories", as_index=False).rolling("7D", on="date").sum()
_.head()
categories metric date
0 A 1.0 2021-01-01
1 A 2.0 2021-01-01
2 A 3.0 2021-01-01
6 A 4.0 2021-01-02
7 A 5.0 2021-01-02

For each group in the groupby, and for each window defined by rolling, pandas will roll sum all values inside of it.

2021-01-01 : 2021-01-07 is one window,

2021-01-02 : 2021-01-08 is another window…

However, it will not use the date as index, it will use the original index instead. If we have duplicated dates, we will have a problem.

px.line(_, x="date", y="metric", color="categories")

Thats not what we wanted…

%xmode controls the verbosity level for exceptions.

%xmode  minimal
Exception reporting mode: Minimal
data.groupby(["categories", "date"], as_index=False).sum().groupby(
    "categories", as_index=False
).rolling("7D", on="date").sum().head()
ValueError: date must be monotonic
_ = data.groupby(["categories", "date"], as_index=False).sum().sort_values("date").groupby(
    "categories", as_index=False
).rolling("7D", on="date").sum()
_.head()
categories date metric
0 A 2021-01-01 3.0
1 A 2021-01-02 6.0
2 A 2021-01-03 9.0
3 A 2021-01-04 12.0
4 A 2021-01-05 15.0
px.line(_, x="date", y="metric", color="categories")