rolling

The World’s
Greatest
Rolling Tutorial

import pandas as pd
import plotly.express as px
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = "svg"
samples_per_day = 6
base_date_index = pd.date_range("2021-01-01", "2021-01-31", freq="D")
base_date_index

DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
               '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
               '2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12',
               '2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16',
               '2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20',
               '2021-01-21', '2021-01-22', '2021-01-23', '2021-01-24',
               '2021-01-25', '2021-01-26', '2021-01-27', '2021-01-28',
               '2021-01-29', '2021-01-30', '2021-01-31'],
              dtype='datetime64[ns]', freq='D')

date_index = base_date_index.tolist() * samples_per_day
date_index.sort()
date_index[:11]

[Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-01 00:00:00', freq='D'),
 Timestamp('2021-01-02 00:00:00', freq='D'),
 Timestamp('2021-01-02 00:00:00', freq='D'),
 Timestamp('2021-01-02 00:00:00', freq='D'),
 Timestamp('2021-01-02 00:00:00', freq='D'),
 Timestamp('2021-01-02 00:00:00', freq='D')]

categories = ["A", "A", "A", "B", "B", "C"] * 31

data = pd.DataFrame([1] * samples_per_day * 31, columns=["metric"]).assign(
    categories=categories,
    date=date_index,
)
print(data.shape)
data.head(11)

(186, 3)

	metric	categories	date
0	1	A	2021-01-01
1	1	A	2021-01-01
2	1	A	2021-01-01
3	1	B	2021-01-01
4	1	B	2021-01-01
5	1	C	2021-01-01
6	1	A	2021-01-02
7	1	A	2021-01-02
8	1	A	2021-01-02
9	1	B	2021-01-02
10	1	B	2021-01-02

_ = data.groupby("categories", as_index=False).rolling("7D", on="date").sum()
_.head()

	categories	metric	date
0	A	1.0	2021-01-01
1	A	2.0	2021-01-01
2	A	3.0	2021-01-01
6	A	4.0	2021-01-02
7	A	5.0	2021-01-02

For each group in the groupby, and for each window defined by rolling, pandas will roll sum all values inside of it.

2021-01-01 : 2021-01-07 is one window,

2021-01-02 : 2021-01-08 is another window…

However, it will not use the date as index, it will use the original index instead. If we have duplicated dates, we will have a problem.

px.line(_, x="date", y="metric", color="categories")

Thats not what we wanted…

%xmode controls the verbosity level for exceptions.

%xmode  minimal

Exception reporting mode: Minimal

data.groupby(["categories", "date"], as_index=False).sum().groupby(
    "categories", as_index=False
).rolling("7D", on="date").sum().head()

ValueError: date must be monotonic

_ = data.groupby(["categories", "date"], as_index=False).sum().sort_values("date").groupby(
    "categories", as_index=False
).rolling("7D", on="date").sum()
_.head()

	categories	date	metric
0	A	2021-01-01	3.0
1	A	2021-01-02	6.0
2	A	2021-01-03	9.0
3	A	2021-01-04	12.0
4	A	2021-01-05	15.0

px.line(_, x="date", y="metric", color="categories")