Date | Gadget | Forecast |
---|---|---|
date | str | i16 |
2025-01-01 | "Headphones" | 3439 |
2025-01-01 | "Keyboard" | 1652 |
2025-01-01 | "Monitor" | 311 |
2025-01-01 | "Mouse" | 1139 |
2025-01-01 | "Printer" | 123 |
… | … | … |
2025-12-01 | "Mouse" | 1385 |
2025-12-01 | "Printer" | 166 |
2025-12-01 | "Smartwatch" | 678 |
2025-12-01 | "Tablet" | 496 |
2025-12-01 | "Webcam" | 512 |
What I learned about group by dynamic in polars while working on a client’s project
In the last client project I worked on, I learned something about the group_by_dynamic
function in Polars. While what I learned was surprising, the fact that I learned it during the project was not. This aligns with the philosophy of “let the work be the practice” that Cal Newport advocates, and I’m proud to say I follow it. Most people spend time learning about a particular technology before they use it in a project. Cal Newport’s philosophy suggests combining learning with doing. By practicing through actual work, you gain mastery. By the end of this project, my proficiency in Polars increased dramatically.
A brief overview of the project
The client had forecast data for 2025 for electronic gadgets in monthly buckets, but she wanted it converted into weekly buckets. The dates in the data she presented all started at the beginning of each month. For instance, the first row contained the date January 1, 2025. Below is the forecast data from the client.
How we solved the problem
At first, I thought I could solve the problem by simply dividing each forecast value by 7, the number of days in a week. However, a colleague quickly reminded me that not all months are created equal—some have more days than others. So, I quickly abandoned that approach and searched for an alternative.
The solution that worked involved creating, from scratch, a single-column dataframe containing all the days of the year 2025, from January to December. We joined this dataframe with the client’s forecast data and then applied group_by_dynamic
. It worked like a charm, but it also exposed something I hadn’t been fully aware of regarding group_by_dynamic
.
Creating the date dataframe.
Using datetime_range
, I created a dataframe containing timeseries values for the entire year of 2025. The interval was set to 1 day, ensuring that every single date in the year is included. Below is the resulting timeseries dataframe. Then I extracted the month values from the dates to create a new column Month.
from datetime import datetime
= pl.DataFrame(
date_df
{"Date": pl.datetime_range(
=datetime(2025, 1, 1),
start=datetime(2025, 12, 31),
end="1d",
interval=True,
eager
)
}'Date').dt.date(),
).with_columns(pl.col(=pl.col('Date').dt.month())
Month date_df
Date | Month |
---|---|
date | i8 |
2025-01-01 | 1 |
2025-01-02 | 1 |
2025-01-03 | 1 |
2025-01-04 | 1 |
2025-01-05 | 1 |
… | … |
2025-12-27 | 12 |
2025-12-28 | 12 |
2025-12-29 | 12 |
2025-12-30 | 12 |
2025-12-31 | 12 |
Joining the two dataframes
Before joining the two DataFrames, I converted the date values in the forecast dataset to month-only values. This ensured a unique common value between both dataframes, enabling the join to proceed. The dataframe below illustrates the transformation from date values to month-only values.
= (data
df 'Date').dt.month())
.with_columns(pl.col(
) df
Date | Gadget | Forecast |
---|---|---|
i8 | str | i16 |
1 | "Headphones" | 3439 |
1 | "Keyboard" | 1652 |
1 | "Monitor" | 311 |
1 | "Mouse" | 1139 |
1 | "Printer" | 123 |
… | … | … |
12 | "Mouse" | 1385 |
12 | "Printer" | 166 |
12 | "Smartwatch" | 678 |
12 | "Tablet" | 496 |
12 | "Webcam" | 512 |
The two dataframes were joined using a left join. Also, I decided to divide the forecast values by 4 since most months have at least 4 weeks Here’s an example of the resulting dataframe for Headphones showing the monthly and weekly forecast.
(date_df='Month', right_on='Date', how='left')
.join(df, left_on'Month')
.drop(=pl.col('Forecast').truediv(4).round(0).cast(pl.Int16))
.with_columns(Weekly_Forecast'Date')
.sort(filter(pl.col('Gadget') == "Headphones")
. )
Date | Gadget | Forecast | Weekly_Forecast |
---|---|---|---|
date | str | i16 | i16 |
2025-01-01 | "Headphones" | 3439 | 860 |
2025-01-02 | "Headphones" | 3439 | 860 |
2025-01-03 | "Headphones" | 3439 | 860 |
2025-01-04 | "Headphones" | 3439 | 860 |
2025-01-05 | "Headphones" | 3439 | 860 |
… | … | … | … |
2025-12-27 | "Headphones" | 2992 | 748 |
2025-12-28 | "Headphones" | 2992 | 748 |
2025-12-29 | "Headphones" | 2992 | 748 |
2025-12-30 | "Headphones" | 2992 | 748 |
2025-12-31 | "Headphones" | 2992 | 748 |
Converting to weekly buckets
The dates are still in days, but the client wants them in weeks so I used groub_by_dynamic
with an interval of 7 days to convert them into weekly buckets. Below is the resulting dataframe.
from great_tables import loc, style
= (date_df
processed_df ='Month', right_on='Date', how='left')
.join(df, left_on'Month')
.drop(=pl.col('Forecast').truediv(4).round(0).cast(pl.Int16))
.with_columns(Weekly_Forecast'Date')
.sort(filter(pl.col('Gadget') == "Headphones")
.'Date', every='7d')
.group_by_dynamic('Gadget', 'Forecast', 'Weekly_Forecast'))
.agg(pl.first(
)
0:10].style.tab_style(
processed_df["yellow"),
style.fill(
loc.body(=pl.col("Date").dt.year() == 2024,
rows
), )
Date | Gadget | Forecast | Weekly_Forecast |
---|---|---|---|
2024-12-26 | Headphones | 3439 | 860 |
2025-01-02 | Headphones | 3439 | 860 |
2025-01-09 | Headphones | 3439 | 860 |
2025-01-16 | Headphones | 3439 | 860 |
2025-01-23 | Headphones | 3439 | 860 |
2025-01-30 | Headphones | 3439 | 860 |
2025-02-06 | Headphones | 2620 | 655 |
2025-02-13 | Headphones | 2620 | 655 |
2025-02-20 | Headphones | 2620 | 655 |
2025-02-27 | Headphones | 2620 | 655 |
Do you notice the peculiarity that group_by_dynamic
introduces? None of our original DataFrames contained the year 2024, yet after using group_by_dynamic
, we now see 2024. What’s going on here? I was initially unaware of this behavior. It turns out that group_by_dynamic
shifts date values by the interval specified in the every
parameter. Since we used a 7-day interval, the date values were moved 7 days back, causing the appearance of 2024.
But wait—the client specifically needs forecast data for 2025. How can we address this? Thankfully, the developers of Polars anticipated this issue and provided a solution. As outlined in the documentation, adding the start_by
parameter with the value "datapoint"
to group_by_dynamic
resolves the problem. With this adjustment, the year 2024 disappeared entirely.
(date_df='Month', right_on='Date', how='left')
.join(df, left_on'Month')
.drop(filter(pl.col('Gadget') == 'Monitor')
.=pl.col('Forecast').truediv(4).round(0).cast(pl.Int16))
.with_columns(Weekly_Forecast'Date')
.sort('Date', every='7d', start_by='datapoint')
.group_by_dynamic('Gadget', 'Forecast', 'Weekly_Forecast'))
.agg(pl.first( )
Date | Gadget | Forecast | Weekly_Forecast |
---|---|---|---|
date | str | i16 | i16 |
2025-01-01 | "Monitor" | 311 | 78 |
2025-01-08 | "Monitor" | 311 | 78 |
2025-01-15 | "Monitor" | 311 | 78 |
2025-01-22 | "Monitor" | 311 | 78 |
2025-01-29 | "Monitor" | 311 | 78 |
… | … | … | … |
2025-12-03 | "Monitor" | 325 | 81 |
2025-12-10 | "Monitor" | 325 | 81 |
2025-12-17 | "Monitor" | 325 | 81 |
2025-12-24 | "Monitor" | 325 | 81 |
2025-12-31 | "Monitor" | 325 | 81 |
With the date issue resolved, we can now proceed to develop the code needed to create the final dataset for presentation to the client.
= df['Gadget'].unique().to_list()
gadget_list
= []
bucket_dfs for gadget in gadget_list:
= (date_df
bucket_df ='Month', right_on='Date', how='left')
.join(df, left_on'Month')
.drop(filter(pl.col('Gadget') == gadget)
.=pl.col('Forecast').truediv(4).round(0).cast(pl.Int16))
.with_columns(Weekly_Forecast'Date')
.sort('Date', every='7d', start_by='datapoint')
.group_by_dynamic('Gadget', 'Forecast', 'Weekly_Forecast'))
.agg(pl.first(
)
bucket_dfs.append(bucket_df)
= pl.concat(bucket_dfs).drop('Forecast')
all_bucket_df all_bucket_df
Date | Gadget | Weekly_Forecast |
---|---|---|
date | str | i16 |
2025-01-01 | "Mouse" | 285 |
2025-01-08 | "Mouse" | 285 |
2025-01-15 | "Mouse" | 285 |
2025-01-22 | "Mouse" | 285 |
2025-01-29 | "Mouse" | 285 |
… | … | … |
2025-12-03 | "Printer" | 42 |
2025-12-10 | "Printer" | 42 |
2025-12-17 | "Printer" | 42 |
2025-12-24 | "Printer" | 42 |
2025-12-31 | "Printer" | 42 |
The lesson to remember is that group_by_dynamic
will move the date values back by the specified interval you set in the every
parameter. If you want to maintain the date values in your dataset, you must add ananother parameter start_at
and set it to "datapoint"
.
Reach out if you need help with your data problems. Also, take a look at our Polars course to improve your data analysis skills using this fast Python library.