Windowing Operations in Pandas
In this section, you will be learning more about Dataframes, how to load data into one and how to perform operations.
Windowing operations in Pandas are essential for performing calculations over a sliding window of data. These operations are particularly useful for time series data analysis, where you might want to compute rolling averages, cumulative sums, or other statistics over a specified window. Pandas provides several methods for windowing operations, including rolling
, expanding
, and ewm
(exponentially weighted windows).
Rolling Window
The rolling
method in Pandas allows you to perform operations over a fixed-size sliding window.
Basic Rolling Window
import pandas as pd
import numpy as np
# Creating a DataFrame with a datetime index
dates = pd.date_range('20230101', periods=6)
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6]}, index=dates)
# Applying a rolling window of size 3 and calculating the mean
rolling_mean = df['A'].rolling(window=3).mean()
print(rolling_mean)
Output:
2023-01-01 NaN
2023-01-02 NaN
2023-01-03 2.0
2023-01-04 3.0
2023-01-05 4.0
2023-01-06 5.0
Freq: D, Name: A, dtype: float64
Rolling Window with Different Functions
You can use various functions with the rolling
method, such as sum
, std
, min
, max
, etc.
# Rolling sum
rolling_sum = df['A'].rolling(window=3).sum()
print(rolling_sum)
# Rolling standard deviation
rolling_std = df['A'].rolling(window=3).std()
print(rolling_std)
Output:
2023-01-01 NaN
2023-01-02 NaN
2023-01-03 6.0
2023-01-04 9.0
2023-01-05 12.0
2023-01-06 15.0
Freq: D, Name: A, dtype: float64
2023-01-01 NaN
2023-01-02 NaN
2023-01-03 1.000000
2023-01-04 1.000000
2023-01-05 1.000000
2023-01-06 1.000000
Freq: D, Name: A, dtype: float64
Centering the Rolling Window
You can center the rolling window by setting the center
parameter to True
.
# Centered rolling mean
rolling_mean_centered = df['A'].rolling(window=3, center=True).mean()
print(rolling_mean_centered)
Output:
2023-01-01 NaN
2023-01-02 2.0
2023-01-03 3.0
2023-01-04 4.0
2023-01-05 5.0
2023-01-06 NaN
Freq: D, Name: A, dtype: float64
Expanding Window
The expanding
method calculates statistics over an expanding window, which starts from the beginning of the data and includes all prior data points.
Basic Expanding Window
# Expanding mean
expanding_mean = df['A'].expanding().mean()
print(expanding_mean)
Output:
2023-01-01 1.0
2023-01-02 1.5
2023-01-03 2.0
2023-01-04 2.5
2023-01-05 3.0
2023-01-06 3.5
Freq: D, Name: A, dtype: float64
Expanding Window with Different Functions
You can use various functions with the expanding
method, such as sum
, std
, min
, max
, etc.
# Expanding sum
expanding_sum = df['A'].expanding().sum()
print(expanding_sum)
# Expanding standard deviation
expanding_std = df['A'].expanding().std()
print(expanding_std)
Output:
2023-01-01 1
2023-01-02 3
2023-01-03 6
2023-01-04 10
2023-01-05 15
2023-01-06 21
Freq: D, Name: A, dtype: int64
2023-01-01 NaN
2023-01-02 0.707107
2023-01-03 1.000000
2023-01-04 1.290994
2023-01-05 1.581139
2023-01-06 1.870829
Freq: D, Name: A, dtype: float64
Exponentially Weighted Window
The ewm
method calculates statistics using exponentially weighted functions, which give more weight to recent data points.
Basic Exponentially Weighted Window
# Exponentially weighted mean
ewm_mean = df['A'].ewm(span=3).mean()
print(ewm_mean)
Output:
2023-01-01 1.000000
2023-01-02 1.666667
2023-01-03 2.428571
2023-01-04 3.266667
2023-01-05 4.153846
2023-01-06 5.081081
Freq: D, Name: A, dtype: float64
Exponentially Weighted Window with Different Functions
You can use various functions with the ewm
method, such as sum
, std
, etc.
# Exponentially weighted standard deviation
ewm_std = df['A'].ewm(span=3).std()
print(ewm_std)
Output:
2023-01-01 NaN
2023-01-02 0.707107
2023-01-03 0.955813
2023-01-04 1.130388
2023-01-05 1.250972
2023-01-06 1.333333
Freq: D, Name: A, dtype: float64
Combining Windowing Operations
You can combine rolling, expanding, and exponentially weighted windows with other Pandas operations for more complex calculations.
# Combining rolling mean and expanding sum
combined = df['A'].rolling(window=2).mean().expanding().sum()
print(combined)
Output:
2023-01-01 NaN
2023-01-02 1.5
2023-01-03 3.5
2023-01-04 6.5
2023-01-05 10.5
2023-01-06 15.5
Freq: D, Name: A, dtype: float64
Conclusion
Windowing operations in Pandas provide powerful tools for calculating statistics over a sliding or expanding window of data. Whether you need rolling windows, expanding windows, or exponentially weighted windows, Pandas offers flexible and efficient methods to perform these operations. By mastering these techniques, you can enhance your data analysis capabilities and gain deeper insights from your data.
Learn How To Build AI Projects
Now, if you are interested in upskilling in 2024 with AI development, check out this 6 AI advanced projects with Golang where you will learn about building with AI and getting the best knowledge there is currently. Here’s the link.
Last updated 17 Aug 2024, 12:31 +0200 .