Windowing operations in Pandas are essential for performing calculations over a sliding window of data. These operations are particularly useful for time series data analysis, where you might want to compute rolling averages, cumulative sums, or other statistics over a specified window. Pandas provides several methods for windowing operations, including rolling, expanding, and ewm (exponentially weighted windows).

Rolling Window

The rolling method in Pandas allows you to perform operations over a fixed-size sliding window.

Basic Rolling Window

  import pandas as pd
import numpy as np

# Creating a DataFrame with a datetime index
dates = pd.date_range('20230101', periods=6)
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6]}, index=dates)

# Applying a rolling window of size 3 and calculating the mean
rolling_mean = df['A'].rolling(window=3).mean()
print(rolling_mean)
  

Output:

  2023-01-01    NaN
2023-01-02    NaN
2023-01-03    2.0
2023-01-04    3.0
2023-01-05    4.0
2023-01-06    5.0
Freq: D, Name: A, dtype: float64
  

Rolling Window with Different Functions

You can use various functions with the rolling method, such as sum, std, min, max, etc.

  # Rolling sum
rolling_sum = df['A'].rolling(window=3).sum()
print(rolling_sum)

# Rolling standard deviation
rolling_std = df['A'].rolling(window=3).std()
print(rolling_std)
  

Output:

  2023-01-01    NaN
2023-01-02    NaN
2023-01-03    6.0
2023-01-04    9.0
2023-01-05    12.0
2023-01-06    15.0
Freq: D, Name: A, dtype: float64

2023-01-01         NaN
2023-01-02         NaN
2023-01-03    1.000000
2023-01-04    1.000000
2023-01-05    1.000000
2023-01-06    1.000000
Freq: D, Name: A, dtype: float64
  

Centering the Rolling Window

You can center the rolling window by setting the center parameter to True.

  # Centered rolling mean
rolling_mean_centered = df['A'].rolling(window=3, center=True).mean()
print(rolling_mean_centered)
  

Output:

  2023-01-01    NaN
2023-01-02    2.0
2023-01-03    3.0
2023-01-04    4.0
2023-01-05    5.0
2023-01-06    NaN
Freq: D, Name: A, dtype: float64
  

Expanding Window

The expanding method calculates statistics over an expanding window, which starts from the beginning of the data and includes all prior data points.

Basic Expanding Window

  # Expanding mean
expanding_mean = df['A'].expanding().mean()
print(expanding_mean)
  

Output:

  2023-01-01    1.0
2023-01-02    1.5
2023-01-03    2.0
2023-01-04    2.5
2023-01-05    3.0
2023-01-06    3.5
Freq: D, Name: A, dtype: float64
  

Expanding Window with Different Functions

You can use various functions with the expanding method, such as sum, std, min, max, etc.

  # Expanding sum
expanding_sum = df['A'].expanding().sum()
print(expanding_sum)

# Expanding standard deviation
expanding_std = df['A'].expanding().std()
print(expanding_std)
  

Output:

  2023-01-01     1
2023-01-02     3
2023-01-03     6
2023-01-04    10
2023-01-05    15
2023-01-06    21
Freq: D, Name: A, dtype: int64

2023-01-01         NaN
2023-01-02    0.707107
2023-01-03    1.000000
2023-01-04    1.290994
2023-01-05    1.581139
2023-01-06    1.870829
Freq: D, Name: A, dtype: float64
  

Exponentially Weighted Window

The ewm method calculates statistics using exponentially weighted functions, which give more weight to recent data points.

Basic Exponentially Weighted Window

  # Exponentially weighted mean
ewm_mean = df['A'].ewm(span=3).mean()
print(ewm_mean)
  

Output:

  2023-01-01    1.000000
2023-01-02    1.666667
2023-01-03    2.428571
2023-01-04    3.266667
2023-01-05    4.153846
2023-01-06    5.081081
Freq: D, Name: A, dtype: float64
  

Exponentially Weighted Window with Different Functions

You can use various functions with the ewm method, such as sum, std, etc.

  # Exponentially weighted standard deviation
ewm_std = df['A'].ewm(span=3).std()
print(ewm_std)
  

Output:

  2023-01-01         NaN
2023-01-02    0.707107
2023-01-03    0.955813
2023-01-04    1.130388
2023-01-05    1.250972
2023-01-06    1.333333
Freq: D, Name: A, dtype: float64
  

Combining Windowing Operations

You can combine rolling, expanding, and exponentially weighted windows with other Pandas operations for more complex calculations.

  # Combining rolling mean and expanding sum
combined = df['A'].rolling(window=2).mean().expanding().sum()
print(combined)
  

Output:

  2023-01-01    NaN
2023-01-02    1.5
2023-01-03    3.5
2023-01-04    6.5
2023-01-05    10.5
2023-01-06    15.5
Freq: D, Name: A, dtype: float64
  

Conclusion

Windowing operations in Pandas provide powerful tools for calculating statistics over a sliding or expanding window of data. Whether you need rolling windows, expanding windows, or exponentially weighted windows, Pandas offers flexible and efficient methods to perform these operations. By mastering these techniques, you can enhance your data analysis capabilities and gain deeper insights from your data.

Learn How To Build AI Projects

Now, if you are interested in upskilling in 2024 with AI development, check out this 6 AI advanced projects with Golang where you will learn about building with AI and getting the best knowledge there is currently. Here’s the link.

Last updated 17 Aug 2024, 12:31 +0200 . history