← Back to Curriculum
Module 06

The Invisible Forces

Feature Engineering Part III: Control Variables. Separating "Sales that just happened" from "Sales we drove."

In Modules 4 and 5, we mathematically transformed our Media variables (Spend, Impressions). But if you run a model using only media to predict sales, the results will be laughable.

Why? Because Media is not the only thing that drives sales. In fact, for most mature brands, Media drives only 10% - 20% of sales. The other 80% is the "Baseline."

We need to create variables for the invisible forces: Seasonality, Holidays, and Macroeconomics.

1. Modeling Seasonality (The Fourier Series)

If you sell winter coats, sales go up in November. That is not because your ads are genius; it is because it is cold. If you don't control for this, your model will think your November ads are 5x more effective than your July ads.

Instead of creating 12 dummy variables (one for each month), advanced MMM uses Fourier Terms (Sin and Cos waves) to create a smooth seasonality curve.

import numpy as np

# Create sine and cosine waves to mimic yearly cycles
# period = 52.18 for weekly data
df['sin_year'] = np.sin(2 * np.pi * df['week_index'] / 52.18)
df['cos_year'] = np.cos(2 * np.pi * df['week_index'] / 52.18)
Why Smooth Waves?
Monthly dummies change abruptly from Jan 31 to Feb 1. Real seasonality is smooth. Fourier terms allow the model to learn a gentle rise and fall over the year.

2. Hard Spikes: Holidays & Events

Holidays are "shocks" to the system. Black Friday, Christmas, Singles Day. These need to be modeled as binary flags (Dummy Variables).

The Trap: Don't just flag the day of the holiday. Flag the lead-up. People buy gifts before Christmas, not on Christmas day.

# 1. Create a binary flag
df['is_black_friday'] = 0
df.loc[df.date == '2023-11-24', 'is_black_friday'] = 1

# 2. Add a 'Lead Up' flag if needed
# This helps capture the shopping frenzy the week prior
df['is_pre_xmas'] = df.date.apply(
    lambda x: 1 if (x.month == 12 and x.day < 25) else 0
)

3. The Macro Context

Sometimes sales drop because the economy is bad, or a competitor launched a huge promo. If you have the data, add these columns to your ABT.

4. Feature Engineering Checklist

Before we move to the Modeling Phase (Module 7), your DataFrame should now have:

We have mathematically represented reality. Now, we are ready to find the coefficients.

Previous Module ← The Ceiling Effect