Module 07

Linear vs. The World

Why Standard Regression (OLS) fails in Marketing Analytics, and why Ridge Regression is the first line of defense.

You have built a perfect dataset. You have engineered features for Adstock and Saturation. You are ready to run model.fit().

If you use standard Ordinary Least Squares (OLS) regression, you will likely get a result that says: "Facebook has a coefficient of -0.5."

-0.5? That implies that for every dollar you spend on Facebook, you lose 50 cents in revenue. Unless your ads are offensive, this is impossible. This happens because OLS is "unbiased"—it tries to fit the training data perfectly, even if the result makes no business sense.

1. The Problem: Multicollinearity & Overfitting

Marketing data is messy. TV spend and Search spend often move together. OLS struggles to separate them. It might assign a huge positive number to TV (+5.0) and a negative number to Search (-2.0) just to balance the equation mathematically.

This is called Overfitting. The model has low "Bias" (it fits the training data well) but high "Variance" (it crashes on new data and fails logic tests).

2. The Solution: Regularization (Ridge Regression)

To fix this, we need to introduce a "Penalty." We tell the model: "I want you to fit the data, BUT I will punish you if you use coefficients that are too large."

This is Ridge Regression (L2 Regularization). It changes the goal of the model from:

Minimize (Error) 2

To:

Minimize (Error) 2 + λ(Sum of Coefficients) 2

Lambda (λ) is the penalty strength.
- If λ is 0, it behaves like OLS.
- If λ is high, it shrinks all coefficients towards zero, reducing the wild swings caused by multicollinearity.

3. Python Implementation

We use scikit-learn to implement Ridge Regression. We also need to enforce Positive Coefficients (because marketing impact should almost always be positive).

from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV

# Define the model with a positivity constraint 
# (Note: standard Ridge in sklearn doesn't force positivity easily, 
# so we often use libraries like CVXPY or specialized args in newer versions)

model = Ridge(alpha=1.0) # alpha is the Lambda parameter

# We usually run a Grid Search to find the best Alpha/Lambda
param_grid = {'alpha': [0.01, 0.1, 1, 10, 100]}

grid_search = GridSearchCV(model, param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)

print(f"Best Penalty Strength: {grid_search.best_params_}")

The Positivity Problem Standard Ridge Regression reduces variance, but it does NOT guarantee positive coefficients. It might just make the negative coefficient smaller (e.g., from -0.5 to -0.1).

To strictly enforce logic (Facebook > 0), we need to graduate to the next level: Bayesian Modeling.

4. The Bias-Variance Tradeoff

By using Ridge Regression, we are accepting a small amount of Bias (our model is slightly "damped") in exchange for a massive reduction in Variance.

The result is a model that is more stable, more predictive on future data, and less likely to give you crazy results like "-$50 ROI on Search."

But Ridge is still purely mathematical. It doesn't know that "Brand Search" should have a higher ROI than "Display Ads." For that, we need to inject human knowledge. That brings us to Bayesian methods.

Previous Module ← The Invisible Forces Next Module The Bayesian Revolution: Priors & Posteriors →