You have run your Bayesian sampling (Module 8) or Ridge Regression (Module 7). You don't just get one model; you often run thousands of iterations with different Adstock and Saturation parameters.
Which one is "True"? To decide, we need to act like a Referee. We judge models on three criteria: Accuracy, Stability, and Calibration.
R² tells you how well your model's line fits the historical data dots.
An R² of 0.95 means you explained 95% of the variance. Amazing, right?
Wrong. A high R² is often a sign of Overfitting. If you have enough variables (Holiday flags, 10 media channels), you can memorize the past perfectly but fail completely at predicting next week's sales. R² is useful, but never trust it alone.
Mean Absolute Percentage Error (MAPE) is what you tell your CFO. It answers: "On average, by what percentage is our prediction wrong?"
from sklearn.metrics import mean_absolute_percentage_error y_true = [100, 120, 130] y_pred = [105, 115, 140] # Result: 0.05 (5% Error) mape = mean_absolute_percentage_error(y_true, y_pred)
In standard Machine Learning, you split data randomly into Train/Test. You cannot do this with Time Series because the order matters. You cannot use next week's data to predict last week's sales.
We use a Rolling Window approach:
This is what separates "Junior" Data Scientists from "Senior" Econometricians.
Imagine your model says Facebook has an ROI of 4.0.
But last month, you ran a Geo-Lift Test (turning off Facebook ads in Ohio) and the result was an incremental ROI of 1.5.
Your model is wrong. It doesn't matter what the R² is.
Calibration involves filtering out any model iteration that deviates too far from ground-truth experiments. We calculate a "Calibration Error" metric:
def calc_calibration_error(model_roi, experimental_roi): # We penalize the model distance from the experiment return abs(model_roi - experimental_roi) / experimental_roi # Logic inside your selection loop: if calibration_error < 0.20: keep_model() else: discard_model()
Once we have selected the single best model (The "Champion" Model), we are ready to interpret the results and decompose the sales.