In Module 7, we saw that Ridge Regression can reduce noise, but it can still produce results that defy logic. If your TV spend was flat for two years while sales grew, Ridge Regression might say: "TV contributed $0."
But you know TV works. You have seen the brand lift studies. You have 30 years of marketing theory that says it works.
This is where Bayesian Statistics wins. Instead of asking the data to speak for itself (Frequentist), we combine the data with our "Prior Beliefs."
Bayes' Theorem can be summarized for MMM as:
A "Prior" is a distribution. It is how you tell the model what is possible before it sees a single row of data.
In Bayesian MMM, we almost always force Positive Priors for media (using a Half-Normal or Gamma distribution), because spending money on ads should not destroy sales.
We use probabilistic programming libraries like PyMC (used by HelloFresh, Google) to build these models. It looks different from scikit-learn.
import pymc as pm # Define the "Probabilistic Context" with pm.Model() as mmm_model: # 1. Define Priors (The Beliefs) # We use HalfNormal to force positive coefficients for media beta_fb = pm.HalfNormal('beta_fb', sigma=1) beta_tv = pm.HalfNormal('beta_tv', sigma=2) # We think TV might have higher variance intercept = pm.Normal('alpha', mu=0, sigma=10) # 2. Define the Linear Relationship # Sales = Intercept + Beta * Spend mu = intercept + (beta_fb * fb_spend_data) + (beta_tv * tv_spend_data) # 3. Define the Likelihood (The Data Fit) # Assuming sales are normally distributed around the prediction sigma = pm.HalfNormal('sigma', sigma=1) y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=sales_data) # 4. Hit the "Magic Button" (MCMC Sampling) # The computer runs thousands of simulations to find the Posterior trace = pm.sample(draws=2000, chains=4)
When you run model.predict() in scikit-learn, you get a single number.
When you run a Bayesian model, you get a Distribution.
It won't say: "ROI is 1.5."
It will say: "There is a 95% probability that ROI is between 1.3 and 1.7."
This is incredibly powerful for executive presentations. It allows you to talk about Confidence Intervals and risk. "We are confident that spending more on Facebook will yield at least 1.3 ROI."