Before we can apply any advanced mathematics, we must structure our data. In digital attribution (MTA), we deal with event logs that are millions of rows long. In MMM, we must compress this universe into a neat, wide-format time-series dataset known as the Analytical Base Table (ABT).
This module focuses on the granularity, the variable selection, and the SQL aggregation logic required to build this table.
The first decision you must make is the time resolution of your model.
Your final table must contain three distinct categories of columns. If you miss one category, your model will fail (Omitted Variable Bias).
What are you trying to predict? Usually Revenue, but it could be New Conversions or App Installs. Ensure this is "Net" Revenue (after returns) if possible.
For every channel (Facebook, TV, Search, TikTok), you need a metric of intensity.
Tip: Use Impressions or GRPs where possible, as they represent the "eyeballs" seeing the ad. Use Spend if impressions are unreliable. Do not use Clicks (clicks are an outcome, not an input).
These explain the variations in sales that marketing didn't cause.
Here is how you transform raw transactional logs and marketing spend tables into the ABT format using SQL.
-- Step 1: Aggregate Sales to Weekly Level WITH weekly_sales AS ( SELECT DATE_TRUNC('week', transaction_date) AS week_start, SUM(revenue_amount) AS total_revenue, COUNT(DISTINCT order_id) AS total_orders FROM raw_orders GROUP BY 1 ), -- Step 2: Aggregate Media Spend to Weekly Level weekly_media AS ( SELECT DATE_TRUNC('week', date) AS week_start, SUM(CASE WHEN channel = 'Facebook' THEN spend ELSE 0 END) AS fb_spend, SUM(CASE WHEN channel = 'TV' THEN spend ELSE 0 END) AS tv_spend, SUM(CASE WHEN channel = 'Google_Search' THEN impressions ELSE 0 END) AS search_imps FROM marketing_spend_table GROUP BY 1 ) -- Step 3: Join to create the ABT SELECT s.week_start, s.total_revenue, m.fb_spend, m.tv_spend, m.search_imps, -- Adding a dummy variable for Black Friday CASE WHEN EXTRACT(MONTH FROM s.week_start) = 11 AND ... THEN 1 ELSE 0 END AS is_black_friday FROM weekly_sales s LEFT JOIN weekly_media m ON s.week_start = m.week_start ORDER BY 1 ASC;
Your resulting DataFrame in Python should look exactly like this. This is the dataset we will feed into our models in the upcoming modules.
| Week_Start | Revenue (Y) | FB_Spend | TV_Spend | Search_Imps | Price_Index |
|---|---|---|---|---|---|
| 2023-01-01 | $145,000 | $12,500 | $50,000 | 1.2M | 1.0 |
| 2023-01-08 | $132,000 | $10,000 | $45,000 | 1.1M | 1.0 |
| 2023-01-15 | $160,000 | $15,000 | $60,000 | 1.4M | 0.9 |
| ... | ... | ... | ... | ... | ... |
Before proceeding, ask yourself: "Are there zeros?" If you turned off Facebook ads for a week, that row should show 0, not NULL. Models hate nulls. Fill your nulls with zero.