← Back to Curriculum
Module 02

The Data Landscape

Garbage in, garbage out. Constructing the Analytical Base Table (ABT) is 80% of the work.

Before we can apply any advanced mathematics, we must structure our data. In digital attribution (MTA), we deal with event logs that are millions of rows long. In MMM, we must compress this universe into a neat, wide-format time-series dataset known as the Analytical Base Table (ABT).

This module focuses on the granularity, the variable selection, and the SQL aggregation logic required to build this table.

1. The Granularity Debate: Daily vs. Weekly

The first decision you must make is the time resolution of your model.

2. Anatomy of the ABT

Your final table must contain three distinct categories of columns. If you miss one category, your model will fail (Omitted Variable Bias).

A. The Dependent Variable (Y)

What are you trying to predict? Usually Revenue, but it could be New Conversions or App Installs. Ensure this is "Net" Revenue (after returns) if possible.

B. Media Variables (The X's)

For every channel (Facebook, TV, Search, TikTok), you need a metric of intensity.
Tip: Use Impressions or GRPs where possible, as they represent the "eyeballs" seeing the ad. Use Spend if impressions are unreliable. Do not use Clicks (clicks are an outcome, not an input).

C. Context & Controls (The Z's)

These explain the variations in sales that marketing didn't cause.

3. The SQL Aggregation

Here is how you transform raw transactional logs and marketing spend tables into the ABT format using SQL.

-- Step 1: Aggregate Sales to Weekly Level
WITH weekly_sales AS (
    SELECT 
        DATE_TRUNC('week', transaction_date) AS week_start,
        SUM(revenue_amount) AS total_revenue,
        COUNT(DISTINCT order_id) AS total_orders
    FROM raw_orders
    GROUP BY 1
),

-- Step 2: Aggregate Media Spend to Weekly Level
weekly_media AS (
    SELECT 
        DATE_TRUNC('week', date) AS week_start,
        SUM(CASE WHEN channel = 'Facebook' THEN spend ELSE 0 END) AS fb_spend,
        SUM(CASE WHEN channel = 'TV' THEN spend ELSE 0 END) AS tv_spend,
        SUM(CASE WHEN channel = 'Google_Search' THEN impressions ELSE 0 END) AS search_imps
    FROM marketing_spend_table
    GROUP BY 1
)

-- Step 3: Join to create the ABT
SELECT
    s.week_start,
    s.total_revenue,
    m.fb_spend,
    m.tv_spend,
    m.search_imps,
    -- Adding a dummy variable for Black Friday
    CASE WHEN EXTRACT(MONTH FROM s.week_start) = 11 AND ... THEN 1 ELSE 0 END AS is_black_friday
FROM weekly_sales s
LEFT JOIN weekly_media m ON s.week_start = m.week_start
ORDER BY 1 ASC;

4. The Final Output

Your resulting DataFrame in Python should look exactly like this. This is the dataset we will feed into our models in the upcoming modules.

Week_Start Revenue (Y) FB_Spend TV_Spend Search_Imps Price_Index
2023-01-01 $145,000 $12,500 $50,000 1.2M 1.0
2023-01-08 $132,000 $10,000 $45,000 1.1M 1.0
2023-01-15 $160,000 $15,000 $60,000 1.4M 0.9
... ... ... ... ... ...

Critical Logic Check

Before proceeding, ask yourself: "Are there zeros?" If you turned off Facebook ads for a week, that row should show 0, not NULL. Models hate nulls. Fill your nulls with zero.

Previous Module ← The Econometric Mindset