← Back to Series Home

Bonus Chapter: Automating the Engine | Module 12

1. The Hook

The champagne corks were popping at Ujvi Candles. We had done it.

Over the last 11 weeks, we had taken them from "Excel Hell" to a sophisticated Marketing Mix Model. The CEO was happy. The CFO was optimizing budgets. The Marketing team finally stopped fighting.

Then, Monday morning hit.

"Hey," the CEO slacked me. "Where is the attribution report for last week?"

I froze. To get that report, I had to:

  1. Download CSVs from Facebook, Google, and Shopify.
  2. Clean them on my laptop (Module 2).
  3. Run the Python Markov script (Module 6).
  4. Copy-paste the results into Tableau.

It took me four hours.

"This isn't a solution," I realized. "This is a science experiment."

We had built a Ferrari, but we were pushing it down the highway by hand. If I got hit by a bus tomorrow, Ujvi's marketing intelligence would die with me.

We needed to turn this manual process into an automated Data Product. We needed to build the factory.

2. The Concept (The Assembly Line)

Here is what I tell my clients: Data Science is useless without Data Engineering.

Think of Modules 1-11 as building a prototype car in a garage. We used hand tools (Pandas on a laptop) and manual parts (CSVs).

Now, we need to build the Assembly Line that manufactures this insight every single day, automatically, while we sleep.

We call this the Modern Data Stack (MDS). It has four stations:

Modern Data Stack Architecture Diagram
⚠️ Image Not Found
Please ensure the file is named exactly: blog 12.1.jpg

The Architecture Diagram

[ Sources ] [ Ingestion ] [ Storage ] [ Modeling ] [ Activation ] ( Facebook ) ➔ [ Fivetran ] ➔ ( Snowflake ) ➔ [ dbt ] ➔ ( Tableau ) ( Google ) (Trucks) (Raw Vault) (SQL Logic) (Dashboard) ( Shopify ) ⬇ [ Python / Airflow ] (Markov / MMM Logic) ⬇ [ Hightouch ] ➔ ( Facebook ) (Reverse ETL) (Bid Optimization)

3. The Technical Solution (The Workflow)

"We are retiring the CSVs," I told Ujvi's team. "Here is how the new engine runs."

Step 1: Ingestion (Fivetran/Airbyte)
Instead of me downloading files, we set up Fivetran. It connects to the Facebook Ads API and the Shopify API. Every night at 12:00 AM, it wakes up, grabs the new data, and dumps it into Snowflake (our Cloud Data Warehouse).

Step 2: Transformation (dbt)
Remember those complex SQL queries from Module 5 (The Pathing Logic)? We don't run those manually anymore. We put them into dbt (data build tool).

dbt runs the SQL inside Snowflake. It handles dependencies. It knows it can't run the "Markov Model" until the "Session Table" is finished.

SQL: dbt Incremental Model
-- models/attribution/marketing_touches.sql
-- This runs automatically every night

{{ config(materialized='incremental') }}

SELECT 
    user_id,
    timestamp,
    channel,
    -- dbt handles the complex logic we wrote in Blog 2
    CASE 
        WHEN source = 'ig' THEN 'Social'
        WHEN source = 'google' THEN 'Search'
    END as channel_group
FROM {{ source('raw', 'clicks') }}
WHERE timestamp > (select max(timestamp) from {{ this }})

Step 3: The Python Handoff (Airflow/Dagster)
SQL is great, but it can't run the Markov Chain (Module 6) or the Logistic Regression (Module 8). For that, we need Python.

We use an Orchestrator like Airflow.

Step 4: Reverse ETL (Hightouch)
This is the secret weapon. We don't just show the data to humans; we show it to the robots.

Using Hightouch, we push the "Predicted High-Value Users" (from Module 8) back into Facebook Ads as a "Custom Audience."
Now, Facebook isn't just targeting random people; it's targeting people our model said were likely to buy.

4. The Real Data Scenario

"Does this really matter?" the CEO asked. "It sounds expensive."

"It's Black Friday," I said.

Black Friday Automated Dashboard
⚠️ Image Not Found
Please ensure the file is named exactly: blog 12.2.jpg

The Manual Way (Old Ujvi):
Traffic spikes 10x. My laptop crashes trying to process 5 million rows in Pandas. The dashboard is frozen. You have no idea which ad is working until Monday. By then, the sale is over.

The Automated Way (New Ujvi):

He makes the decision from his phone while eating Thanksgiving dinner. That is the power of the Modern Data Stack.

5. The Reality Check (Build vs. Buy)

This stack (Snowflake + dbt + Airflow + Hightouch) is powerful. It is also complex.

"Should we build all this?" Ujvi asked. "Or just buy a tool like Northbeam, Triple Whale, or Rockerbox?"

Here is my Consultant's Verdict:

Option A: Buy (The Black Box)
Pros: Setup takes 1 day. Dashboards are pretty. No engineering needed.
Cons: You don't own the logic. If they say "TikTok is 3.0x ROAS," you have to trust them. You can't customize the lookback window or the Markov weights.
Who it's for: Companies spending <$1M/year with no data team.

Option B: Build (The Owned Stack)
Pros: You own the data. You own the logic. You can tweak the model for your specific business (e.g., accounting for returns, margins, or offline stores).
Cons: You need an Analytics Engineer. It breaks if you don't maintain it.
Who it's for: Companies spending >$5M/year or those with unique business models (Subscription, B2B, Omni-channel).

For Ujvi:
Since they are growing fast and launching Retail (Offline), we went with a Hybrid. We bought Fivetran and Snowflake (infrastructure), but we built the Logic (dbt) ourselves. We own the brain; we rent the servers.

6. Engagement

The Series Wrap-Up

We have traveled a long road together.
We started by realizing that "Marketing Attribution is Broken".
We learned that Last Click is a lie and Linear is a guess.
We built Markov Chains to map the network and Game Theory to reward the team.
We zoomed out to MMM to measure the invisible and verified it all with Incrementality Tests.
And finally, today, we automated it all so you can actually take a vacation.

You are no longer just a Marketer or an Analyst. You are an Attribution Architect.

Over to you: Which chapter was your favorite? The SQL basics? The Python models? Or the strategy wars? Drop a comment below, and let's connect on LinkedIn!

(Download the full code repository for this series in the link below)

End of Series