← Back to Series Home

Mapping Journeys | Module 05

1. The Hook

I was sitting at a dual-monitor setup at Datagrad HQ, watching Ujvi Candles’ new Junior Analyst try to force a square peg into a round hole.

"The model keeps crashing," he groaned. "I'm trying to feed the marketing_touches table into a Logistic Regression model to predict the next purchase, but it says the dimensions are wrong."

I looked at his screen. He was feeding in the raw logs—one row per click.

"That's because you're feeding the model Ingredients," I said, "when it's asking for a Cake."

"Algorithms don't think in rows," I explained. "They think in Journeys."

Right now, Ujvi's data is transactional. It tells us that a click happened. It doesn't tell us the story of how that click relates to the one before it. If we want to move from "Reporting" (looking back) to "Data Science" (looking forward), we have to fundamentally change the shape of our data.

2. The "Aha!" Moment

Here is what I tell my clients: We need to map the DNA of the Sale.

Data Bricks Visual
⚠️ Image Not Found
Please ensure the file is named exactly: blog 5.1.png

Imagine our data is a pile of loose Lego bricks.

If you hand a pile of bricks to a machine, it just sees a pile. It doesn't know if you are building a castle or a car.

We need to assemble those bricks into a single structure. We need to transpose the data from Vertical (Time) to Horizontal (Story).

Current State (Vertical):
Row 1: SEO
Row 2: Social

Target State (Horizontal):
"SEO > Social"

Once we have this "Path String," we can treat the entire journey as a single data point. This is the atomic unit of Marketing Data Science.

3. The Solution (The Code)

Here is the SQL architecture I built for Ujvi today. We are using a technique called String Aggregation.

In MySQL, the function is GROUP_CONCAT. (If you are on Postgres/Redshift, use STRING_AGG or LISTAGG).

SQL: Journey Flattening
/* Author: Datagrad Lead Architect
   Goal: Flatten user journeys into single strings
*/
SELECT 
    TRANS_ID,
    ORDER_ID,
    -- The Magic Function
    GROUP_CONCAT(CHANNEL ORDER BY TOUCH_SEQUENCE SEPARATOR ' > ') as journey_path,
    
    -- Keep the conversion value
    MAX(ORDER_VALUE) as revenue
FROM marketing_touches
GROUP BY TRANS_ID, ORDER_ID;

Why this schema works:

4. The Real Data Scenario

"Show me," the Junior Analyst said.

We pulled Order #40808323 again. This is our "Problem Child" with 10 touchpoints.

Vertical to Horizontal Transformation
⚠️ Image Not Found
Please ensure the file is named exactly: blog 5.2.png

The Before: Transactional Rows (Vertical)

This is what the database stores.

SEQUENCE CHANNEL TIME 1 SEO 2025-12-04 2 Direct 2025-12-11 3 Direct 2025-12-18 ... 10 Direct 2025-12-27

The Transformation
We run the GROUP_CONCAT query...
⚙️ PROCESSING...

The After: The Journey String (Horizontal)

This is what the algorithm sees.

ORDER_ID JOURNEY_PATH REVENUE 40808323 "SEO > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Direct" $25.00

The Narrative:
Now, instead of 10 fragmented row entries, we have 1 unique "Story" for this customer. We have captured the Repetitive Nature of their behavior in a single text string.

5. The Reality Check

"Okay," the Analyst said. "But that string is ugly. It's just 'Direct' over and over again."

"Exactly," I said. "And that's a pattern."

By flattening the data, we can now count the unique paths. We might find that Ujvi has 5,000 orders that look exactly like "SEO > Direct > Direct".

If we know that this specific "DNA String" converts at 5%, but the string "Social > Direct" only converts at 1%, we have unlocked the Holy Grail of marketing efficiency.

We can finally answer the hardest question in attribution: "What happens if we remove the SEO step?"

To answer that, we need to stop looking at strings and start doing math. We need Markov Chains.

6. Next Steps & Interaction

We have prepared the ingredients. The cake is ready to be baked. In the next post, we will feed these strings into a probabilistic model to see which channels are actually doing the heavy lifting.

Over to you: Have you ever tried to run a model on raw logs and failed? What's your favorite SQL function for text manipulation: GROUP_CONCAT or STRING_AGG? Vote below.

Continue The Series