Module 09

Novelty & Primacy Effects

Time-based biases. Why the first week of data is almost always lying to you.

In Module 08, we learned not to peek. But even if you wait for the sample size to hit, the data itself might be skewed by user psychology.

Users are creatures of habit. When you change an interface, they don't react neutrally. They react emotionally.

1. The Novelty Effect (The "Shiny Object" Syndrome)

If you change a grey button to a bright orange button, clicks will almost certainly go up in the first week. Is it because orange is better? No.

It's because regular users noticed something changed. Their curiosity drove the click. Once the novelty wears off (usually after 1-2 weeks), their behavior regresses to the mean.

The Danger: If you run a 1-week test, you will declare the Orange Button a winner. You will roll it out, and next month revenue will drop back to baseline.

2. The Primacy Effect (Change Aversion)

This is the opposite problem. If you redesign a complex dashboard (like Salesforce or Gmail), productivity will drop immediately.

Users have "muscle memory." They click where the button used to be. When you move it, they get frustrated. The new design might be objectively better, but the initial data will show a massive loss.

Novelty Effect

What it looks like: Huge positive lift early on, then a slow decline toward zero.

Common in: Retail, Media, Simple UI changes.

Primacy Effect

What it looks like: Huge negative drop early on, followed by a slow recovery and climb.

Common in: SaaS, B2B, Workflow tools.

3. Visualizing the Stabilization

To detect these effects, you should plot the Cumulative Lift over Time.

If the line is zig-zagging or sloping heavily downward after 7 days, your test has not stabilized. You cannot call a winner yet.

4. How to Mitigate These Effects

A. Cohort Analysis (New vs. Returning)

New Users do not have muscle memory. They have never seen the old site. They are immune to Primacy and Novelty effects.

Strategy: Segment your test results.
- If New Users love the design (positive lift) but Returning Users hate it (negative lift), it is likely a Primacy Effect. The Returning Users will eventually learn.

B. Run Longer Tests

Always run tests for at least two full business cycles (usually 2 weeks). This allows the novelty to wear off.

C. The "Burn-In" Period

Some advanced teams ignore the first 3-5 days of data entirely. They treat it as a "warm-up" period and only calculate significance on data collected from Day 6 onwards.

We have now covered the core Frequentist approach to A/B testing. But there is another way—a way that allows us to speak in probabilities rather than P-Values. This brings us to Bayesian A/B Testing.

Previous Module ← The Peeking Problem Next Module Phase 4: Bayesian A/B Testing →