You want to run a test. Your Product Manager asks: "How long will it take?"
To answer that, you need to ask them a question back: "How big of a win are you expecting?"
This is the Minimum Detectable Effect (MDE). It is the smallest lift (e.g., 5%, 10%) that you care about detecting. If the true lift is smaller than your MDE, your test will likely fail to see it (it will look like "noise").
There is an inverse, exponential relationship between MDE and Sample Size.
- Detecting a Big Lift (20%) is easy. You can see it with a few hundred users.
- Detecting a Tiny Lift (1%) is hard. You need a microscope (millions of users).
| Desired MDE | Sample Required (Per Variant) | Duration (@ 1k users/day) |
|---|---|---|
| 20% | 3,000 | 3 Days |
| 10% | 12,000 | 12 Days |
| 5% | 48,000 | 48 Days |
| 1% | 1,200,000 | 3.2 Years (Impossible) |
*Assuming 5% Baseline Conversion, 80% Power, 95% Confidence.
Look at the table above. If your team insists on validating a 1% improvement, you will be running that test for 3 years.
This is why big swings are better than small tweaks. A radical redesign might yield a 20% lift (detectable in 3 days). Changing a button color might yield a 0.5% lift (detectable in never).
Strategy Rule: If you don't have millions of visitors (like Amazon or Google), you cannot afford to hunt for small wins. You must swing for big MDEs (10%+).
How do you choose the MDE number? It shouldn't be a guess. It should be based on Business Value.
Ask: "What is the smallest lift that justifies the cost of building this feature?"
Be careful with language.
- Absolute Lift: Conversion goes from 5% to 6% (a "1% absolute increase").
- Relative Lift: Conversion goes from 5% to 6% (a "20% relative increase").
Most A/B testing tools (and this Masterclass) use Relative MDE. When you input "5%", you mean a 5% improvement over the baseline (e.g., 5.0% -> 5.25%).