In the previous modules, we defined what we want to test and how big of a win we expect (MDE). Now, we must calculate exactly how much data we need to prove it.
This process is called Power Analysis. It serves one purpose: to prevent you from running an "Underpowered Test"—a test where you have zero chance of detecting a win, even if one exists.
Calculating sample size is not magic. It is a mathematical relationship between four variables. If you change one, the others must shift.
In statistics, we are never 100% sure. We deal in probabilities. There are two ways we can be wrong:
| Reality \ Decision | We Say "Test Won" | We Say "Test Failed" |
|---|---|---|
| Test Actually Won | True Positive (Power) |
False Negative (Type II Error / $\beta$) |
| Test Actually Failed | False Positive (Type I Error / $\alpha$) |
True Negative |
This is the risk of saying "We won!" when actually, nothing changed. In business, this is bad because you roll out a feature that does nothing (or hurts you).
Standard Industry Practice: 5% ($\alpha = 0.05$). This corresponds to 95% Confidence.
This is the risk of saying "No impact" when actually, the test did work. We missed it because we didn't have enough data to see through the noise.
Standard Industry Practice: 20% ($\beta = 0.20$). This means we have 80% Power ($1 - \beta$).
This is the engine behind every A/B testing calculator. It tells you $n$, the sample size required per variation.
Where:
Why does this formula matter? Because the term $(p_2 - p_1)^2$ is in the denominator. This mathematically proves why small MDEs require massive samples.
If you halve the MDE (e.g., from 2% to 1%), the denominator shrinks by a factor of 4 ($0.5^2 = 0.25$). This means your required sample size quadruples.
Next Step: Once we have collected this sample size, we calculate the P-Value. But relying solely on P-Values is dangerous. We explore why in the next module.