Module 04

Power Analysis

The first law of experimentation: You must define your risk tolerance before you roll the dice.

In the previous modules, we defined what we want to test and how big of a win we expect (MDE). Now, we must calculate exactly how much data we need to prove it.

This process is called Power Analysis. It serves one purpose: to prevent you from running an "Underpowered Test"—a test where you have zero chance of detecting a win, even if one exists.

1. The Four Variables of Power

Calculating sample size is not magic. It is a mathematical relationship between four variables. If you change one, the others must shift.

Baseline Conversion Rate ($p_1$): How does the Control version perform today?
Minimum Detectable Effect (MDE): The lift you want to detect (from Module 03).
Significance Level ($\alpha$): Your tolerance for False Positives.
Statistical Power ($1 - \beta$): Your tolerance for False Negatives.

2. Understanding Errors: Alpha vs. Beta

In statistics, we are never 100% sure. We deal in probabilities. There are two ways we can be wrong:

Reality \ Decision	We Say "Test Won"	We Say "Test Failed"
Test Actually Won	True Positive (Power)	False Negative (Type II Error / $\beta$)
Test Actually Failed	False Positive (Type I Error / $\alpha$)	True Negative

Alpha ($\alpha$): The False Alarm

This is the risk of saying "We won!" when actually, nothing changed. In business, this is bad because you roll out a feature that does nothing (or hurts you).
Standard Industry Practice: 5% ($\alpha = 0.05$). This corresponds to 95% Confidence.

Beta ($\beta$): The Missed Opportunity

This is the risk of saying "No impact" when actually, the test did work. We missed it because we didn't have enough data to see through the noise.
Standard Industry Practice: 20% ($\beta = 0.20$). This means we have 80% Power ($1 - \beta$).

3. The Sample Size Formula

This is the engine behind every A/B testing calculator. It tells you $n$, the sample size required per variation.

$$ n = \frac{2 \cdot (Z_{1-\alpha/2} + Z_{1-\beta})^2 \cdot p(1-p)}{(p_2 - p_1)^2} $$

Where:

$Z_{1-\alpha/2}$ = Z-score for confidence (1.96 for 95%).
$Z_{1-\beta}$ = Z-score for power (0.84 for 80%).
$p$ = Pooled conversion rate (average of Control and Test).
$p_2 - p_1$ = The absolute difference you want to detect (MDE).

4. Practical Application

Why does this formula matter? Because the term $(p_2 - p_1)^2$ is in the denominator. This mathematically proves why small MDEs require massive samples.

If you halve the MDE (e.g., from 2% to 1%), the denominator shrinks by a factor of 4 ($0.5^2 = 0.25$). This means your required sample size quadruples.

Next Step: Once we have collected this sample size, we calculate the P-Value. But relying solely on P-Values is dangerous. We explore why in the next module.

Previous Module ← Minimum Detectable Effect Next Module The P-Value Trap: Misconceptions →