Power Analysis & A/B Testing Execution Framework
Modern A/B testing is only reliable when experiments are designed with clear statistical guardrails. This framework covers the complete flow—from power analysis to final experiment inference.
Part 1: Power Analysis
Power analysis is the first foundational step before running an A/B test. It defines the acceptable error levels and determines the minimum sample size required to make statistically valid decisions.
How Experiment Errors Are Defined
| Reality \ Decision | Detect Effect | No Effect Detected |
|---|---|---|
| Effect Exists | True Positive | False Negative (β) |
| No Effect Exists | False Positive (α) | True Negative |
- α (alpha): probability of a false positive (Type I error)
- β (beta): probability of a false negative (Type II error)
- Power = $1 - \beta$
Step 1: Define Error Thresholds
Before running the experiment, we decide how much uncertainty we are willing to tolerate:
- α controls the risk of concluding the test works when it does not (typically set to 0.05).
- β controls the risk of missing a real improvement (typically set to 0.20, giving 80% Power).
Step 2: Estimate Expected Conversion Rates
Let:
- $p_1$ = expected conversion rate of control
- $p_2$ = expected conversion rate of test
Average conversion rate:
Step 3: Calculate Minimum Sample Size
Using predefined values of α, β, and expected lift, the minimum sample size required per variant is:
Part 2: A/B Testing Execution (BITS)
Once power analysis is complete, the next step is to execute the A/B test correctly using a statistically valid comparison between control and test groups.
Step 1: Validate Sample Size
- $n_1$ = sample size of control
- $n_2$ = sample size of test
Both $n_1$ and $n_2$ must be greater than or equal to the required sample size $n$.
Step 2: Define the Outcome Metric
- $x_1$ = number of conversions in control
- $x_2$ = number of conversions in test
Step 3: Calculate Conversion Rates
Step 4: Define the Test Objective
The goal is to determine whether the observed difference ($p_2 - p_1$) is statistically significant or due to random variation.
Step 5: Calculate the Pooled Conversion Rate
Step 6: Calculate the Standard Error
Step 7: Calculate the Z-Score
Step 8: Statistical Decision
- $|z| \ge 1.96$ → statistically significant (Reject Null Hypothesis)
- $|z| < 1.96$ → likely due to chance (Fail to Reject Null)
Step 9: Calculate Lift Percentage
Step 10: Final Inference
Using statistical significance, direction of impact, and lift percentage, we determine whether the test variant should be adopted, rejected, or iterated further.