← Back to Framework

A/B Testing Case Study: Real-Life End-to-End Example

A marketing team wants to test whether a new checkout design improves conversion rate compared to the existing design. This case study applies the Power Analysis Framework to a real-world scenario.

Part 1: Power Analysis

Before launching the test, we must determine the sample size required to trust the results.

Assumptions

Step 1: Average Conversion Rate

$$p = \frac{p_1 + p_2}{2}$$
$$p = \frac{0.05 + 0.06}{2} = 0.055$$

Step 2: Minimum Sample Size per Variant

$$ n = \frac{ \left[ Z_{1-\alpha/2} + Z_{1-\beta} \right]^2 \times p(1-p) } {(p_2 - p_1)^2} $$

Using standard Z values:

  • $Z_{1-\alpha/2} = 1.96$ (for 95% Confidence)
  • $Z_{1-\beta} = 0.84$ (for 80% Power)
$$ n = \frac{ (1.96 + 0.84)^2 \times 0.055 \times 0.945 } {(0.06 - 0.05)^2} $$ $$ n = \frac{ 7.84 \times 0.051975 } {0.0001} = 4074 $$

Conclusion: We need a minimum of ~4,100 users per variant to detect this 1% lift reliably.

Part 2: A/B Test Execution

The test ran for 2 weeks. Here is the actual data we collected.

Observed Experiment Data

Group Users (n) Conversions (x)
Control 4,200 210
Test 4,300 275

Step 1: Conversion Rates

$$p_1 = \frac{x_1}{n_1} \quad\quad p_2 = \frac{x_2}{n_2}$$
$$p_1 = \frac{210}{4200} = 0.050$$ $$p_2 = \frac{275}{4300} = 0.064$$

Step 2: Pooled Conversion Rate

$$p = \frac{x_1 + x_2}{n_1 + n_2}$$
$$p = \frac{210 + 275}{4200 + 4300} = \frac{485}{8500} = 0.057$$

Step 3: Standard Error

$$se = \sqrt{ p(1-p) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) }$$
$$ se = \sqrt{ 0.057 \times 0.943 \left( \frac{1}{4200} + \frac{1}{4300} \right) } = 0.00497 $$

Step 4: Z-Score

$$z = \frac{p_2 - p_1}{se}$$
$$z = \frac{0.064 - 0.050}{0.00497} = 2.82$$

Step 5: Statistical Decision

Since $|z| > 1.96$, the result is statistically significant.

Step 6: Lift Percentage

$$\text{lift \%} = \frac{p_2 - p_1}{p_1}$$
$$\text{lift \%} = \frac{0.064 - 0.050}{0.050} = 28\%$$

Final Inference

The new checkout design increased conversion rate by 28% (from 5.0% to 6.4%). The improvement is statistically significant at the 95% confidence level. Recommendation: The test variant should be rolled out.
Kamal Kumar