A/A Testing

The Sanity Check. Before you test a new feature, you must test your own testing platform.

In Experimentation, we obsess over A/B tests (Control vs. Variant). But the most important test you will ever run is an A/A Test.

An A/A Test splits traffic exactly like an A/B test (50/50), but there is zero difference between the two groups. Everyone sees the Control experience.

Why would you waste traffic on this? Because if your tool reports a "Statistically Significant Winner" in an A/A test, you know your tool is broken.

1. Why Run an A/A Test?

A/A tests are diagnostic tools. They validate three critical components of your infrastructure:

Randomization Logic: Is the traffic actually splitting 50/50? Or is the "Treatment" group getting more mobile users by accident?
Data Pipeline: Are events from both groups reaching the data warehouse reliably?
Statistical Engine: Is the P-Value calculation correct?

2. Interpreting the Results

When you run an A/A test, you are testing the Null Hypothesis ($H_0$) where you know the Null is true.

Scenario 1: Expected

You run the test. The result shows "No Significant Difference" (p > 0.05). The conversion rates are nearly identical.

Result: PASS ✅

Scenario 2: The Failure

You run the test. The tool says "Variant A2 is the Winner!" with 99% confidence.

Result: FAIL ❌ (Bias Detected)

The 5% Rule: Even in a perfect A/A test, you expect to see a False Positive 5% of the time (if $\alpha = 0.05$). However, if you run 20 A/A tests and 10 of them are "significant," your platform is inflating errors.

3. When to Run A/A Tests?

You don't need to run an A/A test every week. That is a waste of traffic. You should run them at specific milestones:

Implementation: When you first install a new tool (e.g., Optimizely, VWO, or a home-grown Python script).
Integration: When you change your analytics provider (e.g., moving from GA4 to Mixpanel).
Debugging: If you see a result that looks impossible (e.g., "Changing button color increased revenue by 500%"), run an A/A test to check for technical glitches.

4. The Hidden Benefit: Baseline Measurement

A/A tests are also great for calculating the Baseline Variance ($\sigma$). Knowing exactly how much your metric naturally bounces around helps you calculate Sample Size (Module 04) more accurately.

Now that our engine is calibrated, we are ready to run real tests. But running the test is only half the battle. We must ensure the traffic stays balanced. This leads us to the most common error in execution: Sample Ratio Mismatch (SRM).

Previous Module ← The P-Value Trap Next Module Phase 3: Sample Ratio Mismatch (SRM) →