In Module 04, we calculated sample sizes to avoid False Negatives (Beta). Now we need to talk about False Positives (Alpha), and the number everyone obsesses over: The P-Value.
Most marketers think: "P = 0.05 means there is a 95% chance my new design is better."
This is wrong. This misunderstanding leads companies to launch "winning" tests that actually have zero impact on revenue. Let's fix the definition.
The P-Value is not about the hypothesis. It is about the data.
Imagine you flip a coin 10 times and get 10 heads.
Is the coin rigged? Maybe.
Or did you just get super lucky with a fair coin? The P-Value calculates the odds of that "super lucky" event.
"P = 0.05 means there is a 95% probability the test version is the winner."
"P = 0.05 means there is only a 5% probability we would see this data if the test version was actually useless."
"P = 0.05 means the result is important."
"P-Value measures Surprise, not Size. A tiny lift (0.01%) can be statistically significant if you have 10 million users."
This brings us to the most expensive mistake in experimentation.
With enough traffic (Sample Size), any difference becomes statistically significant. If you test Google's blue links, changing the shade of blue might show a P-Value of 0.001. But the lift might be 0.0001%.
The Rule: Never launch a feature just because P < 0.05. Launch it only if P < 0.05 AND the Lift > MDE (Minimum Detectable Effect).
Because P-values fluctuate wildly at the beginning of a test (Law of Small Numbers), checking your dashboard every day is dangerous. If you check 10 times, your chance of seeing a False Positive rises from 5% to nearly 30%.
We will cover this extensively in Module 08: The Peeking Problem.