You launch a test on Monday.
On Tuesday, you check the results. "Variant B is up 15%! It's significant!"
On Wednesday, it drops to 2%. "Not significant anymore."
On Thursday, it's up 8%. "Significant again! Let's stop the test and declare a winner."
This is called Peeking (or "Continuous Monitoring"), and it is statistical malpractice. If you stop a test the moment it becomes significant, you are cherry-picking the data.
Every time you calculate a P-Value, you are rolling the dice. There is a 5% chance of getting a False Positive (Alpha = 0.05).
If you check the test once at the end, your error rate is 5%.
If you check the test every day, you are rolling the dice over and over again. Your cumulative probability of finding a "fake winner" explodes.
(Where k is the number of times you peek)
| Number of Peeks | False Positive Probability |
|---|---|
| 1 (Fixed Horizon) | 5.0% |
| 2 | 9.8% |
| 5 | 22.6% |
| 10 | 40.1% |
If you check your dashboard 10 times during a test, there is a 40% chance you will see a significant result even if the test is actually flat. If you stop the test then, you have deployed a loser.
The simplest solution requires discipline.
Pros: Statistically valid. Easy to explain.
Cons: Painful. If a test is a huge winner (lift +50%), you still have to wait 2 weeks to "prove" it.
What if you need to peek? (e.g., stopping bad tests early). You can use Sequential Probability Ratio Testing (SPRT).
This is what modern tools like Optimizely and Eppo use. Instead of keeping the significance bar flat at 95% (Z=1.96), they raise the bar significantly at the start of the test and lower it over time.
This "Moving Goalpost" (or Alpha Spending Function) allows you to check the results every day without inflating your error rate. It essentially "spends" a little bit of your error budget each time you peek.
You ran the test without peeking. You found a winner! But... is it a real winner, or is it just the Novelty Effect?
In the next module, we discuss why users click on bright shiny new things, and why those clicks often disappear after 2 weeks.