In most companies, A/B testing is treated like a slot machine. Marketing teams throw random ideas at the website—"Make the button green!", "Change the headline!", "Add a carousel!"—and hope one of them hits the jackpot.
This is not experimentation; this is gambling. And just like gambling, the house (random variance) usually wins.
To run a mature experimentation program, you must stop testing ideas and start testing hypotheses. An idea is a suggestion. A hypothesis is a falsifiable statement rooted in data.
Before any test is approved for the roadmap, it must pass the structure test. If you cannot fill in these three blanks, you are not ready to launch.
Why this works:
Statistically, we never prove that the new version (B) is better. We simply try to disprove that it is the same.
This is the default state of the universe. It assumes your brilliant new design has zero effect.
This is what we hope to find. It states that there is a statistically significant difference between the two variations.
In A/B testing, the "Innocent until proven Guilty" principle applies. We assume the Null Hypothesis is true until the data screams otherwise. We require a p-value < 0.05 to reject the Null, meaning there is less than a 5% chance the result was just luck.
When configuring your test engine, you will often be asked: "Is this a one-tailed or two-tailed test?"
My Recommendation: Always use Two-Sided tests. In business, knowing you broke something (negative lift) is just as valuable as knowing you improved it.
Now that we have a hypothesis ("Conversion will rise by 2%"), we face a measurement problem. Which conversion rate? Clicks? Purchases? Revenue per user? This leads us to Metric Selection.