Bayesian A/B Testing

Forget P-Values. Start asking: "What is the probability this version is actually better?"

For the last 9 modules, we have been using Frequentist Statistics (P-Values, Power, Confidence Intervals). This is the industry standard, but it has a major flaw: it is unintuitive.

When you tell a CEO "P = 0.04," they translate it to "96% chance of winning." As we learned in Module 05, that translation is mathematically wrong.

Bayesian Statistics fixes this. It answers the question everyone actually wants to ask.

1. Frequentist vs. Bayesian

Frequentist (Traditional)	Bayesian (Modern)
The Question: "Assuming there is zero difference, how weird is this data?"	The Question: "Given this data, what is the probability B is better than A?"
The Output: P-Value (Binary: Yes/No)	The Output: Probability Density (e.g., "92% chance of winning")
Peeking: Illegal. (Inflates error).	Peeking: Allowed. (Results update in real-time).

2. How It Works (The Beta Distribution)

In Bayesian testing, we don't start with a blank slate. We start with a Prior (usually a weak assumption that "we know nothing").

As data comes in (Successes vs. Failures), we update our belief. This creates a Posterior Distribution for both A and B.

We then run a simulation (Monte Carlo) to compare these distributions. If B beats A in 92% of the simulations, we say: "Probability to Be Best = 92%."

3. The "Risk" Metric (Expected Loss)

Bayesian methods give us something Frequentist methods cannot: Expected Loss.

It tells you: "If you are wrong and B is actually worse, how much will you lose?"

92% Chance of Win

Risk: If we are wrong, Expected Loss is only -0.01% (Negligible).

Decision Rule: Stop the test when Probability > 95% AND Expected Loss < Threshold.

4. Why Isn't Everyone Using This?

It is computationally expensive. Calculating P-values is instant algebra. Calculating Bayesian posteriors requires running 100,000 simulations per second.

However, modern tools (VWO, Google Optimize, Eppo) have switched to Bayesian engines because it allows Product Managers to make faster decisions without understanding complex statistics.

Bayesian testing leads naturally into the next evolution of experimentation: Multi-Armed Bandits, where we don't just measure; we optimize in real-time.

Previous Module ← Novelty & Primacy Effects Next Module Multi-Armed Bandits: Explore vs. Exploit →