For the last 9 modules, we have been using Frequentist Statistics (P-Values, Power, Confidence Intervals). This is the industry standard, but it has a major flaw: it is unintuitive.
When you tell a CEO "P = 0.04," they translate it to "96% chance of winning." As we learned in Module 05, that translation is mathematically wrong.
Bayesian Statistics fixes this. It answers the question everyone actually wants to ask.
| Frequentist (Traditional) | Bayesian (Modern) |
|---|---|
| The Question: "Assuming there is zero difference, how weird is this data?" | The Question: "Given this data, what is the probability B is better than A?" |
| The Output: P-Value (Binary: Yes/No) | The Output: Probability Density (e.g., "92% chance of winning") |
| Peeking: Illegal. (Inflates error). | Peeking: Allowed. (Results update in real-time). |
In Bayesian testing, we don't start with a blank slate. We start with a Prior (usually a weak assumption that "we know nothing").
As data comes in (Successes vs. Failures), we update our belief. This creates a Posterior Distribution for both A and B.
We then run a simulation (Monte Carlo) to compare these distributions. If B beats A in 92% of the simulations, we say: "Probability to Be Best = 92%."
Bayesian methods give us something Frequentist methods cannot: Expected Loss.
It tells you: "If you are wrong and B is actually worse, how much will you lose?"
Decision Rule: Stop the test when Probability > 95% AND Expected Loss < Threshold.
It is computationally expensive. Calculating P-values is instant algebra. Calculating Bayesian posteriors requires running 100,000 simulations per second.
However, modern tools (VWO, Google Optimize, Eppo) have switched to Bayesian engines because it allows Product Managers to make faster decisions without understanding complex statistics.
Bayesian testing leads naturally into the next evolution of experimentation: Multi-Armed Bandits, where we don't just measure; we optimize in real-time.