Module 11

Multi-Armed Bandits

"Earn while you learn." Why wait for statistical significance when you can just start sending traffic to the winner today?

In a standard A/B test, you send 50% of your traffic to the loser for the entire duration of the test. That is a lot of lost revenue (or "Regret").

What if you could dynamically shift traffic? What if, as soon as Variant B started looking good, the system automatically sent 60%, then 70%, then 90% of traffic to it?

This is the Multi-Armed Bandit (MAB) problem. It is named after slot machines ("one-armed bandits"). If you have 10 slot machines, and one pays out more often, how fast can you figure it out and switch all your coins to that one machine?

1. Explore vs. Exploit

Every algorithm faces this fundamental dilemma:

🔍 Explore

Pulling a lever I haven't tried much, just to gather information. I might lose money, but I gain knowledge.

💰 Exploit

Pulling the lever I currently think is the best. I maximize immediate revenue, but I learn nothing new.

A/B Testing is 100% Explore for 2 weeks, then 100% Exploit forever.
Bandit Testing blends them together.

2. Common Algorithms

Epsilon-Greedy

The simplest approach. You flip a coin.
- 90% of the time, choose the current winner (Exploit).
- 10% of the time, choose a random option (Explore).

Thompson Sampling (The Bayesian Approach)

This uses the probability distributions we learned in Module 10. The algorithm samples a random value from each variation's posterior distribution and picks the highest one.

Thompson Logic: Probability Matching If Variant A has a 70% chance of being the best, it gets 70% of the traffic.
If Variant B has a 30% chance of being the best, it gets 30% of the traffic.

As data collects, the winner eventually gets 99.9% of traffic automatically.

3. When to use Bandits vs. A/B Tests?

Bandits sound perfect. Why do we still use A/B tests? Because Bandits optimize, but they don't teach.

Use Bandits for: Short-lived campaigns (Black Friday offers), News Headlines, Ad Optimization. The goal is purely Conversion Maximization. You don't care "why" it won; you just want the money now.
Use A/B Tests for: Product features, UI redesigns, Pricing. The goal is Knowledge. You need to ensure the result is stable and not just a Novelty Effect.

4. The Future: Contextual Bandits

Standard Bandits treat everyone the same. Contextual Bandits use machine learning to personalize.

"For User X (iOS, Evening), Button A is best. For User Y (Android, Morning), Button B is best." This is the technology behind Netflix recommendations and TikTok feeds.

Previous Module ← Bayesian A/B Testing Next Module Building the Platform: The XP Stack →