Yes, I think it's a huge commercial opportunity. I could use such a product too,...

noelwelsh · on Aug 1, 2011

Woah, we've been talking past each other in a big way. Let me try to clarify:

1. The setup in the bandit problem is identical to the setup in standard A/B testing as applied to web content optimisation. The only difference is that in the bandit problem you are allowed make decisions as data arrives; in A/B testing you have to wait till your experiment completes (otherwise, see "early stopping" which in fact is how the bandit problem came to be). Algorithms for the bandit problem are strictly superior to A/B testing in this setup.

2. The case you seem to be interested in is where you have n possible items to display and you display k <= n simultaneously. In A/B testing land this is known as multivariate testing. The problem comes from dependencies between the items, otherwise it just reduces to k bandit problems. Typical MVT setups assume linear relationships between items. You can do the same in a bandit setup, and this what (I think from a quick read) the arxiv paper I linked above does.

3. NHST (null hypothesis statistical testing, right?) is not more powerful than a bandit algorithm. Consider this: in your hypothesis test you have a probability of making a mistake (determined by the p-value and probability of a type II error which you only indirectly control). The expected regret is thus Pr(error) * Cost(error) * forever (once you make your decision you're stuck with it). Thus the expected regret is infinite (due to that "forever" term). If you decide instead to continue making decisions the probability of making an error rises rapidly. If you decide to control for this you're reinventing sequential design of experiments / the bandit problem.

4. I blogged about the bandit problem because it's the direct analogue of A/B testing. That doesn't mean there aren't more powerful algorithms available in the field of decision theory. If you display your k items in sequence you're doing reinforcement learning, for which there are algorithms with optimal regret bounds. I've discussed k items simultaneously above. No doubt this is a hard problem. The key idea to take away is that you have to control for your uncertainty in the correct action, something that hypothesis testing doesn't do.

That was long; I hope it sheds some light. Oh, and drop me an email -- I've love to at least ask you more questions about the kind of product you'd use.