Sequentially selecting arms for A/B/n testing

A/B/n Testing with Control in the Presence of Subpopulations

We consider a finite set of distributions (called \emph{arms}), one of which is treated as a\emph{control}.We assume that the population is stratified into homogeneous subpopulations.At every time step, a subpopulation is sampled and an arm ischosen : the resulting observation is an independent draw from the arm conditioned on the subpopulation.The quality of each arm is assessed through a weighted combination of its subpopulation means.We propose a strategy for sequentially choosing one arm per time step so as to discover as fast aspossible which arms, if any, have higher weighted expectation than the control.This strategy is shown to be asymptotically optimal in the following sense : if is the first time when the strategy ensures that it is able to output the correct answer with probability at least then grows linearly with at the exact optimal rate.This rate is identified in the paper in three different settings : (1)when the experimenter does not observe the subpopulation information, (2)when the subpopulation of each sample is observed but not chosen, and (3)when the experimenter can select the subpopulation from which each response is sampled.