BLISS Seminar: Learning with Low Approximate Regret with Partial Feedback

Seminar | May 7 | 3-4 p.m. | 540 Cory Hall

 Eva Tardos

 Electrical Engineering and Computer Sciences (EECS)

We consider the adversarial multi-armed bandit problem with partial feedback, minimizing a non-negative loss function using the graph based feedback framework introduced by Mannor and Shamir in 2011. We offer algorithms that attain small loss bounds, as well as low approximate regret against a shifting comparator.

Classical learning algorithms add a low level of uniform noise to the algorithm’s choice to limit the variance of the loss estimator used in importance sampling, which also helps the algorithm to shift to a new arm fast enough when the comparator changes. However, such an approach poses significant hurdles to proving small-loss or low approximate regret bounds. We show that a different general technique of freezing arms, rather than adding random noise, does much better in this setting. The idea of freezing arms was proposed by Allenberg et al. in 2006 in the context of bandit learning with multiplicative weights. We show the broad applicability of this technique by extending it to partial information feedback (via a novel dual freezing thresholding technique), and to shifting comparators. This is joint work with Thodoris Lykouris and Karthik Sridharan.