Webcast: https://yale.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=913dd449-6682-429a-9251-b3b700f4d41f
Title: The statistical price of few updates: efficiency and adaptivity in batched contextual bandits
Sequential decision-making is central to modern statistics, with applications ranging from clinical trials to online recommendation systems. Classical theory typically assumes that policies can be updated after every observation. In many real-world experiments, however, updates are restricted to a small number of discrete time points, leading to batching constraints.
This raises a fundamental question: how much statistical efficiency is lost when updates are rare, and how many batches are needed to achieve optimal learning? In this talk, we address this question within the framework of contextual bandits with smooth reward functions. We begin with a success story: when the margin parameter is known, we show that only $\log\log T$ batches suffice to attain the minimax regret rates of the fully online setting. This result demonstrates that surprisingly limited adaptivity can yield optimal performance. We then turn to the more subtle and practically relevant case where the margin parameter is unknown. In contrast to the fully online regime, where adaptation is free, we show that batching incurs a provable statistical price, even under adaptive batching schedules. We conclude by describing recent results that sharply characterize this adaptation cost.
3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area. For more details and upcoming events visit our website at https://statistics.yale.edu/calendar.