Separating Estimation from Decision Making in Contextual Bandits

Mon Oct 19, 2020 4:00 p.m.—5:00 p.m.

This event has passed.

Speaker

Dylan Foster, MIT Institute for Foundations of Data Science

The contextual bandit is a sequential decision making problem in which a learner repeatedly selects an action (e.g., a news article to display) in response to a context (e.g., a user’s profile) and receives a reward, but only for the action they selected. Beyond the classic explore-exploit tradeoff, a fundamental challenge in contextual bandits is to develop algorithms that can leverage flexible function approximation to model similarity between contexts, yet have computational requirements comparable to classical supervised learning tasks such as classification and regression. To this end, we provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any algorithm for regression with a given function class into an algorithm for contextual bandits, with no overhead in runtime or memory requirements. Conceptually, our results show that it is possible to provably separate estimation and decision making into separate algorithmic building blocks, and that this can be effective both in theory and in practice. Time permitting, I will discuss extensions of these techniques to more challenging reinforcement learning problems.
Dylan Foster’s website

You are invited to a scheduled Zoom meeting. Zoom is Yale’s audio and visual conferencing platform.

Join the Zoom
Or Telephone：203-432-9666 (2-ZOOM if on-campus) or 646 568 7788
Password: 24
Meeting ID: 958 6320 8758
International numbers available

For H.323 and SIP information for video conferencing units