Speaker
Dylan Foster, MIT Institute for Foundations of Data Science
The contextual bandit is a sequential decision making problem in which a learner repeatedly selects an action (e.g., a news article to display) in response to a context (e.g., a user’s profile) and receives a reward, but only for the action they selected. Beyond the classic explore-exploit tradeoff, a fundamental challenge in contextual bandits is to develop algorithms that can leverage flexible function approximation to model similarity between contexts, yet have computational requirements comparable to classical supervised learning tasks such as classification and regression. To this end, we provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any algorithm for regression with a given function class into an algorithm for contextual bandits, with no overhead in runtime or memory requirements. Conceptually, our results show that it is possible to provably separate estimation and decision making into separate algorithmic building blocks, and that this can be effective both in theory and in practice. Time permitting, I will discuss extensions of these techniques to more challenging reinforcement learning problems.
Dylan Foster’s website
You are invited to a scheduled Zoom meeting. Zoom is Yale’s audio and visual conferencing platform.
- Join the Zoom
- Or Telephone:203-432-9666 (2-ZOOM if on-campus) or 646 568 7788
- Password: 24
- Meeting ID: 958 6320 8758
- International numbers available
For H.323 and SIP information for video conferencing units