Alekh Agarwal

, Microsoft Research AI

Title: Structural Foundations of Efficient Reinforcement Learning

Monday, October 05, 2020 4:00PM to 5:00PM

Information and Abstract:

The design of learning agents which observe, interact with, and manipulate their environment to optimize desirable behaviors is a long-standing goal in machine learning, with roots in artificial intelligence, adaptive experimental design and adaptive feedback control. In machine learning, these questions are typically studied in the area of reinforcement learning (RL), which has seen a recent surge of interest both due to potential applications, from robotics and autonomous systems to healthcare and recommendations; as well as popular successes such as superhuman performance in games and dexterous manipulation of a Rubik’s cube.

On the theoretical front, foundations for sample efficient learning were laid in the early 2000’s, for problems where the agent perceives the environment through relatively simple observations. However, these settings fail to capture the complex sensorimotor observation streams which most application domains naturally contain. In this talk, I will describe a research program aimed at addressing this critical gap in our theoretical understanding.

I will begin by highlighting the impossibility of the most naturally posed learning problem in the worst-case, and the need to leverage ubiquitous structures of real-world applications. I will then discuss three such structures which I have examined in my own research:

a) Contextual Bandits capture a subset of RL problems where myopic decision making suffices—a reasonable approximation for many applications. I will describe the algorithmic ideas that result in statistically optimal and computationally practical approaches for this setting, as well as their maturation into an Azure Service (https://aka.ms/personalizer), that I helped create at Microsoft.

b) I will then define a significantly broader structural condition, called Bellman rank, for general RL problems. The condition strictly generalizes contextual bandits, and captures most known settings where sample-efficient RL is possible. I will also describe an algorithm which can provably learn a near-optimal behavior for any RL problem with a small Bellman rank, albeit in a computationally inefficient manner.

c) In the final part, I will examine a subset of problems with low Bellman rank, where the transition dynamics are low-rank, such as due to an underlying latent variable model. I will present a computationally and statistically efficient algorithm for this setting.

At a high-level, we will see how combining classical ideas such as optimism in the face of uncertainty, from the early theoretical works in RL, with statistical learning approaches like classification or regression to build model ensembles provides a favorable algorithmic strategy in all three cases.

The second and third parts of the talk are based on the papers: https://arxiv.org/abs/1610.09512 and https://arxiv.org/abs/2006.10814.

You are invited to a scheduled Zoom meeting. Zoom is Yale’s audio and visual conferencing platform.

Join from PC, Mac, Linux, iOS or Android: https://yale.zoom.us/j/95863208758
Password: 24
Or Telephone：203-432-9666 (2-ZOOM if on-campus) or 646 568 7788
Meeting ID: 958 6320 8758
International numbers available: https://yale.zoom.us/u/acqwvKmSRE

For H.323 and SIP information for video conferencing units please click here: https://yale.service-now.com/it?id=support_article&sys_id=434b72d3db9e8fc83514b1c0ef961924

Department of Statistics and Data Science

Alekh Agarwal

Alekh Agarwal

Department of Statistics and Data Science