Zhuoran Yang, Princeton University
The recent empirical successes of deep reinforcement learning (RL) hinge on combining RL with modern function approximators such as deep neural networks. There are profound challenges, however, in developing a theory to support this enterprise, most notably the need to bring statistical errors that arise in modern function-approximation-based learning systems into the design of an RL algorithm. In this talk, we approach these challenges under both the online and offline settings.
Specifically, under the online setting, we aim to learn the optimal policy by actively interacting with the environment, with zero prior knowledge. For such a setting, we propose an optimistic modification of the least-squares value iteration algorithm that successfully addresses the exploration-exploitation tradeoff at the core of online RL. We establish polynomial sample complexity for this algorithm, in the context of the action-value function represented by a kernel function or an overparameterized neural network, without additional assumptions on the data-generating model.
Furthermore, under the offline setting, we aim to find the optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. For such a problem, we propose a pessimistic variant of least-squares value iteration, which identifies the critical role of pessimism in eliminating a notion of spurious correlation that emerges from the “irrelevant” trajectories that are not informative for the optimal policy.
Bio: Zhuoran Yang is a final-year Ph.D. student in the Department of Operations Research and Financial Engineering at Princeton University, advised by Professor Jianqing Fan and Professor Han Liu. Before attending Princeton, He obtained a Bachelor of Mathematics degree from Tsinghua University. His research interests lie in the interface between machine learning, statistics, and optimization. The primary goal of his research is to design a new generation of machine learning algorithms for large-scale and multi-agent decision-making problems, with both statistical and computational guarantees. Besides, he is also interested in the application of learning-based decision-making algorithms to real-world problems that arise in robotics, personalized medicine, and computational social science.