Lihong Li

, Google Research

Title: Off-policy Estimation by the Regularized Lagrangian

Monday, September 21, 2020 4:00PM to 5:00PM

You are invited to a scheduled Zoom meeting.

Information and Abstract:

In many real-world applications of reinforcement learning (RL) such as healthcare, dialogue systems and robotics, running a new policy on humans or robots can be costly or risky. This gives rise to the critical need for off-policy estimation, that is, estimate the average reward of a target policy given data that was previously collected by another policy. This talk will describe some recent advances for long- or even infinite-horizon off-policy estimation, where standard methods suffer a variance that grows exponentially with the horizon (“curse of horizon”). The key to these methods is a duality structure in RL, whose use goes beyond off-policy estimation.

You are invited to a scheduled Zoom meeting. Zoom is Yale’s audio and visual conferencing platform.

Join from PC, Mac, Linux, iOS or Android: https://yale.zoom.us/j/95863208758
Password: 24
Or Telephone：203-432-9666 (2-ZOOM if on-campus) or 646 568 7788
Meeting ID: 958 6320 8758
International numbers available: https://yale.zoom.us/u/acqwvKmSRE

For H.323 and SIP information for video conferencing units please click here: https://yale.service-now.com/it?id=support_article&sys_id=434b72d3db9e8fc83514b1c0ef961924

Department of Statistics and Data Science

Lihong Li

Lihong Li

Department of Statistics and Data Science