Webcast: https://yale.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=6d1b3346-6…
Title: Scaling Inference-Time Compute: From Self-Improvement to Pessimism
Abstract: Language models increasingly rely on scaling inference-time computation to achieve state-of-the-art performance on a growing number of reasoning tasks. A popular paradigm for such computational scaling is Best-of-N (BoN) sampling, where a model generates multiple candidate responses to a given question and selects the one among them as the most likely to be correct. In this talk I will present a unified understanding of this approach in several settings, both with and without external verification. We will discuss the extent to which such inference-time computation is necessary as well as present a new algorithm that optimally leverages inference-time compute to return better answers in the presence of uncertainty, thereby avoiding common pitfalls of BoN sampling such as reward-hacking and over-optimization. Throughout, we will see that model coverage of ‘good’ answers emerges as the critical feature allowing for inference-time computation to scale effectively. These results provide a principled foundation for designing inference-time algorithms that scale reliably with compute and highlight coverage as the central bottleneck in aligning language models.
3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area. For more details and upcoming events visit our website at https://statistics.yale.edu/calendar.