Hybrid

Yihong Wu, James A. Attwood Professor of S&DS

Mon Nov 17, 2025 4:00 p.m.—5:00 p.m.
Yihong Wu, James A. Attwood Professor of S&DS

This event has passed.

Kline Tower, 13th Floor, Rm. 1327
219 Prospect Street New Haven, CT 06511

Title: Besting Good-Turing for probability estimation over large domains

Abstract: When faced with a small sample from a large universe of possible outcomes, scientists often turn to the venerable Good-Turing estimator. Despite its pedigree, however, this estimator comes with considerable drawbacks, such as the need to hand-tune smoothing parameters and the lack of a precise optimality guarantee. We introduce a tuning-parameter-free estimator that bests Good-Turing in both theory and practice. Our method marries two classic ideas, namely Robbins’ empirical Bayes and Kiefer-Wolfowitz’s nonparametric maximum likelihood, to learn an implicit prior from data and then convert it into probability estimates. We prove that the resulting estimator attains the optimal instance-wise risk up to logarithmic factors in the competitive framework of Orlitsky and Suresh, and that the Good-Turing estimator is strictly suboptimal in the same framework. Simulations on synthetic data and experiments with English corpora and U.S. Census data show that the proposed estimator consistently outperforms both the Good-Turing estimator and explicit Bayes procedures. 

This is based on joint work with Yanjun Han (NYU), Jonathan Niles-Weed (NYU), and Yandi Shen (CMU), available at https://arxiv.org/abs/2509.07355

3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.  For more details and upcoming events visit our website at https://statistics.yale.edu/calendar.