Nati Srebro, Toyota Technological Institute at Chicago
Classical theory, conventional wisdom, and all textbooks, tell us to avoid reaching zero training error and overfitting the noise, and instead balance model fit and complexity. Yet, recent empirical and theoretical results suggest that in many cases overfitting is benign, and even interpolating the training data can lead to good generalization. Can we characterize and understand when overfitting is indeed benign, and when it is catastrophic as classic theory suggests? And can existing theoretical approaches be used to study and explain benign overfitting and the “double descent” curve? I will discuss interpolation learning in linear (and kernel) methods, as well as using the universal “minimum description length” or “shortest program” learning rule.