Cyril Zhang

, Microsoft Research NYC

Title: On the "chemistry" of deep learning: lessons learned from training 10^8 networks on toy problems

Monday, October 02, 2023 4:00PM to 5:00PM

Kline Tower See map

219 Prospect Street, 13 floor, Rm 1327

New Haven, CT 06511

Website

Information and Abstract:

The phenomenology of modern deep learning is far stranger and murkier than our cherished canonical models from statistics, optimization, and theoretical computer science; the practice of training neural nets has been cheekily likened to alchemy. Towards diagnosing limitations and forming a principled new engineering discipline for these beasts, what are the mathematical tools and mental models we should bring?

There’s no satisfactory answer yet, but this talk will revolve around a single microscopic lens: learning parities on the Boolean cube with neural nets. As far as toy experiments go, this one reveals an outsized amount of complexity, out of which many empirical surprises and alchemical algorithm design principles fall naturally. By eschewing a theorems-first viewpoint, towards building a first-principles experimental theory of representation learning, we can use these 10-second experiments to concretize and explore a plethora of weird and wonderful topics in the science of deep learning: emergence, scaling laws, reasoning, hallucinations, lottery tickets, grokking, the edge of stability, and double descent. Along the way, we’ll see where our mathematical understanding of this innocuous experiment precisely ends, generating some conjectures about Boolean halfspaces.

We’ll conclude with some perspectives on toy models: future opportunities, limitations, and their immediate necessity in the era of neural code synthesis and theorem proving.

Joint work with Benjamin L. Edelman, Bingbin Liu, Jordan T. Ash, Boaz Barak, Surbhi Goel, Sham Kakade, and Akshay Krishnamurthy, and Eran Malach.

3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.

Department of Statistics and Data Science

Cyril Zhang

Cyril Zhang

Department of Statistics and Data Science