Cyril Zhang, Microsoft Research NYC
The phenomenology of modern deep learning is far stranger and murkier than our cherished canonical models from statistics, optimization, and theoretical computer science; the practice of training neural nets has been cheekily likened to alchemy. Towards diagnosing limitations and forming a principled new engineering discipline for these beasts, what are the mathematical tools and mental models we should bring?
There’s no satisfactory answer yet, but this talk will revolve around a single microscopic lens: learning parities on the Boolean cube with neural nets. As far as toy experiments go, this one reveals an outsized amount of complexity, out of which many empirical surprises and alchemical algorithm design principles fall naturally. By eschewing a theorems-first viewpoint, towards building a first-principles experimental theory of representation learning, we can use these 10-second experiments to concretize and explore a plethora of weird and wonderful topics in the science of deep learning: emergence, scaling laws, reasoning, hallucinations, lottery tickets, grokking, the edge of stability, and double descent. Along the way, we’ll see where our mathematical understanding of this innocuous experiment precisely ends, generating some conjectures about Boolean halfspaces.
We’ll conclude with some perspectives on toy models: future opportunities, limitations, and their immediate necessity in the era of neural code synthesis and theorem proving.
Joint work with Benjamin L. Edelman, Bingbin Liu, Jordan T. Ash, Boaz Barak, Surbhi Goel, Sham Kakade, and Akshay Krishnamurthy, and Eran Malach.
3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.