Past Event: Unveiling In-Context Learning: Provable Training Dynamics and Feature Learning in Transformers

Mon Mar 31, 2025 4:00 p.m.—5:00 p.m.

This event has passed.

Past Event: Mon Mar 31, 2025 4:00 p.m.—5:00 p.m.

Speaker

Zhuoran Yang, Yale University

In-context learning (ICL) is a cornerstone of large language model (LLM) functionality, yet its theoretical foundations remain elusive due to the complexity of transformer architectures. In particular, most existing work only theoretically explains how the attention mechanism facilitates ICL under certain data models. It remains unclear how the other building blocks of the transformer contribute to ICL. To address this question, we study how a simple softmax transformer is trained to perform ICL on two synthetic tasks — (multi-task) linear regression and n-gram Markov chain. We show that transformer successfully learns these tasks in-context. More importantly, we will interpret the estimator represented by the learned transformer, show how transformers are trained by gradient-based dynamics, and how features emerge during training. Our theory is further validated by experiments.

This is joint work with Siyu Chen, Jianliang He, Xintian Pan, Heejune Sheen, and Tianhao Wang.

3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.

Link for where the event was viewed