Representational strengths and limitations of transformers

Mon Nov 27, 2023 4:00 p.m.—5:00 p.m.
Daniel

This event has passed.

Daniel

Speaker

Daniel J. Hsu, Columbia University

Attention layers, as commonly used in transformers, form the backbone of modern deep learning, yet there is no mathematical description of their benefits and deficiencies as compared with other architectures. This talk presents positive and negative results on the representation power of attention layers, with a focus on relevant complexity parameters such as width, depth, and embedding dimension. The main results establish separations between attention layers and other traditional neural network architectures such as recurrent neural networks, as well as separations between different transformer architectures. Based on joint work with Clayton Sanford (Columbia) and Matus Telgarsky (NYU).

3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.
Daniel J. Hsu’s website
The Zoom link