Daniel J. Hsu

, Columbia University

Title: Representational strengths and limitations of transformers

Monday, November 27, 2023 4:00PM to 5:00PM

Kline Tower See map

219 Prospect Street, 13 floor, Rm 1327

New Haven, CT 06511

Zoom Link: https://yale.zoom.us/j/94223816617 Meeting ID: 942 2381 6617

Website

Information and Abstract:

Attention layers, as commonly used in transformers, form the backbone of modern deep learning, yet there is no mathematical description of their benefits and deficiencies as compared with other architectures. This talk presents positive and negative results on the representation power of attention layers, with a focus on relevant complexity parameters such as width, depth, and embedding dimension. The main results establish separations between attention layers and other traditional neural network architectures such as recurrent neural networks, as well as separations between different transformer architectures. Based on joint work with Clayton Sanford (Columbia) and Matus Telgarsky (NYU).

3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.

Department of Statistics and Data Science

Daniel J. Hsu

Daniel J. Hsu

Department of Statistics and Data Science