Title: Diffusion for language modeling?
Abstract: Compared to autoregressive models and even to continuous diffusions, diffusion language models offer a fundamentally different design space for crafting efficient and flexible generation processes. In this talk I will discuss work along two axes of this design space: parallel decoding and variable-length generation. In the first half, I will give an exact characterization of the optimal inference schedule for masked diffusion models, which depends on a certain “information profile” specific to the data distribution. From this characterization, I will derive simple schedules that enable sampling provably more efficiently than autoregressive models for any distribution with bounded correlations. In the second half, I will present FlexMDM, a theoretically principled and empirically lightweight method for equipping diffusion language models with the ability to generate sequences of arbitrary length, while provably preserving their any-order generation capabilities.
3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.
For more details and upcoming events visit our website at https://statistics.yale.edu/calendar.