Hybrid

Claire Donnat, Assistant Professor of Statistics at the University of Chicago

Mon Nov 3, 2025 4:00 p.m.—5:00 p.m.
Claire Donnat, Assistant Professor of Statistics at the University of Chicago
Kline Tower, 13th Floor, Rm. 1327
219 Prospect Street New Haven, CT 06511

STRUCTURED TOPIC MODELING: LEVERAGING SPARSITY AND GRAPHS FOR IMPROVED INFERENCE

Classical topic models (LDA, pLSI) treat documents as independent, which wastes information when
texts are short or vocabularies are large. I will present two structured alternatives with statistical
guarantees. First, a weakly sparse extension of pLSI that stabilizes estimation in high-vocabulary
settings by shrinking rare terms without enforcing hard zeros. Second, a graph-aligned singular value
decomposition that incorporates known relationships between documents—e.g., spatial proximity
or sample similarity—to improve recovery of document–topic and topic–word matrices. For both
methods we derive non-asymptotic, high-probability error bounds for topic proportions and word
distributions. Applications to spatial proteomics, microbiome profiles, and scientific abstracts show
accuracy and interpretability gains when side information is available. The talk highlights when
structure helps, how to encode it, and what guarantees are achievable.
Keywords Topic Modeling · Constrained SVD

3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.  For more details and upcoming events visit our website at https://statistics.yale.edu/calendar.