Webcast Option: https://yale.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=153489ad-83ae-490e-b86d-b361010cd81b
STRUCTURED TOPIC MODELING: LEVERAGING SPARSITY AND GRAPHS FOR IMPROVED INFERENCE
Classical topic models (LDA, pLSI) treat documents as independent, which wastes information when
texts are short or vocabularies are large. I will present two structured alternatives with statistical
guarantees. First, a weakly sparse extension of pLSI that stabilizes estimation in high-vocabulary
settings by shrinking rare terms without enforcing hard zeros. Second, a graph-aligned singular value
decomposition that incorporates known relationships between documents—e.g., spatial proximity
or sample similarity—to improve recovery of document–topic and topic–word matrices. For both
methods we derive non-asymptotic, high-probability error bounds for topic proportions and word
distributions. Applications to spatial proteomics, microbiome profiles, and scientific abstracts show
accuracy and interpretability gains when side information is available. The talk highlights when
structure helps, how to encode it, and what guarantees are achievable.
Keywords Topic Modeling · Constrained SVD
3:30pm - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area. For more details and upcoming events visit our website at https://statistics.yale.edu/calendar.
