Trevor Campbell, MIT
The automation of posterior inference in Bayesian data analysis has enabled experts and nonexperts alike to use more sophisticated models, engage in faster exploratory modeling and analysis, and ensure experimental reproducibility. However, standard automated posterior inference algorithms are not tractable at the scale of massive modern datasets, and modifications to make them so are typically model-specific, require expert tuning, and can break theoretical guarantees on inferential quality. This talk will instead take advantage of data redundancy to shrink the dataset itself as a preprocessing step, forming a “Bayesian coreset.” The coreset can be used in a standard inference algorithm at significantly reduced cost while maintaining theoretical guarantees on posterior approximation quality. The talk will include an intuitive formulation of Bayesian coreset construction as sparse vector sum approximation, an automated coreset construction algorithm that takes advantage of this formulation, strong theoretical guarantees on posterior approximation quality, and applications to a variety of real and simulated datasets.