Qi Lei, Princeton University
A pre-trained model refers to any model trained on broad data at scale and can be adapted (e.g., fine-tuned) to a wide range of downstream tasks. The rise of pre-trained models (e.g., BERT, GPT-3, CLIP, Codex, MAE) transforms applications in various domains, especially where labeled data is scarce. A pre-trained model first learns a data representation that filters out irrelevant information from the training tasks; it then transfers the data representation to downstream tasks with few labeled samples and slight modifications.
This talk establishes some theoretical understanding for pre-trained models under different settings, ranging from supervised pre-training and meta-learning to self-supervised learning. I will discuss the conditions for pre-trained models to work based on the statistical relation between training and downstream tasks. The theoretical analyses partly answer how they work, when they fail, guide technical decisions for future work, and inspire new methods in pre-trained models.