Title: Chain-of-Thought Monitorability: A fragile opportunity for AI Safety
Abstract: Observability into the decision making of modern AI systems may be required to safely deploy increasingly capable agents. Monitoring the chain-of-thought (CoT) of today’s reasoning models has proven effective for detecting misbehavior. However, this “monitorability” may be fragile under different training procedures, data sources, or even continued system scaling. This talk will cover OpenAI’s recent work on chain-of-thought monitoring, discuss the importance and fragility of chain-of-thought monitorability, and introduce three evaluation archetypes that we are using to measure and hopefully maintain monitorability.
Bio: Bowen Baker is a research scientist at OpenAI and leads its Chain-of-Thought Interpretability Team. Bowen has always been interested in self-improving systems, and his first work in machine learning was in deep learning architecture search during his masters at MIT. He then joined OpenAI in 2017 where he has worked on sim-to-real transfer and dexterous manipulation of humanoid robotic hands, multi-agent autocurricula and cooperation, constructing behavioral priors from unsupervised video, LLM reasoning, and most recently AI safety and Alignment.
11:30am - Lunch/Pre-talk meet and greet - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.
For more details and upcoming events visit our website at https://statistics.yale.edu/calendar.