Aerial View of Yale University

The Major

Students majoring in Statistics and Data Science take courses in both mathematical and practical foundations. They are also encouraged to take courses in areas of application.

The B.A. in Statistics and Data Science is designed to acquaint students with the fundamental techniques in the field. The B.S. should prepare students to participate in research efforts or pursue graduate school in Data Science.
 

Requirements at a Glance

Prerequisite 

  • MATH 1200 (Calculus of Functions of Several Variables)
  • ENAS 1510 (Multivariable Calculus for Engineers)
  • MATH 3020 (Vector Calculus and Integration on Manifolds)
  • or equivalent, or DUS waived

Senior Project

  • S&DS 4910 (Senior Essay, Fall)
  • S&DS 4920 (Senior Essay, Spring)
Classroom

Discipline Areas

These are essential courses in probability and statistics. Every major should take at least two of these courses, and should probably take more. Students completing the BS must take S&DS 2420.

  • S&DS 2380    Probability and Statistics
  • S&DS 2410    Probability Theory
  • S&DS 2420    Theory of Statistics
  • S&DS 3120    Linear Models
  • S&DS 3510    Stochastic Processes

Every student in Data Science should be able to compute with data. While the main purpose of some of these courses is not computing, students who have taken at least two of these courses should be capable of digesting and processing data. While there are other courses that require a lot of programming, these ones are essential.  Every major must take at least two of these courses.

  • One of the following courses:
    • S&DS 2200    Intro Statistics, Intensive
      S&DS 2300    Data Exploration and Analysis
       
  • One of the following courses:
    • CPSC 1001    Introduction to Programming
    • CPSC 2000    Introduction to Information Systems
    • CPSC 2010    Introduction to Computer Science
    • ENAS 1300    Introduction to Computing for Engineers and Scientists
       
  • S&DS 2620    Computational Tools for Data Science
  • S&DS 2650    Introductory Machine Learning
  • S&DS 4250    Statistical Case Studies

These courses teach fundamental methods for dealing with data. They range from the practical to the theoretical. Every major must take at least two of these courses.

  • S&DS 3120    Linear Models
  • S&DS 3170    Applied Machine Learning and Causal Inference
  • S&DS 3610    Data Analysis
  • S&DS 3630    Multivariate Statistics for Social Sciences
  • S&DS 3650    Intermediate Machine Learning
  • S&DS 4310    Optimization and Computation
     
  • CPSC 4460    Data and Information Visualization
  • CPSC 4520    Deep Learning Theory and Applications
  • CPSC 4770    Natural Language Processing

All students in the major must know linear algebra. If they have learned linear algebra through other courses (such as MATH 230/231), they may substitute another course from this category. Students pursuing the B.S. must take at least two courses from this list. Students who wish to pursue graduate school should take many.

  • MATH 2220    Linear Algebra with Applications
  • MATH 2250    Linear Algebra and Matrix Theory
  • MATH 2260    Linear Algebra - Intensive
  • MATH 2320    Advanced Linear Algebra with Applications
  • MATH 3400    Advanced Linear Algebra
  • MATH 2440    Discrete Mathematics
  • MATH 2550    Analysis 1
  • MATH 2560    Analysis 1 Intensive
  • MATH 3020    Vector Calculus and Integration on Manifolds
  • MATH 3050    Analysis 2: Lesbegue Integration and Fourier Series
  • MATH 3200    Measure Theory and Integration
  • MATH 3250    Introduction to Functional Analysis
     
  • S&DS 3640    Information Theory
  • S&DS 4000    Advanced Probability
  • S&DS 4100    Statistical Inference
  • S&DS 4110    Selected Topics in Statistical Decision Theory
  • S&DS 6690    Statistical Learning Theory
     
  • CPSC 3650    Algorithms
  • CPSC 3660    Intensive Algorithms
  • CPSC 4690    Randomized Algorithms

These courses are for students who want to do serious programming or implement large-scale analyses. None are required for the major. Students who wish to work in the software industry should take at least one of these.

  • CPSC 2230    Data Structures and Programming Techniques
  • CPSC 3230    Introduction to Systems Programming and Computer Organization
  • CPSC 4240    Parallel Programming Techniques
  • CPSC 4370    Database Systems

Students are encouraged to take courses that involve the study of data in application areas. These courses will teach students how these data are obtained, how reliable they are, how they are used, and the types of inferences that can be made from them. These course selections should be approved by the DUS. Examples of such courses include

  • ANTH 3476   Observing and Measuring Behavior
  • GLBL 3191    Research Design and Survey Analysis
  • LING 2340    Quantitative Linguistics
  • LING 3800    Neural Networks and Language
  • PSYC 2200    Research Methods, Writing Intensive

These are methods courses in areas of applications. They help expose students to the cultures of fields that explore data. These course selections should be approved by the DUS. Examples of such courses include:

  • S&DS 3520   Biomedical Data Science, Mining and Modeling
  • BENG 4450   Biomedical Image Processing and Analysis
  • CPSC 4750    Computational Vision & Biological Perception
  • ECON 2136   Econometrics
  • ECON 4419   Financial Time Series Econometric
  • LING 2270    Language and Computation I (same as PSYC 3327)
  • PSYC 3327    Language and Computation I (same as LING 2270)