S&DS Major FAQs

Questions about the Field

Anyone! No, seriously.

The S&DS major is designed to be extremely flexible, in part to reflect the broad scope of the field and the ever-expanding skill set that falls under the realm of data science. Courses in the S&DS major span a wide array of subdisciplines and subtopics, and students in our department span a wide array of interests and backgrounds. No matter where your interests lie, the tools that come with analyzing, interpreting, and presenting empirical data will be useful.

This is a good question, and it’s one that nobody knows the answer to (seriously, ask 5 people what data science is and you won’t get a consistent answer). Broadly speaking, “statistics” more typically refers to the question inference: how can we estimate an uncertain quantity, and how uncertain are we? And broadly speaking, data science more typically refers to the entire process of working with data; in some ways, it is applied statistics. Data science is a newer term, but the techniques that comprise data science are not necessarily newer.

Data science can include all of the following, depending on who you ask:

  • Data collection (including web scraping)
  • Data analysis & analytics
  • Data manipulation & wrangling
  • Data visualization
  • Data engineering
  • Statistical inference and modeling
  • Machine learning and artificial intelligence
  • Randomized experiments and causal inference
  • Natural language processing
  • Spatial statistics and GIS

Much of modern machine learning and artificial intelligence builds on techniques that have traditionally been known under the realm of statistics: the underlying math of logistic regression (one of the most useful foundational machine learning models) has existed for decades under the umbrella of “statistics,” before it was repurposed under what many will consider “data science.”
 

Like the previous question, there’s no good answer to this question either. After all, modern “machine learning” draws heavily from foundations in statistical theory, yet industry roles that are centered on machine learning might fall under both the “data scientist” or “software engineer” job titles. Yale’s key gateway machine learning course is offered by the S&DS department (S&DS 3650), while some (perhaps even most) schools offer machine learning under the Computer Science department. No matter what department it is offered under, every solid machine learning course will emphasize statistical theory.

It’s probably most fair to say that statistics, data science, and computer science borrow heavily, and symbiotically, from each other. Machine learning methods rely on statistical theory in concept, but training large machine learning models relies on methods of efficient computing, where research traditionally falls in the realm of computer science. And certain subsets of data science, including data visualization and data engineering, rely on techniques that are sometimes taught in computer science courses (and have traditionally been claimed by the field of computer science as research topics).

Logistics

Yes, you can receive updates on the major by subscribing to the S&DS undergraduate student mailing list. The mailing list includes weekly DUS updates, announcements for new courses, and job and research opportunities. If you are somewhat seriously considering the S&DS major, even if you are not officially declared, we recommend you sign up, as it can be a useful source of information.

Note that after you sign up you’ll be asked to confirm—if you don’t confirm, you won’t be added to the mailing list.

The advising resources page on the Yale College website has information regarding declaring a major. 

You should first talk to your residential college dean. Once you’ve done so, work with your DUS to develop a proposed course schedule using the course checklist

You will also need to complete the Petition to Complete the Requirements of Two Majors online form (the advising resources web page holds that information). You’ll be asked to list the courses you intend to use to fulfill the requirements of both majors but note that these choices are not binding.

If you plan to double major, be sure to speak to the DUS’s of both S&DS and your second major to ensure that you will be on track to complete both sets of requirements. Remember that only two courses can overlap between the majors. Also, note that the senior requirements cannot be double counted. You must take S&DS 4910 or 4920 and fulfill the senior requirement for your other major. 


 

No. The major requirements are more in-depth than those of the certificate and expand on the skills and knowledge you would gain in the certificate.

Major Requirements

The Yale College Program of Studies, the Major page, and S&DS checklist should always serve as the most authoritative sources of information about requirements for the major. If you have questions about the structure and requirements of the major, you should consult one of these resources, or email your questions to the DUS.

We offer two degrees in Statistics & Data Science: a Bachelor of Arts and Bachelor or Science. The key difference is that the B.S. requires three more courses: the B.S. requires 14 courses, while the B.A. requires 11 courses. Note that both degrees require multivariable calculus as a prerequisite, which can be fulfilled through Math 120 or ENAS 151, or be waived by the DUS.

The B.A. in S&DS requires 11 term courses:

  • Linear algebra (either Math 222, 225, or 226)
  • 2 courses from Category A (S&DS 2410 is highly recommended but not required)
  • 2 courses from Category B
  • 2 courses from Category C
  • 3 electives from Categories A-G
  • Senior requirement (S&DS 4910 or 4920)

The B.S. in S&DS requires 3 additional term courses, for a total of 14 term courses:

  • 1 additional course from Category D
  • 2 additional courses from Categories A-E
  • One of your B.A. Category A courses must be S&DS 2420 and one of your B.A. Category C courses must be S&DS 3650.

Introductory courses number in the 100s, including S&DS 1000-1090 and S&DS 1230, do not count towards the major (with the exception of S&DS 1500, Data Science Ethics, which may count on a case-by-case basis, conditional on DUS approval).


 

Generally, the answer is no.

If you are currently a first-year or a sophomore (in either semester), you have more than enough time to complete all of the requirements of the major. You should take S&DS 2410 (Probability Theory), which is offered in the fall, as soon as possible, because it’s a prerequisite for many other courses. You should also plan to take S&DS 2420 (Theory of Statistics) if you’re interested in pursuing the B.S., or plan to take high-level courses, as soon as possible. Many students take the 2410/2420 sequence in their sophomore year, although some students do so in their junior year. 

If you’re a junior, you should be fine as long as you are taking S&DS 2410 no later than junior fall, if you haven’t already done so as part of your requirements for another major. 

The S&DS degree is extremely flexible, and you can use courses from many different departments — including Mathematics, Computer Science, and Economics — to fulfill your elective requirements. If you come from one of these disciplines, you may find that you have already met some or many of the requirements of the S&DS degree. We encourage you to use the S&DS major checklist to map out which requirements you’ve already satisfied, and which courses you would still need.

If you are in the middle of your junior year and still want to switch to S&DS, it will be difficult to do so without having taken S&DS 2410. But depending on your mathematical background (perhaps you’ve gained a good amount of math and statistics from another field), it may be worth speaking to the DUS about your situation, and whether paths might still be open. In any case, the certificate is an option.


 

Aim for the B.S. and if you realize that you don’t have the room to complete the B.S. requirements, or there are other non-S&DS courses you would rather take as you finish up Yale, you can always choose to pursue the B.A. instead. Double majors can decide based on the course requirements of their other major and other considerations. 

Generally speaking, pursuing the B.A. will give you a stronger foundation in the mathematical grounding of statistics and data science. The major will require that you take linear algebra, for instance, and you will gain deeper exposure to some concepts since you will be taking S&DS 2410 rather than potentially 2400, which can only be used for the certificate. You will also gain more applied data science skills just by virtue of taking more courses.

If you are already majoring in a heavily quantitative field, it may not make a huge difference whether you add on a B.A. in S&DS or a certificate in data science. Fields like Economics, Psychology, Computer Science, Mathematics, and some engineering fields would likely give you plenty of exposure to applied situations. Adding the certificate would give you an opportunity to develop some of your data science skills with more depth and complement your primary field of study without substantially burdening your course schedule. But also note that if you’re majoring in a quantitatively-adjacent field, should you choose to pursue the B.A., you’ll likely be able to double-count 2 courses across your majors, so the 11 required courses for the B.A. becomes 9 in practice, which is only 4 credits more than the 5 required credits for the certificate (since you’re not allowed to double-count certificate courses).

If you are planning on majoring in a completely unrelated subject, such as English or History, and particularly if you have several math requirements to backfill, you may find that even completing the certificate requires you to fill in additional courses beyond the 5 required credits of the certificate. Depending on how comfortable you are with mathematics, you may or may not find the additional course credits for the B.A. worth it, or you might not have the space in your schedule.

There’s no one clear answer, and you should speak to both the DUS and the director of the certificate program to discuss your specific situation. However, if you are at all considering the major, we recommend that you avoid the certificate-only courses (such as 2400) and take the major-focused versions (i.e. 2410) to ensure you can count them for the major, should you choose to pursue the major.

We encourage you to use the S&DS major checklist to keep track of your major requirements. We also recommend that, before finalizing your course schedule each semester, you email your checklist to the DUS to ensure that your course selections meet the requirements of the major.

If you have declared the S&DS major, you can view Degree Audit within Yale Hub to see your completed requirements for the major. Note that some electives may not immediately show up in Degree Audit, as they must be manually added by the DUS.

No courses may be taken Cr/D/F.

You need to provide the DUS with evidence demonstrating your proficiency in the course material.  For example, if you took a similar course elsewhere, then provide information such as the syllabus, textbook used, exams, problem sets, transcript, etc. In addition, you should list any advanced courses you’ve taken at Yale that depend crucially on the course material.

For courses taken outside of Yale, there are two options for getting credit:

  • Get Yale credit for the course (this is done through your College Dean). Then, if the DUS has approved your course for the major, your total course count goes down by one.
  • Don’t get Yale credit. If I’ve approved the course I’ll waive the appropriate category requirement but you’ll have to take an extra elective to maintain the total course count.

You must earn a grade of A or A- in three-quarters of the credits you take in the S&DS department, including your senior essay S&DS 4910 or 4920. Note that the denominator of this calculation includes both  

  1. Courses you take to fulfill the requirements of the S&DS major, including those without a S&DS course code, 
  2. Courses you take beyond the 11 (or 14) required courses of the S&DS major that happen to carry a S&DS designation. 

Marks of “W” for Withdrawal are not included in either the numerator or denominator of this calculation. In practice, if you complete the B.A. and take no other courses in the S&DS major, you will need to earn an A or A- in 9 out of 11 of those courses. If you complete the B.S. and take no other courses in the S&DS major, you will need to earn an A or A- in 11 out of 14 of those courses. (The Math 120 prerequisite isn’t factored into this calculation.)

Choosing Courses

If you’re reasonably sure that you plan to major in S&DS, there are a few categories of courses you should look at taking:

  • Math courses. The S&DS major requires a course in linear algebra (Math 222/225/226), so depending on your mathematical background, you might just need to take linear algebra, or you might need to take the entire calculus sequence starting from Math 112/115/120. Either way, you should knock out your math sequence as quickly as you can.
  • S&DS 2200. If you’re likely to major in S&DS, we generally recommend that you take S&DS 2200 as your introductory course, since 100-level intro courses don’t count for the major. S&DS 2200 requires no background in statistics or programming, and while it’s time-intensive, it will equip you with a strong set of tools in statistics and programming for the rest of the major. We recommend taking S&DS 2200 even if you have prior statistics exposure from AP Statistics because 2200 covers a broader range of topics and in much more depth than AP Statistics.
  • S&DS 2410 (or 2380). Once you’ve taken multivariate calculus (Math 1200), you should try to take S&DS 2410 as soon as you can. Typically, most students take this in their sophomore year. S&DS 2410 serves as a prerequisite for math other courses, so taking it early will ensure you have maximum flexibility later on. It also lets you take S&DS 2420 if that’s something you plan on doing, either because you want to, as a requirement for the B.S., or as a prerequisite for many other courses. S&DS 2410 is the more traditional option, although if S&DS 2380 sounds more like your cup of tea, that’s a good alternative as well. See the FAQ below for more details on the difference between S&DS 2380 and S&DS 2410.

If you plan on majoring in S&DS, it’s also a viable path to take S&DS 1000-1090/1230, followed by 2200 or 2300, but remember that 100-level courses don’t count toward the major.

Take S&DS 2200 if you want to learn some of the more statistical theory in more depth and learn how to run statistical simulations. Take S&DS 2300 if you want to get more applied data analysis experience.

If you’re now thinking of majoring in S&DS, you should also consider getting your math requirements out of the way (you’ll need to take all the way up to linear algebra), and you should take S&DS 2410 as soon as you can.

Take S&DS 2300. If you’re now thinking of majoring in S&DS, you should also consider getting your math requirements out of the way (you’ll need to take all the way up to linear algebra), and you should take S&DS 2410 as soon as you can.

Generally, you should contact the instructor and ask if you have a suitable background for the course, whether or not the prerequisites are strict, and whether prerequisites can be taken concurrently.

Once you’ve taken core courses like S&DS 2200/2300, 2410 + 2420, and linear algebra, you’re ready to start branching out into different areas of the major. 300-level courses like 3120 (Linear Models), 3150/3170 (causal inference), 3510(Stochastic Processes), 3610/3630 (more advanced data analysis), 3640 (Information Theory), and 3650 (intermediate machine learning) will give you exposure to different areas of the major and help you hone your interests. 

Note that 2650 (if you don’t have programming experience, see below) and 3650 are particularly useful to take, since they give you the foundational knowledge in machine learning that you would need for many projects, internships, and research opportunities. Machine learning isn’t required of all majors, but many students find the content to be valuable.

Note: Electives must be approved by the DUS. It is recommended that you email your checklist to the DUS (ideally before you enroll in the elective) to ensure that your course selection will count towards the major.

You have a lot of freedom to choose electives in the S&DS major, particularly since the field is so broad, which means that you should put some thought into making the courses you take for your S&DS major somewhat coherent and cohesive. (We promise, you’ll feel like you’ve gotten more out of your major that way.) Once you’re done with core courses for the major, you should try to pick a few things that you’re interested in within the field of statistics and data science, and try to align your studies along that path. You should, of course, give yourself room to pivot if you realize your interests change, and you should also allow yourself to explore and pick courses from several of these categories, but trying to design a cohesive sequence of courses will give you more opportunities to build on previous courses and find connections between material you encounter in different courses.

The next few questions give you examples of courses you can take within different subareas, but by no means are these the only areas that you can explore within the S&DS major! These are just a few ideas to get you started and guide you as you start exploring electives for the major. You might want to talk to professors of courses you’ve taken and peers (specifically peers in their junior or senior year) to get ideas for good courses to take.

Start out with S&DS 2650 if you don’t have programming experience and follow it up with S&DS 3650. If you do have programming experience, start with 3650. Then, consider courses like CPSC 452, CPSC 453, CPSC 464, and CPSC 477. It’s also worth searching “machine learning” in Yale Course Search to see which specific courses are being taught each semester, as the selection changes frequently. Some of the courses just mentioned above may no longer be offered. 

If you plan on focusing on machine learning and taking courses in the Computer Science department, note that you should also plan to take up to CPSC 223, which is a common prerequisite for higher-level computer science courses. Some courses may require up to CPSC 365.

Start out with S&DS 2650 if you don’t have programming experience and follow it up with S&DS 3650. If you do have programming experience, start with 3650. Then, consider courses like CPSC 477, CPSC 488, LING 229, LING 234, and LING 380. Look out for courses taught by Robert Frank in the Linguistics department. John Lafferty in S&DS is also a leading expert in text processing, although in recent semesters he has typically been teaching the machine learning courses.

In preparation for advanced computer science courses, you may also wish to prepare to meet any computer science prerequisites, which is often set at CPSC 223.

You could start with S&DS 3170. S&DS 6160 and 6170 are other possibilities. PLSC 341, The Logic of Randomized Experiments is another. S&DS-affiliated professors who are experts in causal inference and randomized experiments include Joshua Kalla, Alex Coppock, P Aronow, and Jas Sekhon, among many others. Econometrics courses in the Economics department can also be a good option, such as ECON 136 and ECON 419/420. Check if/when these courses are offered. Offerings have differed based on which faculty are in the department. 

The Psychology and S&DS majors are almost the perfect marriage. Understanding research in psychology requires a solid foundation of statistical methods, while the expanse of psychology makes it a great field to apply the tools of statistics and data science. The Psychology major specifically requires a class in statistics, usually PSYC 2000, but it is possible to substitute that requirement with a S&DS class that teaches ANOVA (consult with the DUS of Psychology to confirm eligibility).

Psychology courses that are data analysis heavy include research methods classes (e.g. PSYC 235, 238, 258, 438) and multivariate statistics classes (e.g. S&DS 3630, PSYC 518). Statistics courses focused on causal inference and experimental design are also important to psychology; see the previous question for relevant course recommendations. Other S&DS courses to give you more knowledge on the application of statistics include 2620, 3170, 3610, and 3630.

The introductory econometrics sequence (ECON 117/123) should give you a strong foundation in the skills you’ll need to conduct economic research. ECON 136 can be helpful for solidifying several more advanced economic techniques, such as panel data, difference-in-difference estimation, and regression discontinuity. Depending on your interests, there are generally an array of courses offered in the Economics department that have a strong empirical focus. ECON 301, 417, 418, 419, 420, 438, and 439 might be a few courses to look at, among others.

A few options to consider: S&DS 3150 taught by Josh Kalla, PLSC 454 taught by Frederik Savje, and PLSC 341 taught by Alex Coppock are three good options. In general, these professors do good applied work at the intersection of politics, policy, and statistical methodology, so they may be useful resources to reach out to in general. You should check course listings for upcoming semester, as these courses change frequently and any given course number may only be offered once every couple of years. 

In addition, you may wish to consult the above advice on causal inference and randomized experiments, as a lot of modern research in public policy is centered around testing causal questions using empirical data from field experiments or past policy implementation. Finally, you might also consider seeking research opportunities with professors in the political science department to assist with applied research questions they’re working on.

CPSC 446 and PLSC 349 (if it’s being offered) are two good courses with a very different focus from each other. S&DS 6740 could be useful to take if you are specifically interested in visualizing spatial data and generating maps. You should also take the time to really learn ggplot2, a really valuable R package for creating complex charts and visualizations.

These course selections should be approved by the DUS at the start of each semester. You can consult the major page for examples of courses that are often approved for Categories F and G, depending on what other courses are part of your proposed schedule. Remember that electives should build on your S&DS skills and should provide both breadth and depth of topics. Courses with no S&DS prerequisites are often not approved as electives. 

Specific Courses in the Department

According to the Math department webpage, regarding 222/225/226,  “All three courses cover linear algebra. Math 222 focuses more on computational techniques and applications, while 225 and 226 emphasize mathematical proofs and a more conceptual approach.  Math 225 (linear algebra) or 226 (intensive linear algebra) is recommended for students who wish to take further proof-based mathematics courses.” Starting in Fall 2021, Math 225 will include an explicit introduction to proof-writing, and will be designed to be accessible as a first-semester math course with no prior proof-writing experience. Math 226, new for Fall 2021, is a more intensive version of Math 225 that will not include an explicit introduction to proof-writing, and will explore topics in more depth. It is designed for students who already have prior exposure to proofs.

For most S&DS students interested in applied data science work, taking Math 222 will generally suffice. Taking Math 225 may provide a helpful introduction to proof-writing, which sometimes appears in higher-level, more theoretical courses in statistics and data science (as well as in courses such as CPSC 366, intensive algorithms, if that is a direction you are considering pursuing).
 

You can take both if you’d like, but only one of these courses can count towards the major. There may be an alternative to taking both that would make sense for you. Contact your DUS to discuss. 

S&DS 2380 and 2410 overlap substantially but are quite different, and choosing one over the other involves tradeoffs. They both formally require multivariate calculus as the prerequisite (either Math 118 or Math 120, or a previous multivariate calculus course).

If you are planning to take further statistics courses such as 2420 and 3510, then the more standard choice would be 2410, and I’d say you can’t go wrong with this choice. S&DS 2410 has been taught for a very long time, and the courses 2420 and 3510 were designed to follow 2410.

S&DS 2380 was added to our course offerings more recently. Our original question motivating the development of 2380 was: for a student who is thinking of taking just one course in the whole area of probability/statistics/data analysis, hoping to learn as much as possible in one semester, what would we teach them? It has turned out that many S&DS 2380 students go on to take more statistics (including declaring a statistics major), but that was the original concept.

In contemplating taking S&DS 2410, you should definitely not worry about not having taken a statistics course before, and it is not true that 2410 is less suitable for people with no prior experience with statistics than 2380 is, since they both assume no prior experience with statistics, and just assume some basic familiarity with the tools of multivariate calculus.

What are those tradeoffs? S&DS 2410 focuses on probability theory and tends to emphasize mathematical developments more, and S&DS 2380 includes a substantial number of statistics and computing together with some math. That is, typically (of course it varies with different instructors) S&DS 2410 feels more like a math class, and S&DS 2380 mixes in statistical inference (from a Bayesian viewpoint, which is a bit unusual for a course at this level), computing, and some data analysis. You can expect to get more time and practice and depth with Probability Theory in 2410 than in 2380. S&DS 2380 includes topics that overlap (but from a somewhat different point of view and perhaps for different purposes) with 2420 and 3510, such as using likelihood for statistical inference (which also is done in 2420) and Markov chains (which are also done in 3510). From that point of view, students who come out of S&DS 2380 and then take 2420 or 3510 may feel that they are already comfortable with some concepts that others in those classes are seeing for the first time, which could be viewed as an advantage, but they may feel that their command of probability theory is being taxed more than the students coming out of 2410, which is a disadvantage.

In 2380 you would get enough probability theory that you would be prepared to take 2420 and 3510, and in this regard differences between how well individual students “got” the respective classes (2380 or 2410) are probably more important than which class they took, but the median student in 2410 probably has a more solid command of probability theory than the median student in 2380. The 2380 students probably have some additional useful perspectives and insights (as well as skills in computing and simulation) that could help them understand and appreciate some of the things they are about to learn in 2420 and 3510, and the hope would be if they feel a need to strengthen any particular aspect of probability theory while taking 2420 and 3510, it would not be a problem to do some review or a bit of extra reading, perhaps in a 2410 textbook.

S&DS 2400 was added to our course offerings much more recently in tandem with the recent addition of the certificate in data science. S&DS 2400 is a less mathematically rigorous treatment of the topics typically taught in a probability theory course like 2410, and as such it requires only Math 115 (or AP Calculus BC), rather than Math 118/120 (multivariate calculus). You will learn many of the same broad skills and topics, and in fact any of S&DS 2380, 2400, or 2410 will cover the background needed for S&DS 2420 (Theory of Statistics).

The key difference is that 2400 cannot be used to fulfill the requirements of the major, since the course is designed for those pursuing the certificate, while 2380/2410 can be used towards the major. If you are on the fence between pursuing the certificate and the major, you should try to take 2410 if your schedule allows (and assuming you meet the prerequisite), since it will give you the flexibility to pursue the major if you decide you want to take more courses in S&DS.
 

No. S&DS 2400 and 3550 are only accepted for the certificate.

The case studies course is a guided but independent data analysis practicum — that is, the professor helps you further develop your data analysis skills in a variety of applied situational contexts, filling in gaps that may have been left by earlier data analysis courses. There’s an emphasis on methods of choosing data, acquiring data, assessing data quality, and the issues posed by working with large, and potentially messy, data sets. In other words, while previous courses may have helped you develop your skills in specific areas, such as statistical inference or data visualization, S&DS 4250 will expose you to practical considerations across the entire data lifecycle. There’s also an emphasis on developing strong coding habits to ensure that your R scripts are readable and reproducible by both yourself as well as others who may need to use your code. Unlike previous courses, the S&DS 4250 professor will serve more in an advisory role rather than planning specific knowledge areas you must master. A good portion of the course will center on independently figuring out how different packages work together, comparing different methodologies to accomplish a certain task (for which there might not be one “best” answer), and how to interpret results in ambiguous contexts.

S&DS 4250 is a great course, and past students have said that every student should take this course at some point in their S&DS career. It will make you a better statistician, data analyst, and R programmer. Enrollment is limited and the course has an application process that has typically involved submitting a transcript and statement of interest in the course. Unfortunately, in recent years, due to the rapid growth of the undergraduate S&DS population, the department has only (and just barely) had enough capacity to accommodate one section of undergraduate S&DS majors per semester, with no room for students not in the major.

Conditions may change, but you should speak to the DUS to determine whether the case studies course might be right for you.

Combined B.S./M.A. in S&DS

Generally approximately 2-3 students are accepted, although this can fluctuate from year to year.

You should make yourself aware of the pertinent deadlines in the Yale College Program of Studies, then reach out to the DUS for more information as soon as possible. Note that there are deadlines beginning in your fifth term of enrollment at Yale, but you should start planning your courses well in advance, particularly given that many courses in the S&DS department have cross-listed undergraduate and graduate course codes. There is a helpful PDF with more information, along with an S&DS Major checklist and a MA Checklist that you can use to start planning your schedule. 

Prior to formal admission into the program, you should always enroll in the graduate number. Following admission, you should speak to the DUS to plan out which courses you plan to use fulfill the requirements of the B.S./M.A.

In short: you must take 20 courses in the S&DS department, 8 of which are taken at the graduate level.

In depth: per YCPS, the M.A. portion of your degree generally requires eight or more courses at the graduate level in addition to the standard requirements of the undergraduate degree. If you are pursuing the B.S./M.A., you can expect to take 14 credits in fulfillment of the requirements of the B.S. in S&DS and 8 additional credits to earn your M.A. Then, since you can use two courses at the graduate level for both your undergraduate and graduate degrees, that means you must take a total of 20 courses in the S&DS department, 8 of which are taken at the graduate level.

The 8 courses that are taken to complete the M.A. must cover the following four topic areas: probability, theory of statistics, data analysis, and computing/algorithms. These courses must all be at the graduate level. For any course that you would like to consider for the M.A., please consult the DUS to see if it would be appropriate and eligible for the degree. 

Note that these 8 courses cannot be entirely concentrated within your last year at Yale, and you must take at least six courses outside of the major within your last two years at Yale.

Programming

Computing is a core component of the S&DS major, and over the course of a B.A. or B.S., you will gain plenty of experience in using a programming language to aid you in your data analysis. You’ll find computing woven throughout courses in the major, and the hope is that at some point into the major, you have some idea of what to do if someone throws you a large, million-row dataset and asks you to come up with an interesting, actionable conclusion from the data.

At the same time, it’s important to realize that programming does not automatically solve your problems in statistics and data science. Any modeling you do is only as good as the assumptions that you make, which is why it’s really important to learn what those assumptions are and how to assess whether those assumptions are reasonable.

Just a few years ago, most students moving through the major would come out with a strong foundation in R, but most or all of them likely wouldn’t have touched Python at all. That’s changing, now that courses like S&DS 2620 (Computational Tools for Data Science) and the machine learning courses like S&DS 2650 and 3650 are now being taught in Python.

It still depends a lot on the specific courses you take to fulfill the requirements of the major. Generally, if your coursework focuses more heavily on data analysis, statistical inference, econometrics, and causal inference—and if you take many courses in adjacent departments like Economics and Political Science—you will find yourself more exposed to R, since these are disciplines that rely much more heavily on R in applied research. However, if your coursework has a stronger emphasis on machine learning, and if you perhaps find yourself taking courses in the Computer Science department, there’s a good chance you will end up with a strong foundation in Python.

Python and R are each good for different things—there’s no one answer to the question of whether R or Python is better, and it’s more important that you have a solid grasp of the basics of both so that you can build on that foundation for whatever project you’re working on. The similarities of both languages are widely recognized. Once you know one language, it’s really easy to pick up on the other language, so don’t worry that you won’t get hired for a job just because you know one language but not the other.

There are no formal prerequisites. Some students come in having taken one or a few computer science courses, but this is generally not needed. If you are interested in eventually taking data science-focused courses in the Computer Science department (such as CPSC 477 Natural Language Processing or CPSC 470 Artificial Intelligence), you may wish to begin taking the computer science sequence (CPSC 100, 201, 223) as early as you can to ensure that you meet relevant prerequisites. Otherwise, you will learn the programming you need through courses in the S&DS department.


 

Advising

In the beginning of each semester (and during the pre-registration period), the DUS will typically host DUS office hours. These will be announced weekly through S&DS major email list.

At other times during the semester, you can email the DUS at sds-dus@yale.edu at any time to schedule a time to chat.  We are happy to speak with students.

Any S&DS faculty member can be a sophomore advisor. We suggest you review faculty members’ websites and identify a few that are working on things of interest to you. Then, contact the faculty member and see if they are willing to advise you. (Note that once you are a declared major, the DUS automatically becomes your advisor.)

The DSAC is a good place to start: any of the students on DSAC would be more than willing to speak with you about their experiences in the major, course recommendations, job search advice, and any other questions you might have. See the next section for the current list of DSAC members.

Getting involved with the department

ULAs are Yale’s version of undergraduate teaching assistants. In S&DS courses, they primarily:

  • Grade quantitative homework assignments and provide feedback to students
  • Hold office hours to answer students’ questions about course content
  • Discuss student feedback and experiences in the course

The role typically takes 5-10 hours per week with an average of 7.5. You generally don’t need to have taken a course to be a ULA for the course, as long as you’ve taken higher-level courses or have attained similar skills in other similar courses. For a formal job description, visit the Yale Catalog’s page on ULAs (scroll down to “ULA Responsibilities”).

The department typically sends out ULA applications for a given semester at the end of the previous semester. The best way to ensure you receive notification of applications is by making sure you’re subscribed to the undergraduate email list.

Great ideas! Email the DUS at sds-dus@yale.edu to relay your feedback. If you’d like to help make these changes happen, you should also join the DSAC, which organizes projects and initiatives to improve the student experience in the department.

Broadly speaking, the DSAC makes the S&DS student experience better, smoother, and more welcoming. Activities vary from year to year, but some initiatives have included:

  • Organizing bluebooking events for students to ask questions about the major
  • Organizing social events for students to hang out and get to know each other
  • Developing resources for students, including the writing of this FAQ
  • Organize sweatshirt and swag orders for the department
  • Informally advise the DUS on the major by relaying student feedback

The DSAC can also be useful for undergraduates to seek advice about the major.

Email the DUS at sds-dus@yale.edu.  We’d love to have you!

Our listings of faculty and affiliates of S&DS, as well as members of the Institute of Foundations of Data Science (FDS), are a good place to start.  You may also want to visit the S&DS 4910/4920 Senior Essay page for a list of previous FDS Project Match events to get an idea of some of the data science research that is happening at Yale by FDS members. 

You may wish to reach out to S&DS professors and ask about research opportunities for undergraduates. You might also reach out to professors in adjacent departments such as Computer Science, Political Science, Economics, Global Affairs, Public Health, Biostatistics, and Management (and many others), given that much of modern social science research is reliant on statistical methodology. In particular, there are several professors who have joint appointments in Statistics & Data Science and Political Science who may be conducting interesting, applied research, where there are often more opportunities for undergraduates to get involved.

4910/4920 Senior FAQs

To fulfill the senior requirement, you must complete a senior project (S&DS 4910-4920). You can choose to take the senior project in either the Fall (S&DS 4910) or Spring (S&DS 4920) semester. 

The first step is to find an advisor. Ideally, a student will think of a research project and then find a faculty member who is willing to help supervise it. The faculty member will often suggest changes to the proposed project. Other students find advisors by approaching faculty members they know and asking if they can propose projects. It is strongly advised that you identify your advisor and research project the semester before you enroll in S&DS 4910-4920.

Once you know you want to complete a senior project, you should start reaching out to find an advisor soon. Many faculty members will receive requests from multiple students to advise their projects, and if you ask too late, faculty members may not have the time to take an additional advisee, even if you’ve proposed an interesting project.

Projects can be applied, computational, or theoretical—essentially anything that is relevant to statistics and data science.

If you already have a project idea in mind, you probably want to start by reaching out to a faculty member who is an expert in that area and ask whether they would be willing to advise your project. It would be smart to have read a few relevant articles and papers on the topic prior to emailing the faculty member, and the faculty member can suggest changes to make the project more tractable and feasible within the course of one semester.

If you’re still searching for a project, you can try getting in touch with professors you’ve met in classes or through other venues to see if they have suggestions for projects.

It’s really important that you choose an advisor who is easy to reach—someone who responds to your emails quickly, and who is willing to set aside an hour or so every week to check-in with you on your progress. Also note that your thesis advisor does not necessarily have to be a faculty member affiliated with the Statistics & Data Science department.

The Applied Math department’s FAQ provides really good advice on choosing advisers. With full credit given:

“A general rule of thumb is that the closer the professor’s main body of research is to your project, the better your project will turn out. This is for three important reasons: 1) 

  1. The professor will have expertise in the area and be able to provide you useful help
  2. The professor will be invested in your work because it is relevant to what the professor is spending time on his/herself,
  3. The professor will be better able to scope a project for you and figure out what work is feasible for someone at your level and will take about a semester to complete.

“The danger of working with a professor less familiar with what you want to do is that s/he may have less interest in or ability to help you, and you may end up with a project that hits a dead end six weeks in, or that turns out to be far too complicated for a semester’s worth of work. Going to an interesting professor first and asking for a project the professor already has an interest in can often help mitigate these risks.”

There are only two hard deadlines for the senior project:

  • By the end of Add/Drop, you must submit a 1–2 page project proposal to the instructor, roughly describing the project you have in mind. A proposal template is provided with more details. 
  • By the end of reading period, you should submit a project report and project poster to the course instructor. Your project will be graded by your project supervisor and the course instructor.

Much of the project is self-paced, which will require you to properly set milestones and hold yourself accountable to them. Most reports are around 20 pages, though those projects with extensive data or code may run longer. You will also have to present your work at the S&DS Poster Session, which is generally held during reading period. 

You can complete your senior project either in your senior fall or senior spring. There is no difference between these two options as far as evaluation or logistics. No matter which semester you choose to complete your thesis, you should begin reaching out to professors during the middle of the previous semester, and have your project finalized in the break prior to that semester.

Two factors you may wish to take into consideration:

  • The advisor you have in mind may have more availability during either the spring or fall semester, and it may be worth checking in in advance.
  • If you plan to apply to graduate school in the fall, completing your thesis in the fall would allow your thesis advisor to write you a stronger letter of recommendation.

Students can take 4910/4920 in their last two semesters at Yale. If, for example, a student is graduating in Dec 2025, they can take S&DS 4910 in Fall 2025, right before graduating, or S&DS 4920 in Spring 2025. 

Note that there is a poster session for senior thesis projects annually in May. If the student enrolls in S&DS 4920 in Spring 2025, they will participate in the poster session in May 2025.  If the student enrolls in S&DS 4910 in Fall 2025, their poster session is in May 2026. Since many students are not in the New Haven area after graduation, many December graduates end up not participating in the poster session in May, so that’s something to think about. Students typically enjoy the poster session as a culmination of their senior project and as a forum in which to share their work with classmates, friends, and professors. 

No.

No.