Strength in Numbers

The online MSDS is based on the same principles and rigorous course of study as our on-Grounds program. Students will move through a varied curriculum drawing from different disciplines, all with the unifying goal of becoming leaders in data science and analysis. Coursework includes quantitative methodology as well as a humanist component and plenty of collaboration to ensure that our students approach data science holistically, as part of a team, and with the utmost integrity.

MSDS Curriculum (2 years approximately, 14 graded courses, 32 credit hours)*

  • MSDS students will be automatically enrolled by the School of Data Science into all courses for the online MSDS program.
  • Students can expect one hour per class of live synchronous time, which will be scheduled in the evenings during weekdays. Approximately 15 to 20 hours of related course work (asynchronous sessions, homework, readings, etc.) can be expected for two 3 credit hour courses (the typical part-time course load). However, this can vary course-to-course and also depends on the individual students’ background, skills/abilities and learning styles.
  • A minimum of B- in each class and a cumulative gpa of 3.00 are required to meet degree requirements.

*The MSDS curriculum is evolved every academic year to keep up with industry standards and therefore is subject to change.

CS 5010: Programming and Systems for Data Science (3)

The objective of this course is to introduce basic data analysis techniques including data analysis at scale, in the context of real-world domains such as bioinformatics, public health, marketing, security, etc. For the purpose of facilitating data manipulation and analysis, students will be introduced to essential programming techniques in Python, an increasingly prominent language for data science and “big data” manipulation.

STAT 6021: Linear Models for Data Science (3)

An introduction to linear statistical models in the context of data science. Topics include simple and multiple linear regression, generalized linear models, time series, analysis of covariance, tree-based classification, and principal components. The primary software is R.

CS 5012: Foundations of Computer Science (3)

Provide a foundation in discrete mathematics, data structures, algorithmic design and implementation, computational complexity, parallel computing, and data integrity and consistency for non-CS, non-CpE students. Case studies and exercises will be drawn from real-world examples (e.g., bioinformatics, public health, marketing, and security).

SYS 6018: Data Mining (3)

Data mining describes approaches to turning data into information. Rather than the more typical deductive strategy of building models using known principles, data mining uses inductive approaches to discover the appropriate models. These models describe a relationship between a system’s response and a set of factors or predictor variables. Data mining in this context provides a formal basis for machine learning and knowledge discovery. This course investigates the construction of empirical models from data mining for systems with both discrete and continuous valued responses. It covers both estimation and classification, and explores both practical and theoretical aspects of data mining.

In the third term, students take 2 data science special topics courses which align with what our industry partners suggest are critical for success. Sample courses include:

DS 5001: Exploratory Text Analytics (3)

Introduction to text analytics with a focus on long-form documents, such as novels and newspapers. Students convert source texts into graph and vector-space representations and apply methods such as term frequency measures, topic models, and sentiment analyses in order to address problems of classification, clustering, and other areas, such as social event detection and structuralist poetics. Involves basic Python and some probability theory.

DS 5559: Big Data Analytics (3)

Increasingly, data scientists and data engineers are working with datasets that exceed the memory of a single machine. This motivates the need for a different paradigm of computing, and a different toolset. This course will prepare you for this use case. The focus of the course is learning Spark, an open-source, general-purpose computing framework that is scalable and blazingly fast. The fundamental data types and concepts will be covered (e.g., resilient distributed datasets, DataFrames). You will learn how to use Spark for large-scale analytics and machine learning, among other topics. Tools for data storage and retrieval will be covered, including AWS and the Hadoop ecosystem.

DS 6001: Practice and Application of Data Science I (2)

This course covers the practice of data science practice, including communication, exploratory data analysis, and visualization. Also covered are the selection of algorithms to suit the problem to be solved, user needs, and data. Case studies will explore the impact of data science across different domains.

DS 6002: Ethics of Big Data I (1)

This course examines the ethical issues arising around big data and provides frameworks, context, concepts, and theories to help students think through and deal with the issues as they encounter them in their professional lives.

DS 6011: Data Science Capstone Project Work I (1)

This course is designed for capstone project teams to meet in groups, with advisors, and with clients to advance work on their projects.

DS 6014: Bayesian Machine Learning (3)

Bayesian inferential methods provide a foundation for machine learning under conditions of uncertainty. Bayesian machine learning techniques can help us to more effectively address the limits to our understanding of world problems. This class covers the major related techniques, including Bayesian inference, conjugate prior probabilities, naive Bayes classifiers, expectation maximization, Markov chain monte carlo, and variational inference.

DS 6003: Practice and Application of Data Science II (1)

This course covers the practice of data science practice, including communication, exploratory data analysis, and visualization. Also covered are the selection of algorithms to suit the problem to be solved, user needs, and data. Students will use their capstone projects to explore the impact of data science on that domain.

DS 6012: Ethics of Big Data II (1)

This course examines the ethical issues arising around big data and provides frameworks, context, concepts, and theories to help students think through and deal with the issues as they encounter them in their professional lives.

DS 6013: Data Science Capstone Project Work II (2)

This course is designed for capstone project teams to meet in groups, with advisors, and with clients to advance work on their projects.

SYS 6016: Machine Learning (3)

A graduate-level course on machine learning techniques and applications with emphasis on their application to systems engineering. Topics include: Bayesian learning, evolutionary algorithms, instance-based learning, reinforcement learning, and neural networks. Students are required to have sufficient computational background to complete several substantive programming assignments. Prerequisite: A course covering statistical techniques such as regression.