The online master’s in data science curriculum explores the theoretical concepts and technical skills that students need to succeed as quantitative problem solvers. This STEM-designated program features opportunities to conduct original research and to collaborate with faculty members and industry experts. By the time you graduate, you’ll be prepared to use statistical programming languages, manage data, implement automated tools, perform analyses, and generate valuable insights.

The curriculum consists of seven required courses and three electives. You’ll explore a variety of applications for data while using industry-standard programming languages such as Python and R and tools including Spark, Hadoop, MapReduce, Matlab, and Weka. In the Analytics Capstone, you’ll develop an independent project to solve a real-world problem by synthesizing what you’ve learned throughout the program.

You can customize the curriculum with your choice of nine credits in elective courses selected from computer science, information systems, business analytics (finance, economics, marketing), and natural sciences (environmental science, biochemistry, molecular-biology). However, you are limited to only one elective course (maximum three credits) outside of the computer science department.


Curriculum Structure

Fall 1 (9 credits)

  • CS 660 Mathematical Foundations of Analytics (3 credits)
  • CS 675 Introduction to Data Science (3 credits)
  • CS 673 Scalable Databases (3 credits)

Spring 1 (9 credits)

  • CS 676 Algorithms for Data Science (3 credits)
  • CS 619 Data Mining (3 credits)
  • CS 632M Machine Learning (3 credits)

Fall 2 (9 credits)

  • 2 Electives (3 credits each)
  • CS 668 Analytics Capstone Project (3 credits)

Spring 2 (3 credits)

  • Elective (3 credits)

Total Graduate Credits: 30


Prerequisite/Bridge Courses

Applicants are required to submit transcripts showing they have achieved the necessary level of technical and mathematics knowledge to succeed in the program. Their background should include courses in calculus and linear algebra, as well as knowledge of probability, statistics, programming, and databases.

Students who do not meet the prerequisites for experience in programming and databases are required to complete the following online bridge courses before starting the core/foundation courses in the MS in Data Science curriculum:

Database management system installation and configuration, database’s role as a middleware in system hierarchy, Entity Relationship (E-R) model for logical design, schema normalization and performance tradeoffs, database management with SQL through database console, database programming through JDBC, event-processing with triggers, efficient data processing with stored-procedures, transactions management and ACID properties, database security, and crash recovery.
Prerequisite: Prior programming experience. This course introduces students to the Python programming language with an emphasis on Python’s data analytics libraries. Students will learn the fundamentals of Python and key modules including: scipy, numpy, scikit-learn, pandas, statsmodels, and matplotlib. The course covers basic language syntax, object types, variables, reading data from files, and writing to files. Building on these concepts, students will create functions, and learn how to control program flow. Students will use Python to extract, clean, and prepare data, conduct exploratory data analysis, and build predictive models.

MS in Data Science Course Descriptions

This course covers the fundamental mathematics needed for further study in data science, machine learning and artificial intelligence. Students will learn the theory and application of linear algebra, analytic geometry, matrix decompositions, vector calculus, probability theory and optimization. Building upon these mathematical foundations, the course culminates with an overview of some key machine learning concepts: linear regression; principal components analysis; density estimation; and support vector machines. The emphasis of this course is on the theory underlying data science methods and machine learning.
This course introduces the concepts of data science. The course teaches students the interdisciplinary basis of data science and the data science process. Additionally the course covers data visualization, data wrangling, ethics of designing and conducting data analysis and research, bias in research, data privacy issues surrounding the use of data, and research reproducibility. Students will learn about statistical learning methods and then move on to more advanced topics including: database queries, working with spatial data, text mining, networks, and big data. The course also emphasizes writing technical reports and presenting results. The course prepares students for further study in data mining, machine learning, and artificial intelligence and introduces students to R.
After reviewing relational databases and SQL, students will learn the fundamentals of alternative data storage schemas to deal with large amounts of data (structured and unstructured). The course covers big data and the development of the Hadoop file system, the MapReduce programming paradigm, and database management systems such as Cassandra, HBase, and Neo4j. Students will learn about NoSQL, distributed databases, and graph databases. The course emphasizes the differences between traditional database management systems and alternatives with respect to accessibility, cost, transaction speed, and structure. Part of the course is dedicated to accessing, handling, and processing data from different sources and of different types using Python. The course provides hands-on practice.
This course focuses on the efficiency and complexity of algorithms needed for data analytics and has a computational emphasis. Students will develop proficiency in Python and R as they build algorithms and analyze data. Topics include data reduction: data mapping, data dictionaries, scalable algorithms, Hadoop, and MapReduce; gaining information from data: data visualization, regression modeling, and cluster analysis; and predictive analytics: k-nearest neighbors, naïve Bayes, time series forecasting, and analyzing streaming data, and optimization with gradient descent.
This course will provide an overview of topics such as introduction to data mining and knowledge discovery; data mining with structured and unstructured data; foundations of pattern clustering; clustering paradigms; clustering for data mining; data mining using neural networks and genetic algorithms; fast discovery of association rules; applications of data mining to pattern classification; and feature selection. The goal of this course is to introduce students to current machine learning and related data mining methods. It is intended to provide enough background to allow students to apply machine learning and data mining techniques to learning problems in a variety of application areas.
This course teaches students machine learning theory and algorithms. Students will learn about probably approximately correct, empirical risk minimization, structural risk minimization, and minimum description length learning rules. Students will then study various machine learning algorithms, such as linear models, gradient descent, support vector machines, kernel methods, and trees, and how they connect to the theoretical framework. Finally, the course culminates with additional topics such as clustering, dimensionality reduction, generative models, and feature selection.
The purpose of the capstone project is for the students to apply the knowledge and skills acquired during the data science program to a project involving actual data in a real-world setting. During the project, students will apply the entire data science process from identifying a problem or opportunity, and collecting and processing actual data to applying suitable and appropriate analytic methods to find a solution. Both the problem statement for the project and the datasets will come from real-world domains similar to those that students might typically encounter within industry, government, or academic research. The course will culminate with each student making a presentation of his or her work, and submitting a final paper. This is a largely self-directed course, with guidance and suggestions provided along the way by the instructor.

Elective Courses

Applications of abstraction and divide-and-conquer in computer science (hardware, software, theory); essentials algorithms including searching, sorting, hashing and graphs; popular algorithms such as string machine, Map Reduce and RSA and their applications; complexity; computability; NP-hard problems, NP-complete problems, and undecided problems; finite state automata vs. regular expressions.
Overview of fundamentals of complex systems science. Concepts covered include reductionism, emergence, self-organization, and evolution. Topics covered include competition/cooperation, complexity/scale, relationship/component-centric analyses, and bottom-up/top-down control. Examples will be drawn from disciplines such as neuroscience, healthcare, education, information theory and cybernetics.
Database management system installation and configuration, database’s role as a middleware in system hierarchy, Entity Relationship (E-R) model for logical design, schema normalization and performance tradeoffs, database management with SQL through database console, database programming through JDBC, event-processing with triggers, efficient data processing with stored-procedures, transactions management and ACID properties, database security, and crash recovery.
Theoretical, computational and applied areas of linear and, to some extent, non-linear programming. Formulation of linear programs solution by simplex method, duality problems and importance of Lagrange multipliers will be discussed. Efficient computational techniques, degeneracy procedures, transportation problems and quadratic programming problems and projection methods (active set) to solve non-linear programs will be reviewed.
Theory and data structures and algorithms related to artificial intelligence and heuristic programming. Topics include description of cognitive processes, definition of heuristic vs. algorithmic methods, state space and problem reduction, search methods, theorem proving, natural language processing and pattern recognition techniques.
This course will explore the latest algorithms for analyzing online social networks, considering both their structure and content. Fundamentals of social graph theory will be covered including distance, search, influence, community discovery, diffusion, and graph dynamics.Fundamentals of text analysis will also be covered with an emphasis on the type of text used in online social networks and common applications. Topics include information extraction, clustering, and topic modeling.
This course introduces students to the Python programming language with an emphasis on Python’s data analytics libraries. Students will learn the fundamentals of Python and key modules including: scipy, numpy, scikit-learn, pandas, statsmodels, and matplotlib . The course covers basic language syntax, object types, variables, reading data from files and writing to files. Building on these concepts, students will create functions, and learn how to control program flow. Students will use Python to extract, clean and prepare data, conduct exploratory data analysis, and build predictive models.
Students will discuss current issues in the field of analytics each week. The course will consist of assigned readings, and discussions, including discussions led by industry leaders, on topics relevant to data analytics and computing.
This course introduces the student to computer vision algorithms, methods and concepts which will enable the student to implement computer vision systems with emphasis on visual pattern recognition. Upon successful completion of this course of study a student will have general knowledge of image analysis and processing, pattern recognition techniques, and some experience with research in computer vision. Topics to be studied; data structures for visual pattern representation, feature extraction, basis theory, decision trees, nearest neighbor, artificial neural networks, clustering etc. The students once completing the course should be competent enough to conduct research in this area. The students will be required to critique a current paper from the literature in this area, present it to the class, implement the presented algorithm and evaluate the strengths and shortcomings.
Effective operation of Windows and Linux computers. Installation and management of analytics software. Effective usage of analytics cloud servers. Introduction to IBM Cognos BI Administration. Configuration and Customization of the Cognos BI Environment.
Data modeling. Star and snow schema. Multi-dimensional modeling. Inman and Kimbel approaches to data warehousing. Changing business with data insight. Architecting the data warehouse.
This course will provide an introduction to basic concepts and methodologies for digital image processing and the applications. Fundamental digital image processing techniques including enhancement, filtering, morphology, Fourier transform and segmentation will be discussed. The course will also introduce students how to use Matlab as an image processing tool. Matlab-based course projects will be used to illustrate and practice the image processing techniques. Students will gain understanding of algorithm design and hands-on experiences on how to process and analyze digital images using Matlab.
Students will learn how to design, implement, and evaluate a pipeline for supervised classification of structured data, using a variety of Machine Learning techniques (e.g., Logistic Regression). Apply Deep Learning techniques (e.g., Convolutional Neural Networks, Recurrent Neural Networks) to classify unstructured data, including images and text. Describe important considerations for applying Machine Learning in practice.
This course provides students with an understanding of the concepts, technologies and implementation considerations behind blockchain. Using a hands-on approach, the course covers a range of essential topics, from distributed systems to the cryptographic foundations of blockchain to consensus and smart contracts. Blockchain applications across sectors are presented, with a focus on financial applications and new developments.
The purpose of this course is to acquire a thorough grounding in the core principles and foundations of computer science. After a review of foundational algorithm analysis, students will learn advanced algorithmic techniques such as randomized and approximation algorithms. Problems arising in number theory such as Primality Testing and Factorization will lead the path to study the RSA public-key crypto-system. Classical algorithms for String Matching, with applications to computational biology, such as Rabin-Karp and Knuth-Morris-Pratt, will also be studied. Advanced data structures particularly suited for certain applications, such as B-trees, van Emde Boas trees, and skip lists, will be studied. The question of what problems are hard to compute will be addressed studying the NP-completeness theory, including the identification of NP-hard problems by reductions. Hard problems such as Traveling Salesman, Knapsack, and Vertex Cover will be studied in the context of approximation algorithms.
Advanced topics in Artificial Intelligence include planning, probabilistic reasoning, Markov decision processes, reinforcement learning, deep neural networks, Bayesian learning, and natural language processing.
This course covers advanced research topics in computer vision. Building on the introductory materials covered in the Computer Vision pre-requisite class, this class will prepare graduate students in both the theoretical foundations of computer vision as well as the practical approaches to building real Computer Vision systems. This course investigates current research topics in computer vision with an emphasis on recognition tasks and deep learning. Topics include optical flow, object tracking, object recognition, bag-of-features representation, deep neural networks, etc. We will examine data sources, features, and learning algorithms useful for understanding and manipulating visual data.
This course focuses on the fundamental concepts, theories, and algorithms for pattern recognition and machine learning. Diverse application areas such as optical character recognition, speech recognition, and biometrics are discussed. Topics covered include supervised and unsupervised (clustering) pattern classification algorithms, parametric and non-parametric supervised learning techniques, including Bayesian decision theory, discriminant functions, the nearest neighbor algorithm, and neural networks with emphasis on deep learning.

Request Information

To learn more about online Master of Science in Data Science, fill out the fields in this form to download a free brochure. If you have any questions at any time, please contact an admission advisor at (866) 843-7205.

* All Fields are Required. Your Privacy is Protected.
Are you enrolling from outside the US? Click here.