The online master’s in data science curriculum explores the theoretical concepts and technical skills that students need to succeed as quantitative problem solvers. This STEM-designated program features opportunities to conduct original research and to collaborate with faculty members and industry experts. By the time you graduate, you’ll be prepared to use statistical programming languages, manage data, implement automated tools, perform analyses, and generate valuable insights.
The curriculum consists of eight required courses and two electives. You’ll explore a variety of applications for data while using industry-standard programming languages such as Python and R and tools including Spark, Hadoop, MapReduce, Matlab, and Weka. In the Analytics Capstone, you’ll develop an independent project to solve a real-world problem by synthesizing what you’ve learned throughout the program.
You can customize the curriculum with your choice of six credits in elective courses selected from computer science, information systems, business analytics (finance, economics, marketing), and natural sciences (environmental science, biochemistry, molecular-biology). One elective course (three credits) must be in computer science and the other may be outside of the computer science department.
“During my coursework in machine learning, the final project we had to do was develop a handful of models AND develop explainers for those models. I learned various methods of explaining classification models like ELI5, LIME, and SHAP. Being able to see the primary variables that impact our classification models was interesting to see. It is one thing to build a model, but being able to explain how it worked and what affected it really opened my eyes.”
Mudassir Ali (MS in Data Science 2022)
Data Scientist, Northwell Health
Prerequisite/Bridge Courses
Applicants are required to submit transcripts showing they have achieved the necessary level of technical and mathematics knowledge to succeed in the program. Their background should include courses in calculus and linear algebra, as well as knowledge of probability, statistics, programming, and databases.
Students who do not meet the prerequisites for experience in programming and databases are required to complete the following online bridge courses before starting the core/foundation courses in the MS in Data Science curriculum:
Database management system installation and configuration, database’s role as a middleware in system hierarchy, Entity Relationship (E-R) model for logical design, schema normalization and performance tradeoffs, database management with SQL through database console, database programming through JDBC, event-processing with triggers, efficient data processing with stored-procedures, transactions management and ACID properties, database security, and crash recovery.
This course introduces students to the Python programming language with an emphasis on Python’s data analytics libraries. Students will learn the fundamentals of Python and key modules including: scipy, numpy, scikit-learn, pandas, statsmodels, and matplotlib. The course covers basic language syntax, object types, variables, reading data from files and writing to files. Building on these concepts, students will create functions, and learn how to control program flow. Students will use Python to clean and prepare data, conduct exploratory data analysis, and build predictive models.
MS in Data Science Course Descriptions
This course will provide an overview of topics such as introduction to data mining and knowledge discovery; data mining with structured and unstructured data; foundations of pattern clustering; clustering paradigms; clustering for data mining; data mining using neural networks and genetic algorithms; fast discovery of association rules; applications of data mining to pattern classification; and feature selection. The goal of this course is to introduce students to current machine learning and related data mining methods. It is intended to provide enough background to allow students to apply machine learning and data mining techniques to learning problems in a variety of application areas.
This course covers the fundamental mathematics needed for further study in data science, machine learning and artificial intelligence. Students will learn the theory and application of linear algebra, analytic geometry, matrix decompositions, vector calculus, probability theory and optimization. Building upon these mathematical foundations, the course culminates with an overview of some key machine learning concepts: linear regression; principal components analysis; density estimation; and support vector machines. The emphasis of this course is on the theory underlying data science methods and machine learning.
Students will discuss current issues in the field of analytics each week. The course will consist of assigned readings, and discussions, including discussions led by industry leaders, on topics relevant to data analytics and computing. Students will learn how to prepare analytical results for presentation to stakeholders, to translate technical material to non-technical discourse, and how to prepare for a career as data scientists. This course prepares students for their Analytics Capstone Project
*Effective Fall 2023 for new students. Current students in catalog years prior to Fall 2023 will not be required to take CS 667 and are required to take three electives and two must be CS.
After reviewing relational databases and SQL, students will learn the fundamentals of alternative data storage schemas to deal with large amounts of data (structured and unstructured). The course covers big data and the development of the Hadoop file system, the MapReduce programming paradigm, and database management systems such as Cassandra, HBase, and Neo4j. Students will learn about NoSQL, distributed databases, and graph databases. The course emphasizes the differences between traditional database management systems and alternatives with respect to accessibility, cost, transaction speed, and structure. Part of the course is dedicated to accessing, handling, and processing data from different sources and of different types using Python. The course provides hands-on practice.
This course introduces the concepts of data science. The course teaches students the interdisciplinary basis of data science and the data science process. Additionally the course covers data visualization, data wrangling, ethics of designing and conducting data analysis and research, bias in research, data privacy issues surrounding the use of data, and research reproducibility. Students will learn about statistical learning methods and then move on to more advanced topics including: database queries, working with spatial data, text mining, networks, and big data. The course also emphasizes writing technical reports and presenting results. The course prepares students for further study in data mining, machine learning, and artificial intelligence and introduces students to R.
This course focuses on the efficiency and complexity of algorithms needed for data analytics and has a computational emphasis. Students will develop proficiency in Python and R as they build algorithms and analyze data. Topics include data reduction: data mapping, data dictionaries, scalable algorithms, Hadoop, and MapReduce; gaining information from data: data visualization, regression modeling, and cluster analysis; and predictive analytics: k-nearest neighbors, naïve Bayes, time series forecasting, and analyzing streaming data, and optimization with gradient descent.
This course teaches students machine learning theory and algorithms. Students will learn about probably approximately correct (PAC), empirical risk minimization (ERM), structural risk minimization (SRM), and minimum description length (MDL) learning rules. Students will then study various machine learning algorithms, such as linear models, gradient descent, support vector machines (SVM), kernel methods, and trees, and how they connect to the theoretical framework. Finally, the course culminates with additional topics such as clustering, dimensionality reduction, generative models, and feature selection.
The purpose of the capstone project is for the students to apply the knowledge and skills acquired during the data science program to a project involving actual data in a real-world setting. During the project, students will apply the entire data science process from identifying a problem or opportunity, and collecting and processing actual data to applying suitable and appropriate analytic methods to find a solution. Both the problem statement for the project and the datasets will come from real-world domains similar to those that students might typically encounter within industry, government, or academic research. The course will culminate with each student making a presentation of his or her work, and submitting a final paper. This is a largely self-directed course, with guidance and suggestions provided along the way by the instructor.
Past topics explored by students include how to better predict ventilator settings to improve COVID-19 survival rates; how the number of attempts it takes a student to solve a programming problem can predict future classroom performance; and an analysis of how IBM’s stock price relates to R&D spending.
Elective Courses
Mathematical analysis of biochemical data. Concentrate on statistical analysis, probability, and confidence limits, as applied to the evaluation of scientific data. The appropriate use and presentation of mathematical analysis in a scientific paper will be discussed.
Applications of abstraction and divide-and-conquer in computer science (hardware, software, theory); essentials algorithms including searching, sorting, hashing and graphs; popular algorithms such as string machine, Map Reduce and RSA and their applications; complexity; computability; NP-hard problems, NP-complete problems, and undecided problems; finite state automata vs. regular expressions.
Overview of fundamentals of complex systems science. Concepts covered include reductionism, emergence, self-organization, and evolution. Topics covered include competition/cooperation, complexity/scale, relationship/component-centric analyses, and bottom-up/top-down control. Examples will be drawn from disciplines such as neuroscience, healthcare, education, information theory and cybernetics.
Database management system installation and configuration, database’s role as a middleware in system hierarchy, Entity Relationship (E-R) model for logical design, schema normalization and performance tradeoffs, database management with SQL through database console, database programming through JDBC, event-processing with triggers, efficient data processing with stored-procedures, transactions management and ACID properties, database security, and crash recovery.
Theoretical, computational and applied areas of linear and, to some extent, non-linear programming. Formulation of linear programs solution by simplex method, duality problems and importance of Lagrange multipliers will be discussed. Efficient computational techniques, degeneracy procedures, transportation problems and quadratic programming problems and projection methods (active set) to solve non-linear programs will be reviewed.
Theory and data structures and algorithms related to artificial intelligence and heuristic programming. Topics include description of cognitive processes, definition of heuristic vs. algorithmic methods, state space and problem reduction, search methods, theorem proving, natural language processing and pattern recognition techniques.
This course will explore the latest algorithms for analyzing online social networks, considering both their structure and content. Fundamentals of social graph theory will be covered including distance, search, influence, community discovery, diffusion, and graph dynamics.Fundamentals of text analysis will also be covered with an emphasis on the type of text used in online social networks and common applications. Topics include information extraction, clustering, and topic modeling.
This course introduces students to the Python programming language with an emphasis on Python’s data analytics libraries. Students will learn the fundamentals of Python and key modules including: scipy, numpy, scikit-learn, pandas, statsmodels, and matplotlib. The course covers basic language syntax, object types, variables, reading data from files and writing to files. Building on these concepts, students will create functions, and learn how to control program flow. Students will use Python to clean and prepare data, conduct exploratory data analysis, and build predictive models.
This course introduces the student to computer vision algorithms, methods and concepts which will enable the student to implement computer vision systems with emphasis on visual pattern recognition. Upon successful completion of this course of study a student will have general knowledge of image analysis and processing, pattern recognition techniques, and some experience with research in computer vision. Topics to be studied; data structures for visual pattern representation, feature extraction, basis theory, decision trees, nearest neighbor, artificial neural networks, clustering etc. The students once completing the course should be competent enough to conduct research in this area. The students will be required to critique a current paper from the literature in this area, present it to the class, implement the presented algorithm and evaluate the strengths and shortcomings.
Effective operation of Windows and Linux computers. Installation and management of analytics software. Effective usage of analytics cloud servers. Introduction to IBM Cognos BI Administration. Configuration and Customization of the Cognos BI Environment.
Data modeling. Star and snow schema. Multi-dimensional modeling. Inman and Kimbel approaches to data warehousing. Changing business with data insight. Architecting the data warehouse.
This course will provide an introduction to basic concepts and methodologies for digital image processing and the applications. Fundamental digital image processing techniques including enhancement, filtering, morphology, Fourier transform and segmentation will be discussed. The course will also introduce students how to use Matlab as an image processing tool. Matlab-based course projects will be used to illustrate and practice the image processing techniques. Students will gain understanding of algorithm design and hands-on experiences on how to process and analyze digital images using Matlab.
Students will learn how to design, implement, and evaluate a pipeline for supervised classification of structured data, using a variety of Machine Learning techniques (e.g., Logistic Regression). Apply Deep Learning techniques (e.g., Convolutional Neural Networks, Recurrent Neural Networks) to classify unstructured data, including images and text. Describe important considerations for applying Machine Learning in practice.
This course provides students with an understanding of the concepts, technologies and implementation considerations behind blockchain. Using a hands-on approach, the course covers a range of essential topics, from distributed systems to the cryptographic foundations of blockchain to consensus and smart contracts. Blockchain applications across sectors are presented, with a focus on financial applications and new developments.
The purpose of this course is to acquire a thorough grounding in the core principles and foundations of computer science. After a review of foundational algorithm analysis, students will learn advanced algorithmic techniques such as randomized and approximation algorithms. Problems arising in number theory such as Primality Testing and Factorization will lead the path to study the RSA public-key crypto-system. Classical algorithms for String Matching, with applications to computational biology, such as Rabin-Karp and Knuth-Morris-Pratt, will also be studied. Advanced data structures particularly suited for certain applications, such as B-trees, van Emde Boas trees, and skip lists, will be studied. The question of what problems are hard to compute will be addressed studying the NP-completeness theory, including the identification of NP-hard problems by reductions. Hard problems such as Traveling Salesman, Knapsack, and Vertex Cover will be studied in the context of approximation algorithms.
Advanced topics in Artificial Intelligence include planning, probabilistic reasoning, Markov decision processes, reinforcement learning, deep neural networks, Bayesian learning, and natural language processing.
This course covers advanced research topics in computer vision. Building on the introductory materials covered in the Computer Vision pre-requisite class, this class will prepare graduate students in both the theoretical foundations of computer vision as well as the practical approaches to building real Computer Vision systems. This course investigates current research topics in computer vision with an emphasis on recognition tasks and deep learning. Topics include optical flow, object tracking, object recognition, bag-of-features representation, deep neural networks, etc. We will examine data sources, features, and learning algorithms useful for understanding and manipulating visual data.
This course focuses on the fundamental concepts, theories, and algorithms for pattern recognition and machine learning. Diverse application areas such as optical character recognition, speech recognition, and biometrics are discussed. Topics covered include supervised and unsupervised (clustering) pattern classification algorithms, parametric and non-parametric supervised learning techniques, including Bayesian decision theory, discriminant functions, the nearest neighbor algorithm, and neural networks with emphasis on deep learning.
Introduces the student to the principles of game theory and its application to business and economic situations in interactive settings. Concepts will be demonstrated through the use of business case studies and interactive experiments.
Familiarizes students with applied financial econometrics, with emphasis on empirical analysis of economic and financial data using statistical software packages. Teaches to pursue applied data projects. Methods covered include: simple and multiple linear regression models, regression with time series variables, volatility models, Granger causality, vector auto regressions, forecasting, panel data analysis.
This one semester lecture course focuses on improving the level of student understanding in quantitative analysis tools in environmental science. Students will survey principles of sampling methodology, testing protocols, analytical tools, data evaluation, and statistics, as applied to environmental problems. This will prepare the students as leading scientists and researchers for their future career in environmental sciences. Demonstrations of experiments and exercises, with emphasis on environmental applications, will cover quantitative analytical methodologies such as titration, extraction, UV-VIS, Fluorescence, IR, AA, GC, HPLC, GC-MS, etc.
This course covers issues related to the proper manner in which to develop and conduct a research project. Statistical issues related to environmental evaluations will be discussed, including minimal detectable levels, proper sample size, and determination of proper methods for evaluation of data, using both parametric and nonparametric procedures.
Introduces advanced methodological tools required to do research in finance and investment analysis. Topics include study of simple linear regression, multiple regression analysis, analysis of variance, discriminant analysis, factor analysis, and non-parametric tests. Emphasizes modern portfolio theory. Use of computers is required.
This course teaches estimation and forecasting of time series models in finance. Students will learn how to measure and forecast financial volatility and correlations and become proficient with GARCH type models and historical volatilities. These methods will be used to measure risk and analyze alternative approaches to calculating Value at Risk, dynamic portfolio selection and risk control. The course also examines implied volatilities from options, variance swaps, credit risk models, market (in) efficiency, dynamic relationships between global financial markets, and high frequency volatility. The course teaches estimation, Monte Carlo simulations, and programming methods.
Survey of the types of artificial intelligence that exist. Algorithmic versus heuristic programming; search trees, search algorithms, information retrieval, robotics, and expert systems. State-of-the-art and future trends of these and other forms of artificial intelligence will be explored.
This course combines project management methods and structured systems development techniques and applies them to the complex world of information systems development. Change management is a complicated and crucial aspect of information systems implementation, and will also be addressed by this course. The central project management functions-planning, organizing and controlling-are presented in the context of the systems development process. Topics include project planning, estimating, testing, implementation, documentation, management of change, utilization of services consultants, software houses, turn-key systems, and proprietary software packages.
This course covers one of the most debated current issues facing IT in the corporate environment aligning with the business units for maximizing profitability. The course covers the changing organizational role of IT departments and the new dynamics between the business managers, the CIO, and the CFO. The various financial methodologies for understanding the management of IT investments, such as return on investment (ROI) and total cost of ownership (TOC) will also be discussed.
This course is an introduction to database programming. Concepts and techniques of data definition and data manipulation using SQL will be stressed. Students will design and implement a database in a relational database environment. Topics covered include creating database structure, populating the database, maintaining data, retrieving data, administering the database, and optimizing queries.
This course provides a foundation for learning the basic concepts of data mining and visualization. The course focuses on distinctly “real-world” orientation that emphasizes application of data analysis over algorithm design and development in most topic areas. The course prerequisites are an understanding of database concepts and familiarity with information or business decision systems.
This course applies theoretical and applied aspects of database design to web-based applications. This course will review the basics of database technology, cover different development platforms, and develop projects that connect client-side interfaces to server-side databases.
This course provides an introduction to the analysis and design of geographic information systems. These are systems for which the data and solutions are location based. GIS systems are used in a variety of disciplines and applications including geoscience, environmental science, government, land management, non-profits, and business. Students will learn how to create comprehensive GIS systems in a range of application areas. Solutions to problems will be done in Esri’s ArcGIS desktop software.
Data and analytics are changing the world and the way we are making decisions, thanks to the enormous and increasing amount of data available to us. Behind this vast amount of data lies the greatest potential to understand reality and predict future events. As this potential is being realized, more organizations are investing substantial amounts of money in this discipline that is collectively known as Big Data. Yet, we are facing several challenges, both technological and organizational. From a technology perspective, we see an increasing need to collect more data from sources both internal and external. This is widening the analytical gap within the organization due to the inability to properly address the volume, variety, and velocity of the data. Moreover, organizations are struggling to streamline their advanced analytical capabilities and unable to efficiently respond to the needs of the business of making better decisions faster by converting data into insight. This course will explore the multifaceted reality of Big Data and students will not only learn the underlying principles of data analytics, but also the organizational challenges that Big Data poses to an enterprise. The objective of this course is to introduce students to data science approaches to mine large amounts of information, the necessary tools, and learn from real use cases what is necessary for a company to create Big Data Centers of Excellence in order to successfully turn data analysis into competitive advantage. Additionally, students will also learn about using Hadoop, MapReduce to process and analyze large datasets, and data mining algorithms used for classification, estimation, and prediction purposes.
Examines the use of research as a tool for decision-making. Topics include: defining information needs, value of information, scientific method, exploratory research, questionnaire construction, sample design, field work, editing, tabulation, report writing, and presentation.
This is an application oriented course aimed at developing skills in getting, exploring, manipulating, analyzing, and presenting business data using data visualizations. It will employ visualization software such as Tableau.
Develops competence in a wide array of predictive analytical techniques used in business. Uses a case-based approach to enable application of analytical techniques to marketing activities such as segmentation, targeting, positioning, choice modeling, new product design, forecasting, advertising, promotion, and sales force management.
Develops skills in transforming business data into actionable information. Uses various predictive modeling tools such as decision trees, neural networks, and regression and pattern discovery tools such as cluster analysis, market basket analysis, and text analytics. Analysis conducted using SAS software such as Enterprise Miner, Text Miner, Enterprise Guide, and Forecast Studio.
Students are taught to apply statistical tools useful for making effective managerial decisions in a disorganized and uncertain environment. They will develop valuable data analysis skills and the ability to choose appropriate statistical methods and interpret them in realistic management cases. These tools include methods for collecting, organizing, presenting, and analyzing data utilizing Excel and commercial-level software add-ins. Topics include applying statistical methods and presenting results for making better decisions using descriptive statistics, probability theory-including important statistical distributions, estimation, hypothesis testing, and simple and multiple regression analysis. The emphasis will be on applying these tools to improve decision making in all functional areas of business. NOT OPEN TO STUDENTS WHO HAVE TAKEN MBA 628.
Business decision modeling is applied in all types of organizations to complement traditional approaches to managerial decision making. This course prepares managers to be active participants in the model building process by providing hands-on experience using realistic data and commercial-level Excel software add-ins. The modeling approach is successfully applied to decision problems in human resource management, service delivery systems, marketing, finance, production, and logistics. Applications of linear programming, nonlinear programming, PERT/CPM, forecasting, decision trees, queuing, and Monte Carlo simulation are covered in this course.