Data scientist analyzing healthcare data on a computer

Data scientists are in high demand everywhere you turn, with 35% growth in employment expected from 2022 to 2032—a rate much higher than the national average. Organizations across industries are increasingly understanding the importance of finding actionable insights from data and making strategic business decisions accordingly.

In healthcare, the sheer volume of data collected annually is staggering. On average, a hospital produces 50 petabytes of data each year—approximately 500 billion pages of standard printed text—but only 3% of that data is actually used. The amount of data and the expected growth of the healthcare big data market, which is predicted to reach $105.73 billion by 2030, makes the work of data scientists all the more crucial.

This article covers the skills that healthcare data scientists need to be successful when working with vast amounts of data. It also looks at the current landscape of the data industry, including the ways data scientists have improved healthcare functions, how they are addressing data challenges in the industry, and what healthcare and data science advancements are on the horizon.

Data Science Skills and Opportunities in Healthcare

Since the healthcare industry is receiving an influx of data and information through a variety of avenues, healthcare data scientists are highly sought after and well compensated for their work. Data scientists working in healthcare earn around $89,500 annually, according to the labor analytics site Lightcast.

In general, data scientists are responsible for delivering important insights gathered from real-world data so that organizations can make informed decisions. While there are as many ways to practice data science as there is data to sift through, we’ve highlighted some common skills and knowledge that data scientists in healthcare need, as well as various applications.

Artificial Intelligence

Healthcare is predicted to be one of the industries most disrupted by artificial intelligence (AI). From uses in remote and mobile healthcare, to natural language processing, to machine learning and more, AI is an area where healthcare data scientists play a pivotal role.

The online Master of Science in Data Science offers an elective course in artificial intelligence enables students to learn about the theory, data structures, and algorithms involved in artificial intelligence and heuristic programming. Among other things, the course content covers search methods, natural language processing, and pattern recognition techniques.

Applications in Mobile Health

Over the past five years, healthcare applications and wearable healthcare devices by adults in the United States have grown steadily in their use. According to a 2023 survey, 40% of adults in the US are using healthcare-related apps, such as MyChart or MyFitnessPal, and 35% are using wearables, such as a Fitbit or Apple Watch.

Data scientists ensure that users receive meaningful insights from their data input, and that clinicians remain up-to-date on an individual’s health. The mobile health model (mHealth) involves automatically connecting the data produced from wearables and electronic health records to a smartphone (and vice versa) and sending that data through various artificial intelligence models and tools, as well as big data analytics, to gather insights. The patient and clinician are then alerted to important findings.

Natural Language Processing

Natural language processing (NLP) software trains computers to register human speech and text. NLP can allow computers and programs to understand even the most complicated of linguistic knowledge, and, with the help of computer scientists and data scientists, transform that knowledge into algorithms that solve problems and simplify tasks.

Data scientists assist in building NLP applications that aid in healthcare in a number of ways, including improving surgery. As a more everyday example, natural language processing can transcribe a nurse or doctor’s notes in real time, eliminating the need for time-consuming manual transcription, which can sometimes result in incorrect records.

NLP can also be used in patient billing and in scanning electronic health records (EHRs) for comorbidities or anything else a clinician should be aware of when a patient is undergoing treatment.

Pattern Recognition

Pattern recognition involves using machine learning algorithms to analyze data for specific patterns. In healthcare, there are many applications, including one specific study that used electronic health records to determine whether there is a link between living in an urban environment and having a severe mental illness.

The Pace University online master’s in data science gives students the opportunity to complete an elective course on pattern recognition that teaches key concepts, theories, and algorithms, as well as various applications—including speech recognition and biometrics—and techniques.

Data Mining

Data mining involves the use of statistics and artificial intelligence to examine datasets by setting goals, preparing data, applying algorithms, reviewing findings, and using them as needed. In healthcare, data mining is often used to review the chemical composition of prescriptions and current research in order to help providers and patients in many ways, including guiding clinicians and patients to steer clear of dangerous food and drug interactions.

Another major issue in healthcare is insurance fraud, amounting to $3.1 billion in costs in 2021 alone. Through analytics, data mining warns of anything in claims that looks suspicious.


Genomic data science involves the use of computer science and statistics to examine DNA sequences in order to better understand human health and disease. DNA can be sequenced much more quickly than geneticists can decode the information it contains, which means it will be a major field for many years to come.

Machine Learning

Machine learning, an additional subset of AI, has been enriching healthcare since its introduction to the field in the 1960s, but its astounding modern applications have exponentially increased its use in recent years. Data scientists working in healthcare are tasked with developing and implementing machine learning models to better manage and utilize data.

The Pace online master’s in data science includes a core course in machine learning, delving into theoretical frameworks, algorithms, and key methods. An elective course in deep learning provides the opportunity to harness machine learning techniques to classify structured data and apply deep learning techniques to classify unstructured data.

The many uses for machine learning in the healthcare industry include:

Electronic Health Record Analysis

A report by The Office of the National Coordinator for Health IT indicates that 96% of non-federal acute care hospitals and nearly 4 of 5 office-based physicians are using certified electronic health record (EHR) systems—a respective 243% and 135% increase in usage from 2011. Such widespread use of EHR systems means there is a wealth of easily-accessible data for data scientists to use.

Data scientists analyze EHRs for many purposes, including to track the health of varying populations and to discover ideal treatment strategies for diseases, infections, and chronic conditions. Though these insights are an added benefit to EHRs, they do not exist solely for research purposes, which should be taken into consideration when using them this way. Human error can lead to issues like sample selection bias and imprecise variable definitions. Clinical teams can help provide data scientists with the necessary background information on a dataset.

Medical Imaging

Medical imaging, including x-rays, CT scans and MRIs, is used to see inside an individual’s body and diagnose issues. Computer vision makes it possible for computers to distinguish objects in images and videos. Machine learning is also able to identify certain conditions from medical images to the same ability as medical experts.

Predictive Analytics

The goal and advantage of predictive analytics in healthcare is to enhance care and outcomes. One example of predictive analytics at work is a blood test that was created to help doctors more quickly determine whether treatment for HPV-positive throat cancer is working. Previously, the only other option was an imaging scan every few months, so this new development will improve patient care by allowing doctors to change course if the cancer is not responding to treatment.

Another use is building predictive models for menstrual cycle start dates utilizing self-tracked data in Clue, a mobile app for tracking periods. The developed model updates predictions throughout an individual’s cycle and takes into account their unique cycle history and experiences while also pulling in insights from the entire dataset.

Programming (Python or R)

The ability to program are key skills that data scientists need. They are used to organize unstructured data and build machine learning models. Python and R rank among the most popular programming languages in the data science landscape.

Statistical Analysis

Healthcare data scientists use statistics and statistical analysis to interpret data. This includes:

  • Clustering
  • Descriptive statistics
  • Developing and using machine learning algorithms
  • Forecasting
  • Inferential statistics
  • Multivariate regression
  • Predictive modeling

Whether you want to advance in the profession or make a career switch, getting a master’s in data science can help you prepare for evolving challenges in the field. According to the Burtch Works 2022 Salary Report, 66% of data science professionals hold a graduate degree.

The Pace University online MS in Data Science offers more than 30 specialized elective topics and a culminating analytics capstone. While it’s expected that students are proficient in certain areas of math and have experience with programming and databases, two online bridge courses—Database Management Systems and Python Programming—are available for those who need them.

Why Healthcare Needs Data Scientists

It’s clear that data scientists are essential to advancing healthcare. As detailed above, their work helps to:

  • Assist in billing, note taking, and other processes
  • Enhance the research and treatment of cancer and other terminal diseases
  • Enrich patients’ self-tracking of health
  • Improve patient care and outcomes

This list is certainly not exhaustive; healthcare data scientists are making their mark in many ways.

Data Challenges in Healthcare

With the massive amount of data available in the healthcare field, there also comes reasonable concerns about security, quality, and privacy. Healthcare has more data breaches than any industry; health records are among the most-wanted items on the dark web, sometimes selling for $1,000. Outdated healthcare systems make them more susceptible to attacks.

Beyond compromised patient data, these types of attacks can cost lives. In a survey of health IT and security professionals, 45% of individuals whose healthcare organizations experienced a ransomware attack said it disturbed patient care, and 22% reported an increase in mortality rates during that time.

Programs and projects like the Data Modernization Initiative are underway to ensure public health departments are making the best use of healthcare data and protecting patient data as much as possible.

In addition to sometimes weak security, poor data quality is another issue plaguing healthcare. Errors when recording patient information, duplicated data, and minimal minority patient information all lead to inferior data quality. Work is in progress to obtain data from more societal groups so findings more accurately reflect all patients. Ensuring proper staff training on appropriate data management is another way of increasing data quality.

According to a survey from the American Medical Association (AMA), 75% of patients are concerned about the protection of their health data. A primary worry is that data collected in healthcare phone applications is being shared with third parties without patients’ knowledge or consent, leading the AMA to call for stricter regulations.

What Lies Ahead: Data Science in Healthcare

While data science has already had a huge impact on healthcare, it’s only the beginning. Here’s what to expect:

  • AI: Though there is a great deal already being done with AI, there is much more to come. More companies plan to adopt AI technologies to help manage tasks and make further improvements.
  • Blockchain: A health information exchange (HIE) that utilizes blockchain, “a distributed system recording and storing transaction records,” could result in a more efficient system with improved health outcomes. Issues remain; while this is something that cannot immediately be implemented, the US Department of Health and Human Services is keeping tabs on the technology.
  • Genomics: There are between 2 and 40 billion gigabytes of genomic data produced annually—a number that’s on the rise. With this increasing amount of data, it begs the question of how data scientists will utilize the information in order to improve health outcomes.
  • Remote patient monitoring (RPM) and telehealth: During COVID, the use of remote patient monitoring and telehealth took off. More research needs to be done to ensure the right people are using RPM and determine how it can be used in conjunction with other types of care.

With all of these exciting advancements, both current and yet to come, in healthcare data science, now is a great time to advance in or enter the field.

About the Online MS in Data Science

The Pace University online Master of Science in Data Science was designed to help students take advantage of professional opportunities in the next generation of quantitative solutions. Our STEM-designated curriculum leverages the Seidenberg School’s decades of experience in online education to explore theoretical and practical approaches to data governance, machine learning, predictive analytics, and more. This flexible, 100% online program fits a combination of hands-on experience and asynchronous activities into your schedule, building the expertise you need to guide the future of data-driven organizations.