Care Redesign
Relentless Reinvention

Why Every Health Care Organization Needs a Data Science Strategy

Article · March 22, 2017

Harnessing the full potential of data requires developing an organization-wide data science strategy. Such strategies are now commonplace in most industries such as banking and retail. Banks can offer their customers targeted needs-based services and improved fraud protection because they collect and analyze transactional data. Retailers such as Amazon routinely collect data on shopping habits and preferences to profile their customers and use sophisticated predictive algorithms to tailor marketing strategies to customer demand.

Health care is a glaring exception. Individual pieces of data can have life-or-death importance, but many organizations fail to aggregate data effectively to gain insights into wider care processes. Without a data science strategy, health care organizations can’t draw on increasing volumes of data and medical knowledge in an organized, strategic way, and individual clinicians can’t use that knowledge to improve the safety, quality, and efficiency of the care they provide.

A comprehensive data science strategy needs to address the quality of the underlying data, effective ways to analyze the data, and a framework for keeping it secure. If an organization tries to aggregate and analyze poor-quality data, it may derive useless or even dangerous conclusions. An inadequate security framework may lead to unauthorized access and undermine trust of patients and providers.

A carefully developed data science strategy will help achieve both precision medicine (helping to tailor treatments to patients) and the creation of learning health systems (helping to predict outcomes and identifying specific areas for improvement). Ideally, every decision a provider makes about a patient should be informed by the data of both that specific patient and other similar patients. In a learning health system, prior experiences improve future choices.

Organizations without an effective data science strategy may never realize returns on their investment in electronic health records (EHRs), may have disillusioned physicians, and may face potentially catastrophic security risks resulting from inadequate data protection. The stakes are high.

We believe that an effective data science strategy for health care organizations has five key components:

Key Components of Data Science Strategy in Healthcare Organizations

  Click To Enlarge.

1. Repository.  A secure organization-wide data repository allows organizations to keep a complete inventory of their data assets. Planning a repository presents multiple challenges. Substantial groundwork is required to scope existing data, create metadata (that is, detailed descriptions of each data source), explore ways to combine data sources, and develop strategies to keep track of what data is produced, stored, used, and reused, and how, and by whom.

This process may require the formation of new organizational structures, such as designated centers for data science. Organizations have begun to establish such centers, including the Beth Israel Deaconess Medical Center (BIDMC) for Healthcare Delivery Science and the Stanford Biomedical Data Science Initiative. The Center for Healthcare Delivery Science at the Beth Israel Deaconess Medical Center brings together expertise in health care delivery, analytics, management, epidemiology, biostatistics, and information technology. These experts can draw on a large repository of locally collected electronic health care data for quality improvement purposes.

2. Integration.  Bringing different data sets together involves many challenges in reconciling formats and breaking down siloes. Development and consistent use of an enterprise master patient index (EMPI) allows linkage of disparate data sources on individual patients but requires significant organizational and process changes to be achieved across information systems. These include eliminating duplicate records and establishing new procedures surrounding the addition of new patients.

If creating an EMPI for initial data collection poses too many challenges, either administrative or technical, an organization may achieve a reasonable equivalent by using a “data lake,” a technology platform that allows linking of highly disparate data. Data lakes keep source data in its original state for analysis if needed but also allow organizations to navigate across different sources and explore new relationships among them. Mercy Health in St. Louis uses a data lake to integrate data across its locations. This is fed with real-time data from its electronic health record as well as an enterprise resource management system and several other sources. This combination of data from disparate sources pulls together patient-specific information across a range of operational and clinical issues. This information can be fed back in real-time to clinicians at the bedside and can also be used for operational and strategic planning and overall quality analysis.

3. Security.  Protecting privacy and anonymity are always paramount, and that task becomes more complex when an organization uses a patient’s data for purposes that go beyond immediate patient care. This is particularly important given that some health systems, including BIDMC, are moving to the use of private space on public clouds. Organizations need to create data governance frameworks to ensure those protections, and commit money to cybersecurity measures.

For example, organizations need to address issues of staff training, how to handle access to data by visiting workers, how to guard against data breaches, and how to mitigate the damage from any breaches that occur. Existing technologies should meet International Organization for Standardization (ISO) data security standards, and organizations should schedule periodic risk assessment and mitigation of technical, administrative, and physical vulnerabilities.

Large digital data repositories may increase concerns about the security of cloud storage systems and data lakes and may undermine patients’ and clinicians’ trust. Breaches can be costly in both money and institutional reputation. New and potentially expensive approaches may be needed to prevent them, including the development of anonymization algorithms and machine-learning-based security models that can adapt to changing threats and/or circumstances.

4. Support.  Organizations need teams with a wide range of skills in data processing and cleaning, statistics, computer science, visualization, operational research and change management, artificial intelligence, and archiving/curation. Of particular importance are “boundary spanners” who can establish links among data science staff, the organization’s management, and its clinicians. They can identify data query priorities that are both organizationally and clinically relevant and can help users of data understand the full range of analysis that is available to them (such as near real-time queries regarding particular patient populations, medications, or treatment outcomes).

5. Feedback.  An effective data science strategy relies not only on well-structured databases and advanced analytics but also on having solid underlying data. Predictive analytics can be extremely valuable but require high-quality data for reliable insights. Strategic approaches to analysis should create a virtuous cycle in which data are repeatedly scrutinized as they are reused for different purposes, driving improvements in data quality. Such work should harness innovative analytical tools that employ artificial intelligence approaches such as machine and deep learning, and a complete service redesign may be required in which insights from data can inform important organizational and service delivery decisions in real time. To achieve this level of effectiveness, frontline staff may need to change how they work in order to incorporate these insights and act on them at the point of care.

Implementation of a data science strategy represents one of the cornerstones of better care, as well as greater operational efficiency and, eventually, more effective approaches to population health. Our health care system will increasingly depend on data to improve care, reduce costs, and expand access.

New Call for Submissions ­to NEJM Catalyst


A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »

More From Care Redesign

Improving Care and Cutting Costs: Implementation of a Laboratory Formulary to Facilitate Better Laboratory Ordering Practices

Can a formulary system help to prioritize cost-effective lab tests in the same way it has done for prescription drugs?

Data Graphic: Real-Time Communication Is Key to Improving Post-Acute Care Transitions

The NEJM Catalyst Insights Council weighs in on the best opportunities for post-acute care transitions.

Collaborative Care for Depression in a Safety-Net Health System

Integrating depression treatment into primary care in New York City’s public system.

The Waiting Game — Why Providers May Fail to Reduce Wait Times

Waiting has emotional effects on patients, so it’s ironic that physicians often cite long waiting times as evidence of their excellence.

No Stories Without Data, No Data Without Stories

We must remember to listen to the stories of the human beings on the receiving end of the policies we develop.

From Co-Located to Integrated Teams: How Utah’s Neurobehavior HOME Program Changed Its Culture

University of Utah Health incentivized coordination through integrated teams to provide better care at a lower cost for patients with developmental disabilities.

What’s More Valuable Than a Healthy Choice? Making Lifestyle Medicine Standard Practice.

A framework for embracing the health benefits of lifestyle choices in medicine.

The Other Victims of the Opioid Epidemic

The opioid epidemic is a national crisis that should not be underestimated. But its solution will require development of meaningful interventions.

Population Health — What’s in a Name?

Physicians and executives may agree on the concept but differ on how to define it.

We Need More Geriatricians, Not More Primary Care Physicians

Geriatricians are among the most satisfied specialists, so why don’t we have more of them?


A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »


Coordinated Care

90 Articles

Data Graphic: Real-Time Communication Is Key…

The NEJM Catalyst Insights Council weighs in on the best opportunities for post-acute care transitions.

Reading List: Rushika Fernandopulle

NEJM Catalyst Thought Leader Rushika Fernandopulle weighs in on the most influential and inspiring texts…

Four Ingredients for Engaging Physicians in…

A former physician network leader offers tips on building an environment conducive to change.

Insights Council

Have a voice. Join other health care leaders effecting change, shaping tomorrow.

Apply Now