Care Redesign
Relentless Reinvention

Why Every Health Care Organization Needs a Data Science Strategy

Article · March 22, 2017

Harnessing the full potential of data requires developing an organization-wide data science strategy. Such strategies are now commonplace in most industries such as banking and retail. Banks can offer their customers targeted needs-based services and improved fraud protection because they collect and analyze transactional data. Retailers such as Amazon routinely collect data on shopping habits and preferences to profile their customers and use sophisticated predictive algorithms to tailor marketing strategies to customer demand.

Health care is a glaring exception. Individual pieces of data can have life-or-death importance, but many organizations fail to aggregate data effectively to gain insights into wider care processes. Without a data science strategy, health care organizations can’t draw on increasing volumes of data and medical knowledge in an organized, strategic way, and individual clinicians can’t use that knowledge to improve the safety, quality, and efficiency of the care they provide.

A comprehensive data science strategy needs to address the quality of the underlying data, effective ways to analyze the data, and a framework for keeping it secure. If an organization tries to aggregate and analyze poor-quality data, it may derive useless or even dangerous conclusions. An inadequate security framework may lead to unauthorized access and undermine trust of patients and providers.

A carefully developed data science strategy will help achieve both precision medicine (helping to tailor treatments to patients) and the creation of learning health systems (helping to predict outcomes and identifying specific areas for improvement). Ideally, every decision a provider makes about a patient should be informed by the data of both that specific patient and other similar patients. In a learning health system, prior experiences improve future choices.

Organizations without an effective data science strategy may never realize returns on their investment in electronic health records (EHRs), may have disillusioned physicians, and may face potentially catastrophic security risks resulting from inadequate data protection. The stakes are high.

We believe that an effective data science strategy for health care organizations has five key components:

Key Components of Data Science Strategy in Healthcare Organizations

  Click To Enlarge.

1. Repository.  A secure organization-wide data repository allows organizations to keep a complete inventory of their data assets. Planning a repository presents multiple challenges. Substantial groundwork is required to scope existing data, create metadata (that is, detailed descriptions of each data source), explore ways to combine data sources, and develop strategies to keep track of what data is produced, stored, used, and reused, and how, and by whom.

This process may require the formation of new organizational structures, such as designated centers for data science. Organizations have begun to establish such centers, including the Beth Israel Deaconess Medical Center (BIDMC) for Healthcare Delivery Science and the Stanford Biomedical Data Science Initiative. The Center for Healthcare Delivery Science at the Beth Israel Deaconess Medical Center brings together expertise in health care delivery, analytics, management, epidemiology, biostatistics, and information technology. These experts can draw on a large repository of locally collected electronic health care data for quality improvement purposes.

2. Integration.  Bringing different data sets together involves many challenges in reconciling formats and breaking down siloes. Development and consistent use of an enterprise master patient index (EMPI) allows linkage of disparate data sources on individual patients but requires significant organizational and process changes to be achieved across information systems. These include eliminating duplicate records and establishing new procedures surrounding the addition of new patients.

If creating an EMPI for initial data collection poses too many challenges, either administrative or technical, an organization may achieve a reasonable equivalent by using a “data lake,” a technology platform that allows linking of highly disparate data. Data lakes keep source data in its original state for analysis if needed but also allow organizations to navigate across different sources and explore new relationships among them. Mercy Health in St. Louis uses a data lake to integrate data across its locations. This is fed with real-time data from its electronic health record as well as an enterprise resource management system and several other sources. This combination of data from disparate sources pulls together patient-specific information across a range of operational and clinical issues. This information can be fed back in real-time to clinicians at the bedside and can also be used for operational and strategic planning and overall quality analysis.

3. Security.  Protecting privacy and anonymity are always paramount, and that task becomes more complex when an organization uses a patient’s data for purposes that go beyond immediate patient care. This is particularly important given that some health systems, including BIDMC, are moving to the use of private space on public clouds. Organizations need to create data governance frameworks to ensure those protections, and commit money to cybersecurity measures.

For example, organizations need to address issues of staff training, how to handle access to data by visiting workers, how to guard against data breaches, and how to mitigate the damage from any breaches that occur. Existing technologies should meet International Organization for Standardization (ISO) data security standards, and organizations should schedule periodic risk assessment and mitigation of technical, administrative, and physical vulnerabilities.

Large digital data repositories may increase concerns about the security of cloud storage systems and data lakes and may undermine patients’ and clinicians’ trust. Breaches can be costly in both money and institutional reputation. New and potentially expensive approaches may be needed to prevent them, including the development of anonymization algorithms and machine-learning-based security models that can adapt to changing threats and/or circumstances.

4. Support.  Organizations need teams with a wide range of skills in data processing and cleaning, statistics, computer science, visualization, operational research and change management, artificial intelligence, and archiving/curation. Of particular importance are “boundary spanners” who can establish links among data science staff, the organization’s management, and its clinicians. They can identify data query priorities that are both organizationally and clinically relevant and can help users of data understand the full range of analysis that is available to them (such as near real-time queries regarding particular patient populations, medications, or treatment outcomes).

5. Feedback.  An effective data science strategy relies not only on well-structured databases and advanced analytics but also on having solid underlying data. Predictive analytics can be extremely valuable but require high-quality data for reliable insights. Strategic approaches to analysis should create a virtuous cycle in which data are repeatedly scrutinized as they are reused for different purposes, driving improvements in data quality. Such work should harness innovative analytical tools that employ artificial intelligence approaches such as machine and deep learning, and a complete service redesign may be required in which insights from data can inform important organizational and service delivery decisions in real time. To achieve this level of effectiveness, frontline staff may need to change how they work in order to incorporate these insights and act on them at the point of care.

Implementation of a data science strategy represents one of the cornerstones of better care, as well as greater operational efficiency and, eventually, more effective approaches to population health. Our health care system will increasingly depend on data to improve care, reduce costs, and expand access.

New Call for Submissions ­to NEJM Catalyst


A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »

More From Care Redesign
Relentless Reinvention

The Move to Value-Based Care in Navy Medicine

Achieving the mission of Navy Medicine to “keep the Navy and Marine Corps family ready, healthy, and on the job” requires rethinking current health care delivery models.

Relentless Reinvention

Improving Access to Specialist Expertise via eConsult in a Safety-Net Health System

Electronic referral system supports communication between primary care and specialty providers.

Relentless Reinvention

Redesigning the Delivery of Specialty Care Within Newly Formed Hospital Networks

As the trend toward hospital mergers and consolidations continues, how can newly formed health care networks optimize their delivery of specialty care? They will need to consider a redesign of service lines that includes both centralizing and decentralizing strategies.

Relentless Reinvention

Survey Snapshot: Genomic Data Is Far from Clinical Use

NEJM Catalyst Insights Council members say that clinical and cost data will continue to be the most useful data sources.

Relentless Reinvention

Learning to Drive — Early Exposure to End-of-Life Conversations in Medical Training

The importance of listening to the patient at the end of life.

Relentless Reinvention

Care Redesign Survey: What Data Can Really Do for Health Care

NEJM Catalyst Insights Council members are shifting from disillusionment over the unfulfilled promises of big data to a more realistic vision of how sophisticated analytics can transform health care delivery.

Relentless Reinvention

Can We Achieve Scale in Innovation?

Innovation and scale are inextricably tied to the future success and sustainability of health care providers.

Relentless Reinvention

Measures Only Get Better When You Use Them

Optimism, innovation, and how the two go together.

Relentless Reinvention

P4 Medicine and the Democratization of Health Care

“P4 health care” — predictive, preventive, personalized, and participatory — will use the astonishing power of systems medicine and big data to bring cutting-edge scientific wellness to everyone, improving health and saving money.

Relentless Reinvention

What Happens When We Can’t Cope — Part 2

Do we give patients more work, more choice, or both?


A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »


Focusing on High-Cost Patients — The…

A focus on high-cost patients may not only fail to contain health care spending, but…

Care Integration

48 Articles

How Multi-Specialty Hubs Fill a Major…

Kaiser Permanente, Mid-Atlantic States identified a niche for patients seeking immediate care and found a…

Quality Management

88 Articles

The Move to Value-Based Care in…

Achieving the mission of Navy Medicine to “keep the Navy and Marine Corps family ready,…

Insights Council

Have a voice. Join other health care leaders effecting change, shaping tomorrow.

Apply Now