Using It or Losing It? The Case for Data Scientists Inside Health Care

Article · May 4, 2017

As much as 30% of the entire world’s stored data is generated in the health care industry. A single patient typically generates close to 80 megabytes each year in imaging and electronic medical record (EMR) data. This trove of data has obvious clinical, financial, and operational value for the health care industry, and the new value pathways that such data could enable have been estimated by McKinsey to be worth more than $300 billion annually in reduced costs alone.

However, we believe that the health care industry does not currently appreciate the inherent value of these data, which can only be fully harnessed through better data analytics. Furthermore, we maintain that if appropriate investments in data science are not made in-house, then hospitals and health systems will run the risk of becoming reliant on outsiders to analyze the data that ultimately will be used to inform decisions and drive innovation. In order to be successful, therefore, health care organizations will need to capture the value of their data by investing in data analytics and establishing this skill as a core competency.

The Crucial Value of Data — and Data Analytics

The value of big data in health care is realized only when this raw information is converted into knowledge that changes practice. That value arises through, for example, better and faster identification of shortfalls in adherence, compliance, and evidence-based care; more comprehensive sharing of data and insights inside a hospital and with health insurance partners and community stakeholders; and more customized partnering with individual patients to drive understanding of chronic conditions, enhance adherence and compliance, boost self-care, and avoid more costly treatments at more costly sites of care within a hospital’s overall population base.

To drive value improvements and ensure that the right patient receives the right care from the right provider at the right time and place requires data and data science professionals. Without good data science, good delivery science and good implementation science are impossible. Above all, realizing the value of data downstream requires a willingness to invest in smart data analytics upstream.

Those analytics are sorely needed. Health Catalyst’s Analytics Adoption model describes a best-practice sequence of analytics skills, with many hospitals being at levels 2 and 3 of an 8-level model and with as few as 10% having robust data warehousing and data analytics abilities.

When Amazon and Jeff Bezos anecdotally claim never to throw data away, the unstated corollary is that they actually use their data in-house to drive actionable insights in their field, such as identifying highest-propensity shoppers and personalizing product selection displays on their webpage on the basis of customer preferences, past history, and predicted future interests. Do we in health care harness our data in a similar way? If not, what are the consequences?

The Important Role of Data Scientists

The professionals who analyze data are known as data scientists. These experts pull data from source systems, turn the data into insights and information, and then deploy those insights to the business for transformation into action. It is possible to envision some data-rich clinical specialties such as radiology as providing a fertile training environment for information specialists in the future, but currently these individuals are in very short supply.

Of the approximately 6,000 data scientists in the United States (as indicated by LinkedIn research conducted by Stitch), only 180 are estimated to work in the hospital and health care field. Given that there are nearly 6,000 hospitals and just 400 academic medical centers (AMCs) in the United States, that’s stretching the available labor force a bit thin.

The actual number of data scientists required is of course dependent on the specific needs of an organization, but it can be estimated on the basis of other industry contributions to gross domestic product (GDP) and the distribution of data scientists across industries. On the basis of Stitch’s LinkedIn research, we estimated how much several different industries invest in data science professionals annually for each million dollars of GDP contributed by each industry. Specifically, we estimated that the health care sector employs 6 times fewer data scientists in comparison with the relatively mature U.S. banking and insurance systems, 18 times fewer data science professionals in comparison with more innovative industries such as information technology (IT) and related services, and 60 times fewer data scientists in comparison with an almost purely knowledge-based industry such as management consulting. On the basis of these estimations, we contend that at least 10 to 20 times more data scientists are needed in health care than the fewer than 200 that exist today. That could place 5 to 10 data scientists in each AMC or 1 to 2 in each hospital with at least 200 beds. However, the question remains: Why are there so few data scientists inside health care today?

Why Are Data Analytics So Under-Resourced in Health Care?

  • Lack of Internal Demand: A lack of internal demand is a key part of the explanation. After spending a median of >10% of overall capital expenses annually on an information system to generate and store the data and then spending another 2% to 3% of net revenues annually to operate and maintain that system, a hospital may balk at the additional annual cost of at least $750,000 to support even a small, 3- to 5-person homegrown data science team to pull and analyze the data.
  • Substitution of Data Scientists with Internal Professionals: The substitution of full-time data scientists with informal internal professionals is another part of the explanation. Interested clinical professionals, often chief medical officers or health information officers who are used to implementing and upgrading EMRs, may allow a hospital to informally add some rudimentary abilities in data science. For example, the experience that is gained by adding functionality to EMRs (e.g., fall alerts, alerts regarding high risk of venous thromboembolism, etc.) may be a stepping stone to adding more analytical functionality later. An AMC may seek to leverage existing research strengths at its affiliated school or college of medicine. In such ways, biostatisticians in the affiliated medical school may be able to bring their theoretical expertise to bear on applied problems in the hospital.
  • Ad Hoc Combinations of Computer Science and Clinical Personnel: Moreover, ad hoc combinations of computer science experts and clinical domain experts may spring up to address smaller initiatives that do not and cannot truly span the entire enterprise. A cardiology division, for example, may build a standalone predictive model for heart failure readmission that relies on ad hoc manual batch data feeds rather than deploying a solution embedded in the EMR to flag, in real time, patients who have a high risk of readmission.
  • Reliance on Instinct Rather than Data: Another reason why data analytics is under-resourced stems from a negative feedback loop. As senior leaders are relatively unfamiliar with data analytics products and lack the high-level experience to incorporate the outputs of such products into their decision-making, they may not yet fully see the immediate utility of data science. If administrators tend to rely on instinct instead of data to drive decisions, then clearly data analytics will not rise to the level of management attention that is needed.

Who Should “Do” Analytics Internally?

Reflecting the above shortfalls in supply and demand, only a small handful of health care employees have the title of chief data scientist (CDS) or chief analytics officer (CAO). While each of the top three for-profit hospital systems in the United States (Community Health Systems, Hospital Corporation of America, and Tenet Healthcare) has at least the functional equivalent of a CDS, only a handful of AMCs, such as Penn Medicine’s Predictive Healthcare, have filled the equivalent role.

Such leaders report to a chief medical officer or system president, often with a dotted line to a chief information officer (CIO), but they focus exclusively on analytics to ensure that organizational data are fully exploited. A subtle but crucial distinction is between, on the one hand, the individual who “owns” the data, ensures the provenance and quality of the data, and is responsible for the architecture where the data reside and, on the other hand, the individual who owns the conversion of data into insights through the use of machine learning, advanced statistics, and sophisticated data engineering skills. The former is the role of the CIO, whereas the latter is the role of the CDS or CAO.

While these two roles are clearly complementary, confusion about the boundaries can lead to dysfunctionality. For example, the data science group must not develop its own staging databases (to “ensure faster access” and avoid having to work with the data marts of IT), and the IT group should not seek to move beyond data and basic reporting toward predictive analytics (e.g., what might happen to particular groups of patients) or prescriptive analytics (e.g., what should be done for these groups of patients).

Market Responses to Data Science Opportunities

Third parties have been quick to recognize market inefficiencies. External commercial contractors are responding to these market imbalances between supply and demand by marketing and supplying analytical services to short-handed hospitals. For example, PeraHealth implements proprietary predictive algorithms for the prediction of clinical deterioration, and naviHealth offers similar services for the prediction of out-of-hospital outcomes. Cardinal Health, Aculyst, Health Catalyst, Truven Health Analytics (along with other IBM units), Vizient, the Advisory Board Company, and many other vendors also have robust and appealing options. Yet the advent of such commercial options leads to a familiar executive decision: Should we build internally or simply buy in?

The Build-Versus-Buy Dilemma

In one view, outsourcing data analytics makes a lot of sense for smaller hospitals and financially constrained health systems. When an analytics product is relatively robust (for example, predictive analytics for in-hospital outcomes is rapidly maturing), reinventing the wheel does not make sense. Defining a palette of such basic services and selling them to a busy hospital C-suite is likely to ensure speed to market, guaranteed results, variabilized costs, and the ability to keep up with similar-sized peers.

However, another view is less sanguine about the collective failure of hospitals to develop a dedicated corps of analysts in-house, to provide them with adequate resources, and to ensure that their leaders have a seat at the top table. There are very few core competencies of a hospital that do not hinge on intensive data analytics informed by domain experts who are familiar with the hospital’s internal and external environment and who respond to the hospital’s business strategy. The American Hospital Association (AHA) makes this clear in its articulation of the “must-do strategies” and “core competencies” that hospitals must adopt in order to survive in a value-based economics model.

In this view, if health systems and hospitals depend on vendors to define the pertinence of information and to drive the agenda at clinical and administrative team meetings, then it is difficult to see how locally tailored, timely, and responsive strategic changes and tactical modifications can be made to innovations in, for example, care delivery. If we similarly allow external vendors to decide what data, information, and insights should be brought to the attention of senior leadership, then we disempower our own employees and internal domain experts.

The answer, as always, likely lies somewhere between these two extreme views. Without locally customized and owned data science, delivery science can at best offer a standard, average utility service. But for many smaller hospitals and systems, relying on trustworthy vendors and mature external products is likely to be an acceptable and cost-effective solution.

However, becoming a utility service is not likely to be the aspiration of the AMCs in the United States. Leading the way in advancing health care analytics is consistent with and builds on the AMCs’ research, education, and care missions. AMCs should make large-scale personnel investments in data science; these 1,000 to 2,000 additional professionals will collectively cost the AMC side of the industry just $150 million to $300 million annually. The data are “free” and are already there.

Down the line, these investments can be recouped. As AMC analytics products that are developed in-house are trialed and validated, they can be spun off as tried-and-tested R&D in commercial ventures or simply sold as valuable intellectual property to commercial vendors. Even simply sharing analytics results and methods for free between AMCs and smaller hospitals could allow financially constrained hospitals and systems to increase their analytics options, reduce adoption costs, and enable their own in-house analytics.

Putting the Pieces Together

Clearly, compared with other data-rich industries, academic health care systems currently under-resource data analytics and run the risk of becoming reliant on outsiders when using internal data to inform decisions and drive innovation. Relatively modest internal investments in skilled personnel and advanced data science would allow AMCs to successfully capture the value inherent in their data. Putting these pieces together will help the overall health care sector to achieve the same much-needed improvements in cost, outcomes, access, and experience that the data revolution has achieved in so many other industries.

Call for submissions:

Now inviting expert articles, longform articles, and case studies for peer review


A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »

More From Leadership
Fiscus01_pullquote - humanizing physician performance review

Humanizing the Annual Physician Performance Review

Transforming the review process from a punitive, deflating experience to a valuable one that strengthens the relationship between physician and organization.

People Believe Strongly That Leadership Can Be Taught

Leadership Survey: Leadership Skills Are Teachable and Vital

Leadership is teachable, and leadership development and training are important, according to our survey on the topic. Yet the same survey reveals that more than half of respondents think their organizations’ efforts to develop and train leaders are lacking in quality and time commitment.

A Preliminary Model of Determinants and Consequences of Unhurried Conversations with Patients

Careful and Kind Care Requires Unhurried Conversations

Health care providers must have time to know their patients in “high definition” to best meet their needs.

From the Commonwealth to Obamacare: Reflections on 10+ Years of Expanding Health Insurance Coverage

The former Executive Director of the Commonwealth Health Insurance Connector — a model for the Affordable Care Act and other state marketplaces — reflects on what worked, what didn’t, and what could be done differently in both Massachusetts and at the federal level.

Time Spent Engaging Directly with 16 Camden RESET Participants or Coordinating Care on Their Behalf

“Putting All the Pieces Back”: Lessons from a Health Care–Led Jail Reentry Pilot

The Camden Coalition’s jail-based reentry program illuminated the necessity and challenges of engaging people with complex health and social needs and helping to transform the systems that serve them.

Sands01_pullquote clinical research partnership for learning health care

Real-World Advice for Generating Real-World Evidence

If envisioned and implemented properly, a partnership between clinical delivery systems and clinical research programs can get us closer to the goal of achieving learning within the care continuum and discovering evidence that is available when it is needed.

The Largest Share of Organizations Do Not Have a Formal Strategy for Clinician Engagement

Leadership Survey: Why Clinicians Are Not Engaged, and What Leaders Must Do About It

Clinician engagement is vital for improving clinical quality and patient satisfaction, as well as the job satisfaction of clinicians themselves. Yet nearly half of health care organizations are not very effective or not at all effective at clinician engagement.

Rowe01_pullquote - clinician well-being - fighting clinician burnout and creating culture of wellness takes all stakeholders

Defending the Term “Burnout”: A Useful Tool in the Quest to Ease Clinician Suffering

Health care leaders must take a preemptive approach to clinician well-being that is supported by all stakeholders and prioritized on an equal footing with essential clinical and financial measures.

Screenshot from the NewYork Quality Care Chronic Condition Dashboard

Success in a Hospital-Integrated Accountable Care Organization

How NewYork Quality Care achieved shared savings — by strengthening collaboration, enhancing care management with telehealth, and transparently sharing performance data.

Miller03_pullquote social determinants whole-person

How a State Advances Whole-Person Health Care

Pennsylvania addresses social determinants of health by bringing together managed care and social services organizations to expand access to vital resources.


A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »


Leading Transformation

284 Articles

From the Commonwealth to Obamacare: Reflections…

The former Executive Director of the Commonwealth Health Insurance Connector — a model for the…

Physician Burnout

53 Articles

Humanizing the Annual Physician Performance Review

Transforming the review process from a punitive, deflating experience to a valuable one that strengthens…

From the Commonwealth to Obamacare: Reflections…

The former Executive Director of the Commonwealth Health Insurance Connector — a model for the…

Insights Council

Have a voice. Join other health care leaders effecting change, shaping tomorrow.

Apply Now