Care Redesign

Personalized Hospital Ratings — Transparency for the Internet Age

Article · September 17, 2018

Each release of new overall hospital ratings is captivating to journalists, hospital leaders, and health care consumers in the United States. These overall ratings, whether published by U.S. NewsConsumer Reports, or Hospital Compare, aggregate a wide array of underlying measures into a single score for each hospital. Without such composite scores, it would be impossible to create rankings, star ratings, and “honor rolls” based on overall performance. As every hospital chief executive (and college president) knows, boards of directors rarely ignore these ratings.

Responsible creators of overall performance ratings carefully consider the validity and reliability of individual measures. They ask questions such as, Is risk adjustment adequate? and Is the signal-to-noise ratio reasonable? These are important questions, and methodologic guides based on measurement science can help answer them.1

Where mathematics ends, however, there is an inescapable value judgment: In computing the overall hospital rating, how much weight should each measure (or group of measures) receive? Measurement science is largely silent on this question, as it should be. No equation can help report makers decide how much relative weight to place on fundamentally different dimensions of inherently desirable performance, such as technical quality, patient experience, and efficiency of care. Thus, as currently constructed, the weighting systems that underlie overall hospital performance ratings are expressions of the values, preferences, and tastes of their creators.

Is this approach appropriate? Why should the opinions of report creators hold sway, if the intent is to inform patient choice? Instead, why not ask patients what’s important to them? Report creators could survey patients to estimate weights that reflect the population mean. Such an approach might help align overall performance ratings with the preferences of the average patient. However, individual patients vary considerably in their needs and preferences. A report tailored to the “average patient” will probably be a poor fit for most.

We therefore suggest creating overall performance scores that can be modified, in real time, by each user in accordance with the user’s individual needs, values, and preferences. Although such an approach would have been impossible before the Internet age, it is feasible with current technology. To illustrate how such a report might look, we have created a mock-up based on the 2016 version of the Centers for Medicare and Medicaid Services (CMS) Overall Hospital Quality Star Rating System, available on Hospital Compare.

The 2016 Hospital Compare overall hospital star ratings used latent variable modeling techniques to combine 57 process and outcomes measures into seven domains of quality (mortality, safety of care, readmissions, patients’ experience, timeliness of care, effectiveness of care, and efficient use of medical imaging).2 Overall star ratings were computed from a weighted average of these seven domain scores, using weights chosen for consistency with existing CMS policies and priorities, incorporating input from stakeholders such as members of the agency’s panel of technical experts. Mortality, safety of care, readmissions, and patient experience each received weights of 22%, and the remaining domains each received weights of 4%. In other words, report creators considered readmissions to be 5.5 times as important as the effectiveness of care. We sought to allow the report users to make their own determinations.

To do so, we modified the original program that CMS had used to create the 2016 Overall Hospital Quality Star Ratings to allow report users to set their own weights (choosing from 100, 50, 22, 4, and 0 points — corresponding to “extremely important,” “very important,” “quite important,” “minimally important,” and “unimportant”). We then applied this modified program to individual measure scores from the publicly available 2016 Hospital Compare database, recomputing overall hospital stars for every possible combination of weights and dividing each domain’s points by the total (so that the weights always summed to 100%). Finally, we created a Web-based report card on which report users could display customized overall hospital ratings, using weights that reflect their own assessments of the relative importance of the seven domains. Overall ratings were sensitive to customized weights, as a few examples can demonstrate.

Let’s say Patient A is a pregnant woman in Taunton, Massachusetts. She wonders whether to establish obstetrical care locally or at one of the well-known hospitals in downtown Boston. The default Hospital Compare overall star ratings give four stars to Massachusetts General Hospital (downtown), four stars to Saint Anne’s Hospital in Fall River (a closer option), and three stars to Sturdy Memorial Hospital (another nearby option). However, Patient A prefers to assign zero points to mortality, readmissions, and efficient use of medical imaging, since these variables are based on conditions and services that have questionable relevance to obstetrical care. She then assigns 100 points to effectiveness (which includes some obstetrical measures), safety (she is concerned about postoperative complications, should she need a cesarean section), and timeliness (she wants to be seen right away if she presents to the emergency department) and 50 points to patient experience (all else being equal, she values a good night’s sleep). Using these personalized weights, she finds that both her local options receive five-star ratings, whereas Massachusetts General Hospital receives three stars.

Patient B, a generally healthy 45-year-old man living in West Covina, California, recently had a bike accident. The nearest hospitals offering elective knee surgery are Chino Valley Medical Center (four-star default overall rating on Hospital Compare) and Methodist Hospital of Southern California (five stars). Perusing the underlying measures, the man decides that effectiveness and safety are most relevant to his surgery and assigns them each 100 points. Because his job allows only limited sick leave, he cares greatly about avoiding readmission, so he assigns that domain maximum weight as well. He assigns 50 points to patient experience and gives the remaining domains four points each because he considers them minimally important to his surgery. With these personalized weights, the hospitals’ relative rankings are reversed, with Chino Valley now at five stars and Methodist at four.

Professor C’s interest in hospital ratings is less personal than those of Patients A and B. She is a researcher living in Chicago who questions the validity of the measures underlying the safety domain.3 She sets the weight for safety to zero points and leaves the remaining default weights unchanged. Having thus removed the safety domain from the star calculations, she notes substantial changes in Chicago-area hospital ratings: four hospitals’ ratings drop from five to four stars, whereas those of five others, including Northwestern Memorial Hospital, increase from three to four stars.

Thus, overall hospital ratings are sensitive to the inherently subjective weights applied to the underlying performance measures. One-size-fits-all weighting, which was necessary when performance ratings were published only in print, can be replaced with user-determined weights in the Internet age. By allowing such personalization, creators of performance reports can enhance the value of their overall ratings and rankings to the consumers who might use them.

Our illustrative report card, which is intended only as an example and not as a consumer-ready performance-rating site, does not seek to address every methodologic challenge of public reporting. Rather, it is intended to give users an intuitive understanding of how different weightings can affect overall hospital performance ratings. With further development to assist consumers in setting their personalized weights (perhaps by suggesting weights on the basis of their responses to a questionnaire) and to help them interpret the resulting ratings,4,5 we believe user-determined weights could become a highly desirable feature of future hospital ratings.


SOURCE INFORMATION

From the Northland District Health Board, Whangarei, New Zealand (J.R.-S.); Rand Corporation, Santa Monica, CA (J.R.-S., J.G.), and Boston (M.W.F.); and Brigham and Women’s Hospital and Harvard Medical School — both in Boston (M.W.F.).

1. Friedberg MW, Damberg CL. Methodological considerations in generating provider performance scores for use in public reporting: a guide for community quality collaboratives. Rockville, MD: Agency for Healthcare Research and Quality, September 2011. Google Scholar
2. Yale New Haven Health Services Corporation/Center for Outcomes Research and Evaluation. Overall hospital quality star rating on Hospital Compare: December 2016 updates and specifications report. Woodlawn, MD: Centers for Medicare and Medicaid Services, October 20, 2016 (https://www.rand.org/content/dam/rand/www/external/health/projects/hospital-performance-report-card/StrRtgDec16PrevQUS_rept_110416.pdf). Google Scholar
3. Rajaram R, Barnard C, Bilimoria KY. Concerns about using the patient safety indicator-90 composite in pay-for-performance programs. JAMA 2015;313:897-898. CrossRef | Medline | Google Scholar
4. Hibbard JH, Peters E, Slovic P, Finucane ML, Tusler M. Making health care quality reports easier to use. Jt Comm J Qual Improv 2001;27:591-604. Medline | Google Scholar
5. Hibbard J, Sofaer S. Best practices in public reporting no. 1: how to effectively present health care performance data to consumers. Rockville, MD: Agency for Healthcare Research and Quality, June 2010. Google Scholar

This Perspective article originally appeared in The New England Journal of Medicine.

New call for submissions ­to NEJM Catalyst

Now inviting longform articles

Connect

A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »

More From Care Redesign
Summary of Comprehensive Approach to Physician Behavior and Practice Change

Engaging Stakeholders to Produce Sustainable Change in Surgical Practice

How an initiative designed to improve patient outcomes and satisfaction while containing costs led to sustainable change in surgical practice and physician behavior.

Myths and Realities of Opioid Use Disorder Treatment.

Primary Care and the Opioid-Overdose Crisis — Buprenorphine Myths and Realities

There is a realistic, scalable solution for reaching the millions of Americans with opioid use disorder: mobilizing the primary care physician (PCP) workforce to offer office-based addiction treatment with buprenorphine, as other countries have done.

Coffey02_pullquote family-centered care in medical and surgical procedures

What If Family-Centered Care Were Extended to Medical and Surgical Procedures?

Though the concerns are valid, early experiences suggest that family member engagement may be an effective tool for improving the value of care.

Evidence Needed for Health Systems Change to Address Social Determinants of Health and Obesity and Diet-Related Diseases in Turn

Better Clinical Care for Obesity and Diet-Related Diseases Requires a Focus on Social Determinants of Health

To more effectively treat the problems of obesity and diet-related conditions, health systems need to restructure the traditional medical model of care delivery to address the social determinants of health.

People Living with Dementia Around the World - Value-Based Chronic Illness and Dementia Care

Value-Based Care Must Strengthen Focus on Chronic Illnesses

To effectively control costs and improve value, new models must address our increasingly older patients and chronic care patients, especially those with Alzheimer’s and related dementias.

The Barriers to Excellent Care Vary Widely Across Geographic Regions - both Rural Health Care and Urban Health Care

Survey Snapshot: Rural Health Innovations Born from Challenges

According to NEJM Catalyst Insights Council members, every health system has to develop its own definition of what is meant by “rural” health.

Same-Day Breast Biopsy Workflow at Baylor College of Medicine

How Care Redesign and Process Improvement Can Reduce Patient Fear

Seeing how clinicians take care of their own when they are in frightening situations was the epiphany that led to a same-day breast biopsy program.

Rural Health Care Is Rated Comparable or Worse Across Quadruple Aim Aspects

Care Redesign Survey: Lessons Learned from and for Rural Health

Although care delivery models in rural and urban/suburban areas are distinct, by virtue of geographic density and resource availability, each locale affords lessons for the other.

Comprehensive Intervention Review at Lurie Childrens Hospital - improving patient flow and length of stay

Reducing Length of Stay in the ED

A comprehensive redesign of triage and ED care.

Pumonary Nurse Post-Discharge Follow-Up Note for Patients with COPD

TOPS: Telephonic Outreach in the Pulmonary Service at VA Boston Healthcare System

A nurse-directed intervention targeting veterans who had been hospitalized for COPD resulted in improved access to ambulatory care and a reduced rate of readmissions.

Connect

A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »

Topics

Social Needs

88 Articles

Better Clinical Care for Obesity and…

To more effectively treat the problems of obesity and diet-related conditions, health systems need to…

A Successful Pilot to Improve Access…

Actionable data and modest financial incentives can help motivate clinicians to adjust their behavior around…

Coordinated Care

129 Articles

The Evolution of Primary Care: Embracing…

Primary care must leverage disruptive innovations to ensure that patients receive first-access, comprehensive, coordinated, continuous…

Insights Council

Have a voice. Join other health care leaders effecting change, shaping tomorrow.

Apply Now