Care Redesign

Personalized Hospital Ratings — Transparency for the Internet Age

Article · September 17, 2018

Each release of new overall hospital ratings is captivating to journalists, hospital leaders, and health care consumers in the United States. These overall ratings, whether published by U.S. NewsConsumer Reports, or Hospital Compare, aggregate a wide array of underlying measures into a single score for each hospital. Without such composite scores, it would be impossible to create rankings, star ratings, and “honor rolls” based on overall performance. As every hospital chief executive (and college president) knows, boards of directors rarely ignore these ratings.

Responsible creators of overall performance ratings carefully consider the validity and reliability of individual measures. They ask questions such as, Is risk adjustment adequate? and Is the signal-to-noise ratio reasonable? These are important questions, and methodologic guides based on measurement science can help answer them.1

Where mathematics ends, however, there is an inescapable value judgment: In computing the overall hospital rating, how much weight should each measure (or group of measures) receive? Measurement science is largely silent on this question, as it should be. No equation can help report makers decide how much relative weight to place on fundamentally different dimensions of inherently desirable performance, such as technical quality, patient experience, and efficiency of care. Thus, as currently constructed, the weighting systems that underlie overall hospital performance ratings are expressions of the values, preferences, and tastes of their creators.

Is this approach appropriate? Why should the opinions of report creators hold sway, if the intent is to inform patient choice? Instead, why not ask patients what’s important to them? Report creators could survey patients to estimate weights that reflect the population mean. Such an approach might help align overall performance ratings with the preferences of the average patient. However, individual patients vary considerably in their needs and preferences. A report tailored to the “average patient” will probably be a poor fit for most.

We therefore suggest creating overall performance scores that can be modified, in real time, by each user in accordance with the user’s individual needs, values, and preferences. Although such an approach would have been impossible before the Internet age, it is feasible with current technology. To illustrate how such a report might look, we have created a mock-up based on the 2016 version of the Centers for Medicare and Medicaid Services (CMS) Overall Hospital Quality Star Rating System, available on Hospital Compare.

The 2016 Hospital Compare overall hospital star ratings used latent variable modeling techniques to combine 57 process and outcomes measures into seven domains of quality (mortality, safety of care, readmissions, patients’ experience, timeliness of care, effectiveness of care, and efficient use of medical imaging).2 Overall star ratings were computed from a weighted average of these seven domain scores, using weights chosen for consistency with existing CMS policies and priorities, incorporating input from stakeholders such as members of the agency’s panel of technical experts. Mortality, safety of care, readmissions, and patient experience each received weights of 22%, and the remaining domains each received weights of 4%. In other words, report creators considered readmissions to be 5.5 times as important as the effectiveness of care. We sought to allow the report users to make their own determinations.

To do so, we modified the original program that CMS had used to create the 2016 Overall Hospital Quality Star Ratings to allow report users to set their own weights (choosing from 100, 50, 22, 4, and 0 points — corresponding to “extremely important,” “very important,” “quite important,” “minimally important,” and “unimportant”). We then applied this modified program to individual measure scores from the publicly available 2016 Hospital Compare database, recomputing overall hospital stars for every possible combination of weights and dividing each domain’s points by the total (so that the weights always summed to 100%). Finally, we created a Web-based report card on which report users could display customized overall hospital ratings, using weights that reflect their own assessments of the relative importance of the seven domains. Overall ratings were sensitive to customized weights, as a few examples can demonstrate.

Let’s say Patient A is a pregnant woman in Taunton, Massachusetts. She wonders whether to establish obstetrical care locally or at one of the well-known hospitals in downtown Boston. The default Hospital Compare overall star ratings give four stars to Massachusetts General Hospital (downtown), four stars to Saint Anne’s Hospital in Fall River (a closer option), and three stars to Sturdy Memorial Hospital (another nearby option). However, Patient A prefers to assign zero points to mortality, readmissions, and efficient use of medical imaging, since these variables are based on conditions and services that have questionable relevance to obstetrical care. She then assigns 100 points to effectiveness (which includes some obstetrical measures), safety (she is concerned about postoperative complications, should she need a cesarean section), and timeliness (she wants to be seen right away if she presents to the emergency department) and 50 points to patient experience (all else being equal, she values a good night’s sleep). Using these personalized weights, she finds that both her local options receive five-star ratings, whereas Massachusetts General Hospital receives three stars.

Patient B, a generally healthy 45-year-old man living in West Covina, California, recently had a bike accident. The nearest hospitals offering elective knee surgery are Chino Valley Medical Center (four-star default overall rating on Hospital Compare) and Methodist Hospital of Southern California (five stars). Perusing the underlying measures, the man decides that effectiveness and safety are most relevant to his surgery and assigns them each 100 points. Because his job allows only limited sick leave, he cares greatly about avoiding readmission, so he assigns that domain maximum weight as well. He assigns 50 points to patient experience and gives the remaining domains four points each because he considers them minimally important to his surgery. With these personalized weights, the hospitals’ relative rankings are reversed, with Chino Valley now at five stars and Methodist at four.

Professor C’s interest in hospital ratings is less personal than those of Patients A and B. She is a researcher living in Chicago who questions the validity of the measures underlying the safety domain.3 She sets the weight for safety to zero points and leaves the remaining default weights unchanged. Having thus removed the safety domain from the star calculations, she notes substantial changes in Chicago-area hospital ratings: four hospitals’ ratings drop from five to four stars, whereas those of five others, including Northwestern Memorial Hospital, increase from three to four stars.

Thus, overall hospital ratings are sensitive to the inherently subjective weights applied to the underlying performance measures. One-size-fits-all weighting, which was necessary when performance ratings were published only in print, can be replaced with user-determined weights in the Internet age. By allowing such personalization, creators of performance reports can enhance the value of their overall ratings and rankings to the consumers who might use them.

Our illustrative report card, which is intended only as an example and not as a consumer-ready performance-rating site, does not seek to address every methodologic challenge of public reporting. Rather, it is intended to give users an intuitive understanding of how different weightings can affect overall hospital performance ratings. With further development to assist consumers in setting their personalized weights (perhaps by suggesting weights on the basis of their responses to a questionnaire) and to help them interpret the resulting ratings,4,5 we believe user-determined weights could become a highly desirable feature of future hospital ratings.


From the Northland District Health Board, Whangarei, New Zealand (J.R.-S.); Rand Corporation, Santa Monica, CA (J.R.-S., J.G.), and Boston (M.W.F.); and Brigham and Women’s Hospital and Harvard Medical School — both in Boston (M.W.F.).

1. Friedberg MW, Damberg CL. Methodological considerations in generating provider performance scores for use in public reporting: a guide for community quality collaboratives. Rockville, MD: Agency for Healthcare Research and Quality, September 2011. Google Scholar
2. Yale New Haven Health Services Corporation/Center for Outcomes Research and Evaluation. Overall hospital quality star rating on Hospital Compare: December 2016 updates and specifications report. Woodlawn, MD: Centers for Medicare and Medicaid Services, October 20, 2016 ( Google Scholar
3. Rajaram R, Barnard C, Bilimoria KY. Concerns about using the patient safety indicator-90 composite in pay-for-performance programs. JAMA 2015;313:897-898. CrossRef | Medline | Google Scholar
4. Hibbard JH, Peters E, Slovic P, Finucane ML, Tusler M. Making health care quality reports easier to use. Jt Comm J Qual Improv 2001;27:591-604. Medline | Google Scholar
5. Hibbard J, Sofaer S. Best practices in public reporting no. 1: how to effectively present health care performance data to consumers. Rockville, MD: Agency for Healthcare Research and Quality, June 2010. Google Scholar

This Perspective article originally appeared in The New England Journal of Medicine.

Call for submissions:

Now inviting expert articles, longform articles, and case studies for peer review


A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »

More From Care Redesign
VHA Whole Health System diagram

Finding the Cause of the Crises: Opioids, Pain, Suicide, Obesity, and Other “Epidemics”

Until we redesign our health care system to address our patients’ personal determinants of health, we will continue to inadequately address our multiple chronic disease crises.

Leff06_pullquote home-based medical care for homebound patients

Using Quality to Shine a Light on Homebound Care

How two thought leaders in the fields of home-based medical care, geriatrics, and palliative medicine advanced a quality-of-care agenda for homebound adults.

Charlotte Yeh head shot - hearing aids hearing loss

“You’re Old Without Hearing Aids”— Addressing the Silent Epidemic of Hearing Loss

Hearing loss isn’t a normal consequence of aging. But it is associated with a higher risk of dementia, depression, and falls. The Chief Medical Officer for AARP Services talks about combating this huge but silent epidemic that impacts all ages.

Dentzer01_pullquote - Stone-Age Policies Stifle Modern Virtual Care Solutions

Stone-Age Policies Stifle Modern Solutions

Health care leaders must advocate for regulatory and reimbursement changes to unlock the potential of innovative technology and care team approaches to Parkinson’s and other suitable conditions.

Idiopathic Pulmonary Fibrosis IPF Multidisciplinary Collaborative Care Model

From Consulting to Caring: Care Redesign in Idiopathic Pulmonary Fibrosis

A multidisciplinary collaborative model to address the palliative care needs of patients with idiopathic pulmonary fibrosis resulted in improved end-of-life care and decreased hospital deaths.

Impact of PCSP on Patient Satisfaction at Providence Heart Clinic

Transforming Specialty Practice in Pursuit of Value-Based Care: Results from an Integrated Cardiology Practice

Despite significant primary care reform around patient-centered medical home models, specialty care remains fragmented, with poor communication between primary care and specialists. How should specialty practices be reformed to deliver more coordinated, patient-centered care?

Michael Bennick Yale New Haven Hospital Medical Director of the Patient Experience - Yale Living History Project

The Living History Project: Open-Ended Patient Interviews Create a Therapeutic Bridge

A program at Yale has students conduct open-ended interviews with patients about their lives, their hopes, their values, and what they most want their medical team to know — creating the opportunity for human connection and a better care experience.

Fisher02_pullquote hypertension guidelines

Hypertension Guidelines: Achieving 90% Success

Focused and innovative health systems are managing to control blood pressure for 9 in 10 patients, which is well above the national average of 50% to 60%.

Health Care Organizations Are Moderately Effective in Using Data

Survey Snapshot: Using Data for Change

NEJM Catalyst Insights Council members discuss how data and analytics are being used at their organizations, both now and with the future in mind.

Percentage of U.S. Adult Hemodialysis Patients Achieving Dialysis Adequacy, 2013-2016. Data will be released in early 2019.

Innovation in Dialysis: Continuous Improvement and Implementation

The U.S. dialysis sector has been criticized for its lack of innovation, but this criticism disregards the kidney community’s success in creating — and continuously improving on — dialysis as a safe, globally scaled, quality-oriented outpatient therapy.


A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »


Design Thinking

17 Articles

Key Insights on Launching a Nudge…

Leaders are finding that making higher-value choices easier through subtle changes to choice architecture can…

Finding the Cause of the Crises:…

Until we redesign our health care system to address our patients’ personal determinants of health,…

Finding the Cause of the Crises:…

Until we redesign our health care system to address our patients’ personal determinants of health,…

Insights Council

Have a voice. Join other health care leaders effecting change, shaping tomorrow.

Apply Now