The quality-measurement enterprise in the U.S. health care industry faces a conundrum. Should we collect data on hundreds of measures like those in Medicare’s merit-based incentive payment system (MIPS) and continue to expand the measure set by vetting ever more measures, or should we scale back to a core set of measures as exemplified by the Meaningful Measures initiative recently launched by the Centers for Medicare and Medicaid Services (CMS)? An expanded measurement approach is expensive (U.S. physician practices already spend $15.4 billion each year reporting their quality measures1), and the results are often plagued by small sample sizes and unmeasured case-mix differences among providers. Apart from the expense of gathering data for these measures, the whole quality-measurement activity can be distracting and can divert energy from other important quality-improvement activities.2 Frustration with quality measurement leads to complaints from clinicians and provider organizations that such a “more is better” approach is just too burdensome.3
Scaling back to fewer “meaningful measures” has its own weaknesses. Quality is multidimensional, and measures are often not well correlated with each other. Any core measure set will leave considerable gaps (which the extra measures can only partially fill). Individual delivery systems or health plans may do better, since some have access to very detailed data, but much quality measurement, particularly by CMS, must apply as broadly as possible.
Addressing this conundrum requires recognition that there is no universal solution. In particular, there is tension among quality assurance (identifying poor performers), quality improvement (providing data to help all providers do better), and directing patients to the best providers for their care. The latter two goals require detailed data on many measures. It is the pursuit of these goals, at least on a national level, that has led to the current vast and burdensome, yet still incomplete, quality-measurement system. We believe that targeted supplemental data collection may be a valuable approach to balancing data needs with data-collection costs, in particular for quality assurance.
At a national or even state level, a focus on quality assurance would be valuable. Once low-performing providers were identified, they could be sanctioned (e.g., with financial penalties or prohibition from sharing savings in alternative payment models). Identifying low-performing providers and excluding them from alternative payment models or narrow networks could offer some comfort that these innovations were not promoting substandard care.
It is reasonable to debate the merits of focusing on providers in the lower tail of the quality distribution, but the largest opportunities to improve quality most likely exist in low-performing delivery systems, and being labeled low-performing may motivate those systems to improve. Moreover, whereas capacity issues make it impractical to limit participation in coverage networks to high performers only, it is easier to exclude the lowest performers.
Identifying low-quality providers requires an approach that avoids universal collection of data that are expensive to gather but still offers better insights than a core measure set alone. We propose a targeted supplemental data-gathering approach, in which data on core measures would be collected from all providers, and those performing below a specified threshold would be required to provide additional data (e.g., other measures or data for case-mix adjustment). Because supplemental data would be collected on only a subset of provider organizations, the burden would be reduced for the others. Such an approach may provide a cost-effective way to observe the quality of care offered by providers of concern while avoiding the cost of collecting excessive data on all providers.
The “initial pass” core measures should be ones that are hard to game and that impose a low administrative burden. Mortality and patient-reported access to and experience with care are good candidates for inclusion in this core set. Others, ideally those related to outcomes for major diseases, would need to be developed. The key is that the expense of adding a core measure must be justified by its ability to improve health.
Providers with high scores on these core measures might be exempted from reporting supplemental data (or from reporting them as frequently), while for those falling below a set threshold, more detailed data, probably involving chart review, could be gathered. For example, if we wanted to identify the lowest 10% of providers, we might need to gather additional data on only the 20% or 30% of providers that perform the worst on the core measures. Providers whose supplemental data reveal them to actually be performing well could then be exempted from the added data requirements for a certain period, unless there was deterioration in observed performance on the core measures.
The targeted supplemental data approach is analogous to a screening test: a positive result often does not confer a diagnosis but instead identifies when further exploration is warranted. Many providers initially categorized as such will not in fact be low-quality (false positives) but will only appear to be poor performers because of random variation in the core-measure data or unmeasured confounders (such as inability to adjust for socioeconomic factors). Analysis of targeted supplemental data would reveal that these providers’ performance is actually adequate. Such expanded data collection might protect providers serving disadvantaged populations that are more likely to be in the bottom tail of performance on core measures because of a difficult case mix.4 Currently, these providers may arbitrarily be subject to reductions in payment or disadvantageous network placement because of inadequate data collection.
Though some observers may be disappointed that the approach described here is not appropriate for certain other important purposes, such as supporting provider improvement across the entire spectrum of care or supporting patients’ choices of providers, we should not expect a single measurement system to work for everything. Moreover, this approach does not solve the problem of multiple, often not harmonized, measurement systems. Yet quality assurance is an important goal in itself, and we believe the measurement system described here would be helpful for this purpose.
This approach is intended to reduce the cost of an expansive measure set but avoid the gaps in a core measure set. Reducing the number of providers targeted for detailed measurement would reduce the administrative burden on providers, approximating a core-measure approach. If the number of providers targeted for supplemental data collection were large, this approach would alleviate concerns about falsely identifying low performers as adequate — but, like the status quo, would be more burdensome. The advantage is that it could identify low performers more accurately than a core measure approach alone, and it could do so at lower cost than a system in which data on all measures are gathered for all providers. It thus reflects a balance between retreating to a core measure set (that will have inherent limitations) and creating an elaborate, extensive measure system that will be burdensome but still less complete and more inaccurate than we want.
Although more data on all providers would clearly be better if they cost less to collect, such an approach is impractical because of the financial and time burdens it would impose. With a targeted approach, we may not be able to get exactly what we want, but we may get what we need.
From the Department of Health Care Policy, Harvard Medical School, Boston.
1. Casalino LP, Gans D, Weber R, et al. US physician practices spend more than $15.4 billion annually to report quality measures. Health Aff (Millwood) 2016;35:401-406. CrossRef | Medline
2. Penso J. A health care paradox: measuring and reporting quality has become a barrier to improving it. STAT News. December 13, 2017 (https://www.statnews.com/2017/12/13/health-care-quality/).
3. Meyer GS, Nelson EC, Pryor DB, et al. More quality measures versus measuring what matters: a call for balance and parsimony. BMJ Qual Saf 2012;21:964-968. CrossRef | Web of Science | Medline
4. Roberts ET, Zaslavsky AM, McWilliams JM. The Value-Based Payment Modifier: program outcomes and implications for disparities. Ann Intern Med 2018; 168:255-265. CrossRef | Medline
This Perspective article originally appeared in The New England Journal of Medicine.