Care Redesign
Relentless Reinvention

It’s Time to Rethink How We Measure Remission from Depression

Article · February 5, 2017

For the estimated 15.7 million U.S. adults with depression, the growing focus on treatment is promising, particularly in primary care settings. This shift in care delivery also demands rigorous measurement of the quality of depression services and their effects on population health. Several measure stewards and clearinghouses — the National Quality Forum (NQF), the National Commission on Quality Assurance, and Minnesota Community Measurement (MNCM) — as well as the Centers for Medicare and Medicaid Services (CMS) are converging on measures that emphasize discrete, narrow follow-up windows.

Our collaborative care for depression program at NYC Health + Hospitals has shown us that measures are most useful when the time frame for follow-up is broader. In this brief article, we aim to make a case for reconsidering how remission from depression is measured.

Widen the Time Frame for Follow-Up Assessment

Our program’s bottom-line quality metric, aligned with New York State Office of Mental Health standards, uses this fraction to measure depression improvement:

  • Numerator: Patients with demonstrable clinical improvement
  • Denominator: Patients enrolled in collaborative care for ≥70 days (patients must have a PHQ-9 score >9 to enroll in the program)

We define clinical improvement as either a Patient Health Questionnaire (PHQ)-9 score <10 (i.e., no or mild depression) or a current PHQ-9 score that is <50% of the patient’s baseline score. For patients enrolled in the program during Quarter 1 of 2016, our improvement rate was 57.6%.

Contrast that method with the NQF measure of depression remission (derived from MNCM and adopted by CMS):

  • Numerator: Adults with major depression or dysthymia and an initial PHQ-9 score >9 who achieve remission at 12 months
  • Denominator: Adults with major depression or dysthymia and an index PHQ-9 score >9

Notably, NQF defines remission as a PHQ-9 score <5 (no depression) at 12 months (+/–30 days); if multiple scores are taken within that 60-day window, the most recent score is used.

Although improvement and remission are separate clinical outcomes, we nevertheless believe that a metric similar to the one we use for improvement would be a better metric of remission than what the NQF currently endorses. That’s because the NQF approach narrows the time frame for scoring the follow-up PHQ-9 to 60 days (i.e., between months 11 and 13 after the initial PHQ-9). This narrowing is not rooted in the evidence. For example, in a study in the Journal of the American Board of Family Medicine, the median time to remission for depression in collaborative-care programs was 86 days (i.e., by month 3). Therefore, the NQF numerator does not capture patients who are successfully identified and treated before the 60-day follow-up window opens at month 11.

We deepened our analysis with data from our own depression registry and electronic health record. Specifically, using the NQF measure, we created three time windows for follow-up PHQ-9 scoring: 11 to 13 months (the prescribed window), 9 to 15 months, and 3 to 12 months after the baseline PHQ-9. The table shows what we found.

Broadening the PHQ-9 Follow-Up Timeframe Captures More Remissions

Broadening the PHQ-9 Follow-Up Timeframe Captures More Remissions. Click To Enlarge.

The data show that the vast majority of patients (305/430) enrolled in collaborative care had no PHQ-9 administered within the narrow follow-up time frame of 11 to 13 months. Using a broader time frame, anchored roughly on median time to remission (86 days) and ending one year from baseline (i.e., 3–12 months), increases the number of patients with a follow-up PHQ-9 within that time frame — and dramatically increases the remission rate.

In our program, when patients show significant, persistent improvement, they are “graduated” from collaborative care (to free up resources for other patients). Graduates receive the same care that the rest of our primary care population does, including universal screening for depression, generally with the PHQ-2, a two-question validated screening tool often used to decide whether to administer a PHQ-9 (a “yes” answer to at least one PHQ-2 question prompts a PHQ-9). Using either a PHQ-2 result equal to zero or a PHQ-9 score <5 as an indicator of depression remission, we found that the remission rate increases dramatically when the follow-up window is wider: from 26.0% (using the 11–13 month time frame) to 51.6% (using the 3–12 month time frame). This wider time frame better reflects how many of our patients in collaborative care actually experience remission from depression, rather than identifying only patients who improved and also happened to undergo rescreening during the narrower (60-day) window.

Finding the Optimal Measure of Remission

These results cast doubt on whether the NQF-endorsed measure optimally captures depression remission, particularly in a primary care population. Indeed, narrow follow-up windows end up excluding most of the population whose quality of care we wish to assess. Nevertheless, the NQF depression measure is now included in multiple measure sets, including a consensus set being defined by the Core Quality Measure Collaborative, led by America’s Health Insurance Plans, CMS, and the NQF. The need for the measure to account for depression services delivered in primary care is underlined by CMS’ move toward paying for collaborative care.

An optimal depression-remission measure might instead follow the example of NQF-endorsed quality metrics for other chronic diseases, such as hypertension and diabetes. For instance, the hypertension measure uses adults with diagnosed hypertension as its denominator and patients with adequately controlled blood pressure at their most recent visit in the measurement year as its numerator. For one core diabetes measure, the denominator includes adults with a diagnosis of diabetes (type 1 or 2) and a numerator that includes patients whose most recent HbA1c level during the measurement year was >9.0% (poor control) and those for whom an HbA1c test was not done during that year.

A feasible revised depression measure might use the current NQF-specified denominator, but with a numerator representing patients with depression or dysthymia whose most recent PHQ-9 is <5 (using the most recent PHQ-2 if it is newer than the PHQ-9). Another option, similar to the NQF-endorsed diabetes-control measure, could have a numerator that includes patients with depression or dysthymia who have a PHQ-9 score ≥10 or who lack a PHQ-2 or PHQ-9 for the measurement year. In either case, a 12-month performance period (i.e., without a prespecified follow-up window) could simplify reporting while measuring the population-wide effect of depression care more accurately.

Perhaps the best and simplest solution would roughly mirror the NQF-endorsed hypertension measure: Use a denominator of patients with diagnosed depression or dysthymia (but without the NQF measure’s initial elevated PHQ-9 requirement) and a numerator of patients whose symptoms are adequately controlled (most recent PHQ-9 score <5 or PHQ-2 = 0), again with a 12-month performance period.


It’s heartening to see care for depression getting more attention, but that progress is not enough. We must ensure that systems of care are built on rigorous, meaningful quality measures. Our experience with collaborative care for depression leads us to favor practical changes in how remission of depression is measured. We hope our concrete suggestions advance the conversation.


The views expressed in this article are those of the authors and do not necessarily represent the views or policy of NYC Health + Hospitals.

This article originally appeared in NEJM Catalyst on November 9, 2016.

New Call for Submissions ­to NEJM Catalyst


A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »

More From Care Redesign
Relentless Reinvention

Adopting Innovations in Care Delivery — The Case of Shared Medical Appointments

Given the effectiveness of group interventions, why aren’t doctors routinely using them to treat physical and mental conditions?

Relentless Reinvention

“Being the Best at Getting Better” — Creating a Culture of Change

How Cincinnati Children’s Hospital Medical Center built a culture focused on broad-based change that is transformational for children and their families.

Relentless Reinvention

Rural Health Care: Thirty Miles at Sea — Providing Consistent Care in an Inconsistent Environment

How one of the smallest hospitals in Massachusetts addresses the needs of its unique population.

Relentless Reinvention

Lessons from Oregon in Embracing Complexity in End-of-Life Care

Persons with chronic progressive medical illness require more care in the ICU and more hospitalizations, and often receive late or no referrals to hospice care. These utilization patterns are strikingly different in Oregon.

Relentless Reinvention

The Move to Value-Based Care in Navy Medicine

Achieving the mission of Navy Medicine to “keep the Navy and Marine Corps family ready, healthy, and on the job” requires rethinking current health care delivery models.

Relentless Reinvention

Improving Access to Specialist Expertise via eConsult in a Safety-Net Health System

Electronic referral system supports communication between primary care and specialty providers.

Relentless Reinvention

Redesigning the Delivery of Specialty Care Within Newly Formed Hospital Networks

As the trend toward hospital mergers and consolidations continues, how can newly formed health care networks optimize their delivery of specialty care? They will need to consider a redesign of service lines that includes both centralizing and decentralizing strategies.

Relentless Reinvention

Survey Snapshot: Genomic Data Is Far from Clinical Use

NEJM Catalyst Insights Council members say that clinical and cost data will continue to be the most useful data sources.

Relentless Reinvention

Why Every Health Care Organization Needs a Data Science Strategy

Data science strategy can help providers tap into the power of their data, improve its quality, and keep it safe.

Relentless Reinvention

Learning to Drive — Early Exposure to End-of-Life Conversations in Medical Training

The importance of listening to the patient at the end of life.


A weekly email newsletter featuring the latest actionable ideas and practical innovations from NEJM Catalyst.

Learn More »


Coordinated Care

87 Articles

Patient Engagement Survey: How to Hardwire…

Technology and social networks can help, but nurses and care teams remain essential, say NEJM…

Social Needs

41 Articles

Evaluating Complex Care Programs: Is It…

Policymakers see programs for complex patient populations as a way to bend the health care…

Reading List: Dave Chokshi and François…

NEJM Catalyst Thought Leaders weigh in on the most influential and inspiring texts of their…

Insights Council

Have a voice. Join other health care leaders effecting change, shaping tomorrow.

Apply Now