For the estimated 15.7 million U.S. adults with depression, the growing focus on treatment is promising, particularly in primary care settings. This shift in care delivery also demands rigorous measurement of the quality of depression services and their effects on population health. Several measure stewards and clearinghouses — the National Quality Forum (NQF), the National Commission on Quality Assurance, and Minnesota Community Measurement (MNCM) — as well as the Centers for Medicare and Medicaid Services (CMS) are converging on measures that emphasize discrete, narrow follow-up windows.
Our collaborative care for depression program at NYC Health + Hospitals has shown us that measures are most useful when the time frame for follow-up is broader. In this brief article, we aim to make a case for reconsidering how remission from depression is measured.
Widen the Time Frame for Follow-Up Assessment
Our program’s bottom-line quality metric, aligned with New York State Office of Mental Health standards, uses this fraction to measure depression improvement:
- Numerator: Patients with demonstrable clinical improvement
- Denominator: Patients enrolled in collaborative care for ≥70 days (patients must have a PHQ-9 score >9 to enroll in the program)
We define clinical improvement as either a Patient Health Questionnaire (PHQ)-9 score <10 (i.e., no or mild depression) or a current PHQ-9 score that is <50% of the patient’s baseline score. For patients enrolled in the program during Quarter 1 of 2016, our improvement rate was 57.6%.
Contrast that method with the NQF measure of depression remission (derived from MNCM and adopted by CMS):
- Numerator: Adults with major depression or dysthymia and an initial PHQ-9 score >9 who achieve remission at 12 months
- Denominator: Adults with major depression or dysthymia and an index PHQ-9 score >9
Notably, NQF defines remission as a PHQ-9 score <5 (no depression) at 12 months (+/–30 days); if multiple scores are taken within that 60-day window, the most recent score is used.
Although improvement and remission are separate clinical outcomes, we nevertheless believe that a metric similar to the one we use for improvement would be a better metric of remission than what the NQF currently endorses. That’s because the NQF approach narrows the time frame for scoring the follow-up PHQ-9 to 60 days (i.e., between months 11 and 13 after the initial PHQ-9). This narrowing is not rooted in the evidence. For example, in a study in the Journal of the American Board of Family Medicine, the median time to remission for depression in collaborative-care programs was 86 days (i.e., by month 3). Therefore, the NQF numerator does not capture patients who are successfully identified and treated before the 60-day follow-up window opens at month 11.
We deepened our analysis with data from our own depression registry and electronic health record. Specifically, using the NQF measure, we created three time windows for follow-up PHQ-9 scoring: 11 to 13 months (the prescribed window), 9 to 15 months, and 3 to 12 months after the baseline PHQ-9. The table shows what we found.
The data show that the vast majority of patients (305/430) enrolled in collaborative care had no PHQ-9 administered within the narrow follow-up time frame of 11 to 13 months. Using a broader time frame, anchored roughly on median time to remission (86 days) and ending one year from baseline (i.e., 3–12 months), increases the number of patients with a follow-up PHQ-9 within that time frame — and dramatically increases the remission rate.
In our program, when patients show significant, persistent improvement, they are “graduated” from collaborative care (to free up resources for other patients). Graduates receive the same care that the rest of our primary care population does, including universal screening for depression, generally with the PHQ-2, a two-question validated screening tool often used to decide whether to administer a PHQ-9 (a “yes” answer to at least one PHQ-2 question prompts a PHQ-9). Using either a PHQ-2 result equal to zero or a PHQ-9 score <5 as an indicator of depression remission, we found that the remission rate increases dramatically when the follow-up window is wider: from 26.0% (using the 11–13 month time frame) to 51.6% (using the 3–12 month time frame). This wider time frame better reflects how many of our patients in collaborative care actually experience remission from depression, rather than identifying only patients who improved and also happened to undergo rescreening during the narrower (60-day) window.
Finding the Optimal Measure of Remission
These results cast doubt on whether the NQF-endorsed measure optimally captures depression remission, particularly in a primary care population. Indeed, narrow follow-up windows end up excluding most of the population whose quality of care we wish to assess. Nevertheless, the NQF depression measure is now included in multiple measure sets, including a consensus set being defined by the Core Quality Measure Collaborative, led by America’s Health Insurance Plans, CMS, and the NQF. The need for the measure to account for depression services delivered in primary care is underlined by CMS’ move toward paying for collaborative care.
An optimal depression-remission measure might instead follow the example of NQF-endorsed quality metrics for other chronic diseases, such as hypertension and diabetes. For instance, the hypertension measure uses adults with diagnosed hypertension as its denominator and patients with adequately controlled blood pressure at their most recent visit in the measurement year as its numerator. For one core diabetes measure, the denominator includes adults with a diagnosis of diabetes (type 1 or 2) and a numerator that includes patients whose most recent HbA1c level during the measurement year was >9.0% (poor control) and those for whom an HbA1c test was not done during that year.
A feasible revised depression measure might use the current NQF-specified denominator, but with a numerator representing patients with depression or dysthymia whose most recent PHQ-9 is <5 (using the most recent PHQ-2 if it is newer than the PHQ-9). Another option, similar to the NQF-endorsed diabetes-control measure, could have a numerator that includes patients with depression or dysthymia who have a PHQ-9 score ≥10 or who lack a PHQ-2 or PHQ-9 for the measurement year. In either case, a 12-month performance period (i.e., without a prespecified follow-up window) could simplify reporting while measuring the population-wide effect of depression care more accurately.
Perhaps the best and simplest solution would roughly mirror the NQF-endorsed hypertension measure: Use a denominator of patients with diagnosed depression or dysthymia (but without the NQF measure’s initial elevated PHQ-9 requirement) and a numerator of patients whose symptoms are adequately controlled (most recent PHQ-9 score <5 or PHQ-2 = 0), again with a 12-month performance period.
It’s heartening to see care for depression getting more attention, but that progress is not enough. We must ensure that systems of care are built on rigorous, meaningful quality measures. Our experience with collaborative care for depression leads us to favor practical changes in how remission of depression is measured. We hope our concrete suggestions advance the conversation.
The views expressed in this article are those of the authors and do not necessarily represent the views or policy of NYC Health + Hospitals.
This article originally appeared in NEJM Catalyst on November 9, 2016.