Tom Lee: This is Tom Lee on behalf of NEJM Catalyst, and I’m speaking today with Arnold Milstein. He’s a Professor of Medicine at Stanford, where he directs the Clinical Excellence Research Center where he and his colleagues study scalable health care delivery innovations that provide better care with fewer resources. In a way, this is a continuation of work that Arnie’s been doing for decades now. I first met him 20 years ago.
Arnie, you were working as a consultant to health care purchasers, and I was in senior management at Partners Healthcare at the time. And, I’ll be honest, you were giving them advice on how to put pressure on providers like my colleagues and me to improve quality and efficiency, and I feel a little sheepish today about how we pushed back on things that you were advocating like transparency, that frankly, I think, virtually everyone embraces now. Now, I’m an optimist, and I like to think that the overall arc of the story is that we’re making progress and the health care system is getting better, and we’re on the right path. But what’s your take? Do you feel like things are headed in the right direction? Do you see signs of progress, too?
Arnold Milstein: I do. I think they are certainly a lot slower than policymakers would like it to be, but my intuition is that, though the actual rate of change is limited, because of that there has been deeper thinking about how to create a tailwind for change. That has begun to unfold very actively, really since the very end of the Clinton administration, when the National Quarterly Forum was launched as an effort to simplify performance measurement in the health care delivery system — performance measurement in all three primary domains of value, referring to patient experience, total cost of resources used, and measures of clinical effectiveness, both process and outcome.
That was the beginning, and then there’s been a steady improvement that caught tailwinds for a better performing health care system. First, in the form of the early CMS efforts at hospital transparency and then, over time, various congressional and state legislative initiatives to begin to make performance comparisons more easily available and transparent. And, more recently, to use comparative performance measures to begin to either affect what Medicare pays, and then in the case of the private sector, affect which providers are included or have favorable copays associated with use by a consumer.
Lee: You’ll probably say I’m giving you too much credit, but when I think of you, I think you’re definitely in a core group of people behind several innovations that were sort of stunning when they were introduced, ranging from transparency to tiered networks, narrowed networks, profiling individual physicians based upon their costs over episodes of care as well as their quality of care. When you think back on some of these new ideas that you hatched or were part of hatching, which of them do you think turned out best, and why?
Milstein: Generally speaking, making performance comparisons more generally available, both to clinicians and to patients in the public, I believe has had the most beneficial effect. Among the beneficial effects have been, first of all, a realization that we have to do a much better job of collecting data in order to create measures that everybody has confidence in. And then secondly, there is I think a reasonable amount of evidence — particularly when clinicians are made aware of performance differences — that it does affect the rate at which they improve whatever facet of performance is being measured and made more visible.
Lee: One of the reasons why my colleagues and my partners actually enjoyed sitting down with you, even though the things that you were introducing were making us grumpy, was that you were listening and you were thinking and responding to the objections that came up and learning when things didn’t go well. Were there innovations that had unexpected adverse consequences that sent you back to the drawing board to either unwind them or at least tweak them?
Milstein: Very definitely so, Tom. Early on, it became apparent that the risk adjustment methods that we were using for many of the measures were not sufficiently taking into account the challenges that clinicians face in highly specialized medical subspecialties where, by the nature of the very specialties themselves, the physicians were getting a group of patients who were widely different than other patients in that same clinical category.
An example might be an ophthalmologic surgeon who specializes in a very narrow and high risk domain of ophthalmologic surgery. When the initial performance measures that might have looked at that tended to lump all ophthalmologic surgeons together for purposes of judging resource use or quality, on the face of it, anyone probably could have intuited that those comparisons would not at the beginning have been fair.
It was partly out of the kind of very constructive dialogue, Tom, that you and other physician leaders in Massachusetts initiated, that we began to realize that, among other things, we would need as part of this public reporting and tiering process to give physicians a chance to review the data and not only suggest refinements in the measuring method, but also to comment on, even if the measure was good, why in their particular case, perhaps flaws in how the data were collected were resulting in a mischaracterization of their relative performance.
Lee: One thing I’ve never asked you in the couple decades we’ve known each other is, is there such a thing as being too responsive to providers who are bringing up issues? I wouldn’t have asked that when the issue was being responsive to me — I of course wanted that. But if you’re responsive to everyone and every issue they bring up, can you end up with something that’s too complex or that is simply not scalable?
Milstein: Very definitely so, and I think for me, the illustrations come from other domains, where pretty good measures have generally resulted in a real healthy self-consciousness on the part of professionals with respect to their performance and efforts to improve. I’m reminded of some of the early days, for example, when Michelin began to bring its rating system to the U.S. Like any new rating system, there were a myriad of execution flaws, and many American chefs were able to surface a lot of complaints about how it was done. Some of them were quite valid, but generally speaking, the consensus among the chefs was that bringing a form of rating system that was so widely respected to the U.S. did do a lot for how motivated they were to do their very best every day.
Lee: One quote from you that I still use often to this day is, you said, “Measures only get better when you use them.” And you were making the point that probably no measure is ever perfect, but even when they might be bad when you start or the data might not be very good, they’ll only get better when you get going. Can you give any examples of that? Do you still feel that way?
Milstein: Definitely. One of the things about the very first and most primitive effort to publicly compare quality performance in the U.S. probably was Medicare’s original effort to compare hospitals with respect to death rate, and it was publicly reported. This was, I think, HHS Secretary Joe Califano, and it surfaced a lot of problems. For example, hospitals that had more well-developed hospice services ended up being unfairly mischaracterized, because back then, hospices were part of inpatient units. If a hospital was a regional referral center for patients who wanted inpatient hospice care, you can imagine what that did to their comparative score when it was publicly reported.
But the imperfections of that then led to very constructive responses. For example, as various politicians from both parties began to push forward for public performance reporting, it was pointed out that if you wanted to come up with a better measure, one would have to have much better coding of Do Not Resuscitate, some way of designating which patients were hospice patients and half of those carved out of the inpatient reporting system, and the list goes on. For example, it turned out that in those days, hospitals were not required to differentiate whether the diagnoses listed beyond the reason for admission, the so-called principal diagnoses, were present on admission or not, and it became quickly apparent that if you wanted to validly compare hospitals with respect to both complications and mortality, hospitals would need to begin to code whether a secondary diagnosis was or was not present on admission. So, it was the public transparency that then led to a variety of corrective actions. I think about 10 years ago was the point at which, for the first time, hospitals, when reporting to Medicare, differentiated which of the secondary diagnoses were and were not present on admission.
Lee: That’s such a great example. It just shows that getting started and then making something better is so much more fruitful than analyzing how it might occasionally yield perverse findings. Today, you’re working really hard at studying innovations at Stanford, and let’s just close by asking for one example of something that has you really excited. You’re an optimistic guy, and I’m sure that you have some ideas of things that are sustaining your optimism. If you could bring up your favorite of the moment, that’ll be a good way to end our podcast.
Milstein: Sure. I’ll just pull something from our Care Redesign Fellowship Program, which is an effort to bring postdoctoral and MD-trained young people together to reexamine a large facet of American health care and discern better ways and more affordable ways of treating them. Through their eyes I’ve learned a lot. They do methodical literature searches. They visit so-called positive outlier sites around the country on valuing. What I’ve learned from them this year, among other things, is that when patients with almost any form or degree of dementia are admitted to hospitals and, as most of them do, suffer delirium in the hospital, that when they return home, their measured rate of acceleration of their dementia increases. Our fellows are beginning to articulate a theory that, unless there is a very high predicted beneficial response to hospital therapy, maybe in the U.S. there should be among clinicians and families a realization that hospitalizing someone with dementia should be regarded as a almost never event.
Lee: Well, Arnie, I’m grateful to you for spending the time with us today, and of course you should be proud of the work you have done for the decades in the past. I know there’s much more ahead, and I hope that from time to time you’ll share some of it with the visitors to the NEJM Catalyst website. So, thanks again, and I look forward to our next conversation.
Milstein: You’re welcome, Tom. Thanks again for being a great partner when you were at Partners in helping us move toward better measures rather than waiting another decade.