In health care, variation is the state of nature. Despite efforts to introduce protocols, care pathways, order sets, checklists, and standardized quality measures, providers vary in their practice to a remarkable degree. Differences in outcomes and cost of care are visible when comparing countries, geographical regions, hospitals within a single city, and providers in the same practice. Some of this variation is doubtless benign. In other cases, it contributes to poor outcomes. And sometimes, perhaps frequently, it results in waste — greater cost without better outcomes.
Selective Pressure on Practice
The situation that health care practitioners find ourselves in might be described with an evolutionary analogy. In nature, random mutations in DNA sequences create genotypic variation, which results in phenotypic variation (observable differences in species). In the presence of selective pressure, some phenotypes propagate their genes more successfully than others. In medical practice, “genotypes” are the many individual decisions that constitute a provider’s method of practice, such as the selection of a medication, a surgical approach, a communication style. “Phenotypes” are the cost and clinical outcomes of care. Controlling for patient differences, it is variation in these many elements of practice (genotype) that drives differences in cost and outcomes (phenotype). Contemporary health care is characterized by abundant variation in practice and very little selective pressure to weed out those that don’t add value. Over a century and a half of modern medicine, a great variety of practices have developed, largely in the absence of forces that reward practices leading to good outcomes and low cost.
Times are changing. Governments, consumer groups, and patient safety organizations are advocating for more transparency in the outcomes of care, though the metrics generally promoted by such groups encompass only a small fraction of the outcomes desired by patients when seeking care. Perhaps more importantly, health care has become increasingly unaffordable, resulting in downward pressure on reimbursement to providers. Hospitals are especially affected, experiencing negative margins on once-lucrative procedures. Health care administrators, having negotiated the best rates they can on supplies, and having cut labor costs as much as they feel possible, are left with an uncomfortable reality. If their hospitals are to succeed, they will need to address a much more challenging source of cost: physician practice.
Administrators want physicians to lower cost. Physicians want to optimize patient outcomes. While these conversations can go badly, they also have the potential to result in something very good — the selective pressure that health care has long lacked. But making that selective pressure productive requires a method of measuring variation in the cost and outcomes of care and tying that variation to discrete differences in clinical practice that can be changed.
Good Doctors, Bad Doctors — a False Dichotomy
At Providence St. Joseph Health, the third-largest nonprofit health system in America, with 24,000 physicians in seven states, we have developed a method for simultaneously measuring cost and outcomes and “drilling down” into the specific practices that drive variation. We refer to this as a Value-Oriented Architecture (VOA). The ability to drill down is essential. Consider the example of cost and outcomes variation for total knee replacements. Figure 1a is a Value Plot, displaying what our evolutionary analogy would call phenotypic variation. Each circle represents the performance of one high-volume orthopedic surgeon for primary elective unilateral knee replacement. The y-axis represents average direct cost per case (plotted in reverse order so that lower costs are higher on the graph); the x-axis is a composite outcome score (better scores are further to the right). Circles in the right upper quadrant thus seem to represent “high-value” practitioners.
At this summary level of analysis, where in the past our conversation may have ended, there is a tendency to overinterpret the results. Misleadingly, the graphic makes us think of “good doctors” and “bad doctors,” and suggests that the best solution might be to pick out “bad doctors” for remediation or removal. While that might be possible, it would be a very incomplete solution to the problem of health care waste. A more productive activity is to dive deeper and identify the drivers of variation.
Figure 1b simplifies the view by displaying only the cost dimension of total knee replacements. A different story emerges when cost is exploded into its primary component parts: implants, OR/anesthesia time, pharmacy room and board, and supplies. Figure 1c plots each individual as a ratio of his or her costs within each of those categories to the system average (marks above 1 indicate higher than average costs; marks below 1 indicate lower than average). A “good doctor” in this view would be someone with low costs in all categories. Figure 1d shows that while such physicians exist, they are rare. So are “bad doctors” — providers with high costs across the board. In fact, nearly all surgeons display “crossover” in this graph; that is, their cost is relatively high in some categories and relatively low in others.
This phenomenon becomes even clearer as we move more deeply into the data. Figure 1d seems to indicate that within the category of pharmaceuticals, some physicians are high cost and other are low cost. However, even that distinction hides important variation. Figure 1e is a deeper view of pharmacy cost. Here, overall pharmacy costs are broken into subcategories that group similar agents together: in this case, analgesic and anesthetic agents and hematological agents (overall, the two largest pharmaceutical cost drivers in total knee replacement). The y-axis again expresses each surgeon’s cost as a ratio to the system average. Again, we see that some physicians cross over between categories — that is, they are low cost in some things and high cost in others. With respect to cost, “good practice” and “bad practice” often exist within the same doctors.
Everyone Has Something to Learn
Underneath the pharmaceutical agent categories lies the genotypic view. Figure 1f shows the breakdown of costs for one pharmaceutical subcategory, hematological agents. Here we discover that the use of Tisseel, a branded fibrin sealant, and tranexamic acid, an agent used to control blood loss, are the underlying practice variants. These practice differences are driving variation in hematological cost, which in turn contributes to variation in pharmaceutical cost and eventually to overall cost.
For each clinical area that we have explored using VOA, there are many small differences in practice like this one, which together have a large cumulative impact on cost. For example, Table 1 shows that for total knee replacements, the cost per case of a hypothetical surgeon whose practice consisted only of the most cost-effective practice variations is 35% lower than the system average. Today, Providence St. Joseph’s most cost-effective provider is 20% lower than the system average, meaning that even he has opportunity for improvement.
The work of uncovering the specific elements of practice that drive cost differences has been illuminating and at times surprising. We have uncovered many examples of administrative quirks (“I have no idea how that got into my preference card”); cost ignorance (“This costs how much?”); misleading information (“The rep told me this was cheap”); and a lack of knowledge about the standard of care (“So I’m the only one doing this?”)
As we have explored variation across clinical areas, our philosophy on how value can be improved has evolved. If, indeed, low- or high-value practice variations can exist within a single practitioner, then an approach of keeping the good surgeon and removing the bad surgeon would be ineffective. Rather, a discovery-driven approach was needed to uncover the higher-value practices that lay beneath each cost category and could be codified and spread. Our mantra has become “everyone has something to learn.” VOA is designed to be the tool to enable such learning.
Keys to a Value-Oriented Architecture
VOA is a method of organizing and displaying cost and quality data, allowing detailed drill-down into the drivers of variation. In developing it, we have learned several lessons that we hope will guide others in similar efforts:
1. Partner with clinicians.
Anyone who has worked in health care analytics has heard a physician say, “The data is wrong.” In our experience, the best way to win physician buy-in is to short-circuit that objection. After preparing a preliminary data set, we deliver it to physicians and say, “We know the data is wrong; now help us make it more useful to you.” In promulgating VOA, partnerships with clinicians to develop condition-specific cohort definitions, methods of risk-adjustment, and outcomes have been essential to getting the data “right.” Providence St. Joseph Health has created physician and administrative leadership committees (institutes) that organize thought leadership along service lines across our 51 hospitals. These institutes have been a powerful mechanism for bringing clinicians to the table early on in the VOA development process and ensuring ongoing engagement with the data.
2. Build cohorts that make sense to clinicians.
Historical calculations of cost are often limited to administratively defined cohorts, usually based on Diagnosis-Related Groups (DRGs) for inpatient care. But DRGs are frequently misaligned with how physicians group their patients. Take spinal fusion surgery as an example. DRGs do not differentiate between cases based on the number of spinal levels fused, despite this being an important driver of cost and clinical outcomes. We have found that comparisons of cost and outcomes across surgeons for a single DRG are not meaningful because of differences in numbers of level fused. To correct this problem, we developed a Natural Language Processing tool to read surgeons’ operative notes and identify the number of levels that were fused in each case. This allows us to compare costs for surgeons operating on the same number of levels. Similarly, cardiac surgeons are not comfortable with the DRG grouping for their cases, so we instead cohort patients based on risk of mortality from a linkage to our Society of Thoracic Surgeons (STS) National Database data.
3. Stratify on risk, but be comfortable with imperfection.
Sometimes a doctor’s patients really are sicker. Risk adjustment that is credible with clinicians is critical. This has required us to at times adopt risk stratification methods based on registry data (e.g., the STS database), and at other times to create our own. We do believe, however, that it is important not to succumb to the temptation to overinvest in risk adjustment efforts. We will sometimes pose this question to clinicians: “Would you rather have data that is 80% adjusted now or data that is 95% adjusted next year?” Usually, people pick the former. We get to that 80% quickly by simple stratification. More thorough adjustment requires more sophisticated regression models that we develop over time. Comfort with 80% accurate data comes in part through our discovery-driven philosophy of change. As we explain to clinicians, VOA is not designed to tell us what to do; it’s designed to tell us which conversations to have. We believe the historical demand for near-perfect risk adjustment is due to an overemphasis on external reporting of physician or hospital ranks rather than the practice variation that lies beneath those relative distributions.
4. Make sure they “give a darn.”
To ensure clinical buy-in to value comparisons, we engage deeply with practicing clinicians from across our system to define outcomes that matter to them. This is the most time-consuming part of building VOA. We use an exercise we call the “Give a Darn Test.” In weighing potential outcomes, we ask physicians to consider two questions: “If we told you that you’re better than your peers on this outcome, would you feel good about yourself? If we told you that you’re worse, would you feel motivated to change your practice?” If the answer to both of those questions is yes, the outcome passes the Give a Darn Test, and we invest substantial effort to bring that outcome into VOA. As examples, our current data sets include patient experience, patient-reported outcomes, infection rates, readmissions, reoperations, return to ED, mortality, and surgical complications. The multiple dimensions of outcomes are brought into a single composite score for each cohort, borrowing statistical principles from the Society of Thoracic Surgeons composite methodology. This allows plotting of outcomes and costs on a single graph.
5. Develop a normalized patient-level view of costing.
Our detailed genotypic analyses are built on a foundation of patient-level activity-based cost accounting. This method attempts to allocate to each patient the cost of the resources actually consumed. We benefit from major investments into our cost accounting system over the last 5 years to inform this level of granularity (though we also point out to users that our current system is still in evolution and is not perfect). Armed with patient-level costing data, one can reasonably distinguish and then accumulate costs over the many resource types consumed in a given episode of care. These distinctions allow us to determine the cost impact of differences in practice. For example, by estimating the cost of a minute in the operating room or an hour in a nursed bed for a given patient type, we can estimate the cost impact of differences in time efficiency in either of those spheres. To further make this data comparable across facilities, we need to normalize the cost data. Different facilities inherently have different cost structures that are outside the control of practicing physicians (such as regional labor cost differences or facility age and extent of depreciation), as shown in Figure 2a. By normalizing the cost, we assign each activity the system average cost, therefore representing the typical cost impact of differences in practice, and avoid confusing practice differences with accounting differences (see Figure 2b).
6. Organize costs into a hierarchical taxonomy.
The volume of different activities and resources used in the support of patients is tremendous. Outputting full detail of these activities without an intuitive hierarchy would be overwhelming. We have built a classification system using a combination of existing taxonomies (Current Procedural Terminology, Uniform Billing Revenue codes, and United Nations Standard Products and Services Code), as well as internally developed classifications for other costed items (Figure 3). In addition to simplifying the data exploration process, this taxonomy is critical for algorithmic analysis. Typically, we aim for these intermediate classifications to be the lowest level of substitutable practice. In the earlier example, the grouping of hematological agents underneath pharmacy should contain all practice variants used to control blood loss for knee replacement patients. We can then use data mining algorithms to “sniff out” practice areas where there is a high degree of variation in cost, and surface the drivers of those differences.
7. Create visuals that engage.
All of the organizational and architectural work must come to life in visuals that speak to clinicians. The summary Value Plot is where that conversation begins (Figure 4). Several aspects are impactful. Transparency heightens the social recognition of physicians’ relative standing and stimulates them to want to explore underlying practice drivers. Circle size based on case volume communicates relative stature. Seeing a surgeon with high volume, good outcomes, and low cost stimulates interest in understanding his or her practice. Statistical significance guides conclusions, helping to separate signal from noise. Trends add context and indicate whether outcome or cost performance is sustained or may represent a blip in time. Filtering and highlighting allows the user to answer real-time ad-hoc questions, such as “How do my cost and outcomes compare to peers only within my facility?” or “How do my supply costs compare to a certain peer facility or surgeon?” Any combination of outcome or cost category can be pitted against others among surgeons, facilities, or regions. This dynamic data allows for rapid-cycle hypothesis testing to move quickly from the “but what about” stage to the “ah, we should really explore what’s going on there” stage.
Cost and Outcomes
People often ask us how physicians respond to this data. We have found they are hungry for it. Like hospitals, individual physicians increasingly face pressure to demonstrate their value. Without data, it is difficult to find actionable ways to change. Sometimes those changes come easily, sometimes not.
Returning to our example of primary unilateral knee replacements, the genotypic analysis reveals that bone cement with commercially impregnated antibiotics is a particularly costly practice variant. Because VOA includes outcomes data, we can compare the incidence of prosthetic joint infection between heavy users of impregnated bone cement and non-users. The result: no difference in infection rates between the two cohorts. Most of our orthopedic surgeons, when presented with this data, have been willing to change practice (Figure 5). In aggregate, small changes like these have already reduced the average cost per case across the Providence St. Joseph system by over $200. With 10,000 cases performed each year, those small changes add up.
Our data also reveal large cost differences driven by things like vendor selection. The cost of the core components of a knee varies by over $1,000 between vendors. But here, the path to change is more difficult. Meaningful comparison of cost and outcomes data between implants, focusing on longevity, would take over a decade. And the relationships between surgeon and vendor can be deep and difficult to disrupt. So what then? Can we only achieve the easy changes? We don’t think so. But we do think it will take more than data to make change happen for some types of practice.
What will enable progress on those more difficult practice changes? We believe greater alignment between physicians and hospitals on proving value will be key. In that way, vendors will bear the onus of proving the net value of their products to both the hospital and the physician, rather than pitting the two parties against each other.
Coupled with transparency around cost and outcomes, and with alignment between clinicians and hospitals, we believe the “selective pressure” of a data-driven Value-Oriented Architecture can drive adoption of higher-value clinical practices that have often lain dormant among a handful of providers. In this light, existing variation will become the path to change rather than a headache for administrators. As Charles Darwin recognized, variation contains the key to survival. It may also point the way to higher-value care.