Evaluation of the Use of AHRQ and Other Quality Indicators
Chapter 4. Evaluation of the AHRQ QI Program
In this section, we discuss the results of our environmental scan and interviews with regard to the evaluation of the AHRQ QI indicators. We organize the discussion according to four factors that are used as criteria for evaluating quality indicators: importance, scientific soundness, usability, and feasibility. Since this report focuses on the AHRQ QI program as a whole, the comments and insights should be interpreted broadly, and not as critiques of individual indicators. For example, "importance" here refers mainly to interviewees' perceptions of the AHRQ QI program as a whole, not the importance of the constructs underlying specific AHRQ QIs.
4.1.1. Users' general views on the importance of the AHRQ QI program
Representatives of nearly all of the organizations stressed the importance of the AHRQ QI program. When asked an open-ended question about the role of AHRQ in quality measurement, 11 of 54 interviewees identified AHRQ as a "national leader" in measurement development and research. The AHRQ QI program was described by a vendor as "a major player, both nationally and internationally...a leader, the top of the pyramid." One interviewee captured this sentiment:
AHRQ is a very important player and has a rich history of research and evidence basis. The products they provide help everyone develop measures, such as the National Guideline Clearinghouse. The measures they have done to date have an audience, a place and a role—I know states use them.
Interviewees stressed that without the AHRQ QIs, they would have few alternatives and would likely have to drastically change or eliminate their quality reporting and/or measurement activities. As discussed in more detail below, the scientific soundness of the QIs was highly regarded, as was the transparency of the QI evidence review and validation that was conducted as a part of the AHRQ QI development process.
Interviewees generally felt that it was important that a federal agency like AHRQ, which is regarded as credible and respected, develop and support a quality indicator set for public use. AHRQ's credibility and the transparency of the AHRQ QI methods was often mentioned as a key factor in overcoming opposition to quality measurement and reporting by stakeholders, particularly providers. We were told:
There is a lot of good documentation regarding how rigorously the indicators have been analyzed by AHRQ, researchers, academics, etc., in a collaborative effort. This is important, especially for hospital administrators, who have to convince medical staff that at least there is a rigorous process behind the indicators.
Overcoming this type of opposition is particularly important for public reporting and pay-for-performance initiatives, where providers' reputations and revenues are at stake. In the scenarios described by many of our interviewees, providers are typically not opposed on conceptual grounds to increasing the transparency of the quality of care they provide. However, providers are sensitive to being evaluated using measures that are unreliable or invalid, and they value the opportunity to be able to review and evaluate the measures they are subjected to and to raise objections to the results, where appropriate.
4.1.2. Importance of the Individual AHRQ QIs and Underlying Constructs
Although interviewees were not asked to comment on the importance of the constructs underlying the AHRQ QIs or on individual indicators, a few interviewees raised these issues. When asked why they use the AHRQ QIs, some interviewees mentioned that the AHRQ QIs provide a "good estimate" or that they offer a "reflection of reality."
Several interviewees also remarked that they appreciated having access to the evidence showing that the AHRQ QIs represent important opportunities for quality improvement, which is made available in the AHRQ technical documentation under the headings "face validity" and "fosters real quality improvement." 35 A number of interviewees (10 of 54) mentioned that the availability of this information in the documentation is a key reason why they decided to use the AHRQ QIs, or described the documentation as a factor that facilitated the use of AHRQ QIs in the face of opposition from stakeholders.
4.1.3. Impact of AHRQ QI use
Although only one organization in our sample had formally measured the impact of AHRQ QIs on the quality of care delivered to patients, many interviewees provided anecdotal evidence of the effect of the indicators on quality. The one organization that did report conducting a study of the impact of its use of the AHRQ QIs was The Alliance, a Wisconsin employer-purchasing cooperative that publishes a quality report called QualityCounts. The evaluation of the impact of QualityCounts was conducted by Judith Hibbard and a team from the University of Oregon and was published in Health Affairs.36 The study found that public reporting resulted in increased hospital quality in the clinical areas included in the QualityCounts public report. The improvement appears to be driven by hospital administrators' concerns about their reputation.
When asked whether they had measured the impact of using the AHRQ QIs, a number of interviewees (9 of 29 answering this question) responded that indicator use began too recently to allow for observation of any impact. In addition, several interviewees stated that the results of the AHRQ QIs can be difficult to track longitudinally, since the specifications of the indicators have changed over time.
However, 12 of the 29 interviewees who answered the question on impact reported anecdotal evidence that their or their clients' use of the AHRQ QIs was having some type of impact on quality of care. The impacts observed usually consisted of an activity such as putting a new quality improvement process in place, rather than an improvement in outcomes. Examples of this type of anecdote include:
- A hospital representative reported:
We've definitely seen an impact on quality in areas flagged by the AHRQ QIs. Some have been data problems and some have been actual quality improvements. For example, using the infection indicator (PSI 7) we were able to see improvement after implementing the ventilator and central line bundles. Similarly with the sepsis indicator (PSI 13), we implemented the Surgical Care Procedure Practices —a group of interventions to decrease sepsis, and we saw improvements.
- A hospital network representative reported that staff were able to observe the impact of a quality report card on quality improvements in network hospitals. Two interventions introduced in response to the report card were: 1) new guidelines on the angle of the hospital bed for ventilator-assisted pneumonia patients and 2) implementation of a rapid response team.
- From a hospital using a vendor to implement AHRQ QIs:
We identified that we had high failure to rescue rates... This was the information we needed to present to our executive team and board to obtain resources to effectively establish and run a rapid response team.
- A hospital association representative reported:
There have been some changes in [the AHRQ QIs] data [over time], but I don't know if they've been caused [by our use of the AHRQ QIs for quality improvement]. From 2001 to 2004 there is less variation among hospitals, and mortality has decreased for several indicators; on the other hand, fewer hospitals are at or above the volume thresholds. We have looked at trends in other available data and, to the extent there is overlap, there is some correlation and indication that quality is improving.
- A representative of another hospital association provided anecdotal evidence of quality improvements, and also revealed a barrier to conducting more rigorous assessments of impact:
Hospitals have taken action in terms of identifying individual cases [from the numerator of AHRQ QIs where a problem is flagged], reviewing them [using clinical data], and developing improvement plans (especially moderate cases, such as infection). There are no published impact studies. The climate (in terms of lawsuits, etc.) stands in the way of publishing studies and until the climate is supportive, hospitals will not publish anything.
- A representative of a state that publicly reports AHRQ QIs noted:
One example of where the report had an immediate impact was one hospital that wasn't hitting the volume threshold for carotid endarterectomy [IQI 7]. They decided to stop performing them. We would like to evaluate effectiveness of reports at some point but don't have specific plans at this point.
- An insurance company representative using the AHRQ QIs for pay-for-performance believes that the program has had an impact by garnering attention for quality improvement from hospital management:
The indicators for patient safety have raised awareness. Because real money is now on the table, the result has been that the hospitals' financial people now have a more substantive dialogue with the quality people.
- A researcher who participated in a study that used the AHRQ QIs to evaluate a state-wide hospital policy change reported substantial press coverage of the results and an effect on other states considering the same policy.
The primary type of impact observed, however, was improvement to data quality. Representatives of several organizations stated that they viewed improved data quality as a natural progression in the implementation of a quality measurement program. When a potential quality problem is first flagged using the AHRQ QIs, the first response is to investigate whether the observed issue is due to a problem in the data or a problem with the actual quality of care. Once the provider organization has determined that the result in question is not a data artifact, the provider often examines clinical data and/or performs some other type of quality improvement activity to determine the cause of the quality problem. One state government representative described this process:
The first step hospitals take, naturally, when they see a potential problem is to ensure that it is not a data artifact. Hospitals found that they were consistently up-coding or down-coding measures. They usually started with initiatives to fix their data. Hospitals in some cases threw up red flags and started quality initiatives but the first step is to answer the question—is it an artifact of data or real issue? One hospital had 3 flags [potential quality problems indicated by the ARHQ QIs]; two turned out to be data problems, but one—stroke mortality—was a quality problem. However, most of the feedback from hospitals has been around trying to make data better. We don't have plans to evaluate the impact of our program because we just don't have the resources.
Users largely felt that the AHRQ QIs can be reliably constructed from hospital discharge data, but that there was a certain learning curve during which hospital coding departments had to adjust to the requirements for the QIs. Thus far, coders had mainly been trained to apply coding rules to fulfill reimbursement requirements, but now they had to understand that coding practices also had implications for quality reporting. In selected instances, we heard concerns about ambiguity in the coding rules—that the coding rules did not provide sufficient guidance on whether to code an indicator-relevant diagnosis. For example, we heard repeatedly that coders found it difficult to apply coding rules for vaginal trauma during birth (5 of 36 users).
Our interviewees were impressed by the quality and level of detail of the AHRQ documentation on the face validity of the indicators and stated that the indicators captured important aspects of clinical care. Very rarely were indicators challenged on conceptual grounds. One exception were the VBAC measures (IQIs 22 and 34), because a current American College of Obstetricians and Gynecologists (ACOG) guideline37 recommends VBAC only for facilities with a sufficient infrastructure for emergency C-section, which is commonly not present in smaller hospitals. Two interviewees who do public reporting with AHRQ QIs challenged the validity of the volume-based IQIs, as they did not think the scientific evidence was unambiguous for a positive impact of high volumes of complex procedures on outcomes.
Sample size issues (whether due to the rarity of certain procedures or events or the infrequency with which some procedures are conducted at certain facilities) were repeatedly mentioned as threat to the validity of the indicators. In particular, the adverse events underlying some of the PSIs (e.g., PSI 5: foreign body left in during procedure) fortunately occur quite rarely, even in larger facilities. Smaller facilities, such as rural hospitals, are commonly only able to report on three QIs, mortality for acute myocardial infarction (AMI) and pneumonia (IQIs 15, 20, and 32), because they do not have the minimum required number of cases (20) for other indicators. While interviewees agreed on the face validity of the indicators, a third of the interviewees (16 of 54) argued that such sample size limitations would render some indicator rates unstable and thus hard to interpret.
On construct validity, most users stated that the indicators were correctly operationalized within the constraints of the underlying data source. Isolated findings of specification errors were brought to our attention, but interviewees emphasized that the AHRQ team was always able to address those quickly. The limitations of administrative data were frequently mentioned as a threat to validity, because the UB-92 format would not provide a sufficient level of clinical detail to account for all factors that should be considered in constructing the measures. Several potential improvements were mentioned, such as the addition of flags for conditions that were present on admission or for do-not-resuscitate orders. The AHRQ QI team is incorporating functionality for a condition present-on-admission flag into the next iteration of QI specifications.
Some users thought that formal validation studies should be conducted to assess the validity of the indicator results in relation to indicators based on medical records data. As discussed above, we learned that hospitals are conducting analyses to find out whether poor performance on a given QI is due to an organization's coding practices or points to a real quality problem. But those efforts are typically driven by unusually poor performance, are not systematically analyzed, and focus on identifying false positive events (i.e., an adverse event was flagged by the indicator that could not be ascertained through chart review). False negative events (i.e., the indicator algorithm failed to identify an actual adverse event) were rarely researched.
4.2.3. Risk Adjustment
Since the AHRQ IQIs and PSIs generally represent health outcomes, they are sensitive to the morbidity burden of the patient population and must be risk-adjusted to provide a valid comparison of quality. The IQIs and PSIs currently use different risk adjustment methods, although AHRQ will move to a single method for all of the QIs in the future. Currently, the IQIs use the All Patient Refined Diagnosis-Related Groups (APR-DRGs), a proprietary system owned by 3M Health Information Systems. The PSIs use a public-domain risk-adjustment system developed by AHRQ. The current risk adjustment methods for both the PSIs and the IQIs were regarded as adequate.
Users particularly emphasized that the AHRQ method for the PSIs had the advantage of being transparent and easy to understand. Even though the APR-DRGs are based on proprietary software, interviewees were generally comfortable with using them for IQI risk adjustment, because they already used the software for other purposes, such as payment, and were familiar with its structure and logic. However, 22% (12 of 54) of interviewees thought that the risk adjustment approach used for the AHRQ QIs should be improved. In particular, interviewees would like the see both PSIs and IQIs using the same risk adjustment method and would like AHRQ's method to be aligned with that of CMS, University Healthsystem Consortium, and other developers.
As discussed in detail above, the AHRQ QIs have been used by many types of organizations and for a variety of purposes. Most interviewees stated that the AHRQ QI products provided a workable solution for their needs and were very appreciative of the support that the AHRQ QI team provides for implementation and ongoing use. Despite these overall favorable impressions of the usability of the QIs, two issues were raised repeatedly: the need for development of reporting templates, and the need for clearer guidance on the use of the AHRQ QIs for public reporting and pay-for-performance programs.
4.3.1. Reporting Template
A number of interviewees (9 of 54) highlighted as a top priority the need for a standard format for reporting AHRQ QI results. At the simplest level, some interviewees wanted AHRQ-supported, standard, basic names for the AHRQ QIs in plain language, as some of the current indicator names are difficult for non-clinical audiences to understand. Other interviewees expressed a desire for more guidance and support on other aspects of presentation. Currently, many organizations have developed their own reporting formats. Interviewees were interested in information such as:
- How should indicators be analyzed and reported?
- How should outliers be identified?
- Which indicators are consumers expected to respond to most?
- How should consumers interpret the results of the indicators?
- How do results compare to national, state, or other benchmarks?
4.3.2. Composite indicators
Twelve interviewees expressed a desire for an AHRQ-supported methodology for constructing a composite indicator. Forming composites would allow organizations to summarize the results based on multiple indicators into one statistic, which is easier to grasp and communicate, in particular for non-expert audiences. Composites would also help overcome sample size limitations by allowing information to be pooled. Four organizations whose representatives participated in our interviews have developed their own AHRQ QI composite indicators but most would prefer an AHRQ-developed approach. The AHRQ QI team is currently working on the development of composite indicators to meet those needs.
4.3.3. Guidance on the use of the AHRQ QIs for public reporting and pay-for-performance
Not surprisingly, our questions on suitability of the AHRQ QIs for public reporting and pay-for-performance programs led to vivid and often emotionally charged discussions and comments. Interviewees who are currently using the AHRQ QIs for public reporting and pay-for-performance generally felt that they provided a workable solution for their needs. The introduction of those programs typically followed a similar sequence: following the initial decision to start a public reporting or pay-for-performance program, a controversial debate would start on the merits of such initiatives in general, and the suitability of administrative data for quality measurement in this context in particular.
Then, hospitals and physicians would slowly start to participate rather than resist. Many interviewees told us that AHRQ's reputation for high quality research, the excellent documentation of the scientific basis of the indicators, the transparency of the method, and the ease of implementation and use were crucial factors in obtaining buy-in. The first release of the data was commonly accompanied by media attention and anxiety on the part of providers. Both would subside in subsequent releases, as all stakeholders became more familiar and comfortable with the program.
Still, half of the interviewees who use AHRQ QIs for public reporting stated that additional standards and guidance on the reporting of AHRQ QI results were needed. Some interviewees (10 of 54) expressed dissatisfaction with the current AHRQ stance on the appropriateness of the AHRQ QIs for public reporting. These interviewees described the current guidance as "difficult to find," "weak," and presenting "mixed messages." The lack of clarity is perceived to be largely due to shifts in AHRQ's stance on appropriate uses of the QIs.
Previously published guidance contained much stronger caveats against inappropriate uses than the current guidance. Interviewees felt that clearer guidance from AHRQ would help to counter opposition from those who argue that the AHRQ QIs should only be used for quality monitoring and improvement and research, but not as a public reporting or pay-for-performance tool.
Taking the opposing view were several interviewees (mostly hospitals) who would like to see AHRQ make a clear statement that the AHRQ QIs are not appropriate for use in public reporting, pay-for-performance, or other reporting activities. A representative of one hospital told us:
The AHRQ QIs are fabulous tools, but they are assessment tools, not judgment tools. AHRQ's white paper was very clear in saying that this was not AHRQ's intent. The issue is that AHRQ allowed folks to go too far without a caveat. They tried with that white paper, but now they're endorsing states using it for public reporting—it's not appropriate.
We were told consistently that a major advantage of the AHRQ QIs was the feasibility of their implementation. They require only administrative data in the UB-92 format to which many users have routine access, since those data are already being used for billing and other administrative purposes and have to be collected and reported by hospitals in most states.i
Interviewees emphasized that another substantial advantage of the AHRQ QIs is that the indicators have clearly defined and publicly available specifications, which helps with implementation of measurement. These specifications were regarded as of particular importance for hospitals, as the originators of the data, because the specifications enable hospitals to work with their coding departments to ensure that the required data elements were abstracted from medical records consistently and with high quality. In addition, users who analyze data with the QIs, such as researchers, appreciated the fact that they could dissect the indicator results and relate them back to individual records. That capability helped researchers gain a better understanding of the indicator logic and distinguish data quality issues from actual quality problems.