Decisions Encountered During Key Task Number 2: Selecting the Measures That Will Be Used To Evaluate Provider Performance

Methodological Considerations in Generating Provider Performance Scores

A. Which measures will be included in a performance report?

There are many things a Charted Value Exchange (CVE) may want to consider when choosing performance measures for a report. For the purposes of this paper, we divide the measure selection process into an "early stage," a "middle stage," and a "late stage."

  • The early stage occurs before a CVE knows exactly what kinds of performance data are available (Key Task 3) and before the quality and completeness of these data are known (Key Task 4). In the early stage of measure selection, a CVE may want to be as inclusive as possible: Under ideal conditions, what performance measures would CVE stakeholders like to report? A separate AHRQ publication titled Selecting Quality and Resource Use Measures: A Decision Guide for Community Quality Collaboratives, provides a broader and complementary discussion of how CVEs might engage in early-stage quality measure selection.14 The process outlined in this complementary decision guide is intended to assist CVEs in choosing measures with good intrinsic properties (i.e., those measures that cover the desired domains of performance and meet standards of importance, scientific acceptability, and usability). Once these early-stage standards have been met, a CVE can proceed to the middle and late stages, which depend on how local factors interact with the intrinsic characteristics of the measures.
  • In the middle stage of measure selection, a CVE may discover local data limitations that prevent the construction of certain performance measures. In the decision points of Key Tasks 3 and 4, we provide some options for addressing these data problems. However, data problems may still make it impossible to report all of the measures identified in the early stage of measure selection. Some measures may need to be set aside at this point. This middle stage of measure selection will be easier if CVE stakeholders can reach an earlier consensus on criteria for setting measures aside.
  • In the late stage of measure selection, the remaining measures can be calculated for a "mockup" performance report, and the local risk of misclassification due to chance can be calculated. For a detailed discussion of misclassification risk, refer to Appendixes 1 and 2. Even though a CVE can find many ways to limit the risk of misclassifying provider performance, some measures may not be publicly reported due to excessive misclassification risk for a large number of providers. When this is the case, additional measures may need to be set aside.

Figure 4 illustrates the way a measure selection process might occur. In a way, the middle and late stages of measure selection are "filters" on the early stage. Of course, this way of thinking about measure selection is simplified and does not include every factor that might influence which measures make it into a performance report. But from a purely methodological point of view, this approach illustrates some of the major considerations.

A key take-away point is that a measure may satisfy general requirements, but local factors particular to each CVE are also crucial determinants of which measures may be included in a performance report. (Examples of general requirements include the National Quality Forum's criteria of importance, scientific acceptability, usability, and feasibility.15 Examples of local factors include negotiated "value judgments," data availability, and misclassification risk in the local provider community.) Whether performance measures are right for reporting in "your backyard" can only be determined in later stages of the measure selection process.

In the early stage of measure selection, CVEs may want to choose a broad set of measures that:

  • Measure care for conditions that are common in the population.
  • Measure care for conditions that are important to members of the population.
  • Measure outcomes of care, such as patient mortality.
  • Measure costs, utilization, or efficiency of care.
  • Measure processes of care, such as checking cholesterol in patients with diabetes.
  • Measure patient experience or patient satisfaction.
  • Measure coordination of care.
  • Are known to have room for performance improvement.
  • Come with "prepackaged" risk-adjustment methods.
  • Are relatively easy for the target audience to understand.
  • Are part of consensus-based measure sets, such as being certified by the National Quality Forum, or
  • Are part of a local, regional, or national improvement effort.16

Other considerations may come into play when choosing measures in the early stage, and lists of existing measures may help and provide inspiration. Additional guidance on making initial choices about performance measures is available from a variety of sources.15,17-20


Example: Choosing performance measures

The Wisconsin Healthcare Value Exchange ambulatory performance report (http://www.wchq.org) initially started with diabetes quality of care measures because this health condition was a high public health priority in Wisconsin. Measures of preventive care quality were added next, because these measures were "in demand." The selection of ambulatory measures has been "organic" over the years, without a formal, standardized process. Existing measures, national measurement trends, and member preferences have been taken into consideration in measure selection.

For hospitals (http://www.wicheckpoint.org), however, measures were chosen in a well-defined selection process, guided by a board of directors and steering committee. The workgroup members think "big picture," starting with ideas about what people would want to improve. They then proceed through 20 criteria, such as:

  • Are there existing measures?
  • Is there evidence that the measure being considered actually reflects the clinical practice that we're trying to improve?
  • How does the measure align with national priorities?
  • How does the measure align with State priorities?

Return to Contents 

B. How will the performance measures be specified?

Once performance measures are agreed on in the early stage of selection, the next step is to decide exactly how each measure will be specified. Here, specification refers to the exact ways raw data about patient care (e.g., data that come directly from administrative sources or medical records) are used to construct performance measures. For example, a measure may consist of a numerator and a denominator. The numerator measures the number of times a clinical service (e.g., an immunization) is provided, while the denominator measures the number of times a clinical service should be provided (e.g., the number of patients who should be immunized in a given year).

The measure specifications are the criteria for determining which patients are eligible for the service and which clinical services are received by these patients. Eligible patients are counted in the denominator, and services received are counted in the numerator. The specifications of a measure are the "DNA" of a measure, and small changes in specifications can have large effects on a performance report.

Whether a CVE can develop its own measure specifications depends on how "raw" the available performance data are. If these data are claims, or unprocessed clinical data or survey responses, then a CVE could construct performance measures according to the CVE's own specifications (e.g., the CVE can decide which patients count toward the denominator and which services count toward the numerator). However, if a third party already has constructed "prescored" performance scores (e.g., Leapfrog measures), then the task of specification already has been performed by the body that constructed the measure.

Here are some options a CVE may consider in deciding how a performance measure will be specified. This is not an exhaustive list, but it illustrates the pros and cons of some commonly available options.

  1. Option 1: Use measure specifications that are endorsed by national bodies. Many nationally endorsed performance measures, such as those developed by the National Committee on Quality Assurance and AHRQ, come with detailed specifications. These measures also may have national performance benchmarks. Altering the specifications of these measures may invalidate comparisons to these benchmarks.

    In addition, some nationally endorsed measure specifications may come with established methodologies to adjust for differences in case mix among providers. Case mix adjustment is discussed in more detail in the section on Task Number 5. In a nutshell, case mix adjustment refers to statistical techniques that are intended to ensure that performance comparisons do not systematically misclassify providers. In other words, case mix adjustment seeks to avoid "comparing apples to oranges."

    Advantages:

    • May allow valid comparisons to national benchmarks and to performance scores reported by other reporting programs.
    • May already have case mix adjustment methodologies developed.

    Disadvantages:

    • Nationally endorsed measures may not optimally address a CVE's local priorities.
    • Nationally endorsed measures may not be usable "off-the-shelf," since all data elements may not be available for their construction. It is common for local collaboratives to slightly modify national specifications (e.g., using data that are available to identify patients for the denominator).
    • When the scientific evidence behind a measure changes, nationally endorsed measure specifications may be slower to incorporate the new scientific evidence than CVE stakeholders would like.
  2. Option 2: Use locally modified measure specifications. When CVEs examine the specifications of existing performance measures, stakeholders may want to consider modifying these specifications. Reasons to modify these specifications may include wanting to take advantage of data that are available locally but are rarely available nationally, such as data about a clinically important comorbidity. Also, data elements that are included in the national specifications may not be available locally, precluding precise adherence to the national specifications. However, modifying measure specifications involves important tradeoffs. Comparisons to national benchmarks may not be valid, and new case mix adjustment methodologies may need to be developed.

    In general, performance measurement experts advise against modifying nationally endorsed measure specifications unless modification is unavoidable. As an alternative to modifying measure specifications in its own performance reports, a CVE may choose to convey ideas for measure modification to the measure developer. The goal is to improve subsequent revisions of the nationally endorsed measure.

    Advantages:

    • Modified measures may better address a CVE's local priorities.

    Disadvantages:

    • Valid comparisons to national benchmarks probably will not be possible.
    • Deviating from nationally endorsed specifications may open the way for constant negotiation over further changes.
    • New case mix adjustment methodologies will need to be developed, requiring the assistance of a statistician with expertise in performance measurement (go to the section on Task Number 5).
  3. Option 3: Use measure specifications that are included in proprietary software packages. Proprietary software packages are available that compute measures of provider performance using locally obtained data (usually administrative data such as health plan claims). Some software packages are widely used, so performance comparisons to external benchmarks may be possible. The software packages already may incorporate case mix adjustment methodologies. However, because these software packages are proprietary, it may not be possible to know exactly how the measures that they generate are specified. Information about how the performance measures are calculated may be available from the software vendors.

    Using proprietary software to construct performance measures also may raise concerns about systematic performance misclassification and the risk of misclassification due to chance. Case mix adjustment methods included in the software may be inadequate, allowing systematic performance misclassification. The software may not generate performance data with enough detail to calculate the risk of performance misclassification for each provider (i.e., enough detail to calculate within-provider measurement error; go to Appendix 2 for a more detailed discussion). In that case, it may be impossible to know whether the overall risk of misclassification is acceptable to CVE stakeholders. This could happen if the software provides each provider's score on a performance measure without indicating how much uncertainty there is about that score.

    Advantages:

    • May allow comparisons to external benchmarks.
    • May already incorporate case mix adjustment methodologies.
    • Relatively easy to use.
    • No need for measure development.

    Disadvantages:

    • Proprietary software packages may function like "black boxes" that turn raw performance data into performance scores. In other words, such packages may not reveal detailed measure specifications to CVEs. This lack of transparency can make it difficult to really understand what is being reported, undermining stakeholder trust (especially among providers). Moreover, it may be impossible to assess the construct validity of "black box" measures—one of the most basic requirements of a valid performance report. Construct validity is discussed in Appendix 1.
    • If performance data are not generated with the right level of detail, assessing the risk of misclassification due to chance may be difficult.
    • CVEs may need to check the performance of the case mix methodologies included in the software. To detect systematic performance misclassification due to inadequate case mix adjustment (a threat to the validity of performance reports, as discussed in Appendix 1), a CVE will need the assistance of a statistician.

Example: Deciding how performance measures will be specified

For reports of hospital performance (http://www.wicheckpoint.org and http://www.wchq.org), the Wisconsin Healthcare Value Exchange uses the following strategy for determining measure specifications:
  • If there is a nationally endorsed measure, use its specifications.
  • If there is no nationally endorsed measure, use regionally endorsed measure specifications.
  • If there are no nationally or regionally endorsed measures, then the "last-case scenario" is for the CVE to design and test its own measure.

For "HEDIS-like"* measures of ambulatory provider performance (http://www.wchq.org), the CVE tries to stick as close as possible to the national measure specifications, which are intended for use with claims data. However, because the CVE obtains performance data directly from providers rather than from health plan claims, the measure specifications must be translated for this alternative data source. The main goal is to try to capture "the essence" of the denominator that might be applied to claims data.

As a side benefit to providers, the list of patients included in each measure denominator can also be used as a patient registry, and this functionality makes the reporting effort very well accepted by providers.

* HEDIS is the Healthcare Effectiveness Data and Information Set of the National Committee for Quality Assurance.

Return to Contents 

C. What patient populations will be included?

In the early stage of measure selection and specification, CVE stakeholders also may want to consider which patient populations they would like to include in measuring provider performance. This "included" patient population consists of all the patients whose care generates the performance data that will be used to create performance measures.

The choice of patient population is important for at least two major reasons. First, reported provider performance will have the greatest meaning for patients who belong to the population contributing performance data. For example, if only patients who are Medicare beneficiaries generate performance data, then performance reports will have the most meaning for patients age 65 and older. Provider performance for these patients may or may not accurately indicate how well a provider delivers care to a much younger population.

Second, due to segmentation of the U.S. health care system, certain patient populations will require different data sources than others. This segmentation is a particular concern when constructing performance measures based on health plan claims data. For example, if a CVE wants to include patients age 65 and older, the CVE generally will need to access performance data from traditional fee-for-service Medicare or from a Medicare Advantage plan. If a CVE wants to include patients from vulnerable sociodemographic groups, performance data from Medicaid or uninsured patients may be needed. Some data sources may be difficult or impossible to access.

Even if all potential sources of performance data are available, not all patients captured in these data at any given time can be included in some performance measures. In their specifications, some performance measures have "continuous enrollment criteria," usually meaning that to be included in a measure, a patient must be a member of the same health plan for at least 1 or 2 years. Similarly, some measures may require that a patient receive care from a given provider for 1 or 2 years. However, a significant percentage of patients may switch health plans or providers from year to year, becoming ineligible for inclusion in a measure.

Care for patients who switch plans or providers may differ from care for patients who do not switch. Performance reports may therefore be less meaningful for patients who switch than for those who stay in the same health plan and maintain the same provider. We discuss a way to quantify the extent of this potential problem in the section on Task Number 4, in the bullet point titled "Compute overall number of patients who qualify for a measure."

Here are two "extreme" options for deciding which patient populations to include in a performance report. These "extreme" options are intended only to illustrate the tradeoffs that may be involved.

  • "Extreme" Option 1: Include all patient groups present in a CVE's local area.

    Advantages:

    • May maximize the usefulness of performance reports to a broad population. May also enhance the usefulness of performance reports to providers who may otherwise receive multiple competing reports, each representing the care delivered to a different patient population.
    • May reduce the risk of misclassification due to chance because the number of observations for each provider is increased. However, including more patient groups does not guarantee more reliable performance estimates.

    Disadvantages:

    • Population-based data are generally segmented into different sources (e.g., Medicare, Medicaid, commercial insurance). Obtaining and pooling data from multiple sources may be difficult. For some data sources, legal restrictions on data use may be a barrier to their inclusion in a report.
  • "Extreme" Option 2: Only include patient groups for which performance data are readily available.

    Advantages:

    • May be easier to generate performance reports.

    Disadvantages:

    • May limit the usefulness of performance reports, especially for patients from populations whose care is not reflected in the reports.
    • May have a high risk of performance misclassification due to chance related to lower numbers of observations per provider.
    • May have dissimilar populations of patients included in performance reports and using the reports. This dissimilarity raises the possibility that from the point of view of patients using the reports, providers will be systematically misclassified, resulting in "selection bias" (a threat to the validity of performance reports; briefly discussed in Appendix 1).

    Examples: Patient populations included in ambulatory care* performance reports

    Most of the CVE stakeholders we interviewed were reporting provider performance using claims-based performance measures. Therefore, these CVEs could include only the patient populations for whom claims data were available (typically commercially insured patients, plus Medicaid enrollees in some cases). However, Aligning Forces for Quality-South Central Pennsylvania (http://www.aligning4healthpa.org) relies on provider medical record reviews (rather than claims) to generate performance data and can therefore include all patients regardless of health plan coverage. Similarly, the Wisconsin Healthcare Value Exchange (http://www.wchq.org) uses provider electronic health record data, including clinical data and lab results, as the basis for most performance measures.

    Organizations leading the Minnesota Healthcare Value Exchange (http://www.mnhealthscores.org/ and http://www.mnhospitalquality.org/) also report some measures of ambulatory care quality based on provider medical records; for these measures, all patient populations are included. The same is true for measures based on claims data, with one exception. Because Medicare claims data are unavailable for performance reporting purposes, no patients with Medicare fee-for-service coverage can be included in the claims-based measures.

    *For hospital performance reports, all-payer State hospital discharge databases (depending on the State) may enable reporting that includes patients covered by Medicare fee for service, Medicaid, and commercial insurance, as well as uninsured patients. (For a list of State data contacts, go to http://www.hcup-us.ahrq.gov/partners.jsp).

Page last reviewed September 2011
Internet Citation: Decisions Encountered During Key Task Number 2: Selecting the Measures That Will Be Used To Evaluate Provider Performance: Methodological Considerations in Generating Provider Performance Scores. September 2011. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/professionals/quality-patient-safety/quality-resources/value/perfscoresmethods/perfsctask2.html