Selecting Quality and Resource Use Measures: A Decision Guide for Community Quality Collaboratives

Part IV. Selecting Quality and Resource Use Measures (continued)

Question 22. What criteria should we use when screening measures of quality for public reporting or other purposes?

Screening Framework

Once a community quality collaborative establishes its purposes in assessing health care quality (e.g., pay for performance, public reporting, internal quality improvement), it must screen a large number of established quality measures. This process can be daunting, but community quality collaboratives can take advantage of several evidence-based and consensus-based evaluation frameworks that leading organizations have developed to prioritize measures.107, 131, 136 Although these evaluation frameworks were developed for somewhat different purposes and were initially applied to different sets of candidate indicators, they are actually quite similar and are therefore useful to collaboratives that hold a variety of measurement agendas.

As shown in Appendix A, the National Quality Forum (NQF) built on earlier work by The Joint Commission, National Committee for Quality Assurance (NCQA), and Institute of Medicine (IOM) (on behalf of AHRQ's National Healthcare Quality Report [NHQR]137) to propose a four-domain scheme for evaluating quality measures. Within each of these domains are several key questions or criteria, as described below. Depending on local priorities, a community quality collaborative may put more weight on one domain and less weight on others, and it may choose to focus on a single criterion or set of criteria within each domain. In some cases, a collaborative may accept NQF endorsement of an indicator as both necessary and sufficient evidence that the indicator is acceptable for public reporting. In other cases, a collaborative may set a lower or higher threshold than NQF, perhaps because of differing local views about the importance of a quality-related problem, local availability of better or worse data, or local interest in testing a measure that may be submitted for endorsement in the future.

Here is a brief summary138 of NQF's four domains for evaluation:

Domain Description
Importance: There should be a leverage point for improving quality, considerable variation in quality of care, or suboptimal performance in the area of interest.
Scientific acceptability: The measure should be well defined, precisely specified, reliable, and able to produce the same results a high proportion of time in the same population. It should be valid, accurately representing the concept being evaluated. The measure should be precise, adequately discriminating between real differences in provider performance and adaptable to patient preferences and a variety of settings. An adequate and specified risk-adjustment strategy should be available and there should be evidence linking process measures to outcomes.
Usability: The measure can be used for making decisions and implementing change. Performance differences should be statistically meaningful (practically and clinically). The measure should provide for appropriate risk stratification, risk adjustment, and other forms of recommended analyses.
Feasibility: Data collection should be linked to care delivery when feasible, and timing and frequency of measure collection must be specified. The benefit of implementation should be evaluated against financial and administrative burden. Confidentiality concerns should be addressed and an audit strategy should be available.

Scientific acceptability, which aligns with the NCQA and IOM/NHQR domains of “scientific soundness,” may be the most difficult domain for community quality collaboratives to assess, because it draws on complex concepts such as reliability, validity, and lack of bias. Many chartered value exchanges (CVEs) lack the technical expertise and resources to assess reliability, validity, and bias, but they should be intelligent consumers of information about these performance characteristics from other sources (such as measure developers). The following questions from the National Quality Measures Clearinghouse, which measure developers/sponsors are expected to answer, are useful for understanding measure validity139:

Q1. How strong is the scientific evidence supporting the validity of this measure as a quality measure?>
Q2. Are all individuals in the denominator equally eligible for inclusion in the numerator?>
Q3. Is the measure result under control of those whom the measure evaluates?
Q4. How well do the measure specifications capture the event that is the subject of the measure?>
Q5. Does the measure provide for fair comparisons of the performance of providers, facilities, health plans, or geographic areas?
Q6.How well does your intended use of the measure match the developer's intended use?
Q7. Does the quality of data available meet the measure standards (e.g., reliability, appropriate sample size, accessibility)?

Screening Process

Three comprehensive databases are available to community quality collaboratives to help them identify and screen potential measures.133 The National Quality Measures Clearinghouse (NQMC), supported by AHRQ, summarizes measures that are submitted to it by many different measure sponsors. These include government agencies, accrediting bodies, research institutions, professional societies, and even individual hospitals and health systems. The NQMC provides detailed information about each measure, based on a template of measure attributes that includes each measure's:

  • Title and source
  • Domain of measurement (e.g., structure, process, outcome)
  • Description and rationale
  • Supporting evidence of value
  • Current use
  • Care setting
  • Professionals responsible
  • Target population age and gender
  • Incidence or prevalence
  • Association with vulnerable populations
  • Burden of illness
  • Associated utilization and costs
  • IOM domain
  • Sampling frame
  • Denominator and numerator definitions and time windows
  • Data source
  • Scoring
  • Risk adjustment
  • Standards of comparison (e.g., benchmarks)
  • Evidence of reliability or validity
  • Endorsements

An online search utility allows users to find the subset of measures that meet specific criteria, based on this template of attributes, while a measure comparison utility allows users to generate side-by-side comparisons for any combination of two or more measures. Links to full-text measures and ordering details are provided when available.

The National Quality Forum (NQF) maintains an ongoing list of approved measures, which it refers to as National Voluntary Consensus Standards. The list currently available at the NQF Web site includes 545 standards covering all domains of inpatient and outpatient care. Each standard is assigned an official number and title, succinctly described, attributed to a specific “steward,” and classified in terms of the following:

  • Care setting (e.g., ambulatory care, hospital, home health, hospice, nursing home, dialysis center)
  • Type (e.g., structure, process, outcome, patient experience)
  • Level of measurement (e.g., facility, individual clinician)
  • Data source (e.g., paper medical record, electronic claims, clinical registry)
  • Endorsement status and date

This NQF list can be searched to identify, for example, the subset of endorsed measures related to hospital readmissions. Note that some of the listed information was not available online at the time of this publication.

The third resource, the Centers for Medicare & Medicaid Services (CMS) Quality Measures Management Information System (QMIS), is a comprehensive, Web-based repository of quality measures used by all CMS health care quality initiatives ( It was designed as an electronic tool to support the CMS Measures Management System. This system is a set of processes and decision criteria used to oversee the development, implementation, and maintenance of health care quality measures throughout their life cycle. The QMIS serves as the authoritative repository of information on the quality measures used by CMS, including their technical specifications, justification, and history. It is also being used to track the development of new measures and the maintenance of existing measures, providing a consistent mechanism by which requests, inquiries, and comments pertaining to measures can be processed. Users can browse or search for measures by name, description, approval status, clinical condition, developer, and contractor.

Community Collaborative Example

The Alliance, a purchaser coalition in Wisconsin, uses a two-phase process to reduce the number of measures to a manageable number and it completely reevaluates the measure set every other year. The Alliance included hospitals at the beginning of the evaluation to help determine the measure set and circulated a list of almost 200 measures for comment. This process proved to be very cumbersome and was subsequently revised to include a shorter list of the most viable measures. The Alliance solicited feedback on the truncated list and considered alternative measures as suggested by hospitals. This new process was less overwhelming to the hospitals and resulted in what Alliance leaders consider to be a strong measure set.140 

The Alliance also created a matrix that identifies the measure criteria that relate to their pay-for-performance program and those criteria that relate to public reporting. Most of the criteria apply to both goals, but some only apply to a single goal. For instance, measures that demonstrate variation in care are more important to public reporting than to pay for performance because provider reward is based on meeting certain performance standards.

Return to Contents

Question 23. Against which benchmarks should we measure our local performance?

The benchmark concept, originated by the manufacturing industry, is a critical tool for health care quality performance measurement that provides context for measuring individual performance. A benchmark is a reference point or standard against which individual performance can be assessed. Keife, et al., state that benchmarks should reflect the best care achieved for at least 10% of the eligible patient population. This standard means that a benchmark will always surpass average performance and should represent an attainable (clinically realistic) level of excellence. This approach has been described as “Achievable Benchmarks of Care” (ABC).141-142

Examples of benchmark use can be found through QualityCheck, sponsored by The Joint Commission, and CMS Hospital Compare, both of which offer the national average and a “benchmark” representing the top 10% of hospitals (also known as the 90th percentile) reporting that measure. Wessell, et al. (2008), demonstrated the feasibility of applying this ABC method in primary care settings through the Practice Partner Research Network, which includes 87 EMR-equipped practices with 712,000 patients across 35 States.143

In practice, however, benchmarks are often based on average or median performance scores, especially for risk-adjusted outcome comparisons such as those published by New Jersey, Pennsylvania, California, and many other States. Average benchmarks may be more palatable to the organizations being evaluated and easier to incorporate into statistical analyses, given the so-called “null hypothesis” that all organizations perform at the same, average level.

Local/Regional Versus National Benchmarks

The question of whether to use benchmarks derived from local, State, regional, or national data depends on several factors, including the availability of benchmark data, the collaborative's objectives, and local performance levels.

The availability of reliable benchmarks is the foremost consideration. National benchmarks for measure sets developed by national organizations (e.g., National Committee for Quality Assurance's [NCQA's] Healthcare Effectiveness Data and Information Set, The Joint Commission's QualityCheck, AHRQ's HCUPNet utility and annual National Healthcare Quality Report and State Snapshots) are generally easy to obtain and offer reliable and valid comparisons to the Nation's performance. However, benchmarks for many physician group performance measures are not yet available at the national level. Instead, local and regional initiatives are establishing their own benchmarks, such as for coronary artery bypass surgery mortality in Pennsylvania, New York, California, and a few other States.

Benchmarks at the local or State level may also be required if a collaborative creates unique performance measures or if local population factors (e.g., a shortage of primary care physicians) have a substantial impact on performance at the local or State level. Although local benchmarks reflect local practice, they have an important disadvantage in that they are susceptible to undesirable local variation in the quality and pattern of care.144 If an area compares poorly with other areas nationally, no real benefit from a local benchmark will be realized; scores will remain lower than what could be attainable. Other challenges for local benchmark creation and use include accruing a sufficiently large sample and addressing possible financial and political barriers that may hinder the creation of locally based benchmarks.

Another factor influencing the choice of benchmark is the reason for measurement (pay for performance, internal quality improvement, or public reporting) in the context of current local performance levels. For example, organizations using pay for performance as an incentive for improved quality may require providers to achieve a national benchmark, also known as a threshold, for supplemental payment. Using a national benchmark that is already vetted should allay provider concerns about its reliability and validity.

Dudley, et al., explore using either relative (local) or absolute performance thresholds to inform pay-for-performance programs.145 Either is useful depending on the primary program goal: to improve the quality of care delivered by all eligible providers or to reward the highest quality providers.145 When publicly reporting performance results, it is useful for consumers to put the information into a larger context. For practices that are high-volume Medicaid providers, such as Federally Qualified Health Centers, using NCQA clinical benchmarks that are specific to Medicaid may be perceived as being fairer than the alternatives.

Peer Group Benchmarking

Another important question is whether to compare each provider organization to all of the other provider organizations in the market, or only to those with similar structural characteristics. The latter approach is also known as peer group benchmarking, because it involves identifying a peer group of similar organizations for each organization being evaluated. Peer group benchmarking has face validity in the provider community and has been shown to reduce the number of statistical outliers, presumably because organizational characteristics (e.g., size, teaching status, ownership) explain some of the variation in outcomes that would otherwise be attributed to the individual organization.146

It is not always clear how a peer group of similar organizations should be constructed, as different approaches may yield different results.147-148 At the extreme, some provider organizations might argue that they are unique in geography and structure, and therefore they do not have any peer group to which they can be compared. Even if an appropriate peer group can be identified, many question whether meaningful performance differences across different types of provider organizations should be “covered up” by attributing those differences to immutable organizational characteristics rather than to the organization itself.149 Therefore, most report card sponsors now benchmark provider organizations against all of the organizations with which they compete in a geographic market, without regard to their size, volume, or teaching status.

Community Collaborative Example

Both the Wisconsin Health Care Value Exchange and the Michigan—Greater Detroit Area Health Council are using AHRQ's NHQR to help determine their CVEs' performance improvement priorities. The report presents more than 220 measures at the State and national level and allows these CVEs to gauge their local challenges and opportunities. The Wisconsin CVE, which is a leader in data collection and measurement, used the NHQR to confirm many of its own findings. The Michigan CVE found that its State experience for asthma care was below average whereas its diabetes care was about average. Although asthma appeared to be the more problematic condition, the CVE chose to leverage its resources and address diabetes because the State of Michigan recently implemented an asthma care plan. 

Return to Contents

Question 24. When and how should providers review data before public reports are released?

Studies show that provider acceptance of performance measurement and public reporting is largely dependent on the perceived validity of the measures.142 Most organizations pursuing performance measurement understand the necessity of measure validation and are working to integrate constructive provider feedback before reports are released. This review and feedback may focus on one or more of three areas: (1) general analytic methods; (2) attribution of cases to specific providers; and (3) information about cases that would affect denominator exclusion, numerator determination, or risk adjustment.

Denominator exclusion relates to the concept of “exception reporting,” which allows providers to identify specific patients who should not be eligible for inclusion in the quality measure. Numerator determination relates to whether the patient actually experienced an adverse outcome or actually failed to receive appropriate therapy. Risk adjustment relates to whether the data capture the patient's true severity of illness and therefore his or her true risk of an adverse outcome.

Exception reporting is a commonly used physician review method in the United Kingdom's pay-for-performance program, as well as The Joint Commission's Core Measures program. It was developed to allow providers to pursue quality improvement and avoid penalties for patients not meeting measure specifications for reasons that could not be captured in administrative data (e.g., newly diagnosed within the practice or had an allergy or other contraindication to treatment). Similarly, The Joint Commission “accepts” any physician statement in the record that a patient had a medical contraindication to the medication of interest, even if that contraindication is not supported by clinical evidence. Previous studies have explored concerns about providers “gaming” the system (shrinking the denominator by excluding patients who should be treated), with mixed conclusions.150-152 Exception reporting may be particularly favored for MediCal beneficiaries due to provider perception of higher nonadherence and greater barriers to care in this population.

Mechanisms for Soliciting Provider Review and Feedback

There are several possible mechanisms for soliciting provider review and feedback. Community quality collaboratives may choose to use one or more of these mechanisms, depending how much is already known about the quality of the data and the validity of the analyses based on those data. For example, when Medicaid claims data are used for reporting, continuous tracking and physician attribution may be problematic due to frequent eligibility changes within a 12-month period. The mechanisms listed below are ordered from the most costly and time-consuming to the least.

  • Send each provider (either routinely or upon request) a patient-level or claims-level data file summary with the specific cases and quality-related data elements attributed to him or her. Allow a limited period (typically 21 days to 3 months) for providers to challenge the specific cases attributed to them, the quality-related data elements, or both. A more limited version of this option, less susceptible to manipulation, would be to send only the attributed case list to each provider.
  • Send each provider (either routinely or upon request) a patient-level or claims-level data file with the specific cases and quality-related data elements attributed to him or her. Allow a limited period (typically 21 days to 3 months) for providers to prepare a public response, without allowing them to challenge anything.
  • Send each provider (either routinely or upon request) a draft copy of the report, Web materials, and other documents related to the planned public release. Allow a limited period (typically 7-28 days) for providers to suggest specific changes to any of these documents.
  • Send each provider (either routinely or upon request) a draft copy of the report, Web materials, and other documents related to the planned public release. Allow a limited period (typically 7-28 days) for providers to review and prepare a public response, which may accompany the final release.
  • Send each provider (either routinely or upon request) an advance copy of the report, Web materials, and other documents related to the planned public release. Do not solicit any suggestions or comments, but alert providers that they should expect inquiries from media organizations and others.

Community Collaborative Examples

Washington-Puget Sound Health Alliance (WPSHA), The Alliance in Wisconsin, and the BQI Project provide examples of alternative provider review processes. WPSHA published the Reasonableness Review Process for Medical Groups, which details how to access draft results for provider review and provide feedback through a secure online portal. They provided explanations of patient attribution and details about each measure as well as an appeals process. Specifically, “volunteer data suppliers” and medical groups worked together to confirm that specific measure results reflected a given clinic's patients. Patients were reidentified for medical groups who then verified that the particular patient met the measure criteria and received a particular service from a particular clinician and clinic according to the measure specifications ( ).

Wisconsin's purchaser coalition, The Alliance, sends results and documentation to hospitals before hosting a conference call during which technical specifications, including numerator/denominator definitions and risk-adjustment methods, are presented. Hospitals receive “rich” spreadsheets with far more detail than is publicly reported, at least 30 days before the scheduled release, and are encouraged to identify mistakes and raise concerns. On several occasions, hospitals have reported duplicate record submissions that altered their results. The Alliance corrected these mistakes and reported the proper results. Other issues that hospitals frequently raise relate to risk-adjustment methods and challenges to exclusions (e.g., hospital transfers for AHRQ Patient Safety Indicators or PSIs). The Alliance believes its methodology is sound and transparent and will only consider revisions if the method is technically invalid and has done so in the past after a hospital discovered bias in one of the PSI specifications.

The six pilot sites in the Better Quality Information (BQI) project invited physician feedback on patient attribution. In California, physicians were sent letters advising them to request data online and to contact the Pacific Business Group on Health (PBGH) to identify any errors. However, the Quality Improvement Organization confidentiality rules, which protect both patient and physician privacy, prohibited PBGH from sharing patient information that was not generated by the physician requesting the data. In this case, the auditing process did not work optimally for physicians with a high percentage of Medicare patients. The Massachusetts Chartered Value Exchange (CVE) offered an interactive Web-based tool so that providers could update their medical group affiliations. As part of the Quality Alliance Steering Committee's High-Value Health Care project, a similar tool has been created so that physicians can review and correct the list of individual patients attributed to them. An active physician registry, when available, enhances proper patient attribution. In summary, the BQI pilot reaffirmed the importance of physician involvement to ensure proper attribution as well as to improve acceptance of the measurement process.

Return to Contents

Question 25. What are the critical success factors for selecting useful performance measures?

Established community quality collaboratives, such as Chartered Value Exchanges (CVEs), report several critical factors that contribute to a successful process of selecting measures and data.

  1. It is important to have healthy partnerships with diverse stakeholders who support a common mission of performance measurement (i.e., pay for performance, public reporting, confidential reporting for quality improvement).
  2. It is critical to establish common goals because those choices can affect the relative weights assigned to different evaluation criteria specified in Question 22. For example, a measure may be more relevant if a collaborative's primary goal is to inform consumer choice than if the primary purpose is to drive providers' quality improvement efforts.
  3. Continued, active engagement of key stakeholders will help maintain support for the common mission established at the collaborative's inception, while allowing its goals to evolve over time as needed. Community quality collaboratives in the early stages of organizing should note that “key stakeholders” include consumer and provider representatives. Both offer viewpoints that are critical to the sustainability of the effort and the usability of the selected measures.

Successful Steps to Measure Selection

Roski and Pawlson153 suggest that after community quality collaboratives agree on the mission, they should:

  1. Address the goals or scope of measurement (i.e., measure adherence to a single guideline or an assessment of the “quality” of care). More ambitious goals and a broader scope will inevitably necessitate more measures. An emphasis on accountability and transparency may lead to a larger set of measures, with more variable reliability, than an emphasis on improving consumer decisionmaking.
  2. Determine the number of measures required to meet the goal, which varies according to desired level of validity and reliability and other technical performance characteristics. More measures are not necessarily better, if their reliability or validity is questionable, or if they are not relevant to the intended audience.
  3. Assess data source availability, reliability, and affordability (e.g., electronic claims; pharmacy, laboratory, and medical records; paper records and surveys). Improving these features of the data may allow the goals or scope of measurement toexpand, leading to a cycle of program improvement.

By following this three-step process, community quality collaboratives can ensure that they are making fully informed measure selections that are aligned with their specific goals and the needs of their stakeholders. In addition, they can ensure that their selections are consistent with the data and other resources available to them.

Page last reviewed October 2014
Page originally created May 2010
Internet Citation: Part IV. Selecting Quality and Resource Use Measures (continued). Content last reviewed October 2014. Agency for Healthcare Research and Quality, Rockville, MD.