Selecting Quality and Resource Use Measures: A Decision Guide for Community Quality Collaboratives
Part III. Introduction to Resource Use/Efficiency Measures
Resource use measurement is in the early stages of development. Although public and private payers express considerable interest in calculating the value of health care services, it remains a challenge to develop and implement nationally accepted measures. Questions 14-18 describe both theoretical (e.g., types of measures, measure construction) and applied (e.g., measure use in comparing providers, national efforts) aspects of resource use measurement.
Question 14. What are the main types of resource use measures?
The term “resource use measures” is intended to broadly capture indicators of the cost and efficiency of health care provision. Health care resource use measures reflect the amount or cost of resources used to create a specific product of the health care system. The specific product could be a visit or procedure, all services related to a health condition, all services during a period of time, or a health outcome. “Efficiency” measures are a subset of resource use measures that compare the production of products of a specified level of quality.1, 106 Most resource use measures in use are not efficiency measures by this definition because they do not explicitly incorporate a measurement of the quality of the product.
A systematic review of available resource use measures was published by AHRQ at www.ahrq.gov/qual/efficiency/index.html. Three main groups of resource use measures have been developed:
- Relatively simple measures of the resources used to produce health care, such as mean length of stay, mean charges or estimated costs, and readmission rates for hospitals; and consultation or test ordering rates for outpatients with common complaints such as low back pain.
- More complex measures of health care resource use, including both inpatient and outpatient services, using econometric or mathematical programming techniques to account for multiple outputs.
- Measures of the resources used in an episode of care for a patient, or to treat a patient with a specified burden of comorbidity for a specified period of time.
Relatively Simple Measures
The first group of measures includes relatively straightforward measures long used in hospital management. The most common measure of this type is the average length of hospital stay, adjusted for case mix. This provides an estimate of the resources used to care for a hospitalized patient with a particular diagnosis. Other measures focus on whether the hospitalization itself was a necessary use of resources; potentially avoidable readmissions following hospital stays are a commonly used measure of this type. Finally, charges or estimated costs associated with specific services are sometimes presented as resource use measures, although these measures may be distorted by cost shifting, anticompetitive behavior, differences in quality, and a variety of other manifestations of market failure.
More Complex Measures
The second group of measures, typically published in the peer-reviewed literature,107 reflects the amount and type of various resources used to produce a mix of hospital services, such as hospital discharges, outpatient visits, and procedures. These measures use complex methods to account for different mixes of resources used and services produced. The complexity of these methods may have inhibited the broad use of these measures beyond academic research, because measurement results can be sensitive to many specification choices and difficult to interpret. However, a measure of this type is included in AHRQ's National Healthcare Quality Report (refer to Question 17). Using a related approach called Data Envelopment Analysis (DEA),108 Valdmanis and colleagues compared the number of hospital staff and beds used to produce a mix of inpatient and outpatient services across 1,377 urban hospitals in 34 States operating in 2004. Their study found that hospitals could increase the total amount of outputs produced by an average of 26%, without increasing inputs, by eliminating inefficiency.
Episode- and Population-Based Measures
The third group includes two main approaches to resource measurement: (1) “episode-based” measures of resources used for an “episode of care,” including all services related to a particular medical condition or acute event; or (2) “population-based” measures of resources used in providing all care to an individual with one or more chronic conditions for a period of time. Of the two approaches, episode-based measures have been used most widely by commercial payers and have been recommended for use in Medicare by the Congressional Budget Office and the Medicare Payment Advisory Commission, among others.
Episodes are defined using “grouper” tools, such as the Episode Treatment Groups (ETGs) developed by Symmetry Health Data Systems and Medstat Episode Groups (MEGs) developed by Thomson Medstat. These tools group related services into episodes primarily using diagnosis codes; episodes include services furnished by different providers in different care settings. The cost or resources used to produce each episode are then tallied across providers.
A population-based approach to efficiency measurement, such as Diagnostic Cost Groups (DCGs), classifies a patient population according to morbidity burden in a given period (e.g., one year). The cost or resources used for all health care for that patient over the time period are then measured.
Question 15. What types of data are used to construct resource use measures? How is “cost” measured?
Resource use measures are typically constructed using administrative data. Hospital-focused measures can use administrative data sets such as those collected and disseminated by statewide health data organizations.109 Measures that cover a broader range of services and care settings may require the use of insurance claims for medical, ancillary, and pharmacy services. While administrative data do not include much clinical information, they have the advantage of being readily available and reasonably standardized. Some measures append additional data from other sources on provider characteristics, such as the American Hospital Association Annual Survey.
One of the main challenges in using administrative data is that each insurer's data include only a portion of all care provided by each provider. Data from one insurer may not be sufficient to allow for stable measurement or may not be representative of a provider's entire practice. For this reason, some initiatives have aggregated data across multiple sources. For example, statewide health data organizations collect all-payer hospital data, and six communities have aggregated data from multiple sources for the Centers for Medicare & Medicaid Services (CMS) Better Quality Information project, described under Question 13.44 However, aggregation of cost data can be both technically and politically challenging, because insurers and providers are reluctant to share sensitive information, such as pricing arrangements.
Alternative Sources for Capturing Cost
For a profit-maximizing firm in a perfectly competitive market with symmetric information, the marginal cost of producing a service equals the marginal revenue or transacted price of that service. However, health care markets are not perfectly competitive, and transacted prices in the commercial market are generally unknown. Therefore, there are two easy alternatives to using actual prices (payments) in measuring hospital costs. One option is to use the amounts that providers charge payers, which are more readily available than either prices (payments) or hospital costs. However, prices (payments) often differ significantly from charges due to negotiated discounts, bundling of claims, and shared risk or capitated payment. For hospitals, charges can be used to estimate hospital resource costs by applying cost-to-charge ratios calculated from Medicare hospital cost reports or similar all-payer systems established by several States (e.g., California, Florida, Massachusetts, New Jersey). Although the accuracy of these estimates at the service level has been questioned, they appear to have reasonable validity at the hospital level.110-111
Another alternative, most commonly applied to ambulatory care, is to use standardized units to assign the relative resource use of different services. For example, relative value units (RVUs), which are used by Medicare and other payers to determine relative payment rates for various procedures, could be used instead of the price paid for services. One group that has followed this approach is the Washington-Puget Sound Health Alliance. The Alliance has constructed a regional all-payer database that is being used for quality and resource use measurement, but the data suppliers do not submit any financial information. Instead, The Alliance uses a system of RVUs developed by Milliman to score different services (including not only physician services but also other types of services without Medicare RVUs) using a common metric for relative resource use.112
Different questions are answered by using standardized prices, which are the same for all providers and payers, than by using charges or actual prices, which differ across providers and payers. Standardized prices address whether a health care service could be produced faster, with fewer people, fewer labor hours, or fewer supplies (i.e., fewer inputs). Charges or actual prices address whether the output could be produced less expensively (i.e., reducing the total cost of labor, supplies, and capital, either by using fewer inputs or by procuring those inputs at lower cost).
Consumers and purchasers may favor using actual prices, which reflect what they actually pay. For example, the price of an imaging study is likely to be higher at a teaching hospital than at a community hospital, even though the same real resources may be used in each setting.113 In this hypothetical example, a measure based on actual prices would reflect the fact that consumers and payers pay more for the imaging study at a teaching hospital than at a community hospital. By comparison, a measure based on standardized prices might show that the same quantity of resources was used for the imaging study in both settings.
Question 16. What is known about the validity of available resource use measures, including their advantages and disadvantages?
The state of the art in health care resource use measurement contrasts sharply with that of the measurement of health care quality. Little is known about the validity of resource use measures or the advantages and disadvantages of different measures. Only a few resource use measures (length of stay and readmission measures) have been endorsed by the National Quality Forum (NQF). Unlike the evolution of most quality measures, current resource use measures are not typically derived from practice standards in the research literature, professional medical associations, or expert panels. Unlike most quality measures, resource use measures have been subjected to few rigorous evaluations of their reliability and validity.
Differences Among Resource Use Measures
Several differences among resource use measures could guide a community collaborative's choice of measures. Many resource use measures focus on hospitals, including simple measures such as mean length of stay and more complex multiple-output measures using econometric or mathematical programming techniques. Hospitals account for a high proportion of total health spending, and so may be of particular interest for resource use measurement. However, a focus on hospital care omits many types of services and does not capture coordination of care across settings, where many inefficiencies in delivery occur.
Commercial measures, both episode based and population based, reflect care provided in multiple settings. Population-based measures, although typically adjusted for patients' risk of higher resource use, reflect the “probability risk” that patients will acquire a condition that requires higher than expected resources during the data collection period. Episode-based measures, in contrast, reflect only the resources used in treatment of a particular condition, beginning at the onset of the episode of care for that condition.
After weighing these considerations, several national groups, including NQF and CMS, have expressed a preference for episode-based measures over population-based or hospital-focused measures. CMS lists the following advantages of episode-based measures114:
- “Compare more similar patients than per capita calculations, as they are defined by similar procedures or conditions;
- Capture the multiple ways in which services can be combined and substituted to produce the best outcome at the lowest cost;
- Reflect patients' view of care as they move between and across settings and managers of their care, rather than simply measuring resources used for just a part of their care in one setting, and
- Encourage improved coordination across settings included in the episode.”
Many hospital-focused measures, such as average length of stay and readmission rates, are widely used and relatively simple to construct. For example, United Health Group sponsors an NQF-endorsed measure described as “overall inpatient 30-day hospital readmission rate.” However, “single output” measures of this type may be misleading, because the services needed to avert readmissions (e.g., longer inpatient stays) may actually consume more resources than the “preventable service” itself. Readmission may be an undesirable outcome for some patients in some settings, but a desirable outcome for other patients in other settings.
More complex multiple-output hospital measures are published in the peer-reviewed literature. However, they are generally published in one-off studies and use complex methodology, so that community quality collaboratives would need to reconstruct such measures at considerable cost. Commercial episode-based and population-based measures are proprietary and are available to be licensed for application to existing data sets. Many commercial insurers are using these measures, although there is little evidence about the relative merits of competing products from different vendors. Because of concerns over the proprietary nature of these measures, some collaboratives, such as the Washington-Puget Sound Health Alliance, have elected instead to use public domain episode-based measures that are currently under development.
Further Methodological Questions
Several methodological questions that are important to establishing credible resource use measurement remain. These questions apply to most types of measures.
- Reliability: Reliability is an analysis of whether the variation seen in resource use is due to measurement error or to true differences in performance. The reliability of various resource use measures is largely unknown. The sample size of observations required to produce stable resource use estimates is uncertain. Health plans currently use arbitrary cutoffs, such as 30 episodes per physician, and therefore are often unable to profile as many as one-third of the eligible physicians in their networks.
- Provider Attribution: A key issue for resource measurement for care provided by more than one provider, such as episodes of care, is how to attribute primary accountability for the resources used. Various algorithms, mainly based on visit counts and payment amounts, have been used. Different algorithms lead to different assignments, and every algorithm needs to be adjusted based on market characteristics such as the availability of subspecialists and geographic or cultural isolation. No national consensus guidelines for provider attribution are available.
- Risk adjustment: Variation in resource use may be driven largely by differences in patient risk. While several risk-adjustment methods are used in various applications, most notably by vendors such as 3M Health Information Systems (distributor of Severity of Illness scores for All Patient Refined DRGs), limited testing has been done in some resource measurement applications, such as episodes of care.
- Treatment of outliers: The distribution of resource use across individuals is highly skewed, with some people having no encounters or prescriptions and others having hundreds of encounters per year. Some users exclude outliers, but a preferable approach is probably to truncate (also known as Winsorizing) outliers to reduce their influence on subsequent analyses.
Question 17. Which national groups are developing or endorsing resource use measures?
The Agency for Healthcare Research and Quality (AHRQ) includes a chapter on “efficiency” in its annual National Healthcare Quality Report.115 The chapter includes several “potential” measures of efficiency that AHRQ believes “should be viewed as preliminary and designed to stimulate productive ongoing discussions about health care efficiency.” These measures include trends in potentially avoidable hospitalizations and related costs, rehospitalization for heart failure, and an application of stochastic frontier analysis, which is an econometric technique that models provider-level inefficiency as a departure from an estimated best-practice frontier.108, 116 A set of resource use and efficiency measures is currently being developed for broader application, under AHRQ's Quality Indicators program.
The Centers for Medicare & Medicaid Services (CMS) is developing reporting of physician resource use using episode-based measures.117 CMS has been evaluating two commercial episode groupers, Episode Treatment Groups (ETGs) and Medstat Episode Groups (MEGs). It also has funded a study developing alternative approaches to the commercial groupers. It is continuing to explore ways to improve on the commercial measures and is considering funding the development of new groupers for use with Medicare claims. CMS's publicly reported, National Quality Forum (NQF)-endorsed measures of 30-day readmissions after hospitalization for heart failure, pneumonia, or heart attack may also be interpreted as hospital-level resource use measures.
The National Committee for Quality Assurance (NCQA) has developed Relative Resource Use (RRU) measures. The RRUs are population-based measures that are used to compare health plans on resources used to care for beneficiaries with six conditions: asthma, diabetes, low back pain, cardiovascular disease, hypertension, and chronic obstructive pulmonary disease. Published tables allow organizations to match severity-adjusted resource use within service categories (Inpatient Facility, Surgery and Procedure, Evaluation and Management (E&M), and Pharmacy) to a standardized allowed payment in order to calculate total standard costs for their eligible members across different areas of clinical care.
The Leapfrog Group is a purchaser coalition that publishes information on hospital quality and resource use for coronary artery bypass graft surgery, percutaneous coronary intervention, heart attack, and pneumonia. Efficiency of care for each procedure and condition is a blend of a hospital's quality score for that procedure or condition (based on statewide outcome data, process-of-care measures, and volume) with their resource utilization score for that procedure or condition (Table 5). The resource utilization score for each procedure or condition is based on a hospital's standardized, risk-adjusted, geometric mean length of stay for that procedure or condition, inflated by the hospital's 14-day all-cause readmission rate for that condition.
The Quality Alliance Steering Committee (QASC) is a collaborative effort among government agencies, physicians, nurses, pharmacists, hospitals, health insurers, employers, consumers, and others.117 To support the generation of effective health care performance information, the QASC is working to foster coordinated episode- and patient-level measures across the care continuum. It is aggregating data from different national health plans and Medicare to enable measurement of physicians' care for their entire practice panels. The QASC is also developing resource use measures for 20 high-cost/priority conditions, which will include both episode-based and per capita resource use measures.
The National Quality Forum (NQF) has developed a draft measurement framework for efficiency. A draft report, Measurement Framework: Evaluating Efficiency Across Patient-Focused Episodes of Care, has been endorsed by the NQF.106 This episode-based framework will be used to develop a comprehensive set of performance measures, including resource use and quality measures, for selected clinical conditions.
The Consumer-Purchaser Disclosure Project (CPDP) is a multistakeholder collaboration involving consumer, employer, and labor organizations. It has published a “Patient Charter for Physician Performance Measurement, Reporting and Tiering Programs.” This charter lays out a set of principles to guide physician performance measurement, which community quality collaboratives might consider as they develop reporting programs about resource use. The principles are118:
- Measures should be meaningful to consumers and reflect a diverse array of physician clinical activities.
- Those being measured should be actively involved.
- Measures and methodology should be transparent and valid.
- Measures should be based on national standards to the greatest extent possible.
Question 18. How have resource use measures been used to compare providers to benchmarks?
Provider resource use is typically compared to benchmarks of “peer” providers. A common approach is to calculate a resource use score by dividing a provider's observed resource use by the “expected” resource use derived from a benchmark population. This approach allows for aggregation of resource use measurements across multiple episodes (or other units of service). The resource use score is then used to identify outliers that have significantly higher or lower resource use than peers.
Slightly different methods are used in these calculations, which could have a significant effect on the results.119 For example, Medstat Episode Group (MEG) can be used to calculate a Risk-Adjusted Cost Index (RACI), which is a ratio of the total allowed costs of qualified episodes for which the provider is attributed responsibility divided by the total expected costs. The expected cost is based on the average cost of similar episodes based on MEGs, severity of illness (classified from 0 to 3), comorbidity burden (defined by the Diagnostic Cost Group [DCG] Relative Risk Score), provider specialty, and geographic region. With Symmetry Episode Treatment Groups (ETGs), the estimated ratio is somewhat different: a physician's normalized, actual resource use for a given set of ETGs serves as the numerator, and his or her specialty's normalized, average resource use for the same set of ETGs serves as the denominator.
Arbitrary thresholds are typically used to determine which providers are categorized as “high resource use” or “low resource use.” These thresholds are often set at percentiles in the distribution of resource use scores (e.g., the providers in the top decile of resource use scores are labeled “high resource use”). The Washington-Puget Sound Health Alliance provides another example of how resource use is compared. Following the format of their quality reports, the Alliance plans to report provider resource use for episodes of care as “above the regional average,” “at the regional average,” or “below the regional average.”
Different peer groups have been used in resource use comparisons. One decision is whether to use a benchmark of providers from the same geographic area or a national benchmark. Practice patterns vary widely between regions. For example, the cost per episode and the number of episodes per beneficiary were found to differ widely between Minneapolis and Miami.120 A regional benchmark would control for these differences, while a national benchmark would compare providers to national practice standards.
A second decision is whether to use a benchmark of providers of the same type (e.g., specialty) or providers of all types (e.g., multiple specialties). Many conditions are treated by multiple specialties, and practice patterns often differ widely by specialty. For example, endocrinologists and primary care physicians both provide care for diabetes. A measure of resource use for diabetes treatment could compare endocrinologists to other endocrinologists, as well as to primary care physicians.
A single-specialty benchmark holds providers accountable for the standards of their specialty, while a multiple-specialty benchmark compares resource use across specialties. Similarly, hospital comparisons could be limited to hospitals of the same teaching status or safety-net status. Comparing providers only to similar providers has the advantage of reducing variation that is beyond the provider's control, but the disadvantage is that providers are not held accountable to the level of performance achieved by other types of providers caring for similar patients.