Executive Summary
The measurement of health care efficiency has lagged behind the measurement of health care
quality. Providers, payers, purchasers, consumers, and regulators all could benefit from more
information on value for money in health care. Purchasers, particularly large employers, have
been demanding that health plans incorporate economic profiling into their products and
information packages. Despite the importance, there has not been a systematic and rigorous
process in place to develop and improve efficiency measurement as there has been for other
domains of performance. Recognizing the importance of improving efficiency measurement, the
Agency for Healthcare Research and Quality (AHRQ) has sponsored this systematic review and
analysis of available measures. Our work was designed to reach a wide variety of stakeholders,
each of which faces different pressures and values in the selection and application of efficiency
measures. Thus, we anticipate that some sections of the report will be less useful to some
readers than others. This report should be viewed as the first of several steps that are necessary
to create agreement among stakeholders about the adequacy of tools to measure efficiency.
Methods
Typology
Because we found that many stakeholders attach different meanings to the word "efficiency,"
we first developed a definition of efficiency. We believe that being explicit about how the term
is being used is helpful in advancing the dialogue among stakeholders. In this report, we define
efficiency as an attribute of performance that is measured by examining the relationship between
a specific product of the health care system (also called an output) and the resources used to
create that product (also called inputs). Under our definition, a provider in the health care system
(e.g., hospital, physician) would be efficient if it was able to maximize output for a given set of
inputs or to minimize inputs used to produce a given output.
Building on this definition, we created a typology of efficiency measures. The purpose of the typology is to make explicit the content and use of a measure of efficiency. Our typology has three levels:
- Perspective: who is evaluating the efficiency of what entity and what is their objective?
- Outputs: what type of product is being evaluated?
- Inputs: what resources are used to produce the output?
The first tier in the typology, perspective, requires an explicit identification of the entity that is evaluating efficiency, the entity that is being evaluated, and the objective or rationale for the assessment. We distinguish between four different types of entities:
- Health care providers (e.g., physicians, hospitals, nursing homes) that deliver health care services.
- Intermediaries (e.g., health plans, employers) who act on behalf of collections of either providers or individuals (and, potentially, their own behalf) but do not directly deliver health care services.
- Consumers/patients who use health care services.
- Society, which encompasses the first three.
Each of these types of entities has different objectives for considering efficiency, has control
over a particular set of resources or inputs, and may seek to deliver or purchase a different set of
products. Efficiency for society as a whole, or "social efficiency," refers to the allocation of
available resources; social efficiency is achieved when it is not possible to make a person or
group in society better off without making another person or group worse off. The perspective
from which efficiency is measured has strong implications for the measurement approach,
because what looks efficient from one perspective may look inefficient from another. For
example, a physician may produce CT scans efficiently in her office, but the physician may not
appear efficient to a health plan if a less expensive diagnostic test could have been substituted in
some cases. The intended application of an efficiency measure (e.g., pay-for-performance,
quality improvement) offers another way of assessing perspective.
The second tier of the typology identifies the outputs of interest and how those will be
measured. We distinguish between two types of outputs: health services (e.g., visits, drugs,
admissions) and health outcomes (e.g., preventable deaths, functional status, clinical outcomes
such as blood pressure or blood sugar control). The typology addresses the role of quality (or
effectiveness) metrics in the assessment of efficiency. A key issue that arises in external
evaluations of efficiency is whether the outputs are comparable. Threats to comparability arise
when there is (perceived or real) heterogeneity in the content of a single service, the mix of
services in a bundle, and the mix of patients seeking or receiving services. Pairing quality
measures with efficiency measures is one approach that has been suggested by AQA and others
to assess comparability directly.
In this typology, we do not require that the health service outputs be constructed as
quality/effectiveness metrics. For example, an efficiency measure could consider the relative
cost of a procedure without evaluating whether the use of the procedure was appropriate.
Similarly, an efficiency measure could evaluate the relative cost of a hospital stay for a condition
without considering whether the admission was preventable or appropriate. However, the
typology allows for health service outputs to be defined with reference to quality criteria. That
is, the typology is broad enough to include either definition of health services. We deliberately
constructed the typology in this way to facilitate dialogue among stakeholders with different
perspectives on this issue.
The third tier of the typology identifies the inputs that are used to produce the output of
interest. Inputs can be measured as counts by type (e.g., nursing hours, bed days, days supply of
drugs) or they can be monetized (real or standardized dollars assigned to each unit). We refer to
these, respectively, as physical inputs or financial inputs. The way in which inputs are measured
may influence the way the results are used. Efficiency measures that count the amounts of
different inputs used to produce an output (physical inputs) help to answer questions about
whether the output could be produced faster, with fewer people, less time from people, or fewer
supplies. In economic terms, the focus is on whether the output is produced with the minimum
amount of each input and is called technical efficiency. Efficiency measures that monetize the
inputs (financial inputs) help to answer questions about whether the output could be produced
less expensively—whether the total cost of labor, supplies, and other capital could be reduced. A
focus on cost minimization corresponds to the economic concept of productive efficiency, which
incorporates considerations related to the optimal mix of inputs (e.g., could we substitute nursing
labor for physician labor without changing the amount and quality of the output?) and the total
cost of inputs.
This typology provides a framework within which stakeholders can have an explicit
discussion about the intended use of measures, the choice and measurement of outputs, and the
choice and measurement of inputs. Requesting that groups use a standard format, such as that
suggested by the typology, allows stakeholders to systematically examine what is being
measured and whether the measure (and available data) is appropriate for the purpose.
Evidence Sources and Searches
We searched Medline® and EconLit for articles published between 1990 and 2005 describing
measures of health care efficiency. Titles, abstracts and articles were reviewed by two
independent reviewers, with consensus resolution. We focused on studies reporting efficiency of
U.S. health care, and excluded studies focusing on other countries. Data were abstracted onto
Evidence Tables and also summarized narratively.
Because we expected some of the most commonly used efficiency measures might not
appear in the published literature, we developed a list of organizations that we knew had
developed or were considering developing their own efficiency measures. We contacted key
people at these organizations in an attempt to collect the information necessary to describe and
compare their efficiency measures to others we abstracted from articles.
A Technical Expert Panel (TEP) advised the project staff on the typology and sources of information, and reviewed a draft of this report. The TEP is listed in Appendix D of this report.
Return to Contents
Results
We found little overlap between the peer-reviewed literature that describes the development,
testing, and application of efficiency measures and the vendor-based efficiency metrics that are
most commonly used. From the perspective of policymakers and purchasers, the published
literature provides little guidance for solving current challenges to managing rising health care
costs. From the perspective of measurement experts, the vendor-based metrics are largely
untested and as such the results may be problematic to interpret accurately. These observations
have implications for the recommendations we make at the end of the report regarding future
research.
Published Literature
In total, RAND reviewers examined 4,324 titles for the draft version of this report. Of these, 563 articles were retrieved and reviewed. There were 158 articles describing measures of health care efficiency in the United States.
The majority of peer-reviewed literature on health care efficiency has been related to the
production of hospital care. Of the 158 priority articles abstracted, 93 articles (59%) measured
the efficiency of hospitals. Studies of physician efficiency were second most common (33
articles, 21%), followed by fewer articles on the efficiency of nurses, health plans, other
providers, or other entities. None of the abstracted articles reported the efficiency of health care
at the national level, although two articles examined efficiency in the Medicare program.
Almost all of the measures abstracted from the articles used health services as outputs.
Common health service types used as inputs included inpatient stays, physician visits, and
procedures. Only four measures were found that included health outcomes as outputs. In
addition, none of the outputs explicitly accounted for the quality of service provided. A small
subset of measures attempted to account for quality by including it as an explanatory variable in
a regression model in which efficiency was the dependent variable. Some articles also
conducted analyses of outcomes separately from analyses of efficiency.
The health care efficiency measures abstracted were divided between measures using physical or financial inputs. There were more articles that used physical inputs than financial inputs. No articles were found containing measures of social efficiency.
Most of the measures abstracted from the peer-reviewed literature used econometric or
mathematical programming methodologies for measuring health care efficiency. Two
approaches were most common: data envelopment analysis (DEA) and stochastic frontier
analysis (SFA). DEA is a non-parametric deterministic approach that solves a linear
programming problem in order to define efficient behavior. SFA is a parametric approach that
defines efficient behavior by specifying a stochastic (or probabilistic) model of output and
maximizing the probability of the observed outputs given the model. These techniques can
explicitly account for multiple inputs and multiple outputs. For example, DEA and SFA could
be used to measure the efficiency of hospitals that use nursing labor and supplies to produce
inpatient stays and ambulatory visits. DEA and SFA differ in a number of respects. DEA makes
fewer assumptions than SFA about how inputs are related to outputs. DEA compares the
efficiency of an entity to that of its peers (rather than an absolute benchmark) and typically
ignores statistical noise in the observed relationship between inputs and outputs.
Some measures were ratio-based. Ratios were more common for physician efficiency
measures than hospital efficiency measures. The main difference between the various
measurement approaches is that ratio-based measures include only single inputs and outputs
(although various elements are sometimes aggregated to a single quantity), whereas SFA, DEA,
and regression-based approaches explicitly account for multiple inputs and outputs.
An example of a measure that uses multiple physical inputs and multiple health services
outputs comes from Grosskopf.1 This DEA-based measure used the following inputs (counts): physicians; nurses; other personnel; and hospital beds. As outputs it used (again, counts):
outpatient procedures; inpatient procedures; physician visits in outpatient clinics; hospital
discharges; and emergency visits. In comparison, a typical example of a measure that uses a
single physical input and health services output (ratio) was the number of hospital days (input)
divided by the number of discharges (output)—the average length of stay.2
Vendors and Stakeholder Interviews
Thirteen organizations were selected using a purposive reputational sampling approach. The
results presented here are based on information gathered from eight vendors and five
stakeholders who responded to our request for an interview. The TEP, which included various
stakeholders and experts on efficiency measurement, also provided input into the search and
reviewed this report. The TEP members are listed in Appendix D.
Most of the measures used by purchasers and payers are proprietary. The main application of
these measures by purchasers and plans is to reduce costs through pay-for-performance, tiered
product offerings, public report, and feedback for performance improvement. These measures,
for the purpose of assessing efficiency, generally take the form of a ratio, such as observed-to-expected
ratios of costs per episode of care, adjusting for risk severity and case-mix. Efforts to
validate and test the reliability of these algorithms as tools to create relevant clinical groupings
for comparison are documented in either internal reports or white papers. External evaluations
of performance characteristics of these measures are beginning to emerge from the Medicare
Payment Advisory Commission (MedPAC), the Centers for Medicare and Medicaid Services
(CMS), and other research groups including RAND. Our scan identified seven major developers
of proprietary software packages for measuring efficiency, with other vendors providing
additional analytic tools, solution packages, applications, and consulting services that build on
top of these existing platforms.
The proprietary measures fall into two main categories: episode-based or population-based.
An episode-based approach to measuring efficiency uses diagnosis and procedure codes from
claims/encounter data to construct discrete episodes of care, which are a series of temporally
contiguous health care services related to the treatment of a specific acute illness, a set time
period for the management of a chronic disease, or provided in response to a specific request by
the patient or other relevant entity. On the other hand, a population-based approach to
efficiency measurement classifies a patient population according to morbidity burden in a given
period (e.g., one year).
We contacted a sample of stakeholders to seek their insights on efficiency measurement. We
used their input to cross-validate our selection of vendors described above. Our sample included
two coalitions on the national level; two coalitions on the state level; and an accrediting agency.
We asked these stakeholders to provide the definition of efficiency they used to guide their
efforts; describe desirable attributes they considered as they searched for available measures;
comment on their interest or objectives in developing and/or implementing efficiency measures;
and list proprietary measures they have considered.
While the stakeholders used different definitions of "efficiency," they shared a number of
common concerns related to efficiency measurement. Many concerns were related to
methodological issues such as data quality, attribution of responsibility for care to providers, risk
adjustment, and identification of outliers. The stakeholders also shared a number of concerns
related to the use of efficiency measures, including the perceptions of providers and patients, and
the cost of using proprietary measures and transparency of the methods used to construct the
measures.
Return to Contents
Evaluation
Measures of any construct can rarely be evaluated in the abstract. The evaluation must take into account the purpose or application of the measure; some measures that work well for research, for example, may be unusable for internal quality improvement.
We suggest that measures of health care efficiency be evaluated using the same framework as measures of quality:
- Important—is the measure assessing an aspect of efficiency that is important to
providers, payers, and policymakers? Has the measure been applied at the level of interest to those planning to use the measure? Is there an opportunity for improvement? Is the measure under the control of the provider or health system?
- Scientifically sound—is the measure reliable and reproducible? Does the measure appear
to capture the concept of interest? Is there evidence of face, construct, or predictive validity?
- Feasible—are the data necessary to construct this measure available? Is the cost and
burden of measurement reasonable?
- Actionable—are the results interpretable? Can the intended audience use the information
to make decisions or take action?
An ideal health care efficiency measure does not exist, and therefore the selection of
measures will involve tradeoffs between these criteria. We summarize the results of our review
of measures below.
Important
The measurement of efficiency meets the test of importance because of the interest and intent
among stakeholders in finding and implementing such measures for policy and operations.
Although we found differences in the content of measures from peer-reviewed versus vendor–developed
sources, they have in common the specification of one or more outputs and one or
more inputs in constructing a measure.
The "importance" of measures abstracted from peer–reviewed literature appears low because
these have not generally been used in practice and there is no apparent consensus in the academic
literature of an optimal method for measuring efficiency. Some academic experts have indicated
skepticism that the construct can be adequately measured. Although many peer-reviewed
articles identified factors that were found to influence efficiency, the findings appear to be
difficult to translate into policy. We found no clear evidence that efficiency measures developed
by academics had influenced policy decisions made by providers or policymakers.
The vendor-developed measures meet the importance criterion because they are being widely used by purchasers and plans to inform operational decisions. Some of the vendor-developed measures are based on methods originally developed in the academic world (e.g., Adjusted Clinical Groups).
Scientifically Sound
Very little research on the scientific soundness of efficiency measures has been published to
date. This includes measures developed by vendors as well as those published in the peer–reviewed
literature. Although academics are more likely to publish articles evaluating scientific
soundness, we found little peer–reviewed literature on the reliability and validity of efficiency
measures. Several studies have examined some of the measurement properties of vendor–developed
measures, but the amount of evidence available is still limited at this time. Vendors
typically supply tools (e.g., methods for aggregating claims to construct episodes of care or
methods for aggregating the costs of care for a population) from which measures can be
constructed; thus, the assessment of scientific soundness requires an evaluation of the application
as well as the underlying tools. Significant questions about the scientific soundness of efficiency
measures have been raised. The lack of testing of the scientific soundness of efficiency measures
reflects in part the pressure to develop tools that can be used quickly and with relative ease of
implementation.
Feasible
The focus of vendor–developed measures is on producing tools that are feasible for routine
operational use. Most of the measures abstracted from the peer-reviewed literature were based
on available secondary data sources (i.e., claims data). These measures could feasibly be
reconstructed at little cost and measurement burden. The vendor–developed measures also rely
largely on claims data. Most of the vendor–developed measures require that the user obtain and
pay for a license either directly or through a value added reseller. This has prompted some
organizations to begin developing open–source, public domain measures of efficiency. This
work is at an early stage.
Actionable
For efficiency metrics to have the effects intended by users, the information produced from measures must be actionable. We found little research on the degree to which the intended audiences for these measures (e.g., consumers, physicians, hospitals) were able to readily use the information to choose or deliver care differently.
Return to Contents
Conclusions
We found little overlap between the measures published in the peer-reviewed literature and
those in the grey literature suggesting that the driving forces behind research and practice result
in very different choices of measure. We found gaps in some measurement areas, including: no
established measures of social efficiency, few measures that evaluated health outcomes as the
output, and few measures of providers other than hospitals and physicians.
Efficiency measures have been subjected to relatively few rigorous evaluations of their
performance characteristics, including reliability (over time, by entity), validity, and sensitivity
to methods used. Measurement scientists would prefer that steps be taken to improve these
metrics in the laboratory before implementing them in operational uses. Purchasers and health
plans are willing to use measures without such testing under the belief that the measures will
improve with use.
The lack of consensus among stakeholders in defining and accepting efficiency measures that
motivated this study was evident in the interviews we conducted. An ongoing process to develop
consensus among those demanding and using efficiency measures will likely improve the
products available for use. A major goal of the AQA has been to develop a consensus around
use of language in describing measures of economic constructs. The National Quality Forum is
similarly working to achieve consensus on criteria for evaluating measures. Both groups support
the use of clear language in describing particular metrics, which may be easier to implement than
a consensus definition of efficiency.
Return to Contents
Future Research
Research is already underway to evaluate vendor-developed tools for scientific soundness, feasibility, and actionability. For example, we identified studies being done or funded by the
General Accounting Office, MedPAC, CMS, Department of Labor, Massachusetts Medical
Society, and the Society of Actuaries. A research agenda is needed in this area to build on this
work. We summarize some of the key areas for future research here but do not intend to signal a
prioritization of needed work.
Filling Gaps in Existing Measures
Several stakeholders recognize the importance of using efficiency and effectiveness metrics together but relatively little research has been done on the options for constructing such approaches to measurement. Much of the developmental work currently underway at AQA is focused on this gap.
We found few measures of efficiency that used health outcomes as the output measure.
Physicians and patients are likely to be interested in measures that account for the costs of
producing desirable outcomes. We highlight some of the challenges of doing this that are
parallel to the challenges of using outcomes measures in other accountability applications; thus, a
program of research designed to advance both areas would be welcome.
We found a number of gaps in the availability of efficiency measures within the classification
system of our typology. For example, we found no measures of social efficiency, which might
reflect the choice of U.S.-based research. Nonetheless, such measures may advance discussions
related to equity and resource allocation choices as various cost containment strategies are
evaluated.
Evaluating and Testing Scientific Soundness
There are a variety of methodological questions that should be investigated to better
understand the degree to which efficiency measures are producing reliable and valid information.
Some of the key issues include whether there is enough information to evaluate performance
(e.g., do available sample sizes allow for robust scores to be constructed?); whether the
information is reliable over time and in different purchaser data sets (e.g., does one get the same
result when examining performance in the commercial versus the Medicare market?); methods
for constructing appropriate comparison groups for physicians, hospitals, health plans, markets;
methods for assigning responsibility (attribution) for costs to different entities; and the use of
different methods for assigning prices to services. Remarkably little is known about these
various methodological issues and a program of systematic research to answer these questions is
critical given their increasing use in operational applications.
Evaluating and Improving Feasibility
One area of investigation is the opportunities for creating easy-to-use products based on methods such as DEA or SFA. This would require work to bridge from tools used for academic research to tools that could be used in operational applications.
Another set of investigations is identifying data sources or variables useful for expanding inputs and outputs measured (e.g., measuring capital requirements or investment, accounting for teaching status or charity care).
Making Measures More Actionable
Considerable research needs to be conducted to develop and test tools for decisionmakers to
use for improving health care efficiency (e.g., relative drivers of costs, best practices in efficient
care delivery, feedback and reporting methods) and for making choices among providers and
plans. Research could also identify areas for national focus on reducing waste and inefficiency in
health care. The relative utility of measurement and reporting on efficiency versus other
methods (Toyota's Lean approach, Six Sigma) could also be worthwhile for setting national
priorities.
Return to Contents
Return to Preface
Proceed to Next Section