Pay for Performance: A Decision Guide for Purchasers

Phase 2. Design (continued)

Question 11. Where do we find the money?

Potential sources of funds for a P4P initiative include:

  • New money.
  • Redirection of annual payment updates.
  • Reallocation of payment among providers, e.g., through a combination bonus-penalty payment scheme.
  • Cost savings resulting from improved quality and special cases of shared savings.
  • For Medicaid, disproportionate share funds, and for the special case of Medicaid managed care, preferential auto-assignment formulas, which provide financial incentives in the form of greater volume of patients.

Many private payers that have introduced P4P programs frame the bonus potential as "increased" payments to providers, but it is difficult to imagine sustaining such increases given the recent double-digit growth rates in spending. If performance pay is to account for more than a small share of provider compensation in the near term, there will have to be significant redistribution (winners and losers); savings will be needed to at least partially offset the additional costs associated with improving quality; or, most likely, the funds will come from cumulatively directing all or a portion of annual updates to incentive pay. From an employer perspective, the possibility of offsetting savings should account for increased employee productivity.

Whether improved quality will pay for itself in financial terms is an issue of some debate. In general, however, it is clear that the question of whether the costs of quality improvement will be offset by savings (e.g., from reduced hospitalizations) will first depend on whether the incentive seeks to remedy misuse, underuse, or overuse. Even within each of these categories, however, there will be differences by clinical area. For incentives to correct underuse of cancer screening, for example, most of the health and financial gains are long term, so a positive financial return in the short term is unlikely. Reducing underuse of prescription drugs and educational services for patients with chronic illnesses and substantial risks of high-cost hospitalizations or procedures may be more likely to yield savings in the near term.

At present, for many of the most commonly used measures of quality (e.g., HEDIS® measures), the promise that improvement will result in cost offsets sufficient to ensure financial sustainability of pay for performance is uncertain at best. In light of these clinical realities and current budgetary constraints, purchasers might choose to explicitly incorporate cost-saving measures into their P4P programs. For example, the Wellpoint Physician Quality Incentive Program rewards physicians for generic prescribing and administrative efficiency measures such as electronic claims filing. The Integrated Healthcare Association in California is also currently evaluating the inclusion of an aggregate cost-efficiency measure in its P4P program. On the other hand, there is some risk that orienting a P4P program toward cost control may undermine the credibility of the quality improvement aspects of pay for performance.

Question 12. How much money should we put into performance pay?

There is no single answer to the question of how much money is needed. Some P4P schemes have provided as little as $2 per visit and had an impact, while others offering bonuses of up to $10,000 had no effect.33,34 One can, however, identify a number of factors relevant to the decision about how large performance pay needs to be (Diagram).5


Factors Affecting the Necessary Size of Incentive

Many factors are relevant to the decision about how large performance pay needs to be. The main factors that affect the optimal magnitude of the incentive are local mediators of the cost of improvement and fundamental characteristics of the clinical condition or activity. Local mediators of the cost of improvement include: other incentives in place (especially the general approach to payment), patient characteristics (such as education), organizational capabilities (such as information technology), market factors (such as availability of market resources), and provider characteristics (especially current level of performance). Fundamental characteristics of the clinical condition or activity include: feasibility and national average cost of improvement.

Key issues include the following:

  • Characteristics of the clinical condition or treatment. Some changes are easier to achieve than others. It is easier to get patients to take flu shots than to quit smoking. Some interventions are less costly than others, even among screening tests. For instance, Pap smears are much less expensive than colonoscopies. Improving performance in areas with good feasibility and low cost should require smaller incentives than improving results in other areas.
  • Other incentives already in place. For example, if medical groups are capitated for their services, then incentives to increase screening tests would need to be larger than in a fee-for-service system in which providers already receive basic fees for the associated visits and procedures.
  • Organizational capabilities. Larger groups may have the resources to hire a dedicated asthma patient educator and ensure excellent communication between pulmonologists and primary care providers, while smaller groups and solo practices may find patient education and inter-provider communication more difficult.
  • Patient and market variables. Providers with highly educated patients traditionally experience better patient adherence and cooperation, which may affect their performance ratings.35 Rural diabetics may have a harder time getting eye exams than their urban counterparts because of a dearth of local ophthalmologists. Market share of the payer may also be a factor in determining the necessary size of the bonus, particularly if investments in infrastructure or training are needed to achieve the quality goal. A purchaser with large market share, like Medicare, may be able to promote change with a relatively smaller proportional bonus compared to a purchaser with small market share.

In light of all the uncertainty on this topic, it is not surprising that the P4P programs in place today—most of which are fairly new—typically place 5 percent or less of contracted revenues at risk for performance, although there is some indication that the amounts at risk are increasing.1, 36 In the case of hospital programs, the percentages are often lower. For example, the CMS/Premier demonstration involves a reward of 1 to 2 percent for top-performing hospitals.

Another perspective on how much to pay to improve performance can be found by considering shared savings where savings are anticipated from quality improvement. This approach has been used by purchasers such as the Alliance of Wisconsin as well as the Bridges to Excellence program. For example, if hospitals reduce complication rates among patients receiving a particular procedure and those avoided complications save the purchaser $10,000 in additional treatment costs, the hospital might receive 50 percent or $5,000 of those savings.

Question 13. What measure characteristics make them attractive candidates for inclusion in an initial measure set?

Measure types span structure, process, and outcome and include technical (clinical) as well as interpersonal attributes of care. Developing a robust measure set is crucial to P4P success. Not surprisingly, surveys of providers indicate that performance measurement that lacks clinical face validity or sufficient scope and sophistication will be poorly received and actively resisted.12, 37

Table 2 lists characteristics to consider in evaluating candidate quality indicators. One major issue is whether the indicator generates information about a single condition (e.g., use of appropriate antibiotics in pneumonia) or is relevant to a broad population (e.g., rates of medication errors). Although measures that apply to larger numbers of patients are attractive, the disadvantage of this approach is that precise measurement definitions and standards are less often available for process or outcome measures of this type; valid measurement may require adjustment for differences in the types of patients across providers (and methods for such adjustment may not be developed for some measures). On the other hand, precisely defined, condition-specific measures suffer from a general lack of availability for many diseases and treatments.


Table 2. Indicator Characteristics To Consider in Developing a Measure Set

1.    Does the indicator measure care that is a priority for quality improvement?
2.    Does the indicator apply to a single disease or across multiple patient groups?
3.    Does the indicator generate information about cost efficiency, health care processes, outcomes, or structure?
4.    Does the indicator reflect technical competency or patient experiences with care?
5.    Is the indicator actionable?
6.    Is there a valid source for the data needed to calculate the indicator? What is the cost of acquisition and validation of those data?
7.    Is the indicator nationally accepted or locally developed?

Other considerations include the following:

  • Providers generally prefer process measures, which assess whether the right clinical decision was made and the appropriate diagnostic test or treatment was used, rather than outcomes, which are more strongly influenced by patient factors beyond a provider’s control.12,18,38
  • Structural measures—such as the volume of procedures a provider performs or their capacity for computerized order entry—have been favored by some purchasers because they do not require collection of detailed clinical data and can be measured by survey. This approach largely avoids the issue of patient differences, but structural measures are often only weakly related to outcomes. In addition, this strategy runs counter to the idea that incentives should be established to encourage suppliers to find the most effective and efficient production systems on their own.
  • Some purchasers may wish to reward the reduction of disparities in the quality of care or access. Reductions in differences in quality would not be appropriate quality measures to use as the basis of rewards because differences could be reduced by decreasing the quality of the better served group. Instead, purchasers could provide incentives for improving care to the underserved group.

Decisions about measures require evaluation of sources of data. The main sources, in order of increasing expense of data collection, are:

  • Pre-existing administrative databases (generally created through the submission of claims and/or discharge abstracts) or data that have been collected for another purpose such as accreditation.
  • Provider surveys.
  • Patient surveys.
  • Medical record abstractions.

Each data source comes with its own set of strengths and limitations:

  • Administrative data are readily available and algorithms for using them to examine the quality of care are established, although providers may not believe those algorithms yield valid performance measures.39,40
  • Administrative data are a reasonably good source of process information, although this is less true in the hospital setting, in part because hospitals are typically paid a set fee per day or per discharge so that details about individual therapies that a patient received while admitted are not captured.
  • Administrative data yield fewer outcome measures than medical records and contain few of the variables perceived as necessary for risk adjustment of those outcomes.19, 41 It is noteworthy that an increasing number of health plans capture pharmacy claims and lab results in their electronic data systems, which strengthen a purchaser’s ability to judge quality of care through claims data.
  • In general, provider acceptance of the validity of the data is least for administrative data and greatest for medical record data.12
  • Chart abstraction, done correctly, can address many of the limitations of administrative data, but it is expensive. In the future, information technologies may be adopted that greatly reduce the cost of collecting the data generally sought through chart abstraction, but implementation of electronic medical records with such capabilities has been slow.
  • Provider surveys are the most feasible way of collecting information on structural measures (e.g., whether a hospital has computerized order entry) but are limited by the reliability of self-report and the fact that standardized methods for auditing them are not yet available.
  • Patient (or family) surveys are the source for information about patients’ experiences, and there are validated survey measures that could be readily used for almost any provider type. Patients are less reliable sources for technical information about their own diagnoses and care.42,43

Another major tension in measure selection is the choice between using nationally adopted indicators versus developing local measures. When feasible, it is clearly preferable to use measures endorsed by CMS, JCAHO, National Committee for Quality Assurance, the Hospital Quality Alliance, the Ambulatory Quality Alliance (AQA) or the National Quality Forum (NQF). However, the number and scope of measures available from these sources is limited. There are more indicators endorsed by NQF, but the work of developing measurement specifications is ongoing, so one cannot implement all NQF measures at the current time.iv (Table 3 presents specific examples of various types of quality measures currently used by purchasers.)


Table 3. Types of Quality Measures, with Purchaser Examples and Specific Measure Used

Type of measurePurchaser example and measure
StructureEmpire Blue Cross Blue Shield: Leapfrog Group measures including computerized physician order entry and staffing of intensive care units with intensivists.
ProcessIntegrated Healthcare Association, year 1: Hemoglobin (Hb)A1c testing, LDL cholesterol testing, childhood immunizations, cervical cancer screening, and mammography.
Health outcomePremera Blue Cross of Washington State:
HbA1c, LDL cholesterol, and blood pressure control, among other measures.
Patient experienceIntegrated Healthcare Association, year 1:
40 percent of P4P is based on the following patient satisfaction measures: 1) satisfaction with specialty care, 2) timely access to care, 3) doctor-patient communication, and 4) overall ratings of care.
Locally developed measuresHawaii Medical Services Association:
Locally developed measure of surgical complications.
Nationally developed measuresEmpire Blue Cross Blue Shield: Leapfrog Group measures.

Note: Some purchasers may use a mix of various types of measures.

On the other hand, local development of measures may be advantageous for two reasons:

  1. Developing measures that are relevant to a local population and delivery system may be an effective means for engaging providers.
  2. There may be important local public health priorities for which nationally vetted measures do not exist.

"We do not yet have many quality areas with generally accepted measures established. Pioneers in this work will have to validate their own measures much of the time."

—James Mortimer, former President, Midwest Business Group on Health


iv. Summaries of NQF reports on issues in developing quality measurement specifications and recent NQF-endorsed consensus standards may be found at

