Methods Matter: Methodological Considerations in Generating Provider Performance Scores for Use in Public Reporting

Slide presentation from the AHRQ 2010 conference.

On September 27, 2010, Mark Friedberg and Cheryl Damberg made this presentation at the 2010 Annual Conference. Select to access the PowerPoint® presentation (310 KB). Free PowerPoint® Viewer (Plugin Software Help).


Slide 1

Methods Matter: Methodological Considerations in Generating Provider Performance Scores for Use in Public Reporting  

Methods Matter: Methodological Considerations in Generating Provider Performance Scores for Use in Public Reporting

AHRQ Annual Meeting

Mark Friedberg, MD, MPP and Cheryl Damberg, PhD

September 27, 2010

Note: On upper left-hand corner is the logo for Rand Health.
 

Slide 2

 Background

Background

  • Many communities are developing public reports to compare the performance of health care providers:
    • Examples: CVEs, Aligning Forces Communities

A screen shot of Maine Health Management Coalition is shown.

Slide 3

 Road to Producing Performance Reports

Road to Producing Performance Reports

Image of a flowchart.

Description: Goals of CVE stakeholder A and stakeholder B leads to Negotiation. Negotiation leads to Decisions about Methods Issues. Available data and Focus of Rand "Methods" White Paper also points to Decisions about Methods Issues. Decisions about Methods Issues leads to Performance reports.

Slide 4

 Motivation for the Methods White Paper

Motivation for the Methods White Paper

  • Comparative ratings of providers in public score cards are influenced by earlier decisions about (Not always a clear distinction):
    • "Value judgments"
    • Methods
  • The RAND white paper identifies 23 methods decisions and options for addressing those decisions, which should to be considered when generating provider performance scores for public reporting.

Slide 5

 Format Used to Discuss Methods Decisions:  Example: How to Handle the Small Numbers Problem?

Format Used to Discuss Methods Decisions:
Example: How to Handle the Small Numbers Problem?

  • Why are small numbers important to consider?
    • Sample size affects reliability:
      • ...which affects the risk of misclassifying a provider's performance.
  • Identify alternative options for addressing and discuss the advantages/disadvantages of each option:
    1. Report at a higher level of aggregation.
    2. Combine data across years.
    3. Use composite measures.
  • Illustrate with a CVE example:
    • Massachusetts Health Quality Partners combined options A and C.

Slide 6

 Decision Points Encountered in Producing a Performance Report

Decision Points Encountered in Producing a Performance Report

Image of a flowchart with the following steps:

  1. Negotiating consensus on “value judgments” of performance reporting:
    • What are the purposes of reporting?
  2. Determining the measures that will be used to evaluate providers:
    • Which measures?
    • How will measures be specified?
  3. Data sources and aggregation:
    • What kinds of data?
    • How will sources be combined?
  4. Checking data quality and completeness:
    • How will missing data be detected and handled?
  5. Computing provider scores:
    • How will data be attributed to providers?
    • How will performance scores be calculated?
  6. Creating performance reports:
    • How will performance be reported?
    • Will composite measures be used?

Slide 7

 Today's Agenda: Review 4 Key Methods Issues in White Paper

Today's Agenda: Review 4 Key Methods Issues in White Paper

  1. Risk of misclassifying provider performance:
    • Most misclassification issues are fundamentally about "unknowable unknowns"-the information necessary to know "true" provider performance usually does not exist.
  2. Validity:
    • Systematic misclassification due to differences in patient characteristics (adjustment or stratification).
    • Analogy: Validity is influenced by whether you tend to bat against better pitchers than others.
  3. Reliability:
    • Misclassification due to chance or noise in estimate.
    • Analogy: Reliability is influenced by how many times you've been at bat and how widely batting averages vary between players.
  4. Composite measures:
    • Analogy: A "triple-double" summary measure in the NBA, where the player accumulates a double-digit number total in 3 of 5 statistical categories in a game (points, rebounds, assists, steals and blocked shots)

Slide 8

 Methods Issue #1: Misclassification of Performance

Methods Issue #1: Misclassification of Performance

  • All reports of provider performance classify providers by "categorizing" their performance.
  • Providers can be classified in various ways, such as:
    • Relative to each other.
    • Relative to a specified level of performance (e.g., above or below national average performance).
  • Provider rankings are a kind of classification system, since each rank is a class.
  • Reports that show confidence intervals also allow the end user of the information to classify the performance of a provider:
    • Typical comparison is whether the provider's performance is different from the mean performance of all providers.

Slide 9

Example of Classification Scheme

Example of Classification Scheme

An image of Los Angeles Teacher Ratings for a 3rd grade teacher named Jessica Techama Mcferren is shown.

Slide 10

Issue #1: Misclassification of Performance  

Issue #1: Misclassification of Performance

  • Misclassification refers to reporting a provider's performance in a way that does not reflect the provider's true performance:
    • Example: provider's performance may be reported as being in category "1" when true performance is in category "2".
  • Two types of error:*:
    • False negative: high quality provider is labeled as ordinary.
    • False positive: ordinary provider is labeled as high performing.

* Source: Adams JL, The Reliability of Provider Profiling: A Tutorial, Santa Monica, Calif.: RAND Corporation, TR-653-NCQA, 2009. As of June 8, 2010: http://www.rand.org/pubs/technical_reports/TR653/

Slide 11

 Issue #1: Misclassification of Performance

Issue #1: Misclassification of Performance

  • Misclassifying the performance of too many providers (and by too great an amount) may prevent the reports from having best possible impact:
    • Patients may choose low-performing providers, incorrectly believing that they are high-performing.
    • Providers may prioritize the wrong areas for improvement:
      • Devoting scarce resources to areas in which they are truly doing fine.
      • Neglecting areas in which they need to improve.

Slide 12

Issue #1: Misclassification of Performance

Issue #1: Misclassification of Performance

  • Problem: We can't observe a provider's "true" performance:
    • For the purpose of public reporting, future performance is what really matters.
  • Two major sources of misclassification:
    • Systematic performance misclassification
      • A validity problem: When the performance being reported is determined by something other than what the performance is supposed to reflect:
        • Example: Differences in mortality rates between hospitals being determined by differences in patient mix, rather than the delivery of "right care".
    • Performance misclassification due to chance:
      • Random measurement error.
      • It is not possible to know exactly which providers are misclassified due to chance, but we can calculate the "risk" or probability that each provider's performance is misclassified.

Slide 13

 Issue #1: Misclassification of Performance

Issue #1: Misclassification of Performance

  • Misclassification is related to:*
    • The reliability of a measure
      • Which depends on sample size (which can vary provider to provider).
      • Variation between providers (so population dependent).
    • Number of cutpoints in the classification scheme.
    • How close the performance score is to the cutpoint.

* Source: Safran, D. "Preparing Measures for High Stakes Use: Beyond Basic Psychometric Testing. Academy Health, June 27 2010 presentation.

Slide 14

 Issue #1: Misclassification of Performance

Issue #1: Misclassification of Performance

  • Examples of options for addressing misclassification discussed in white paper:
    1. Exclude providers from reporting for whom the risk of misclassification due to chance is too high.
    2. Exclude measures for which the risk of misclassification due to chance is too high for too many providers.
    3. Modify the classification system used in the performance report:
      • Report using fewer categories.
      • Change the thresholds for deciding categories.
      • Introduce a zone of uncertainty around performance cutpoints.
      • Report shrunken estimates.

Slide 15

 Methods Issue #2: Validity

Methods Issue #2: Validity

  • Validity—the extent to which the performance information means what it is supposed to mean, rather than meaning something else:
    • Ask yourself: Does the measurement measure what it claims to measure?
  • Consider whether these threats to validity exist:
    • Is the measure controllable by the provider?
    • Does patient behavior affect the measure?
    • Is the measure affected by differences in the patients being treated?
    • Is the measure controlled by other factors than the provider?

Slide 16

 Methods Issue #2: Validity

Methods Issue #2:
Validity

  • Lack of validity can lead to systematic misclassification of performance.
  • Potential threats to validity:
    • Statistical bias (i.e., omitted variable bias, such as differences in case mix).
    • Selection bias (e.g., patients for whom performance data are available are not representative of patients who will use the report).
    • Information bias (e.g., providers differ in the amount of missing data they have, such that lower performing providers have more missing data).

Slide 17

Methods Issue #2: Reliability*  

Methods Issue #3: Reliability*

  • A statistical concept that describes how well one can confidently distinguish the performance of one provider from another.
  • Measured as the ratio of the "signal" to the "noise":
    • The between-provider variation in performance is the "signal".
    • The within-provider measurement error is the "noise".
    • Measured on a 0.0 to 1.0 scale:
      • Zero = all variability is due to noise or measurement error.
      • 1.0 = all the variability is due to real differences in performance.

* Source: Adams JL, The Reliability of Provider Profiling: A Tutorial, Santa Monica, Calif.: RAND Corporation,TR-653-NCQA, 2009. As of June 8, 2010: http://www.rand.org/pubs/technical_reports/TR653/

Slide 18

 Between-Provider Performance Variation

Between-Provider Performance Variation

Lower between-provider variation (harder to tell who is best)

Higher between-provider variation (easier to tell who is best)

Slide 19

 Different Levels of Measurement Error (Uncertainty about the "true" average performance)

Different Levels of Measurement Error
(Uncertainty about the "true" average performance)\

Higher measurement error (harder to tell who is best)
Lower measurement error (easier to tell who is best)

Slide 20

 Methods Issue #3: Link between Reliability and Misclassification*

Methods Issue #3: Link between Reliability and Misclassification*

  • Reliability is a function of:
    • Provider-to-provider variation (which depends on the population).
    • Sample size:
      • Providers typically vary in their number of "measured events" (i.e., some providers have more information than others)
  • Higher reliability in a measure:
    • Means more signal, less noise.
    • Reduces likelihood that you will classify provider in "wrong" category.
  • Per Adams*: "Reliability Assumes validity".

* Source: Adams JL, The Reliability of Provider Profiling: A Tutorial, Santa Monica, Calif.: RAND Corporation,TR-653-NCQA, 2009. As of June 8, 2010: http://www.rand.org/pubs/technical_reports/TR653/

Slide 21

 Misclassification risk: Various Factors Contribute to the Risk

Misclassification risk: Various Factors Contribute to the Risk

Image of a flowchart described below.

Higher average error per observation and Lower number of observations lead to Higher within-provider measurement error. Higher within-provider measurement error and Lower between-provider variation in performance leads to Lower reliability. Lower reliability and Classification: More categories leads to Higher misclassification risk.

Slide 22

 Methods Issue #4: Composite Measures

Methods Issue #4: Composite Measures

  • Composite measures are "summary measures":
    • Combine data from 2 or more individual measures into a single measure:
      • Example: 4 separate preventive care measures may be combined into an overall preventive care composite.
    • Potential advantages are:
      • Fewer measures may be easier for patients to digest.
      • May increase reliability, thereby lowering risk of misclassification.
    • Key decision questions:
      • Will composites be used?
      • If used, which measures will be combined?
      • How will individual measures be combined (i.e., the construction of the composite)?

Slide 23

 Methods Issue #4: Composite Measures

Methods Issue #4: Composite Measures

  • There are different types of composite measures and methods to construct composites.
  • Options for combining measures include:
    1. "Reflective" or "latent" composites:
      • Let the data decide which measures to include (e.g., via factor analytic methods).
    2. "Formative" composites:
      • Based on judgment as to what to include.
    3. Nationally-endorsed composites.
  • Options for constructing:
    • "All-or-none" methods (success on every measured service).
    • Weighted average methods (need to define the weights).

Slide 24

 Other Topics Addressed in White Paper: Creating Reports of Provider Performance

Other Topics Addressed in White Paper
Creating Reports of Provider Performance

  1. Negotiating consensus on goals and "value judgments" of performance reporting:
    1. What are the purposes of publicly reporting provider performance?
    2. What will be the general format of performance reports?
    3. What will be the acceptable level of performance misclassification due to chance?
  2. Selecting the measures that will be used to evaluate provider performance:
    • A. Which measures will be included in a performance report?
    • B. How will the performance measures be specified?
    • C. What patient populations will be included?

Slide 25

Other Topics Addressed in White Paper: Creating Reports of Provider Performance  

Other Topics Addressed in White Paper Creating Reports of Provider Performance

  1. Identifying data sources and aggregating performance data:
    1. What kinds of data sources will be included?
    2. How will data sources be combined?
    3. How frequently will data be updated?
  2. Checking data quality and completeness:
    1. How will tests for missing data be performed?
    2. How will missing data be handled?
    3. How will accuracy of data interpretation be assessed?

Slide 26

 Other Topics Addressed in White Paper: Creating Reports of Provider Performance

Other Topics Addressed in White Paper Creating Reports of Provider Performance

  1. Computing provider-level performance scores:
    1. How will performance data be attributed to providers?
    2. What are the options for handling outlier observations?
    3. Will case mix adjustment be performed, and if yes, how?
    4. What strategies will be used to limit the risk of misclassification due to chance?
  2. Creating performance reports:
    1. How will performance be reported?
      1. Single points in time, trends.
      2. Numerical scores.
      3. Categorizing performance.
    2. Will composite measures be used? Is yes, how to combine measures and construct the composite.
    3. What final validity checks might improve the accuracy and acceptance of reports?
  3. Slide 27

     Concluding Thoughts

    Concluding Thoughts

    • Groups of stakeholders engaging in measurement for public reporting may choose different options at each decision point.
    • Our goal was to illustrate the advantages and disadvantages of various options at each decision point:
      • Consultation with a statistician may yield tailored advice.
    • The menu of options for each decision point is not exhaustive:
      • Going through the options may stimulate discussion and negotiation.

    Slide 28

    To obtain a copy of the White Paper . . .  

Current as of December 2010
Internet Citation: Methods Matter: Methodological Considerations in Generating Provider Performance Scores for Use in Public Reporting. December 2010. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/news/events/conference/2010/damberg/index.html