More Practical Metrics for Standardizing Health Outcomes in Effectiveness Research (Text Version)

Slide Presentation from the AHRQ 2009 Annual Conference



On September 15, 2009, John E. Ware, Jr. made this presentation at the 2009 Annual Conference. Select to access the PowerPoint® presentation (7 MB) (Plugin Software Help).


Slide 1


 

More Practical Metrics for Standardizing Health Outcomes in Effectiveness Research

John E. Ware, Jr., PhD, Professor and Chief
Division of Measurement Sciences, Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA
Track A - Patient Reported Outcome Measurement and Comparative Effectiveness  Research to Reform:  Achieving Health System Change
AHRQ 2009 Annual Conference, Bethesda MD  September 13-16, 2009

Slide 2


 

What is the Relationship Between Health Care Expenditures & Outcomes?

Image: Line graph shows health outcome rising with expenditures for health care ($).

 

Slide 3


 

Health Insurance Experiment Revealed:
More Health Care is Not Always Better

Image: Line graph shows health outcome rising with expenditures for health care ($), then leveling off. The leveling off is described as "Flat of the Curve."

 

Slide 4


 

When the Same Outcome Costs More

Payers & Consumers: Want to Pay Less

Image: Line graph shows health outcome rising with expenditures for health care ($), then leveling off. On the section of the line that has leveled off are two bell curves and the captions, "Payers and Consumers: Want to Pay Less."

Slide 5


 

Who is Most Vulnerable with Aggressive Cost Containment?

  • Health Insurance Experiment (HIE) (1974-1981)
    Well, Well off, Young

Cost Containment

  • Most Vulnerable in the MOS:
    • Chronically Ill
    • Elderly
    • Poor
    • Non-white
  • Medical Outcomes Study (MOS) (1986-1990)

Expenditures for Health Care ($)

 

Slide 6


 

4-Year Physical Health Outcomes Favored FFS > HMO for Chronically-Ill Medicare in the MOS

Images: Two pie charts display the following statistics:

Fee for Service

  • 63% Same
  • 28% Worse
  • 9% Better

HMO

  • 54% Worse (These percentages, better & worse would be only about 5% due to measurement error)
  • 37% Same
  • 9% Better

Source: Ware, Bayliss, Rogers et al., JAMA 1996; 276:1039-1047

 

Slide 7


 

When Outcomes Vary at the Same Price

Image: Line graph shows health outcome rising with expenditures for health care ($), then leveling off. At the point where the line levels off is a bell curve perdendical to the level line and the caption, "Payers & Consumers Want the Best Outcomes."

 

Slide 8


 

To Compare Health Care Effectiveness
We Need Health Outcomes "Rulers"

Image: Line graph shows health outcome rising with expenditures for health care ($), then leveling off. Health Outcome is divided into three sections:

  • Better
  • Same
  • Worse

 

Slide 9


 

Continuum of Disease-specific and Generic Health Measures

Clinical MarkersSpecific SymptomsImpact of Disease-specific ProblemsGeneric Functioning, Well-being and Evaluation
(1)(2)(3)(4)

Adapted from: Wilson and Cleary, JAMA 1995; Ware, Annual Rev. Pub. Health 1995

 

Slide 10


 

Continuum of Disease-specific and Generic Health Measures

SpirometryShortness or Breath  
Image: a woman is shown using a spirometer; the parts of the machine are labeled

Over the last 4 weeks I have had shortness of breath

  • Almost every day
  • Several days a week
  • A few days a month
  • Not at all
  
Clinical MarkersSpecific SymptomsImpact of Disease-specific ProblemsGeneric Funcitoning, Well-being and Evaluation
(1)(2)(3)(4)

Adapted from: Wilson and Cleary, JAMA 1995; Ware, Annual Rev. Pub. Health 1995

 

Slide 11


 

Continuum of Disease-specific and Generic Health Measures

SpirometryShortness or BreathRespiratory -specific 
Image: a woman is shown using a spirometer; the parts of the machine are labeled

Over the last 4 weeks I have had shortness of breath

  • Almost every day
  • Several days a week
  • A few days a month
  • Not at all

How much did your lung/ respiratory problems limit your usual activities or enjoyment of everyday life?

  • Not at all
    A little
  • Moderately
  • Extremely
 
 
Clinical MarkersSpecific SymptomsImpact of Disease-specific ProblemsGeneric Funcitoning, Well-being and Evaluation
(1)(2)(3)(4)

Adapted from: Wilson and Cleary, JAMA 1995; Ware, Annual Rev. Pub. Health 1995

Slide 12


 

Continuum of Disease-specific and Generic Health Measures

SpirometryShortness or BreathRespiratory -specific 
Image: a woman is shown using a spirometer; the parts of the machine are labeled

Over the last 4 weeks I have had shortness of breath

  • Almost every day
  • Several days a week
  • A few days a month
  • Not at all

How much did your lung/ respiratory problems limit your usual activities or enjoyment of everyday life?

  • Not at all
    A little
  • Moderately
  • Extremely

In general, would you say your health is...

  • Excellent
  • Very good
  • Good
  • Fair
  • Poor
Clinical MarkersSpecific SymptomsImpact of Disease-specific ProblemsGeneric Funcitoning, Well-being and Evaluation
(1)(2)(3)(4)

Adapted from: Wilson and Cleary, JAMA 1995; Ware, Annual Rev. Pub. Health 1995

Slide 13


 

There is More to the Continuum

Image: the table below is contained in the shape of an arrow pointing to the right.

Clinical MarkersSpecific SymptomsImpact of Disease-specific ProblemsGeneric Funcitoning, Well-being and Evaluation
(1)(2)(3)(4)

 

Slide 14


 

Prediction and Risk Management: PROs are among the Best Predictors

Image: the table below is contained in the shape of an arrow pointing to the following text:

Future health
Inpatient expenditures
Outpatient expenditures
Job loss
Response to treatment
Return to work
Work productivity
Mortality

Impact of Disease-specific ProblemsGeneric Funcitoning, Well-being and Evaluation
(3)(4)

Below the arrow is the following text: "Health-Related QOL (HR-QOL)."

Slide 15


 

What Do We Need for Comparative Effectiveness Research?

  • Outcomes that matter to patients
  • Practical measures
  • Coverage of a wide range
  • Greater precision
  • Comparability of scores
  • Ease of interpretation

 

Slide 16


 

Content of Widely-Used Patient-Reported Outcome Measures

ConceptsPsychometricUtility Related
SIPHIENHPCOOPDUKEMOS FWBPMOS SF-36PROMISQWBEURO-QOLHUISF-6D
Physical functioningxxxxxxxxxxxx
Social functioningxxxxxxxxxx x
Role functioningxxxxxxxxxx x
Psychological distressxxxxxxxx xxx
Health perceptions (general)xxxxxxxx    
Pain (bodily)xxxxxxxx xx 
Energy/fatiguexxxxxxxxx  x

Psychological well-being
 x  xx      
Sleep  x xx      
Cognitive functioning    xx    x 
Quality of life x x x      
Reported health transition   x x      

SIP = Sickness Impact Profiles (1976)
HIE = Health Insurance Experiment surveys (1979)
NHP = Nottingham Health Profile (1980)
QLI = Quality of Life Index (1981)
COOP = Dartmouth Function Charts (1987)
MOS FWBP = MOS Functioning and Well-Being Profile (1992)
MOS SF-36 = MOS 36-Item Short-Form Health Survey (1992)
PROMIS = Patient Reported Outcomes Measurement Information System
QWB = Quality of Well-Being Scale (1973)
EUROQOL = European Quality of Life Index (1990)
HUI = Health Utility Index (1996)
SF-6D = SF-36 Utility Index (Brazier, 2002)

Source: Adapted from Ware, 1995

 

Slide 17


 

What Do We Need for Comparative Effectiveness Research?

  • Outcomes that matter to patients
  • Practical measures
  • Coverage of a wide range
  • Greater precision
  • Comparability of scores
  • Ease of interpretation

 

Slide 18


 

What Do We Need for Comparative Effectiveness Research?

  • Outcomes that matter to patients
  • Practical measures
  • Coverage of a wide range
  • Greater precision
  • Comparability of scores
  • Ease of interpretation

 

Slide 19


 

A Practical Solution in 1999: Computerized Dynamic Health Assessment

Image: Graph showing that IRT/CAT will spawn a new generation of static tools.

Ware JE, Jr, et al. Med Care 2000;38:1173-82.

 

Slide 20


 

What Do We Need for Comparative Effectiveness Research?

  • Outcomes that matter to patients
  • Practical measures
  • Coverage of a wide range
  • Greater precision
  • Comparability of scores
  • Ease of interpretation

 

Slide 21


 

What Do We Need for Comparative Effectiveness Research?

  • Outcomes that matter to patients
  • Practical measures
  • Coverage of a wide range
  • Greater precision
  • Comparability of scores
  • Ease of interpretation

Slide 22


 

Practical Solution in 2000:
Cross-Calibration of Headache Pain Disability Measures

Theta (θ) [Best Possible Estimate]

Scales203040506070
HDI ↑1643739198100
HIMQ ↓7453311782
MIDAS ↓58285100
MSQ ↑315379929699
DYNHA-5 (+)233241515866

Note: Direction of scoring shown with arrows

Source: Ware, Bjorner & Kosinski, Medical Care 2000
 

Slide 23


 

We Need the Health Equivalent of a Two-Sided Tape Measure

Image: A tape measure with centimeters on one side and inches on the other. A note reads, "52 centimeters = 20.5 inches."

And Public-Private Partnerships That Meet the Needs of Research and Business

 

Slide 24


 

What Do We Need for Comparative Effectiveness Research?

  • Outcomes that matter to patients
  • Practical measures
  • Coverage of a wide range
  • Greater precision
  • Comparability of scores
  • Ease of interpretation

Slide 25


 

PRO Validation Must be Comprehensive

Image: Five boxes contain the following text:

Causes

  • Diagnosis
  • Disease severity
  • Responders
  • Treatments
Gold Standard

Consequences

  • Work productivity
  • Costs of care
  • Mortality
  • Self-evaluated health
Measures In Question
Other Measures & Methods

Arrows point from "Causes" to "Measures In Question" to "Consequences."

Adapted from: Ware JE, Jr. and Keller SD: Interpreting general health measures, in: Quality of Life and Pharmacoeonomics in Clinical Trials. Philadelphia, PA: Lippincott-Raven Publishers; 1995: Chapter 47.

 

Slide 26


 

What Do Differences in Treatment Effectiveness Mean?

Treatment

  • 50% reduction in disease burden
  • 33% reduction in hospitalization
  • Substantional increase in work productivity
  • Subsequent cost savings

 

Slide 27


 

Matching Methods to Applications:
"Choosing the Right Horse for the Course"

  • Population monitoring
  • Group-Level outcomes monitoring
  • Patient-level measurement/management

 

Slide 28


 

Matching Methods to Applications

  • Graph of matching methods to applications.
  • Population monitoring
    Single Item
    • Most Functionally Impaired: Noisy Individual Classification
  • Group-Level Outcomes Monitoring
    Multi-Item Scale
  • Patient-Level Management
    "Item Pool" (CAT Dynamic)
    • Most Functionally Impaired: Very Accurate Individual Classification

 

Slide 29


 

Solutions

  • Improved psychometrics (Item response theory—IRT)
  • Computerized adaptive testing (CAT) software
  • The Internet (and other connectivity)

Business Week. November 26, 2001.

 

Slide 30


 

First, Construct Better Metrics

  • Comprehensive Item "Pools"
  • IRT Cross Calibration of Items

1980 "PF Ruler" >75% @ Ceiling
1990 "PF Ruler" >30% @ Ceiling
2008 "PF Ruler" <3 % @ Ceiling
 

Note: Physical Functioning (PF)

 

Slide 31


 

Precision Varies Across “Static” and Dynamic Forms and Across Score Levels

Image: Graph of Static and Dynamic Forms, across score levels.

 

Slide 32


 

2nd Solution, Assess Health Dynamically

CAT
Patient scores here
CAT = Computerized Adaptive Testing

 

Slide 33


 

What are the Advantages of Dynamic Assessments?

  • More accurate risk screening
  • Reliable enough to monitor individual outcomes
  • Brevity of a short form—
    90% reduction in respondent burden
  • Elimination of "ceiling" & "floor" effects
  • Can be administered using various data collection technologies
  • Markedly reduced data collection costs
  • Monitor data quality in real time

 

Slide 34


 

Performance of 5-item CAT Scores Confirmed in NIH-Sponsored Studies

Images: A series of 6 graphs studies, Mental Health, Headache Disability, Pedatric Disability, Chronic Kidney Disease, Diabetes Impact, Post Acute Rehabilitation.

 

Slide 35


 

3rd Solution: The Internet

  • www.amIhealthy.com
  • www.asthmacontroltest.com

Reference—Headache Impact: MS Bayliss, JE Dewey, R Cady et al., A Study of the Feasibility of Internet Administration of a computerized health survey: The Headache Impact Test (HIT), Quality of Life Research 2003, 12:953-961

References—Asthma Control: Nathan RA, Sorkness CA, Kosinski M et al., "Development of the Asthma Control Test: A survey for assessing asthma control. Journal of Allergy and Clinical Immunology 2004;113:59-65.

 

Slide 36


 

Conclusions

  • Patient-reported outcomes (PROs) are very useful
  • Standardization of concepts & metrics is enabling comparisons across treatments & settings
  • Increasing widespread use proves that more practical tools will be adopted
  • Promising technological advances include: item response theory (IRT), computerized adaptive testing (CAT) and Internet-based data capture
Current as of December 2009
Internet Citation: More Practical Metrics for Standardizing Health Outcomes in Effectiveness Research (Text Version): Slide Presentation from the AHRQ 2009 Annual Conference. December 2009. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/news/events/conference/2009/ware/index.html