Understanding the Methods Used by the US Preventive Services Task Force in Developing Recommendations

Instructor Guide

Presentation describes the methods used by the US Preventive Services Task Force to develop recommendations.

Background. Clinicians are frequently confronted with conflicting screening and prevention recommendations from a variety of sources. One source, the US Preventive Services Task Force (USPSTF), is thought to set the standard for evidence-based recommendations, though the methods used by the USPSTF to arrive at recommendations are not well known. An understanding of their methodological approach is useful for clinicians and clinician educators faced with making choices about whose guidelines to follow.

Target setting and audience: The materials presented herein are devised for an instructor-led small-group setting. The materials are applicable for use at all levels of training: undergraduate medical education, graduate medical education, and continuing medical education. These materials are linked to a teaching module (PowerPoint® presentation) appropriate for a large-group lecture setting.

By the end of this session, students will have met the following educational objectives:

  1. Understand the major challenges in measuring and balancing benefits and harms of screening
  2. Understand why clinical recommendations for cancer screening may conflict
  3. Understand the methodology used by the US Preventive Services Task Force in developing recommendations related to breast cancer screening
  4. Understand the interpretation and application of the US Preventive Services Task Force breast cancer screening recommendations

Clinical scenario: You are the director of the Women's Clinic at the General Hospital and are revising your guidelines for breast cancer screening. You note that revised recommendations by the US Preventive Services Task Force (USPSTF) seem to conflict in important ways with your current guidelines (based on the American Cancer Society guidelines). You are unsure what to do. You decide to read the new recommendation and some supporting documents before making any revisions. Here they are:

Screening for breast cancer: an update for the U.S. Preventive Services Task Force. Nelson HD et al; U.S. Preventive Services Task Force. Ann Intern Med. 2009 Nov 17;151(10):727-37, W237-42. Available at: http://www.uspreventiveservicestaskforce.org/uspstf09/breastcancer/brcanup.htm  and http://www.annals.org/content/151/10/727.full.pdf+html

Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. US Preventive Services Task Force. Ann Intern Med. 2009 Nov 17;151(10):716-26, W-236. Available at: http://www.uspreventiveservicestaskforce.org/uspstf09/breastcancer/brcanrs.htm  and http://www.annals.org/content/151/10/716.full.pdf+html

The estimated prep time for this small group is 90 minutes. The small group is scheduled for 120 minutes.

Instructors: text in bold italics does not appear in the student version of this module.

1. Clinical guidelines strive to balance benefits and harms. What are potential benefits and harms specific to breast cancer screening (as outlined in the USPSTF recommendation)? How might you rank these in terms of importance to population health? (10 minutes)

Some of these were discussed in the large group session. Potential benefits include: reduced cancer morbidity and mortality. Some students might mention “peace of mind” when testing is normal.

Potential harms include: discomfort, cost, inconvenience and harms due to the test itself (e.g., radiation) and follow-up imaging; pain, cost, inconvenience, and morbidity due to invasive follow-up testing (e.g., biopsies) and ineffective treatments; inconvenience, cost, and anxiety due to prolonged surveillance among women with diagnoses of uncertain malignant potential (e.g., atypical hyperplasia); false-positive testing; and ‘overdiagnosis'.

The ranking of benefits and harms is challenging and likely varies by individuals' perspective and values. You may want to take a poll of students to see how they might rank certain benefits and harms. Pointing out disagreements among your students illustrates the basis from which controversies arise.

2. One cited harm of screening in the USPSTF recommendation is “overdiagnosis”. What does that mean specifically in the context of breast cancer screening? (10 minutes)

Some of these were discussed in the large group session. Potential benefits include: reduced cancer morbidity and mortality. Some students might mention “peace of mind” when testing is normal.

Potential harms include: discomfort, cost, inconvenience and harms due to the test itself (e.g., radiation) and follow-up imaging; pain, cost, inconvenience, and morbidity due to invasive follow-up testing (e.g., biopsies) and ineffective treatments; inconvenience, cost, and anxiety due to prolonged surveillance among women with diagnoses of uncertain malignant potential (e.g., atypical hyperplasia); false-positive testing; and ‘overdiagnosis'.

The ranking of benefits and harms is challenging and likely varies by individuals' perspective and values. You may want to take a poll of students to see how they might rank certain benefits and harms. Pointing out disagreements among your students illustrates the basis from which controversies arise.

3. To assist the USPSTF in making its recommendation, a systematic review and meta-analysis were commissioned. You have read an abbreviated version of a much longer (95-page) review (Nelson et al). To determine the validity of a systematic review and meta-analysis for mammography screening, the following elements should be evaluated:

ElementAddressed?
Funding source disclosedYes (AHRQ)
Study protocol describedYes
Search strategy describedYes (in the full report)
Included and excluded papers listedYes (in the full report)
Publication bias addressedYes
Study quality assessedYes
Tests of heterogeneity performedYes
Sensitivity analyses performedYes (3 in full report)
Results reported such that analyses may be replicatedYes (for women 39-49)

Look at the studies of mammography screening included in the meta-analysis (Appendix Table 1). What was the quality of the included studies? In what ways do the studies differ from each other? What are the challenges of combining studies in general and these studies in particular? How did the authors address limitations in combining results from studies that may have differed from each other in important ways? (10 minutes)

All studies included in the meta-analysis were judged as being of only “fair” quality. The studies were performed over a wide number of years (1963-1991) during which time screening technologies and standards for interpretations have changed and treatments have changed. Some studies randomized women to both mammography and clinical breast examination, so the independent effect of mammography is difficult to ascertain. Some studies did 1-view mammography and others did 2-view tests. One way the authors address the limitations in combining study results was by excluded some trials in a sensitivity analyses (e.g., the HIP study was excluded).

4. In the Nelson report, the following summary information from mammography screening trials is given for women aged 39-49 years:

Graph showing summary information from mammography screening trials for women aged 39-49 years. Reference information, relative risk for breast cancer mortality and total events is provided.

Are any of the individual studies statistically significant? Does each study show a trend to benefit (based on the relative risk)? Which study do you think is weighted the least in the meta-analysis? The most? Do you think there is substantial heterogeneity? If so, which study do you think might contribute most to heterogeneity? (10 minutes)

All studies included in the meta-analysis were judged as being of only “fair” quality. The studies were performed over a wide number of years (1963-1991) during which time screening technologies and standards for interpretations have changed and treatments have changed. Some studies randomized women to both mammography and clinical breast examination, so the independent effect of mammography is difficult to ascertain. Some studies did 1-view mammography and others did 2-view tests. One way the authors address the limitations in combining study results was by excluded some trials in a sensitivity analyses (e.g., the HIP study was excluded).

5. The Age/Moss trial is the largest (over 150,000 women randomized) and was designed specifically to answer the question about mammography efficacy among women aged 39-41 years. Enrollees were randomized to being screened at ages 39-41 and annually thereafter until age 48 (i.e., for about 10 years) versus no screening. What can you conclude about the efficacy of mammography in women aged 39-49 on the basis of the reported results of the Age/Moss trial? What can you conclude about the efficacy of mammography in women aged 39-49 on the basis of the overall results of the meta-analysis? (10 minutes)

The Age/Moss trial (from the UK) showed on average a 17% decrease in risk of breast cancer mortality with mammography screening, but this decrease was not statistically significant (95% CI includes 1.0). One might draw several conclusions from this trial alone: that mammography is ineffective for women aged 39-49; that mammography is ineffective for women beginning screening at ages 39-41; that screening starting at 39-41 is ineffective in a ten-year time frame; or, that it is indeed effective but that the study was underpowered to detect an important difference (although over 150,000 women were enrolled). The results of the meta-analysis, on the other hand, supports a clearer 15% decrease in breast cancer mortality for women who are first screened in the age range 40-49. Which results should one believe: a single large trial designed to answer a specific question or a meta-analysis of several trials? The answer to this question is debated, and there appears to be no ‘right" answer. Meta-analysis enthusiasts might say that having several smaller trials (with no substantial heterogeneity) that all show the same general effect is more robust since the results are drawn from various populations, etc. Some may use the Moss/Age trial as an argument against screening women aged 40-49. Of note, the USPSTF considered the 15% decrease in risk to be “real”.

6. The USPSTF commissioned a decision analytic model to assist in weighing benefits and harms. In the report accompanying the guideline, the following table was provided (http://www.uspreventiveservicestaskforce.org/uspstf09/breastcancer/brcanart.htm):

Table 4. Benefits and Harms Comparison of Different Starting and Stopping Ages Using the Exemplar Modela

StrategyAverage Screenings per 1000 WomenPotential Benefits (vs. No Screening)Potential Harms (vs. No Screening)b
Percentage of Mortality ReductionCancer Deaths Averted per 1000 WomenLife-Years Gained per 1000 WomenFalse-Positive Results per 1000 WomenUnnecessary Biopsies per 1000 Women
Comparison of different starting ages
Biennial screening
40-69 y13,86516c6.1120c1,25088
45-69 y11,77117c6.2116c1,05074
50-69 y8,944155.49978055
55-69 y6,941134.98059041
60-69 y4,24693.45234024
Annual screening
40-69 y27,58322c8.3164c2,250158
45-69 y22,62322c8.0152c1,800126
50-69 y17,75920c7.3132c1,35095
55-69 y13,00316c6.1102c95067
60-69 y8,40612c4.669c60042
Comparison of different stopping ages
Biennial
50-69 y8,944155.49978055
50-74 y11,109207.512194066
50-79 y12,347259.41301,02071
50-84 y13,836269.61381,13079
Annual
50-69 y17,75920c7.3132c1,35095
50-74 y21,35726c9.5156c1,5701106
50-79 y24,4393011.11701,740122
50-84 y26,9133312.21781,880132

a Results are from model S (Stanford University). Model S was chosen as an exemplar model to summarize the balance of benefits and harms associated with screening 1000 women under a particular screening strategy.
b Overdiagnosis is another significant harm associated with screening. However, given the uncertainty in the knowledge base about ductal carcinoma in situ and small invasive tumors, we felt that the absolute estimates are not reliable. In general, overdiagnosis increases with age across all age groups but increases more sharply for women who are screened in their 70s and 80s.
c Strategy is dominated by other strategies; the strategy that dominates may not be in this table.

How do various starting and stopping ages affect the balance of benefits and harms? How does biennial compared with annual screening affect the balance of benefits and harms? Compare biennial screening among women ages 40-69 years to biennial screening among women ages 50-69 years. What are the specific differences in benefits and harms? (20 minutes)

Try this in smaller groups of 3-4 students. Obviously, the more frequently one screens and the longer, the more benefit (in terms of life-years gained and lives saved), but the harms (in terms of false-positive testing) also increase. Look at the 3rd line. The estimated percentage mortality reduction is 15% for biennial screening of women aged 50-69; biennial screening from ages 40-69 (first line) is associated with a 16% reduction in mortality (1% extra reduction) (and 21 extra life-years saved), but the false-positive rates rise from 780 to 1250 per 1000 women (i.e., note that 1250/1000 women means at least one false-positive test will occur per woman screened). There is no ‘best' measure of benefit; 3 different measures are given: percentage of mortality reduction, cancer deaths averted per 1000 woman, life years gained per 1000 women. Two measures of harm are given, but there is no real ‘best' measure of harm.

The students should be aware that there is generally no way to balance benefits and harms that is satisfactory to everyone, especially if it is a “close call”. There is no one right answer. Students may be frustrated by this, but these are real challenges faced by those making recommendations. Remind students that this challenge is analogous to clinical decision making on an individual patient level. We often operate under conditions of uncertainty and reasoned judgment is required.

7. Review the USPSTF recommendation. What does the USPSTF conclude about mammography screening? What is their justification? (10 minutes)

For women 40-49: “The decision to start regular, biennial screening mammography before the age of 50 years should be an individual one and take into account patient context, including the patient's values regarding specific benefits and harms. (Grade C recommendation)” Further, the recommendation states: “the USPSTF reasoned that the additional benefit gained by starting screening at age 40 years rather than at age 50 years is small, and that moderate harms from screening remain at any age. This leads to the C recommendation. The USPSTF notes that a “C” grade is a recommendation against routine screening of women aged 40 to 49 years. The Task Force encourages individualized, informed decision making about when to start mammography screening.” The key word here is “routine”; the USPSTF felt as if women should not be automatically given a requisition form and sent to get a mammogram when the balance of benefits and harms was so tenuous and potentially value-laden.

For women 50-73: “The USPSTF recommends biennial screening mammography for women aged 50 to 74 years. (Grade B recommendation).”and “ For biennial screening mammography in women aged 50 to 74 years, there is moderate certainty that the net benefit is moderate.”

The justification is as follows: “The current USPSTF statement is also informed by… modeling studies…. The Task Force considered both “mortality” and “life-years gained” outcomes. In this case, given that the age groups (40 to 49 years and 50 to 59 years) are adjacent, the Task Force elected to emphasize the mortality outcomes from the modeling studies. Of the 8 screening strategies found most efficient, 6 start at age 50 years rather than age 40 years. The frontier curves for the mortality outcome show only small gains but larger numbers of mammograms required when screening is started at age 40 years versus age 50 years.”

Critics of the method used by the USPSTF to balance benefits and harms might say that there is no value placed on the specific benefits and harms to be able to make a reasonable judgment about their balance. Cost-effectiveness analyses (and cost-utility analyses) may be more useful but these can generally be opaque and complex and are not used by the USPSTF in part for these reasons.

8. A 40-year-old woman presents to your clinic for a periodic health examination. She is healthy and has no risk factors for any particular diseases. She does not smoke, is sexually active, and is not pregnant. Using the electronic Preventive Services Selector, you note that screening for the following diseases gets an A or B recommendation: cervical cancer, hypertension, alcohol misuse, and obesity. Routine mammography is not recommended. She has read about the mammography controversy and wants to know more about the benefits and harms. Look at the following table from the Nelson report. These data were derived from the Breast Cancer Screening Consortium and include results from a single screening round.

 

Table showing results of breast cancer screening by age group.

Table showing results of breast cancer screening by age group.

What might you tell this patient about the likelihood of screen-detected breast cancer? false-positive testing? Additional imaging? Does every case of screen-detected breast cancer translate into a breast cancer death avoided? (10 minutes)

The table suggests the following likelihoods per 1000 women screened, per screening round: screen-detected breast cancer, 1.8 per 1000 women screened (or 0.18%); false-positive testing, 97.8 per 1000 women screened (or 9.8%), additional imaging, 84.3 per 1000 women screened (or 8.4%). Not every detected case of breast cancer will translate into a breast cancer death avoided. Students should know that some indolent breast cancers that are detected by screening may never become clinically relevant. These cases constitute “overdiagnosis” due to screening. Conversely, some breast cancers detected by screening will be unresponsive to therapy and lead to death regardless of early detection and treatment.

9. The USPSTF recommendations also state that a clinician taking the time to teach breast self-examination (BSE) should be discouraged since the harms outweigh the benefits (a “D” recommendation). The Nelson report summarizes the results of trials evaluating the benefit of teaching BSE on breast cancer mortality. Consider the results of the “good-quality” Shanghai trial. From the data presented, how certain are you that teaching BSE has no effect on breast cancer mortality? (5 minutes)

In the Shanghai trial of over 268,000 women, the RR for breast cancer mortality was 1.03 (95% CI 0.81-1.31). You may want to point out that unlike the Age/Moss trial that suggests a 17% decreased risk (RR 0.83, 95% CI 0.66-1.04), there is no suggestion of a benefit in the Shanghai trial (the RR is practically 1.0). The Russian trial supports this finding, but note that the outcome was total mortality.

10. The USPSTF recommendations also state that there is insufficient evidence to recommend clinical breast examination (CBE) and an “I” statement was assigned. What does an “I” statement mean? What does the USPSTF suggest clinicians do when an “I” statement is assigned? (5 minutes)

The “I” statement indicates that the USPSTF concluded that “the current evidence is insufficient to assess the balance of benefits and harms of the service. Evidence is lacking, of poor quality, or conflicting, and the balance of benefits and harms cannot be determined.” and “If the service is offered, patients should understand the uncertainty about the balance of benefits and harms.” They also suggest reading the “clinical considerations” which address issues in 4 domains: potential preventable burden, potential harms, costs and current practice. For CBE, these are the following:

Potential Preventable Burden. The evidence for CBE, although indirect, suggests that CBE may detect a substantial proportion of cases of cancer if it is the only screening test available. In parts of the world where mammography is infeasible or unavailable (such as India), CBE is being investigated in this way.

Potential Harms. The potential harms of CBE are thought to be small but include false-positive test results, which lead to anxiety and breast cancer worry, as well as repeated visits and unwarranted imaging and biopsies.

Costs. The principal cost of CBE is the opportunity cost incurred by clinicians in the patient encounter.

Current Practice. Surveys suggest that the CBE technique used in the United States currently lacks a standard approach and reporting standards. Clinicians who are committed to spending the time on CBE would benefit their patients by considering the evidence in favor of a structured, standardized examination.

11. One important component of health care leadership is arriving at consensus opinions. Within your group, determine a strategy to arrive at a consensus opinion about an optimal breast cancer screening strategy for the Women's Clinic at the General Hospital. Consider the following questions: (20 minutes)

  • Should constrained resources (such as those in a public hospital) affect your decision?
  • Would you make a different decision if mammograms were entirely free?
  • Would you use formal cost analyses to assist in your decision?
  • What are potential barriers to incorporating the USPSTF recommendation for patients aged 40-49 years at the General Hospital?

There are no ‘right' answers. Students should be aware that decision analyses, outcomes tables, cost-effectiveness analyses and cost-utility analyses lend unique perspectives, and policymakers may look at all of these to make decisions. Even if mammograms were free, students should still want to make sure that benefits and harms are well balanced (i.e., it is not all about costs). The USPSTF, of note, does not consider cost analyses in their deliberations.

Potential barriers to incorporating the USPSTF recommendation for shared decision-making for patients aged 40-49 years are many including low health literacy, low numeracy, language and/or cultural barriers. There also may be a lack of time for a thorough discussion, and clinicians may lack skills and may not have access to resources such as decision support tools (e.g., decision aids).

Acknowledgement

These materials were developed under an agreement between the Health Research & Education Trust and George F. Sawaya, MD as part of the “Knowledge Transfer/Implementation- Outreach to Health Professionals” project, no. HHSA 290200900014C sponsored by the Agency for Healthcare Quality and Research.

Dr. Sawaya is Professor of Obstetrics, Gynecology and Reproductive Sciences and Epidemiology and Biostatistics at the University of California, San Francisco. He was a member of the US Preventive Services Task Force from 2004-2008.

The information herein was current as of June 2011 as per the USPSTF website http://www.uspreventiveservicestaskforce.org/about.htm.

Page last reviewed August 2010
Internet Citation: Understanding the Methods Used by the US Preventive Services Task Force in Developing Recommendations: Instructor Guide. August 2010. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/cpi/centers/ockt/kt/webinars/tfmethods/tfmethinstr.html