Facing Challenging Situations When Grading Strength of Evidence (Text
On September 20, 2011, Holger Schünemann and Nancy Berkman made this presentation at the 2011 Annual Conference.
Slide 1
Facing Challenging Situations When Grading Strength of Evidence
Presenters:
Holger Schünemann, MD, PhD, McMaster University
Nancy Berkman, PhD, RTI International
Slide 2
Process Overview
Image: A screen shot of a figure that presents a lifecycle of CER activities that span topic generation, evidence synthesis and implementation, dissemination and future research is shown.
Slide 3
Steps in AHRQ EPC Approach to Grading SOE
- Separately for RCT and observational study evidence, aggregated across studies, for each outcome.
- Score 4 required domains:
- Risk of bias.
- Consistency.
- Directness.
- Precision.
- Considering, possibly scoring, 4 additional domains:
- Dose-response association.
- Plausible confounding.
- Strength of association.
- Publication bias.
- Combine into a single SOE grade.
Slide 4
Risk of Bias Domain Score
- Concerns both study design and study conduct for individual studies.
- Assesses the aggregate quality or risk of bias of studies separately for RCTs and observational studies and integrates those assessments into an overall risk of bias score.
- Scores: high, medium, or low:
- High risk of bias lowers SOE grade.
- Low risk of bias raises SOE grade.
Slide 5
Consistency Domain Score
- Degree of similarity in the effect sizes of different studies within the evidence base.
- Consistent: same direction of effect (same side of "no effect") and narrow range of effect sizes.
- Inconsistent: non-overlapping confidence intervals, significant unexplained clinical or statistical heterogeneity, etc.
- Unknown or not applicable: single study so cannot be assessed
Slide 6
Directness Domain Score
- Whether evidence reflects a single, direct link between the intervention of interest and the ultimate health outcome under consideration.
- Direct: single direct link between the intervention and health outcome.
- Indirect: evidence relies on:
- Surrogate or proxy outcomes.
- More than one body of evidence (no head-to-head studies).
Slide 7
Precision Domain Score
- Degree of certainty for estimate of effect with respect to a specific outcome.
- Precise: estimate allows a clinically useful decision.
- Imprecise: confidence interval is so wide that it could include clinically distinct (even conflicting) conclusions.
Slide 8
Additional "Discretionary" Domains
- Dose-response association (pattern of larger effect with greater exposure): present, not present, NA.
- Plausible confounders (confounding that works in the direction opposite, "weakens" effect): present, absent.
- Strength of association (effect so large that cannot have occurred solely as a result of bias from confounders): strong, weak.
- Publication bias: (not formally scored).
- Unlike GRADE, applicability is considered separately.
Slide 9
Integrating Domain Scores Into a SOE grade
- EPCs can use different approaches to incorporating multiple domains into an overall strength of evidence grade:
- GRADE algorithm.
- EPC's own weighting system.
- A qualitative approach.
- Evaluation needs to be made by (at least) 2 reviewers.
- Must document approach used.
Slide 10
AHRQ and GRADE Grading Categories
AHRQ | GRADE |
---|---|
High High confidence that the evidence reflects the true effect. | High Very confident that the true effect lies close to that of the estimate of the effect |
Moderate Moderate confidence that the evidence reflects the true estimate of effect. | Moderate Moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect |
Low Low confidence that the evidence reflects the true effect | Low Limited confidence in the effect estimate: the true effect may be substantially different |
Insufficient Evidence either is unavailable or does not permit a conclusion | Very Low Very little confidence in the effect estimate: the true effect is likely to be substantially different |
Slide 11
Challenge 1: CER of Benefits, 1 Study, No Meta-analysis or CIs
- Topic: Antidepressant medication response in the elderly.
- Evidence description: 1 fair quality RCT (N = 108). Outcome evaluated through 2 validated scales that are clinician administered.
- Scale 1: Results reported in bar graph only: (p = 0.03).
- Scale 2: Results reported in bar graph only: (p = 0.04).
Number of Studies; Subjects | Domain Scores | SOE Grade | |||
---|---|---|---|---|---|
Risk of Bias | Consistency | Directness | Precision | ||
RCT: 1, 108 | Medium | Unknown | Direct | ?? | ?? |
Slide 12
Challenge 1: Precision Score
- AHRQ/GRADE approach: Precise.
- AHRQ approach: Imprecise.
- GRADE approach: Imprecision Serious (-1).
- GRADE approach: Imprecision Very Serious (-2).
Slide 13
Challenge 1: Strength of Evidence Grade
- AHRQ/GRADE approach: High.
- AHRQ/GRADE approach: Moderate.
- AHRQ/GRADE approach: Low.
- AHRQ approach: Insufficient.
- GRADE approach: Very low.
Slide 14
Challenge (1)—Response
- Rules for precision:
- Based on CI, number of events, effect size:
- Not perfect but good guides.
- Judgment simple and possible for this example.
- Given only 108 people, downgrade for imprecision unless effect is huge (which we need for this evaluation) and possibly by two levels.
- Based on CI, number of events, effect size:
Slide 15
Creating a New GRADEpro File
Image: GRADEprofiler software is shown.
Slide 16
Image: Screen shot of the GRADEprofiler software Defining a Health Care Question page is shown.
Slide 17
Optimal Information Size
- We suggest the following: if the total number of patients included in a systematic review is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial, consider rating down for imprecision. Authors have referred to this threshold as the "optimal information size" (OIS).
Slide 18
For Systematic Reviews
- If the 95% CI excludes a relative risk (RR) of 1.0 and the total number of events or patients exceeds the OIS criterion, precision is adequate. If the 95% CI includes appreciable benefit or harm (we suggest a RR of under 0.75 or over 1.25 as a rough guide) rating down for imprecision may be appropriate even if OIS criteria are met.
Slide 19
Figure 4: Optimal Information Size Given Alpha of 0.05 and Beta of 0.2 for Varying Control Event Rates and Relative Risks
Image: A line graph shows optimal information size given alpha of 0.05 and beta of 0.2 for varying control event rates and relative risks.
For any chosen line, evidence meets optimal information size criterion if sample size above the line.
Slide 20
Total Number of Events | Relative Risk Reduction | Implications for meeting OIS threshold |
---|---|---|
100 or less | ≤ 30% | Will almost never meet threshold whatever control event rate |
200 | 30% | Will meet threshold for control control group risks of ~ 25% or greater |
200 | 25% | Will meet threshold for control group risks of ~ 50% or greater |
200 | 20% | Will meet threshold only for control group risks of ~ 80% or greater |
300 | ≥ 30% | Will meet threshold |
300 | 25% | Will meet threshold for control group risks of ~ 25% or greater |
300 | 20% | Will meet threshold for control group risks of ~ 60% or greater |
400 or more | ≥ 25% | Will meet threshold for any control group risks |
400 or more | 20% | Will meet threshold for control group risks of ~ 40% or greater |
Slide 21
Challenge 2: CER of Harms
Mixed Outcomes & Mixed Results from RCTs and Obs Studies
- Topic: Risk of suicidality from antidepressants.
- Evidence description:
- RCT: 1 fair quality study:
- Suicidal ideation worse Drug B (p = 0.03).
- Case control: 1 fair quality study (N = 1300):
- Non-fatal suicidal behavior; Drug A (OR = 1.16); Drug B (OR = 1.29).
- Overlapping confidence intervals comparing each with Drug C.
- Nested case control: 1 good quality study (N = 10,000):
- Completed suicides in adjusted analyses (P = NS).
- RCT: 1 fair quality study:
Slide 22
Challenge 2: CER of Harms
Mixed Outcomes & Mixed Results From RCTs and Obs Studies
Number of Studies; Subjects | Domain Scores | SOE Grade | |||
---|---|---|---|---|---|
Risk of Bias | Consistency | Directness | Precision | ||
RCT:1; 90 | Medium | NA | ?? | Imprecise | ?? |
Case control: 2; 11,400 | Medium | Consistent | ?? | Imprecise |
Slide 23
Challenge 2: Directness Score RCTs
- AHRQ/GRADE approach: Direct.
- AHRQ approach: Indirect.
- Grade approach: Serious indirectness (-1).
- Grade approach: Very serious indirectness (-2).
Slide 24
Challenge 2: Directness Score Observational Studies
- AHRQ/GRADE approach: Direct.
- AHRQ approach: Indirect.
- Grade approach: Serious indirectness (-1).
- Grade approach: Very serious indirectness (-2).
Slide 25
Challenge 2: Strength of Evidence Grade
- AHRQ/GRADE approach: High.
- AHRQ/GRADE approach: Moderate.
- AHRQ/GRADE approach: Low.
- AHRQ approach: Insufficient.
- GRADE approach: Very low.
Slide 26
Challenge (2)—Response
- Indirect comparison:
- Downgrade:
- Observational study can provide more direct evidence.
- Need to go through full framework to find that out.
- Downgrade:
Slide 27
Challenge 3: CER of Benefits, RCTs Found No Difference Between Treatments
- Topic: Medication response.
- Evidence description: 5 fair quality RCTs, # of participants ranges from 90-200, each study: (p = NS).
- Meta-analysis pooled risk ratio: 1.03 (95% CI, 0.92-1.16).
Number of Studies; Subjects | Domain Scores | SOE Grade | |||
---|---|---|---|---|---|
Risk of Bias | Consistency | Directness | Precision | ||
RCT: 5, 690 | Medium | Consistent | Direct | Precise | ?? |
Slide 28
Challenge 3: Strength of Evidence Grade
- AHRQ/GRADE approach: High.
- AHRQ/GRADE approach: Moderate.
- AHRQ/GRADE approach: Low.
- AHRQ approach: Insufficient.
- GRADE approach: Very low.
Slide 29
Challenge 3: Are the treatments equivalent for this outcome?
- Yes.
- No.
- Don't know.
Slide 30
Challenge (3)—Response
- Superiority, inferiority and non-inferiority depend on more than one outcome.
- Need to specify threshold. If threshold met, not imprecise, if not met, imprecise.
Slide 31
Figure 1, Rating Down for Imprecision in Guidelines: Thresholds are Key
Image: A graph shows mortality estimate and confidence internal and thresholds if side effects.
Slide 32
Challenge 4: CER of Serious Harms, Mixed Findings in RCTs and Observational Studies
- Topic: Serious infection from rheumatoid arthritis treatments.
- Evidence description:
- RCTs: 4 fair quality studies. Number of participants ranges from 80 to 531. Number of serious infections presented for each treatment, very rare event. In each study (p = NS).
- Retrospective cohort study 1: fair quality (N = 5,326). Hospitalization with a definite bacterial infection: Higher for Treatment A. Adjusted HR =1.94 (95% CI, 1.32 to 2.83).
- Retrospective cohort study 2: good quality/low risk of bias (N = 2,369) Adjusted rate of serious bacterial infection: RR =1.0 (95% CI, 0.6 to 1.71).
Slide 33
Challenge 4: CER of Serious Harms, Mixed Findings in RCTs and Observational Studies
Number of Studies; Subjects | Domain Scores | SOE Grade | |||
---|---|---|---|---|---|
Risk of Bias | Consistency | Directness | Precision | ||
RCT: 4; 1,215 | Medium | Consistent | Direct | Precise | ?? |
Retrospective Cohort : 2; 7,695 | ?? | Inconsistent | Direct | Imprecise |
Slide 34
Challenge 4: Risk of Bias Score
- AHRQ/GRADE approach: Low risk of bias.
- AHRQ approach: Medium risk of bias.
- AHRQ approach: High risk of bias.
- GRADE approach: Serious risk of bias (-1).
- GRADE approach: Very serious risk of bias (-2).
Slide 35
Challenge 4: Strength of Evidence Grade
- AHRQ/GRADE approach: High.
- AHRQ/GRADE approach: Moderate.
- AHRQ/GRADE approach: Low.
- AHRQ approach: Insufficient.
- GRADE approach: Very low.
Slide 36
Challenge (4)—Response
- Sequential work:
- Use the evidence that is of higher quality:
- Mention observational evidence in footnote.
- Use the evidence that is of higher quality:
Slide 37
Challenge 5: Can You Use Less Stringent Criteria to Evaluate Risk of Bias If the Outcome Without Treatment is Likely to Result in Death?
- Topic: use of Hematopoietic stem cell transplantation (HSCT), also known as bone marrow transplantation.
- Low Risk of Bias modified to be: natural history (or severity) of disease made spontaneous remission highly unlikely or impossible.
- Evidence description:
- For single HSCT for Wolman's disease: The natural history of this disease death occurs by approximately 6 months of age. Of five cases reported in the evidence, three patients were alive at 4 to 11 years' followup, with normal function and attending school. The strength of the body of evidence is high.
Slide 38
Challenge 5: Do You Agree That It Would Be Appropriate to Use Less Stringent Criteria to Evaluate Risk of Bias Under These Circumstances?
- Yes.
- No.
- Don't know.
Slide 39
Challenge 5: Do You Agree That It Would Be Appropriate to Use Less Stringent Criteria to Evaluate Risk of Bias Under These Circumstances?
- One reviewer commented that, rather than modifying Risk of Bias criteria, "the SOE system does allow consideration of other factors through the 'optional domains' if applied correctly." These optional domains are:
- Dose-response association.
- Plausible confounding that would decrease observed effect.
- Strength of association (magnitude of effect).
- Publication bias.
- Do you agree?
Slide 40
Challenge (5)—Response
- Particular design features of extremely rigorous well-conducted observational studies may warrant consideration for rating up quality of evidence. For instance, a case-control study found that sigmoidoscopy was associated with a reduction in colon cancer mortality for lesions in range of the sigmoidoscope (OR 0.30, 95% CI 0.19 to 0.48), but not beyond the range of the sigmoidoscope (OR 0.96, 95% CI 0.61 to 1.50). Possible bias because of unmeasured confounders should have been very similar if not identical in the two situations, considerably raising confidence in the causal effect of the sigmoidoscopy.
Slide 41
Challenge (5)—Response
- Furthermore, when considering rating up the quality of evidence for magnitude of effect, factors relating to the magnitude are rapidity of treatment response, and the previous underlying trajectory of the condition. For example, we feel confident that hip replacement has a large effect not only because of the size of the treatment response, but because the natural history of hip osteoarthritis is a progressive deterioration that surgery rapidly and uniformly reverses. The rapidity of response compared to the known trajectory of the condition can also be considered (and calculated) as a large effect size.
- An additional factor mitigating the problem of rating up the quality because of a large effect is that indirect evidence usually provides further support for large treatment effects. For example, oral anticoagulation in mechanical heart valves has not been compared to placebo in an RCT, but evidence from observational studies suggests a large effect of oral anticoagulation in decreasing thromboembolic events. Supplementary indirect evidence from randomized trials that have demonstrated large reductions in the relative risk of thrombosis with anticoagulation in analogous conditions such as atrial fibrillation further increases our confidence in the beneficial effect of anticoagulation.
- Similarly, the effectiveness of antibiotic prophylaxis in a variety of other situations supports observational studies that suggest that antibiotic prophylaxis results in an 89% relative risk reduction in meningococcal disease in contacts of patients who have suffered the illness.
- Another situation allows an inference of a strong association without a formal comparative study. Consider the question of the impact of routine colonoscopy versus no screening for colon cancer on the rate of perforation associated with colonoscopy. Here, a large series of representative patients undergoing colonoscopy will provide high quality evidence on the risk of perforation associated with colonoscopy. When control rates are near 0 (i.e., we are certain that the incidence of spontaneous colon perforation in patients not undergoing colonoscopy is very low), case series of representative patients (one might call these cohort studies of affected patients if they include large numbers of patients) can provide high quality evidence of adverse effects associated with an intervention, thereby allowing us to infer a strong association from even a limited number of events. One should not confuse the situation highlighted in the previous example with isolated case reports of associations between exposures and rare adverse outcomes (as have, for instance, been reported with vaccine exposure).
Slide 42
Challenge 6: Challenges in Using GradePro.
- "I find it challenging to use GRADEpro to grade the body of evidence for non-RCTs and unpooled data.".
- Comments?
Slide 43
Challenge (6)
- Response:
- GRADEpro is updated for observational studies.
- Unpooled data: headcount as last resort, can still make qualitative judgments as long as transparent (e.g. inconsistency, imprecision).
Slide 44
Challenge 7
Current grading schemes are not amenable to healthcare quality improvement studies because:
- They may only distinguish between RCTs and "all other" types of studies.
- They may not distinguish quality of studies within RCTs and other types of study designs.
- They do not have a way to appropriately grade external validity, which is critically important in QI studies.
- Comments?
Slide 45
Challenge 7—Response
- They may only distinguish between RCTs and "all other" types of studies:
- GRADE makes explicit judgments necessary about the confidence in estimates of effects for any study design. Randomization is just one of the criteria early on in the process as it is the key method to protect against bias.
- They may not distinguish quality of studies within RCTs and other types of study designs.
- GRADE's explicit judgments do make this distinction.
- They do not have a way to appropriately grade external validity, which is critically important in QI studies.
- Judgments about directness do accomplish that (PICO) where P includes the setting.
Slide 46
More Information
Holger Schunemann
Chair and Professor
Department of Clinical Epi and Biostatistics
McMaster University
schuneh@mcmaster.ca
Nancy Berkman
Senior Health Policy Research Analyst
Program on Healthcare Quality and Outcomes
berkmanschuenemann@rti.org