Considering Sensitivity and Positive Predictive Value in Comparing the Performance of Triggers Systems for Iatrogenic Adverse Events
Jonathan R. Nebeker, M.S., M.D.a
Gregory J. Stoddard, M.S.b
Amy Rosen, Ph.D.c
Trigger systems are typically evaluated for their accuracy in identifying iatrogenic adverse events by examining their positive predictive value (PPV). PPV is an important metric for the performance of a trigger system, since it provides the adverse-event yield of triggered alerts. Hence, it is a measure of efficiency. PPV is also relatively easy to estimate. It requires review of a small sample of patients relative to what would be required for other important performance characteristics. Many authors compare the performance of triggers solely on the basis of PPV. They make comparisons between PPV of triggers targeting different events or similar events in different settings. However, comparison of trigger accuracy based on PPV alone is highly problematic. This brief paper addresses three issues in measuring the performance of a trigger system: the limitations of PPV alone, the need for estimating sensitivity, and the difficulty in assessing sensitivity.
To facilitate discussion, some relevant test characteristics for a binary trigger to detect a binary event, or disease state, are shown in Table 1
Table 1. Classification table of trigger results by event status
Sensitivity = true positive fraction = TPF = P[Trigger = 1 | Event = 1] =
False Negative Fraction = FNF = 1 – Sensitivity = P[Trigger = 0 | Event
= 1] = b/(a+b)
Positive Predictive Value = PPV = P[Event = 1 | Trigger = 1 ] = a/(a+c)
Prevalence = P[Event = 1] = (a+b)/(a+b+c+d)
The Limitations of PPV
There are two limitations in using PPV. First, although PPV provides information on the likelihood of a positive trigger flagging a true event, it does not provide any information on how many events the trigger succeeds in flagging or fails to flag. Second, PPV is largely a function of event prevalence.B1 Low PPV may be due to poor trigger performance, low event prevalence, or a combination of the two. The correlation of PPV with prevalence may generate problematic comparisons among triggers or across different times and settings. Figure 1 illustrates how prevalence affects PPV. The figure shows curves for three possible values of sensitivity given high specificity, which is typical of trigger applications. As sensitivity and specificity remain fixed for a given line, PPV increases solely as a result of increasing prevalence. Also note the large change in PPV over just a small change in prevalence. The variability in PPV is highest at low prevalence; low prevalence is typical of many types of iatrogenic adverse events.B2
The Need for Estimating Sensitivity
There are three advantages to using sensitivity as a performance characteristic of trigger systems. First, sensitivity is independent of prevalence and thus provides a consistent measure of performance in different settings and times. This metric may be used to compare the accuracy of trigger systems.
Second, sensitivity provides clinically significant information about the fraction of targeted events hit (true positive fraction) and missed (1 - sensitivity, false negative fraction). For triggers intended to guide interventions related to individual patients, sensitivity is useful in describing any events that the trigger picks up.
Finally, sensitivity provides important information about the suitability of a trigger system for rate estimation. In a dichotomous system (e.g., events happen or do not), overall accuracy is the average of sensitivity and specificity. The more accurate a system is, the better it can estimate the true rate of an event.B3 Conversely, trigger systems without sensitivity or accuracy estimates cannot be relied on for rate estimation.
The Difficulty in Assessing Sensitivity
For estimates of sensitivity, a reasonably narrow confidence interval (CI) is desired, which is considered to be "informative." A very wide CI suggests uninformative estimates. Figure 2 shows how confidence levels vary with varying prevalence. It was derived by populating all cells of the 2 by 2 table (Table 1) using random sampling. Note that CIs for sensitivity are unacceptably wide at low prevalence—even for large sample sizes. For sample sizes of at least n = 500, somewhat informative CIs can be obtained if prevalence is as low as 2 percent. However, for a sample size of n = 250, prevalence needs to be 10 percent to achieve the same level of precision. Note that, at low prevalence, confidence intervals for PPV do not widen as dramatically as they do for sensitivity. Of course, much narrower CIs will result if the sample selection is restricted to only trigger-positive cases.
PPV is an important performance metric of trigger systems, but it alone cannot be used to compare performance of triggers unless the underlying prevalence of events is known. Sensitivity provides more clinically relevant information than PPV and can be used to estimate the accuracy of a trigger system. However, when using a random sample of subjects from the population, a large sample of patients must be reviewed to achieve a moderately precise estimate of sensitivity.
B1. Pepe MS. The statistical evaluation of medical tests for classification and prediction. New York: Oxford University Press; 2003.
B2. Hougland P, Xu W, Pickard S, et al. Performance of International Classification of Diseases, 9th Revision, Clinical Modification codes as an adverse drug event surveillance system. Med Care 2006 Jul;44(7):629-36.
B3. Nebeker JR, Yarnold PR, Soltysik RC, et al. Developing indicators of inpatient adverse drug events through nonlinear analysis using administrative data. Med Care 2007 Oct;45(10 Supp 2):S81-8.
a VA Salt Lake City Geriatrics Research, Education, and Clinical Center (GRECC); VA Salt Lake City Informatics, Decision Enhancement, and Surveillance (IDEAS) Center; Department of Medicine, University of Utah; and Intermountain Institute for Healthcare Delivery.
b VA Salt Lake City Informatics, Decision Enhancement, and Surveillance (IDEAS) Center; Department of Medicine, University of Utah.
c VA Center for Health Quality Outcomes and Economic Research (CHQOER) and School of Public Health, Boston University.
Note: The views in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.
Previous Section Contents Next Section