Appendix V. JAMIA Draft Manuscript

Screening for Surgical Site Infections by Applying Classification Trees to Electronic Data

Michael A. Rubin, MD, PhD1, 2, Makoto Jones, MD1, 2, Jefrey L. Huntington, MPH3, Lynn Guy, MBA4, M. Josh Durfee, MSPH5, James F. Lloyd, BS1, Christopher Nielson, MD,6, 7 Heather Gilmartin, MSN, RN, CIC4, R. Scott Evans, MS, PhD1, 2, Walter L. Biffl, MD5, Lucy A. Savitz, PhD, MBA3, and Connie Savor Price, MD5, 8

1 VA Salt Lake City Health Care System, Salt Lake City, UT; 2University of Utah, Salt Lake City, UT; 3Intermountain Healthcare, Salt Lake City, UT; 4Vail Valley Medical Center, Vail, CO; 5Denver Health and Hospital, Denver, CO; 6VA Reno Medical Center, Reno, NV; 7Veterans Health Administration Office of Patient Care Services; 8University of Colorado School of Medicine, Denver, CO

Word Count: 3,725
Key words: xxx

Corresponding Author:
Dr. Michael A. Rubin
George E. Whalen Dept. of Veterans Affairs Medical Center
500 Foothill Drive
Salt Lake City, UT 84148
Tel: (801) 582-1565
Fax: (801) 584-5556
E-mail: Michael.Rubin2@va.gov

Abstract

Objectives: Automated systems for surgical site infection (SSI) surveillance have been developed, but rarely tested for generalizability. We tested an approach similar to one employed at Intermountain Healthcare, where sensitive automated systems flag charts for subsequent human-adjudication.

Methods: We developed three electronic algorithms to detect deep and organ-space SSI after coronary artery bypass grafting, total hip and knee arthroplasties, and herniorrhaphies using a sample of nationwide National Surgical Quality Improvement Program (NSQIP) data from Veterans Affairs hospitals. The algorithms were then tested against NSQIP data, as well as data from hospitals from three other systems: Intermountain Health, Denver Health, and Vail Valley Medical Center.

Results: Algorithm performance varied because of differences in the data collected and stored in each system. Algorithms developed using recursive partitioning were over-fit despite 10-fold cross validation.

Conclusion: The development of generalizable algorithms necessitates careful consideration of the data readily available at most healthcare systems.

Introduction

Healthcare systems with electronic health records (EHRs) may improve the efficiency of their SSI surveillance activities (i.e., time spent to find a positive case) and improve case finding reliability by leveraging electronic data. Although many potential approaches exist, the system long employed by Intermountain Healthcare (IH) uses electronic algorithms to screen potential cases and populate more manageable queues of charts for an Infection Preventionist (IP) to subsequently review [1]. This approach can capitalize on the IP's superior specificity (i.e., ability to discern the presence of a true SSI) and may significantly reduce their work burden. When IH initially implemented this scheme, few facilities had the data infrastructure or capacity to replicate their system, but as more facilities employ sophisticated EHRs, more may now be able to implement similar human-adjudicated electronic surveillance strategies. The purpose of our work was to develop an SSI surveillance tool that detects downstream manifestations of SSI as indicated in electronic data, and to implement and test this tool at four disparate healthcare organizations.

Background and Significance

The purpose of traditional manual surveillance is at least two-fold: (1) to improve situational awareness, and (2) to accurately detect trends and differences across times or locations. For the former, it is most useful to have a high sensitivity; for the latter, it is most useful to have a high specificity. IPs have been favored over automated systems for this task because of their adaptability and clinical judgment about the presence or absence of SSI. Continuing to solely use IPs in this role may appear ideal, but because of increasing time demands, they often cannot devote adequate time to all of their responsibilities [2-4]. Also, the fact that they can and do adapt potentially leads to issues concerning intra- and inter-rater comparability and reliability.

On the other hand, completely automated electronic systems can review charts rapidly and without concern for adaptation. There is some evidence to suggest that, in some situations, automated systems may be the instrument of choice [5]. However, these systems can be extremely sensitive to artifacts of data manipulation or changes in clinical practice. Also, automated algorithms are usually limited to using structured data and cannot utilize the same body of information as manual review, such as the information contained within text notes. As a result, the specificity of these systems is typically inferior to manual review.

We decided to employ a hybrid, human-adjudicated approach. The rationale for the combination of the two may be illustrated by invoking signal detection theory. Reviewers (in this case, IPs) distinguish between the presence or absence of disease by assessing the data elements in the patient chart. In signal detection theory, these data are called signal. The reviewer has two important characteristics: the discriminability index and criterion. The discriminability index is a measure of how well the reviewer perceives the differences in signal between the diseased and non-diseased states, while the criterion is the threshold at which the reviewer interprets signal as disease. If the criterion is lowered, then sensitivity improves and specificity declines; if the criterion is raised, then the reverse is true.

The only way to improve sensitivity and specificity simultaneously is to improve discriminability. However, a human reviewer's discriminability index is unlikely to change rapidly, while criterion might. An automated system's discriminability index is typically lower than a human reviewer's, but it can review a large number of cases far more rapidly. Its criterion usually does not change unless the data have changed. With this framework, we chose to build a two-tiered system: the first tier is run by the automated system, which removes charts where the signal is weak enough to safely exclude; the second tier involves human review on the more difficult cases, where their superior discriminability index can be efficiently applied. We hypothesized that this system would lead to comparable results between healthcare systems and considerable time savings during surveillance activities.

Methods

Study sites and cohort. Out study involved four participating centers: the VA Salt Lake City Health Care System (VA SLCHCS), Denver Health (DH), Vail Valley Medical Center (VVMC), and Intermountain Healthcare (IH). The population of interest was all patients who underwent coronary artery bypass grafting (CABG), total hip arthroplasty (THA), total knee arthroplasty (TKA), and abdominal and inguinal herniorraphy. To develop, train, and test our electronic algorithms, we used National Surgical Quality Improvement Program (NSQIP) data from VA hospitals (also referred to as VASQIP data) on the outcomes of patients undergoing these procedures from January 1, 2007 through December 31, 2009. We supplemented these data with VA enterprise-wide microbiology, laboratory, admission/discharge/transfer, bar code medication administration, and vitals data from one week prior to 30 days after the surgical procedure. Similar external test data sets were developed for each participating center.

Each of the centers had different pre-existing strategies for SSI surveillance. DH and VVMC generally follow National Healthcare Safety Network (NHSN) guidelines and perform traditional manual surveillance. While centers were opportunistic when recording post-discharge, prosthetic-related infections up to a year after surgery, they did not systematically follow up on patients beyond 30 days post-operatively. IH previously pioneered electronically supported, human-adjudicated surveillance systems and uses this modality routinely [6]. The VA uses NSQIP for surveillance, with rules similar to (but not entirely the same as) NHSN. Each of the facilities pulled the results of routine surveillance based on their own methodologies into databases residing on their own systems. Each of these data sets served as reference standards representing the status quo. As such, accuracy and reliability statistics between the centers were not directly comparable, but represent the performance of our algorithms compared with the various systems already in place. Table 1 shows the procedure and SSI data gathered from the four centers.

Electronic data. We performed a literature review using Medline to identify data elements that were likely to inform a diagnosis of SSI. We selected articles that pertained to the manifestations of SSI (as opposed to risks for SSI) that were potentially included in electronic records. We allowed “snowballing” of related articles during review, but excluded articles that employed primary data collection. The identified data elements were: leukocyte count, leukocyte differential, fever, procalcitonin, erythrocyte sedimentation rate (ESR), c-reactive protein (CRP), microbiology results, and antimicrobial administration [7-20]. A significant number of articles also incorporated claims data into algorithms [9, 10, 12-16, 21]. Unfortunately, claims data are often not available at the time of IP case review, so we excluded it from the algorithm. Not all of the remaining elements were included in the final algorithm. For instance, although we initially planned to include fever in the algorithm, DH did not record these data through the whole study period, so it was excluded. Procalcitonin testing was also not commonly available.

Algorithm training and testing data. We began by identifying candidate surgeries among VASQIP data from 2007 through 2009. As VASQIP surgeries are identified by CPT code and not by ICD-9, we built a map between the two for the four target surgeries, using the Unified Medical Language System (UMLS) metathesaurus concepts to bridge between the two vocabularies. We then reviewed the children of these concepts and identified codes which described the types of procedures which were included in the ICD-9 list.

After the necessary CPT codes were identified, they were used to identify candidate surgeries among all VA hospitals. VASQIP surveillance is the principle method of SSI accounting at the VA; as such, surveillance is not performed on all surgeries, but rather on a subset. During our study time frame there were 71,102 of our target procedures performed that were reviewed in the VASQIP system. This set was divided randomly into two equal sets, one for training and one for testing. The characteristics of the training data set are shown in Table 2.

The VASQIP data included whether a superficial, deep, or organ-space SSI was identified within 30 days of the surgical procedure. For simplicity, we condensed this information into a dichotomous variable indicating the presence or absence of any SSI type. These data were then linked to potential manifestations of disease. We included electronic markers between post-operative days 4 and 30 because earlier data might indicate that the patient was already infected at the time of operation. We then investigated the relationship of leukocyte count, temperature, the sending of a culture, the administration of a systemic antibiotic (inpatient or outpatient), hospital readmission (to ICU or acute care wards), ESR, and CRP to the presence of SSI. Maximum values during the eligible time frame were used for laboratory values and vitals.

To improve the specificity of the electronic algorithms, we mapped microbiology samples and specimens to a single type. Each type was categorized as to whether it could be consistent with each of the surgeries of interest. For example, a wound swab was considered to be compatible with SSI from any of the candidate surgeries, while a urine specimen was considered to be incompatible. For the algorithm, we used whether a culture was sent from a site consistent with SSI.

We anticipated that some data elements would be more informative than others and that some of them would also be collinear with each other. We calculated pairwise Pearson's correlation coefficients between all of our data elements and the presence of SSI and found that there were very few strong correlations.

Algorithm development. Various strategies have been used for algorithm development. We targeted algorithms with high sensitivity and high negative predictive value that could increase the efficiency of chart review by excluding a large fraction of negative charts. To accomplish the latter while not impeding the former, we investigated methods that would allow interactions between variables. Classification tree and regression tree analysis (CART; also called recursive partitioning) lends itself to the formulation of interacting rules and has been used previously in algorithms to detect SSI [13]. This method is somewhat limited in that it does not analyze interactions along the entire range of variables. Another issue is that much of the time, laboratory elements are missing post-operatively. Random forest strategies may have advantages when dealing with sets where much of the data are missing, but we felt that for user-acceptability it was important to have simple understandable rules.

We used the function rpart for recursive partitioning in R to develop our algorithms. We used the function to initially train to detect all types of SSI, but because of the lack of sensitivity and inefficiencies when searching for superficial SSI (sSSI), it was subsequently trained to target only deep (dSSI) and organ space (oSSI) infections. We specified a classification tree and a loss matrix to penalize false negatives. The loss matrix was weighted by the inverse of the prevalence of dSSI and oSSI in the set. The maximum tree-depth was limited to three, and the minimum number of cases in a branch before a split was permitted was three. Any tree that resulted in a change of the complexity parameter (cp) of more than 0.001 was investigated. Effort was taken to prune the tree at the cp that minimized the relative cross-validation error, but when the difference was small and the algorithm was not sensitive enough, values with more splits but slightly higher relative cross-validation errors were accepted.

In addition to the rpart algorithm, we also created an “inclusive” algorithm using the presence of any high-normal laboratory value, as well as a "simple" algorithm that looked only for post-operative cultures and antimicrobials. The specific rules for all three algorithms are shown in Table 3.

The VASQIP data were randomly divided into two equal size sets for validation because a second set of data was not prospectively collected. Data from VA SLCHCS were excluded because they would later be used in the analysis of our four principal centers. Sensitivity, specificity, and positive predictive value (PPV) and negative predictive value (NPV) were calculated by comparing the output of the algorithms against the testing set.

Each of the hospitals were then sent the data elements necessary for the final algorithm. Actual code scripts were also sent to facilitate algorithm implementation; however, tailoring and adjustments were made to accommodate different data structures at each facility, as others have before [22].

Results

Algorithm Training and Testing Performance

The algorithms' performance on the VASQIP training set can be seen in Tables 4 and 5. The sensitivity was as low as 92.9% for herniorrhaphies and TKA. To improve sensitivity, we attempted to penalize false-negatives beyond the inverse prevalence, but the rpart function would not return acceptable algorithms. The “inclusive” and “simple” algorithms were implemented as well, and while there were some gains in sensitivity, particularly with respect to sSSI (which we were no longer targeting) this came at the expense of needing to review approximately one-third of the charts.

After the recursive partitioning algorithm was developed, its overall sensitivity (for deep and organ space SSI) was 93.8% (95%CI 88.5-97.1) and its specificity was 93.0% (95%CI 92.7-93.3) for all four surgical procedures compared to the VASQIP training data set. Its positive and negative predictive values were 5.2% (95%CI 4.3-6.1) and 99.99% (95%CI 99.9-100), respectively. Thus, we anticipated that, when an IP reviewed procedures identified by the algorithms, this person would, on average, review 18.9 charts to find each SSI using the recursive partitioning algorithm, 79.3 charts using the “inclusive” algorithm, and 246.9 if all charts were reviewed.

When algorithms were applied to the VASQIP test data set, the overall sensitivity and specificity of the rpart algorithm were 73.1% and 92.9%, respectively, with a PPV of 3.9% and a NPV of 99.9%. Unfortunately, the statistics for sensitivity were well below those seen on the training set. However, the performance of the inclusive and simple algorithms remained stable as seen when comparing the results in Tables 5 and 6.

External Validation Results

We applied our electronic algorithm to all surgical procedures that met our pre-specified criteria at each principal hospital. The results are shown in Table 7. Overall, the sensitivity was 40%, the specificity was 93.8%, the PPV was 1.7%, and the NPV was 99.8%.

During implementation, poor performance was noted at VVMC. At most other facilities, the absence of an antibiotic prescription after surgery meant either no antibiotic was given, or that the prescription data were missing from the record. In the case of VVMC, no electronic antibiotic prescription data were available, so all were missing. In their case, we coded ‘-1's in the post-operative antibiotic field and altered the algorithm to allow ‘-1's to cause the algorithm to err on the side of calling cases positive for flagging. No cases were picked up with the changes.

All reports of sensitivity, especially when divided by procedure as in Table 8, must be interpreted with caution because of wide confidence intervals. For example, at VVMC the algorithm found 0 of 3 SSI. These data would be observed with a sensitivity of up to 60% over 5% of the time.

To investigate reasons for false alarms by the algorithm at the various sites, we reviewed all positives identified by the algorithm, as well as positives identified by routine surveillance. At DH, the study reviewer agreed with all of the cases identified as positive by routine surveillance. Four surgeries were noted to have incorrect ICD-9 codes, indicating that they should not have been included. The study reviewer also identified one superficial SSI and one deep SSI queued by the algorithm, but not in routine surveillance records. At VA SLCHCS, four additional deep and organ-space SSI were identified in addition to those identified by routine surveillance. At VVMC, all algorithm-identified cases were false positives. At IH, the study reviewer agreed with all positive cases identified by routine surveillance and none identified by algorithm, except for 2 cases where there appeared to have been errors with identifiers.

False negatives were reviewed at each center to determine the reasons for low sensitivity. At DH, two of the false negatives represented problems with the data pull. One SSI was assigned to the wrong hip replacement in the historical data set. The hip replacement with infection was not in the data set. Another procedure identified as having an SSI was actually a hysterectomy. Three surgeries were missed because the SSI occurred greater than 30 days post-operatively. One SSI was missed because laboratories were only available from the outpatient setting. One SSI could only have been picked up from emergency department notes. Only two SSI could have been picked up by electronic data, but were missed due to the algorithm's threshold criteria.

At VA SLCHCS, only two SSI were missed. Both occurred in total hip arthroplasties with onset of infection greater than 30 days post-operatively. At VVMC, one surgery was treated in the outpatient setting and another was treated at an outside facility. The last infection developed 11 months after surgery and was thus not picked up because it was greater than 30 days post-operatively.

At IH, 11 of 16 false negatives occurred because the algorithm missed important information in the notes and microbiology. All data necessary to make an SSI diagnosis occurred after discharge from the initial surgery. In 2 cases, the reviewer felt that the cases were ambiguous, in another 2 the reviewer disagreed that the cases were SSI. In 1 case, the reviewer felt that the case was a sSSI rather than dSSI or oSSI.

Discussion

Our objective was to generate algorithms that would feature high sensitivity and a low number of charts needed to review per SSI found; however we found that our recursive partitioning algorithm had a low sensitivity in the testing set and even poorer performance when tested in outside hospitals. Our simpler algorithms were more robust, which suggests that the recursive partitioning algorithm was over-fit to both sample data and to the VA itself. Performance was quite variable between facilities. Future algorithms could be improved by extending the surveillance period and by incorporating more information.

When comparing SSI rates between facilities, algorithm diagnostic accuracy and reliability must be carefully considered. Usually, routine prospective surveillance or some augmentation of it is used for a reference standard. Routine manual, prospective surveillance is estimated to have a sensitivity between 30% and in excess of 90%, with most estimates in the 70-80% range [7, 16, 23-25]. Additionally, the reliability of manual healthcare associated and surgical site surveillance has been reported to be less than ideal [16, 26-29]. Any comparisons to such standards must take this into account.

Electronic algorithms are frequently reported to have sensitivities in excess of 80% [19, 20]. Only some of them have been applied to multiple hospitals [7, 13, 16], and none of them report individual hospital validation results among hospitals as heterogeneous as the principal hospitals of our study. Although our recursive partitioning algorithm had high sensitivity on the VASQIP training set, it was only 73.1% on the VASQIP test set. The pooled sensitivity at the four principal hospitals was 40%. These results contrast with the high performance seen in other published literature. Specificities and predictive values were relatively stable between our training and testing sets.

The differences in sensitivities that we see in the recursive partitioning algorithm suggest two levels of over-fitting: first, over-fitting to the training data set; and second, over-fitting to the VA system. One study in the literature used the same method to develop algorithms and reported high sensitivities [13]; however, these algorithms were not applied to external data. The two “common-sense” algorithms we developed (e.g., the inclusive and simple algorithms) demonstrated high sensitivity in both VASQIP training and testing sets. Since they were not derived from the training set, they were not over-fit to it. We expected the sensitivity of these algorithms to be high because of success in previously devised algorithms, and because we surmised that it was unlikely that patients with either deep or organ-space SSI would be absent of both antibiotic therapy and any culture testing for etiologic microorganisms. However, when these algorithms were tested against other hospitals, sensitivity and PPV varied. At VA SLCHCS, no improvement in sensitivity over the recursive partitioning algorithm was observed, perhaps due to small numbers. At IH, a very large number of false positives were generated; this appears largely due to the very frequent use of antimicrobials during the post-operative period at this center. At DH, the simple algorithm fared poorly, while the inclusive algorithm fared better, perhaps because a large number of outpatient antimicrobials may not be captured by their system. This underscores our concern that even more robust “common-sense” algorithms that include elements successful at other institutions [7, 13, 16] still did not generalize well because of institutional differences in data collection and clinical practice.

No one algorithm appeared to be appropriate for implementation at all hospitals. The inclusive rule may be sufficiently sensitive for most hospitals, but in some cases may not be much more efficient than reviewing all surgeries, as was the case at IH. The simple rule was not sensitive enough in hospitals where all prescriptions (inpatient and outpatient) are unlikely to be captured.

The strengths of our study include drawing from VASQIP data to amass a reasonable number of SSI for training. Also, the use of one-fold cross validation on the VASQIP data set on an algorithm that was already derived with a 10-fold cross validation and external implementation at other hospitals presents a more realistic picture of algorithm accuracy and its variability. Limitations include using routine manual surveillance from each facility as the reference standard, and having small numbers of SSI in our four centers.

In the future, improving the sensitivity, while keeping the number of charts needed to review low can only be accomplished by improving the algorithm's ability to distinguish between SSI and other abnormal conditions. This could be accomplished by using procedures more robust to sparse data for algorithm development, incorporating dynamic thresholds for laboratory values and vitals, and enriching the input data by using natural language processing to extract information from text notes. Any electronic algorithm used to compare SSI rates at different centers should undergo extensive testing before operational use.

Acknowledgements

This work was supported using resources and facilities at VA SLCHCS, IH, DH, and VVMC with funding support from the Agency for Healthcare Research and Quality (HHSA290200600020i-8), the VA Informatics and Computing Infrastructure (VINCI; VA HSR HIR 08-204) and the Consortium for Healthcare Informatics Research (CHIR; VA HSR HIR 08-374). We also gratefully acknowledge the VA Surgical Quality Improvement Program (VASQIP) and VA Patient Care Services (PCS) for providing data and support for this project.

References

1. Evans RS, Larsen RA, Burke JP, et al. Computer surveillance of hospital-acquired infections and antibiotic use. JAMA. 1986;256(8):1007-11.
2. Goldrick B, Larson E. Assessment of infection control programs in Maryland skilled-nursing long-term care facilities. Am J Infect Control. 1994;22(2):83-9.
3. Goldrick BA. The Certification Board of Infection Control and Epidemiology white paper: the value of certification for infection control professionals. Am J Infect Control. 2007;35(3):150-6.
4. Stevenson KB, Murphy CL, Samore MH, et al. Assessing the status of infection control programs in small rural hospitals in the western United States. Am J Infect Control. 2004;32(5):255-61.
5. Rubin MA, Mayer J, Greene T, et al. An agent-based model for evaluating surveillance methods for catheter-related bloodstream infection. AMIA Annu Symp Proc. 2008:631-5.
6. Evans RS, Abouzelof RH, Taylor CW, et al. Computer surveillance of hospital-acquired infections: a 25 year update. AMIA Annu Symp Proc. 2009 Nov 14;2009:178-82.
7. Bolon MK, Hooper D, Stevenson KB, et al. Improved surveillance for surgical site infections after orthopedic implantation procedures: extending applications for automated data. Clin Infect Dis. 2009;48(9):1223-9.
8. Hirschhorn LR, Currier JS, Platt R. Electronic surveillance of antibiotic exposure and coded discharge diagnoses as indicators of postoperative infection and other quality assurance measures. Infect Control Hosp Epidemiol. 1993;14(1):21-8.
9. Huang SS, Livingston JM, Rawson NS, et al. Developing algorithms for healthcare insurers to systematically monitor surgical site infection rates. BMC Med Res Methodol. 2007;7:20.
10. Miner AL, Sands KE, Yokoe DS, et al. Enhanced identification of postoperative infections among outpatients. Emerg Infect Dis. 2004;10(11):1931-7.
11. Petherick ES, Dalton JE, Moore PJ, et al. Methods for identifying surgical wound infection after discharge from hospital: a systematic review. BMC Infect Dis. 2006;6:170.
12. Platt R, Yokoe DS, Sands KE. Automated methods for surveillance of surgical site infections. Emerg Infect Dis. 2001;7(2):212-6.
13. Sands K, Vineyard G, Livingston J, et al. Efficient identification of postdischarge surgical site infections: use of automated pharmacy dispensing information, administrative data, and medical record information. J Infect Dis. 1999;179(2):434-41.
14. Spolaore P, Pellizzer G, Fedeli U, et al. Linkage of microbiology reports and hospital discharge diagnoses for surveillance of surgical site infections. J Hosp Infect. 2005;60(4):317-20.
15. Stevenson KB, Khan Y, Dickman J, et al. Administrative coding data, compared with CDC/NHSN criteria, are poor indicators of health care-associated infections. Am J Infect Control. 2008;36(3):155-64.
16. Yokoe DS, Noskin GA, Cunningham SM, et al. Enhanced identification of postoperative infections among inpatients. Emerg Infect Dis. 2004;10(11):1924-30.
17. Yokoe DS, Platt R. Surveillance for surgical site infections: the uses of antibiotic exposure. Infect Control Hosp Epidemiol. 1994;15(11):717-23.
18. Yokoe DS, Shapiro M, Simchen E, et al. Use of antibiotic exposure to detect postoperative infections. Infect Control Hosp Epidemiol. 1998;19(5):317-22.
19. Leal J, Laupland KB. Validity of electronic surveillance systems: a systematic review. J Hosp Infect. 2008;69(3):220-9.
20. Leth RA, Moller JK. Surveillance of hospital-acquired infections based on electronic hospital registries. J Hosp Infect. 2006;62(1):71-9.
21. Sands KE, Yokoe DS, Hooper DC, et al. Detection of postoperative surgical-site infections: comparison of health plan-based surveillance with hospital-based programs. Infect Control Hosp Epidemiol. 2003;24(10):741-3.
22. Borlawsky T, Hota B, Lin MY, et al. Development of a reference information model and knowledgebase for electronic bloodstream infection detection. AMIA Annu Symp Proc. 2008:56-60.
23. Cardo DM, Falk PS, Mayhall CG. Validation of surgical wound surveillance. Infect Control Hosp Epidemiol. 1993;14(4):211-5.
24. Mannien J, van der Zeeuw AE, Wille JC, et al. Validation of surgical site infection surveillance in the Netherlands. Infect Control Hosp Epidemiol. 2007;28(1):36-41.
25. Rosenthal R, Weber WP, Marti WR, et al. Surveillance of surgical site infections by surgeons: biased underreporting or useful epidemiological data? J Hosp Infect. 2010;75(3):178-82.
26. Allami MK, Jamil W, Fourie B, et al. Superficial incisional infection in arthroplasty of the lower limb. Interobserver reliability of the current diagnostic criteria. J Bone Joint Surg Br. 2005;87(9):1267-71.
27. Gastmeier P, Kampf G, Hauer T, et al. Experience with two validation methods in a prevalence survey on nosocomial infections. Infect Control Hosp Epidemiol. 1998;19(9):668-73.
28. Mayer J, Howell, J., Green, T., Rubin, M., Ray, W.R., Nordberg, B., Hayden, C.L., Nechodom, P., Samore, M.H. Assessing inter-rater reliability (IRR) of surveillance decisions by infection preventionists (IPs). Fifth Decennial International Conference on Healthcare-Associated Infections. Atlanta, Georgia; 2010.
29. Trick WE, Zagorski BM, Tokars JI, et al. Computer algorithms to detect bloodstream infections. Emerg Infect Dis. 2004;10(9):1612-20.

Page last reviewed December 2012
Internet Citation: Appendix V. JAMIA Draft Manuscript. December 2012. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/research/findings/final-reports/ssi/ssiapv.html