Appendix A. Pilot Study for the Hospital Survey on Patient Safety Culture: A Summary of Reliability and Validity Findings
Appendix A. Pilot Study for the Hospital Survey on Patient Safety Culture: A Summary of Reliability and Validity Findings
Introduction and Background
Sponsored by the Medical Errors Workgroup of the Quality Interagency Coordination Task Force (QuIC) and funded by the Agency for Healthcare Research and Quality (AHRQ contract no. 290-96-0004), this summary describes the development of the Hospital Survey on Patient Safety Culture and presents the results of a psychometric analysis designed to determine the reliability and validity of the survey. The goal of this project was to develop a reliable, public-use safety culture instrument that hospitals could administer on their own to assess patient safety culture from the perspective of their employees and staff.
This summary presents survey pilot data gathered from 1,437 hospital staff in 21 United States hospitals. The goal of the psychometric analysis was a concise and refined survey instrument, based on an earlier draft instrument and revised through the identification of conceptually meaningful, independent, and reliable safety culture dimensions, with 3 to 5 survey items measuring each dimension. The psychometric analysis consisted of a number of analytic techniques, including:
- Item analysis.
- Content analysis.
- Exploratory and confirmatory factor analyses.
- Reliability analysis.
- Composite score construction.
- Correlational analysis.
- Analysis of variance.
The researchers conducted a number of preliminary activities to inform the development of the Hospital Survey on Patient Safety Culture. First, a review of the literature was conducted in areas related to safety management and accidents in the nuclear and manufacturing industries, employee health and safety, organizational climate and culture, safety climate and culture, and medical error and event reporting. The researchers also gathered examples of existing safety climate and culture instruments, including published and unpublished instruments and those available across the Internet.
Psychometric analyses also were conducted on 2 existing health care safety culture surveys: one developed and administered by Westat for the Medical Event Reporting System for Transfusion Medicine (MERS-TM) and another developed and administered by the Veterans Health Administration (VHA). The 100-item MERS-TM safety culture survey data set consisted of 945 staff from 53 hospital transfusion services across the United States and Canada. The 120-item VHA Patient Safety Questionnaire (FY 2000) data set consisted of 6,161 staff from 160 VHA hospitals nationwide. The data sets were analyzed independently, and the psychometric analyses were written as technical reports delivered to AHRQ (Burr, Sorra, Nieva & Famolaro, 2002; Sorra & Nieva, 2002). The results from these technical reports had a significant influence on the safety culture dimensions and types of items that were included in the pilot version of the Hospital Survey on Patient Safety Culture.
Key dimensions of hospital safety culture were identified for inclusion in the survey, based on the literature review, examination of existing published and unpublished safety culture instruments, and the psychometric analyses from the MERS-TM and VHA safety culture surveys. Items then were developed to measure those dimensions. The items were written with the goal of obtaining a staff-level perspective on patient safety in hospital settings. Respondents were asked to think about their own units because they would know the culture of their unit better than the hospital as a whole. The investigators, however, did include a short section at the end of the survey that focused specifically on hospital-wide safety issues.
Cognitive Testing and External Review of the Survey
Cognitive testing is a developmental procedure in which individuals similar to the targeted respondents are asked to complete a questionnaire and provide comments or "think aloud" while answering the questions. Frequently, the interviewer will ask respondents questions as they work through the questionnaire to better assess the respondents' comprehension and interpretation of the terms used and the items they are being asked to consider, as a means of determining how they arrive at their answers, and to identify problems with the items or instructions. Cognitive interviews were conducted by telephone with diverse hospital staff, including:
- A nurse manager.
- A risk manager.
- A department clerk.
- A dietician.
- A food services employee.
- A respiratory therapist.
- A pharmacist
- A pathologist
- Nurses, residents and physicians from different U.S. hospitals.
The investigators also solicited reviews of the draft instrument from other researchers familiar with safety culture measurement, along with input from a hospital system administrator, a group of physicians, and the Joint Commission on Accreditation of Healthcare Organizations (JCAHO). Changes were made to the survey dimensions and items following cognitive testing and the external survey review, resulting in a revised pilot survey comprised of 79 items measuring 14 dimensions of safety culture.
Draft Pilot Survey
The draft pilot survey contained items that, for the most part, used 5-point Likert response scales of agreement ("Strongly disagree" to "Strongly agree") or frequency ("Never" to "Always"). The items in the draft pilot survey included 2 single-item outcome measures used as validity checks and 14 multiple—item dimensions or scales of patient safety—2 overall patient safety outcome scales designed to assess validity and 12 safety culture dimensions.
The pilot survey administration sample included 21 hospitals across 6 U.S. states. The investigators collected their own data in 10 hospitals. Additional data from 1 Veterans Health Administration (VHA) hospital and 10 Georgia hospitals were forwarded to the researchers by the VHA and the Emory Center on Health Outcomes and Quality, in close cooperation with the Georgia Hospital Association. The sample of hospitals was selected to vary by geographic region, teaching status, and hospital size (Table 1), to ensure that the pilot survey administration contained a diverse sample. In addition, 2 facilities were for-profit hospitals, 1 facility was a veterans' hospital, and 1 was a geriatric hospital.
Table 1. Teaching Status and Bed Size of the 21 Pilot Hospitals
|Hospital Type||Number of Beds|
(< 300 beds)
(> 500 beds)
For the 10 hospitals in which the investigators collected data, packets were delivered containing a cover letter, the survey, a postage-paid envelope for returning completed surveys directly to the investigators, and a reply postcard. Contact persons at each hospital distributed the survey packets through the internal hospital mail system (with the exception of one hospital in which the surveys were mailed to employees' homes). The surveys were mailed to the homes of hospital employees included in the sample for the remaining 11 hospitals.
Data collection involved the following distribution steps to maximize response rates:
- First survey.
- First reminder postcard.
- Second survey.
- Second reminder postcard.
For 6 hospitals, a prenotification letter was sent on hospital letterhead, signed by the hospital president, COO, Chief Executive Officer (CEO), or equivalent.
Sample and Response Statistics
Criteria for sample selection varied somewhat from one hospital group to another. Six hospitals each selected a sample of about 100 staff, and purposive sampling was used (rather than random sampling) to ensure that an adequate variety of job classifications and hospital units would be represented. The selected hospital staff included those with direct patient contact, as well as those without patient contact. The researchers also recommended the inclusion of only those physicians who spend the majority of their work time in the hospital (e.g., emergency department physicians, radiologists, hospitalists, pathologists, etc.).
Only nurses and pharmacists were selected in 4 other hospitals, and these staff were randomly chosen. All staff were included in another hospital (a census). Staff in another group of 10 hospitals were selected from 4 specific departments:
- General medicine.
- General surgery.
- Intensive/critical care.
- Ancillary services.
A random sample of 100 staff from each unit was selected. For smaller hospitals in this group, all staff from these departments were selected (and may not have reached 100 staff per department).
A total of 4,983 surveys were administered across the 21 hospitals, with 1,437 responses received at the time the data set was compiled. This resulted in a 29-percent overall response rate. Response statistics are summarized below:
- Distribution through internal hospital mail systems (11 hospitals):
45% response rate (711 responses out of 1,575 surveys).
Note: One site in this group mailed the surveys to the employees' home addresses.
- Distribution to employees' homes through the U.S. Postal Service (10 hospitals):
21% response rate (726 responses out of 3,408 surveys).
- Average response rate within each hospital: 37%.
- Average number of respondents per hospital: 68.
In anticipation of confidentiality concerns and the privacy of each individual's responses, the survey included few demographic questions. Most respondents were female (81%) and most (84%) typically had direct interaction or contact with patients. The average age of the respondents was 43 years old. They had worked an average of 10 years in their hospital, and the average tenure in their specific hospital unit or work area was 7 years. The largest percentage of respondents worked in intensive care units (18%), followed by surgery (15%), other (14%), and medicine (nonsurgical) (12%).
Analyses and Results
Several analyses were conducted on the responses to the items in the Hospital Survey on Patient Safety Culture. The goal of the combined analytic efforts was a shorter, revised survey instrument, based on conceptually meaningful, independent, and reliable safety culture dimensions, with three to five items measuring each dimension. Individual item analysis first was conducted, in an effort to identify and eliminate those items that were highly skewed or had high amounts of missing data.
Exploratory and Confirmatory Factor Analyses
Since it is possible that safety culture could simply be a single, unidimensional concept, an exploratory factor analysis was conducted initially to explore the dimensionality of the survey data. Principal components extraction was used, along with varimax rotation, to maximize the independence of the factors. The exploratory factor analysis results confirmed the existence of multiple factors or dimensions and provided evidence that suggested many of the a priori item groupings did, in fact, fall into distinct factors. The analysis results revealed 14 factors with eigenvalues greater than or equal to 1.0. The total variance explained by the 14 components or factors is 64.5 percent, with almost all items loading highly on only one factor (with a factor loading greater than or equal to .40).
To further examine the dimensionality of the survey, and taking into consideration the a priori safety culture dimensions, a confirmatory factor analysis (CFA) then was performed. CFA is used when an a priori factor structure is posited, because CFA tests the fit of a model that proposes a specific number of factors and specifies the items that measure or load onto each of the factors. Since the Hospital Survey on Patient Safety Culture was developed by first identifying safety culture dimensions and then creating items to measure those dimensions, an a priori factor structure was posited and a CFA was conducted to determine how well the posited structure conforms to the data. An initial confirmatory factor model then was created based on the exploratory factor analysis and a content analysis of the safety culture dimensions and items. The CFA work was done using the SAS Institute's software for calculating covariance analysis of linear structural equations (CALIS), in conjunction with the maximum likelihood method of parameter estimation.
After analyzing several confirmatory factor models (and dropping items each time to eliminate problematic issues), the investigators arrived at a final confirmatory factor model with a good fit to the data. This was verified by a number of different model fit indices. The final confirmatory factor model features 12 dimensions—two outcome dimensions and 10 safety culture dimensions—with 3 or 4 items measuring each dimension, for a total of 42 items.
Overall model fit indices were examined closely. These model fit statistics:
- The comparative fit index (CFI).
- The goodness-of-fit index (GFI).
- The adjusted GFI (AGFI).
- The normalized fit index.
- The non-normalized fit index (NNFI).
Each met the criterion for good conformance with indices at .90 or above. The closer each of these indices is to 1.00, the better the fit of the model to the data. The root-mean-square error of approximation (RMSEA), a measure of the discrepancy per degree of freedom for the model or the degree of unexplained variance, was .04. An RMSEA of .05 or lower indicates a good model fit because the closer it is to zero, the better the fit of the model to the data.
Internal consistency reliabilities were examined for each of the 12 final safety culture dimensions identified in the confirmatory factor model. Since items were worded in both positive and negative directions, negatively worded items first were reverse coded so that a higher score would indicate a more positive response in all cases. Each of the 12 safety culture dimensions that make up the survey was found to have an acceptable reliability (defined as a Cronbach's alpha greater than or equal to .60), with reliability coefficients ranging from .63 to .84.
Validity Analysis: Composite Scores and Intercorrelations
Composite scores were created for the 12 safety culture dimensions by obtaining the mean of the responses to items in each dimension (after any necessary reverse coding). A composite score was calculated for each respondent, relative to each of the 12 safety culture dimensions. Since all the items used 5-point response scales, composite scores ranged from 1.0 to 5.0 (scored so that 1 = a low score and 5 = a high score). After calculating the composite scores, the safety culture dimensions then were correlated with one another.
The construct validity of each safety culture dimension would be reflected in composite scores moderately related to one another, indicated by correlations between .20 to .40. Correlations of less than .20 would indicate that 2 safety culture dimensions were related weakly. Exceptionally high correlations (.85 or above) would likely indicate that the dimensions measure essentially the same concept, and these dimensions possibly could be combined and some items eliminated. Correlations between the safety culture composites or scales ranged from .23 (between Nonpunitive Response to Error and Staffing or Frequency of Event Reporting) to .60 (between Hospital Management Support for Patient Safety and Overall Perceptions of Safety). These intercorrelations all fall within the expected moderate to high range. That none were exceptionally high indicates that no 2 safety culture dimensions appeared to measure the same construct.
Correlations were calculated for the 12 safety culture dimensions and the 4 outcome variables (2 of the safety culture dimensions are considered outcome variables—Overall Perceptions of Safety and Patient Safety Grade). The highest intercorrelation was .66 (p < .001), calculated for the outcome measures of Overall Perceptions of Safety and Patient Safety Grade. This high correlation provides evidence of the Overall Perceptions scale validity, in that has a strong relation to the respondents' single-item assessment of their unit's grade on patient safety (A = Excellent, B = Very Good, C = Acceptable, D = Poor, and E = Failing). The second highest intercorrelation was between Overall Perceptions of Safety and Hospital Management Support for Patient Safety (r = .60, p < .001). This finding points to the important role that hospital management plays in the advancement of patient safety issues. Staff gave their units higher patient safety marks when they felt that hospital management actively supported safety.
The highest correlation associated with the Frequency of Event Reporting dimension was with Feedback and Communication About Error (r = .48, p < .001). Surprisingly, Nonpunitive Response to Error had the lowest relationship with the Frequency of Event Reporting (r = .23, p < .001). Hospital staff indicated that events are reported more frequently when there is an open line of communication involving errors, and when they are given feedback regarding changes implemented as a result of event reports. These correlations suggest that increased event reporting is more likely to be achieved through the advancement of communication and feedback-than through the creation of a nonpunitive culture.
Finally, all but 2 of the correlations between the Number of Events Reported within the last year and the safety culture dimensions were nonsignificant and very low-almost zero in most cases. One explanation for the lack of relationships with this 1-item outcome variable is that more than half of all respondents reported no events in the last 12 months. Forty-five percent reported 10 or fewer events. The lack of variability and the highly skewed nature of the reported event numbers resulted in an absence of linear relationships with the other safety culture dimensions. For now, the best use for this 1-item measure of reported events is as a change indicator, to see if staff report more events over time.
Analysis of Variance: Differences Across Hospitals
One final analysis—a 1-way analysis of variance (ANOVA)—was conducted on each of the 12 safety culture dimensions, and on the 2 single-item outcome measures (Number of Events Reported and Patient Safety Grade), to determine the extent to which composite scores on these safety culture scales are differentiated across hospitals. An ANOVA by hospitals examines whether there is greater response variability on the safety culture dimensions between hospitals compared to within hospitals. In other words, it generally addresses the issue of whether hospitals differ on each of the safety culture dimensions. All ANOVAs on each of the 12 composites had statistical significance, supporting the hypothesis that hospitals have differentiated scores on each dimension—that different hospitals have different composite scores on the safety culture outcome variables and dimensions. Since hospitals have different actual levels of patient safety, some should score high and some should score low on the safety culture dimensions—which is what the results indicate and what good scales would reflect.
Westat was tasked with developing an employee survey to assess the culture of patient safety in hospital settings. The development of the survey was based on a literature review, examination of existing published and unpublished safety culture instruments, and psychometric analyses conducted on 2 existing safety culture surveys.
The draft survey was piloted in 21 hospitals, and the pilot data were analyzed to refine the instrument and determine its psychometric properties. In the process of refining the instrument, 26 of the originally piloted items were dropped. Based on the psychometric analyses, the final Hospital Survey on Patient Safety Culture includes 12 dimensions and 42 items, plus additional background questions. All of the psychometric analyses—from the CFA results and reliabilities to the intercorrelations among the dimensions and the analysis of variance results—provide solid evidence supporting the final dimensions and items that were retained.
All dimensions were shown to have acceptable levels of reliability (defined as Cronbach's alpha equal to or greater than .60). The safety culture dimensions included in the final survey are shown below (reliabilities are in parentheses):
- Two outcome dimensions (multiple item scales):
- Overall perceptions of safety (.74).
- Frequency of event reporting (.84).
- Ten safety culture dimensions (multiple item scales):
- Supervisor/manager expectations and actions promoting patient safety (.75).
- Organizational learning—Continuous improvement (.76).
- Teamwork within units (.83).
- Communication openness (.72).
- Feedback and communication about error (.78).
- Nonpunitive response to error (.79).
- Staffing (.63).
- Hospital management support for patient safety (.83).
- Teamwork across hospital units (.80).
- Hospital handoffs and transitions (.80).
Burr M, Sorra J, Nieva VF, et al. Analysis of the Veterans Administration (VA) National Center for Patient Safety (NCPS) FY 2000 Patient Safety Questionnaire. Technical report. Westat: Rockville, MD; 2002.
McKnight S, Lee C. Patient safety attitudes. Paper presented at the Summit on Effective Practices to Improve Patient Safety, Washington, DC; September 5-7, 2001.
Nieva VF, Sorra J. Safety culture assessment: A tool for improving patient safety in health care organizations. Qual Saf Healthcare 2003;12(Suppl 2):17-23.
Sorra JS, Nieva VF, Schreiber G, et al. MERS-TM Hospital Transfusion Service Safety Culture Survey. Unpublished survey developed by Westat under contract to Columbia University, supported by a grant from the National Heart, Lung, and Blood Institute (NHLBI # R01 HL53772-06); 2001.
Appendix B. Safety Culture Assessment: A Tool for Improving Patient Safety in Healthcare Organizations
This article can be downloaded as a
PDF Version [ - 154.23 KB]
. The HTML version of this article is also available online at: http://qhc.bmjjournals.com/cgi/content/full/12/suppl_2/ii17.
Reprinted with the permission of BMJ Publishing Group, London (UK) from: Quality and Safe Health Care 2003; 12(Suppl II):ii17-ii23