Operational Measurement of Diagnostic Safety: State of the Science

Learning From Known Incidents and Reports

No single data source will capture the full range of diagnostic safety concerns. Valuable information can be gleaned from even limited data sources so long as those who use the data remain mindful of its limitations for a given purpose.⁴⁰ For instance, many routinely recorded discrete events lend themselves to retrospective analysis of diagnostic safety. Most healthcare organizations have incident reporting systems, although reporting has included few diagnostic events.⁴¹

There is also an opportunity to leverage peer review programs to improve diagnostic self-assessment, feedback, and improvement.⁴² Similarly, autopsy reports,⁴³ diagnostic discrepancies at admission versus discharge,^44,45 escalations of care,^46,47 and malpractice claims^48-51 may be reviewed with special attention to opportunities to improve diagnosis. These data sources may not shed light on the frequency or scope of a problem, but they can help raise awareness of the impact and harm of diagnostic errors and, in some cases, specific opportunities for improvement.

Voluntary reports solicited specifically from clinicians who make diagnoses are another potentially useful source of data on diagnostic safety. For example, reports from clinicians who have witnessed diagnostic error have the advantage of rich detail that, at least in some cases, may offer insight into ways to prevent or mitigate future errors. However, no standardized mechanisms exist to report diagnostic errors. Despite widespread efforts to enable providers to report errors,^17,52 clinicians find reporting tools onerous and are often unaware of errors they make.⁵³

It has also become clear that a local champion and quality improvement team support are needed to sustain reporting behavior. At present, few facilities or organizations allocate “protected time” that is essential for clinicians to report, analyze, and learn from safety events. Some of these challenges could be overcome by having frontline clinicians report events briefly and allowing organizational safety teams (which include other clinicians) to analyze them. Still, voluntary reporting alone cannot address the multitude of complex diagnostic safety concerns, and reporting can only be one aspect of a comprehensive measurement strategy.⁵⁴

Patients are often underexplored sources of information, and many organizations already conduct patient surveys, interviews, and reviews of patient complaints to learn about safety risks.^55-58 Prior work in other areas of patient safety (e.g., medication errors, infection control) has examined the potential of engaging patients proactively to monitor safety risks and problems.^59,60 With subsequent development, similar mechanisms could be used to monitor diagnosis-related issues.

One limitation of current patient reporting systems is the lack of validated patient-reported questions or methods to detect diagnostic safety concerns. Barriers to patient engagement in safety initiatives, including low health literacy, lack of wider acceptance of safety monitoring as part of the patient role, provider expectations and attitudes, and communication differences, will also need to be addressed to make the most of such efforts.^59,61,62 Real-time adverse event and “near-miss” reporting systems, akin to those intended for use by clinicians, are another potential mechanism to collect patient-reported data on diagnostic safety.⁶³

Learning From Existing Large Datasets

Whereas direct reports from patients and clinicians may offer unique insights, HCOs may also be ready to use large datasets to uncover trends in diagnostic performance and events that are not otherwise easily identified. Administrative and billing data are widely available in most modern HCOs and have been proposed as one source of data for detecting missed opportunities for accurate and timely diagnosis.^64-66

For example, diagnosis codes assigned at successive clinical encounters may be used as a proxy for the evolution of a clinical diagnosis; if significant discrepancies are found, it may lead to a search for reasons.⁴⁴ Using symptom-disease-based dyads, such as abdominal pain followed by appendicitis a few days later or dizziness followed by stroke, are examples of this approach.^66,67

This strategy may be intuitive to patient safety leaders because several other safety measurement methods are also based on diagnosis codes extracted from administrative datasets (e.g., patient safety indicators).^68,69However, unlike specific safety events that can be coded with good sensitivity (e.g., retention of foreign objects during procedures), administrative data are not sufficiently sensitive to detect diagnostic errors. Moreover, administrative data lack relevant clinical details about diagnostic processes that can be improved. Administrative data sources are mainly useful insofar as they can be used to identify patterns or specific cohorts of patients to further review for presence of diagnostic error, based on diagnosis or other relevant characteristics that may be considered risk factors or of special interest for diagnostic safety improvement.

Medical records can be a rich source of data, as they contain clinical details and reflect the patient’s longitudinal care journey. Although medical record reviews are considered valuable and sometimes even gold standard for detecting diagnostic errors, it is often not clear which records to review. Reviewing records at random can be burdensome and resource intensive. However, more selective methods can identify a high-yield subset (e.g., reviewing records of patients diagnosed with specific clinical conditions at high risk of being missed, such as colorectal cancer or spinal epidural abscess).^70,71 Such selective methods can be more efficient compared with voluntary reporting or nonselective or random manual review.

Another way to select records is through the “trigger” approach, which aims to “alert patient safety personnel to possible adverse events so they can review the medical record to determine if an actual or potential adverse event has occurred.”^72-75 EHRs and clinical data warehouses make it possible to identify signals suggestive of missed diagnosis prior to detailed reviews. HCOs can search EHRs on a scale that would be untenable using manual or random search methods using electronic algorithms, or “e-triggers,” which mine vast amounts of clinical and administrative data to identify these signals.^74,76,77

For example, algorithms could identify patients with a certain type of abnormal test result (denominator) and identify which results have still not been acted upon after a certain length of time (numerator).⁷⁸ This type of algorithm is possible because the data about an abnormal result (e.g., abnormal hemoglobin value and microcytic anemia) and the followup action needed (e.g., colonoscopy for the 65-year-old) are (or should be) coded in the EHR. Similarly, certain patterns, such as unexpected hospitalizations after a primary care visit, can be identified more accurately if corresponding clinical data are available.¹⁹

To enhance the yield of record reviews, e-triggers can be developed to alert personnel to potential patient safety events and enable targeted review of high-risk patients.³⁵ The Institute for Healthcare Improvement’s Global Trigger Tools,⁷⁹ which include both manual and electronic trigger tools to detect inpatient events,^80-82 are widely used but were not specifically designed to detect diagnostic errors. More targeted e-triggers for diagnostic safety measurement and monitoring can be developed and integrated within existing patient safety surveillance systems in the future and may enhance the yield of record reviews.^46,83-86 Other examples of possible e-triggers that are either in development or approaching wider testing and implementation are described in Table 2.

Table 2. Examples of Potential Safer Dx e-Triggers Mapped to Diagnostic Process Dimensions of the Safer Dx Framework¹⁴ (adapted from Murphy DR, et al.³⁵)

Safer Dx Diagnostic Process	Safer Dx Trigger Example	Potential Diagnostic Error
Patient-provider encounter	Emergency department or primary care visit followed by unplanned hospitalization	Missed red flag findings or incorrect diagnosis during initial office visit
	ED/PC visit within 72 hours after ED or hospital discharge	Missed red flag findings during initial ED/PC or hospital visit
	Unexpected transfer from hospital general floor to ICU within 24 hours of ED admission	Missed red flag findings during admission
Performance and interpretation of diagnostic tests	Amended imaging report	Missed findings on initial read or lack of communication of amended findings
Followup and tracking of diagnostic information	Abnormal test result with no timely followup action	Abnormal test result missed
Referral-related factors	Urgent specialty referral followed by discontinued referral within 7 days	Delay in diagnosis from lack of specialty expertise
Patient-related factors	Poor rating on patient experience scores post ED/PC visit	Patient report of communication barriers related to missed diagnosis

Both selective and e-trigger enhanced reviews are advantageous because they can allow development of potential measures for wider testing and adoption. For instance, measures of diagnostic test result followup may focus specifically on abnormal test results that are suggestive of serious or time-sensitive diagnoses, such as cancer.⁸⁷ Examples of diagnostic safety measures could include the proportion of documented “red-flag” symptoms or test results that receive timely followup, or the proportion of patients with cancer newly diagnosed within 60 days of first presentation of known red flags.¹⁶ Depending on an HCO’s priorities, safety leaders could consider additional development, testing, and potential implementation of measure concepts proposed by NQF and other researchers.^8,16,88

Other mechanisms for mining clinical data repositories, such as natural language processing (NLP) algorithms,⁸⁹ are too early in development but may be a useful complement to future e-triggers for detection of diagnostic safety events. Whereas e-triggers leverage structured data to identify possible safety concerns, a substantial proportion of data in EHRs is unstructured and therefore inaccessible to e-triggers. NLP and machine-learning techniques could help analyze and interpret large volumes of unstructured textual data, such as those in clinical narratives and free-text fields, and reduce the burden of medical record reviews by selecting records with potentially highest yield. While research has examined the predictive validity of NLP algorithms for detection of safety incidents and adverse events,^90-92 to date this methodology has not been applied to or validated for measurement of diagnostic safety.

Synthesizing Data and Enhancing Confidence in Measurement

Determining the presence of diagnostic error is complex and requires additional evaluation for missed opportunities.⁹³ A binary classification (e.g., error or no error) may be insufficient for cases involving greater uncertainty, which call for more graded assessment approaches reflecting varying degrees of confidence in the determination of error.^20,94

Depending on the measurement method, a thorough content review may be sufficient to identify missed opportunities for diagnosis, contributing factors, and harm or impact. However, systematic approaches to surveillance and measurement of diagnostic safety often warrant the use of structured data collection instruments^95,96 that assess diagnostic errors using objective criteria, as well as taxonomies to classify process breakdowns. For example, the Revised Safer Dx Instrument⁹⁷ is a validated tool that can be used to do an initial screen for presence or absence of diagnostic error in a case. It helps users identify potential diagnostic errors in a standardized way for further analysis and safety improvement efforts.

Structured assessment instruments can also be used to provide clinicians with data for feedback and reflection that they otherwise may not receive.98 Furthermore, process breakdowns can be analyzed using approaches such as the Diagnostic Error Evaluation and Research taxonomy⁹⁹ or the Safer Dx Process breakdown taxonomy,⁹⁷ both of which help to identify where in the diagnostic process a problem occurred.

Other factors related to diagnostic error, such as the presence of patient harm (e.g., clear evidence of harm versus “near-misses”), preventability, and actionability, may also be important to define in advance so that the selected measurement strategy aligns with the learning and improvement goals. Nevertheless, the science of understanding the complex mix of cognitive and sociotechnical contributory factors and implementing effective solutions based on these data still needs additional development.^100-102

Browse Topics

Topics A-Z

Priority Populations

Programs

Research

Publications & Products

Research Findings & Reports

National Healthcare Quality and Disparities Report

Data & Analytics

Tools

Funding & Grants

Notice of Funding Opportunities

Research Policies

Funding Priorities

Training & Education Funding

Grant Application, Review & Award Process

Post-Award Grant Management

Contracts

AHRQ Grants by State

PCOR

News

Newsroom

Blog

Newsletter

Events

About

About AHRQ

Organization & Contacts

SHARE:

Operational Measurement of Diagnostic Safety: State of the Science

Special Considerations for Measurement of Diagnostic Safety (continued)

Table of Contents

Learning From Known Incidents and Reports

Learning From Existing Large Datasets

Table 2. Examples of Potential Safer Dx e-Triggers Mapped to Diagnostic Process Dimensions of the Safer Dx Framework¹⁴ (adapted from Murphy DR, et al.³⁵)

Synthesizing Data and Enhancing Confidence in Measurement

SHARE:

Operational Measurement of Diagnostic Safety: State of the Science

Special Considerations for Measurement of Diagnostic Safety (continued)

Table of Contents

Learning From Known Incidents and Reports

Learning From Existing Large Datasets

Table 2. Examples of Potential Safer Dx e-Triggers Mapped to Diagnostic Process Dimensions of the Safer Dx Framework14 (adapted from Murphy DR, et al.35)

Synthesizing Data and Enhancing Confidence in Measurement

Table 2. Examples of Potential Safer Dx e-Triggers Mapped to Diagnostic Process Dimensions of the Safer Dx Framework¹⁴ (adapted from Murphy DR, et al.³⁵)