Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement

5. Improving Data Collection across the Health Care System

While a range of health and health care entities collect data, the data do not flow among these entities in a cohesive or standardized way. Entities within the health care system face challenges when collecting race, ethnicity, and language data from patients, enrollees, members, and respondents. Explicitly expressing the rationale for the data collection and training staff, organizational leadership, and the public to appreciate the need to use valid collection mechanisms may improve the situation. Nevertheless, some entities face health information technology (Health IT) constraints and internal resistance. Indirect estimation techniques, when used with an understanding of the probabilistic nature of the data, can supplement direct data collection efforts.

Addressing health and health care disparities requires the full involvement of organizations that have an existing infrastructure for quality measurement and improvement. Although hospitals, community health centers (CHCs), physician practices, health plans, and local, state, and federal agencies can all play key roles by incorporating race, ethnicity, and language data into existing data collection and quality reporting efforts, each faces opportunities and challenges in attempting to achieve this objective.

To identify the next steps toward improving data collection, it is helpful to understand these opportunities and challenges in the context of current practices. In some instances, the opportunities and challenges are unique to each type of organization; in others, they are common to all organizations and include:

  • How to ask patients and enrollees questions about race, ethnicity, and language and communication needs.
  • How to train staff to elicit this information in a respectful and efficient manner.
  • How to address the discomfort of registration/admission staff (hospitals and clinics) or call center staff (health plans) about requesting this information.
  • How to address potential patient or enrollee pushback respectfully.
  • How to address system-level issues, such as changes in patient registration screens and data flow.

Previous chapters have provided a framework for eliciting, categorizing, and coding data on race, ethnicity, and language need. This chapter considers strategies that can be applied by various entities to improve the collection of these data and facilitate subsequent reporting of stratified quality measures. It begins by examining current practices and issues related to collecting and sharing data across the health care system. Next is a discussion of steps that can be taken to address these issues and improve data collection processes. This is followed by a review of methods that can be used to derive race and ethnicity data through indirect estimation when obtaining data directly from many patients or enrollees is not possible.

Collecting and Sharing Data Across The Health Care System 

Health care involves a diverse set of public and private data collection systems, including health surveys, administrative enrollment and billing records, and medical records, used by various entities, including hospitals, CHCs, physicians, and health plans. Data on race, ethnicity, and language are collected, to some extent, by all these entities, suggesting the potential of each to contribute information on patients or enrollees. The flow of data illustrated in Figure 5-1 does not even fully reflect the complexity of the relationships involved or the disparate data requests within the health care system. Currently, fragmentation of data flow occurs because of silos of data collection (NRC, 2009).

No one of the entities in Figure 5-1 has the capability by itself to gather data on race, ethnicity, and language for the entire population of patients, nor does any single entity currently collect all health data on individual patients. One way to increase the usefulness of data is to integrate them with data from other sources (NRC, 2009). Thus there is a need for better integration and sharing of race, ethnicity, and language data within and across health care entities and even (in the absence of suitable information technology [IT] processes) within a single entity.

It should be noted that a substantial fraction of the U.S. population does not have a regular relationship with a provider who integrates their care (i.e., a medical home) (Beal et al., 2007). For some, a usual source of care is the emergency department (ED), a situation that complicates the capture and use of race, ethnicity, and language data and their integration with quality measurement. While health plans insure a large portion of the U.S. population, their direct contact tends to be minimal, even during enrollment. Hospitals, which tend to have more developed data collection systems, serve only a small fraction of the country's population. As a result, no one setting within the health care system can capture data on race, ethnicity, and language for every individual.

Health information technology (Health IT) may have the potential to improve the collection and exchange of self-reported race, ethnicity, and language data, as these data could be included, for example, in an individual's personal health record (PHR) and then utilized in electronic health record (EHR) and other data systems.1 There is little reliable evidence, though, on the adoption rates of EHRs (Jha et al., 2009). While substantial resources were devoted to this technology in the American Recovery and Reinvestment Act of 2009,2 it will take time to develop the infrastructure necessary to fully implement and support Health IT (Blumenthal, 2009). Thus, the consideration of other avenues of data collection and exchange is essential to the subcommittee's task.

Until data are better integrated across entities, some redundancy will remain in the collection of race, ethnicity, and language data from patients and enrollees, and equivalently stratified data will remain unavailable for comparison purposes unless entities adopt a nationally standardized approach. Methods should be considered for incorporating these data into currently operational data flows, with careful attention to concerns regarding efficiency and patient privacy.


Because hospitals tend to have information systems for data collection and reporting, staff who are used to collecting registration and admissions data, and an organizational culture that is familiar with the tools of quality improvement, they are relatively well positioned to collect patients' demographic data. In addition, hospitals have a history of collecting race data. With the passage of the Civil Rights Act of 19643 and Medicare legislation in 1965,4 there was a legislative mandate for equal access to and desegregation of hospitals (Reynolds, 1997). Therefore it is not surprising that more than 89 percent of hospitals report collecting race and ethnicity data, and 79 percent report collecting data on primary language (AHA, 2008).


This culture of data collection has limitations, however. Historically, the data were never intended for quality improvement purposes, but to allow analysis to ensure compliance with civil rights provisions. Additionally, hospital data collection practices are less than systematic as the categories collected vary by hospital, and hospitals obtain the information in various ways (e.g., self-report and observer report) (Regenstein and Sickler, 2006; Romano et al., 2003; Siegel et al., 2007). Furthermore, compared with the number of people who are insured or visit an ambulatory care provider, a relatively small number of people are hospitalized in any one year (Figure 5-2). Thus, while hospitals are an important component of the health care system and represent a major percentage of health care expenditures, they are only one element of the system for collecting and reporting race, ethnicity, and language data.

Hospitals also face challenges associated with collecting accurate data and using these data for quality improvement and reduction of disparities. A 2006 National Public Health and Hospitals Institute (NPHHI) survey asked hospitals that collected race and ethnicity data whether they used the data to assess and compare quality of care, utilization of health services, health outcomes, or patient satisfaction across their different patient populations. Fewer than one in five hospitals that collected these data used them for any of these purposes (Regenstein and Sickler, 2006). Additionally, only half of hospitals that collected data on primary language maintained a database of patients' primary languages that they could track over time (Hasnain-Wynia et al., 2006).

Many of the above challenges can be attributed largely to the many staff and departments or units that need to be engaged in the process to ensure systematic data collection and use. Hospitals have multiple pathways (inpatient, outpatient, ED, urgent care) through which patients enter the system. For example, the ED is the source of 45 percent of all hospital admissions (Healthcare Financial Management Association, 2007).

Systems changes can involve training a large number (possibly hundreds) of hospital registration/admission staff (many of whom may be off site) and modifying practice management and EHR systems to ensure that proper and consistent data fields are in place across multiple departments and units that serve as patient entry points. Ideally, these systems would be made interoperable through the development of interfaces that would make it possible to relay the data across different systems.

A Robert Wood Johnson Foundation initiative to reduce disparities in cardiac care required participating hospitals to systematically collect race, ethnicity, and language data and use the data to stratify quality measures. The ten hospitals in the collaborative initially cited the data collection requirement as one of the greatest challenges of the program, yet once they focused their efforts on these goals, they were able to bring together key stakeholders within each institution, implement needed IT changes, and train staff. As a result, they successfully began data collection within a relatively short time (Siegel et al., 2008). Other hospitals not part of this initiative are also successfully collecting race, ethnicity, and language data and linking them to quality measures (Weinick et al., 2008). Data collected at the hospital level are useful both for assessing the quality of hospital-provided services and, if shared with other entities, for facilitating analyses of quality across multiple settings. Box 5-1 provides an example of a statewide initiative to collect standardized race, ethnicity, and language data.

Community Health Centers

CHCs are front-line providers of care for underserved and disadvantaged groups (Taylor, 2004) and therefore are good settings for implementing quality improvement strategies aimed at reducing racial and ethnic disparities in care. Yet while CHCs serve diverse patient populations and, as organizations, understand the importance of demographic data for improving the quality of care, the accuracy of the race, ethnicity, and language data they collect may be limited (Maizlish and Herrera, 2006). More than 87 percent of surveyed CHCs reported inquiring about a patient's need for language services, and 73 percent reported recording this information in the patient record (Gallegos et al., 2008); less is known, however, about the extent to which CHCs consistently collect patient race and ethnicity data beyond the basic Office of Management and Budget (OMB) categories included in their national Uniform Data System (HRSA, 2009).5

Box 5-1. Statewide Race and Ethnicity Data Collection: Massachusetts

In January 2007, all Massachusetts hospitals were required to begin collecting race and ethnicity data from every patient with an inpatient stay, an observation unit stay, or an emergency department visit. These data are included in the electronic discharge data each hospital submits to the state's Division of Health Care Finance and Policy. As part of this effort, a standardized set of reporting categories was created and train-the-trainer sessions were held across the state. A report on this initiative notes:

"The new efforts in Massachusetts are unique in the constellation of requirements and approaches being implemented in the state today. First, all acute care hospitals are required to collect these data, and a recommended data collection tool has been developed jointly by the city [Boston] and Commonwealth to standardize efforts across hospitals. Second, the tool and the required categories in which hospitals must provide patient-level discharge data to the [state] include an exceptionally detailed list of ethnicities, with 31 reporting categories that include 144 ethnicities or countries of origin. Third, the collaboration between the City of Boston, the Commonwealth of Massachusetts, and hospitals has been crucial to turning policy attention to reducing disparities in the quality of health care."

Acute care hospitals are required to report the basic OMB race categories along with 31 ethnicity categories: Asian Indian, Cambodian, Chinese, Filipino, Japanese, Korean, Laotian, Vietnamese, African American, African, Dominican, Haitian, European, Portuguese, Eastern European, Russian, Middle Eastern (or North African), Caribbean Island, American, Brazilian, Cape Verdean, Central American (not otherwise specified), Colombian, Cuban, Guatemalan, Honduran, Mexican (Mexican, Mexican American, Chicano), Puerto Rican, Salvadoran, South American (not otherwise specified), and Other Ethnicity.

Sources: Massachusetts Executive Office of Health and Human Services, 2009; Weinick et al., 2007, 2008.

Like hospitals, CHCs face challenges to collecting data, such as the need to train staff, the need to modify existing Health IT systems, and the need to ensure interoperability between the practice management systems where demographic data are collected and recorded and the EHR systems where the demographic data can be linked to clinical data for quality improvement purposes. In 2006, only 26 percent of surveyed CHCs reported some EHR functionality, yet 60 percent reported plans for installing a new EHR system or replacing the current system (Shields et al., 2007). Collection of demographic data can also increase the burden of data entry for staff, particularly for those CHCs that still use paper forms to collect these data from patients (Chin et al., 2008).

Limited resources (both financial and human) and a high-need patient population present ongoing challenges to CHCs in their data collection and quality improvement efforts (Box 5-2). Because 40 percent of CHCs' patient populations are uninsured and because CHCs generally have a poor payer mix (Manatt Health Solutions and RSM McGladrey, 2007; National Association of Community Health Centers, 2006), they gain relatively less revenue than private physician practices from quality improvement interventions that lead to the delivery of more services (Chin et al., 2008). Even with increases in federal funding, CHCs struggle to meet the rising demand for care along with demands to increase quality reporting, reduce disparities, and develop EHR systems (Hurley et al., 2007).

Box 5-2. Collecting and Using Data: The Alliance of Chicago Community Health Services

The Alliance of Chicago Community Health Services developed a customized EHR system to provide decision support for clinicians and link clinical performance measures with key patient characteristics to identify disparities in performance and inform quality improvement efforts. The alliance of four CHCs across 32 clinical sites implemented the centralized EHR system in 2005-2006. The system is hosted in a secure facility, allowing its data to be accessed by providers via the Internet. The aggregate data means CHCs can look at trends across populations and compare outcomes by different communities, different CHCs, or different demographic groups. The system integrates patient race and ethnicity data, which is collected and stored in the practice management system, with clinical data stored in the EHR system.

The processes of development and implementation required reconsiderations of workflow design, customization, and decision support. For example, implementation required analyzing and redesigning hundreds of clinical workflow patterns in busy CHCs and developing the right strategies for training staff. Additionally, some CHCs were collecting race and ethnicity data using paper forms and then transferring the data first into practice management systems and then into EHR systems for linkage with quality data. Lack of standardization for quality measures and data specifications made some of the tasks even more difficult. The standard ultimately decided upon for collection was the OMB standard categories. Now that the systems are in place, it is possible for clinics to move forward with collecting more granular data. The Alliance is now serving as a model for CHC systems in New York, California, and Detroit.

Sources: De Milto, 2009; Kmetik, 2009; Rachman, 2007.

Physician and Group Practices

The structure and capabilities of primary and specialty care entities vary tremendously, ranging from large groups or health centers with highly structured staff and advanced information systems to solo physician practices with correspondingly small staff. The ability and motivation of these entities to collect and effectively use race, ethnicity, and language data consequently also vary given the investments in Health IT systems and staff training required for these functions. At the same time, these settings have direct contact with patients, ideally as part of an ongoing caregiving relationship. Thus, they are well suited to explaining the reasons for collecting these data, as well as using the data to assess health care needs and patterns of disparities. Physician practices, however, are less likely than hospitals or CHCs to collect race, ethnicity, and language data from patients (Nerenz et al., 2004).

Medical groups may believe either that it is unnecessary to collect these data or that collecting them would offend patients (Nerenz and Darling, 2004). Physician practices may not see the utility of the data and may believe that they should not bear the burden of collecting the data and linking them to quality measures (Mutha et al., 2008). A number of physicians and practice managers interviewed in 2007 thought it was illegal to collect these data, and many did not understand how the data would be used (Hasnain-Wynia, 2007). However, most of the interviewees (physicians, nurse managers, and practice managers) indicated that they thought it would not be problematic to collect these data from their patients if they could explain why the data were being collected and how they would be used (Box 5-3). Indeed, Henry Ford Medical Group has collected race and ethnicity data for more than twenty years, and the Palo Alto Medical Foundation, a multispecialty provider group with several clinics, has recently begun to collect race and ethnicity data for use in analyses of disparities (Palaniappan et al., 2009).

Primary care sites typically do not have structured information available about care provided at other locations, so their ability to analyze data on quality of care by race, ethnicity, and language is generally limited to measures involving routine prevention and primary care. Physician practices with EHR systems tend to use the system for administrative rather than quality improvement purposes (Shields et al., 2007), but EHR systems can be tailored to link quality measures and demographic data (Kmetik, 2009). Data on race, ethnicity, and language need collected in these settings could be useful throughout the health care system if mechanisms were in place for sharing the data with other entities (e.g., health plans) that have an ongoing obligation and infrastructure for analysis of data on quality of care which can be stratified by race, ethnicity, and language need and can look at episodes of care and care coordination.

Box 5-3. Collecting Data in Small Physician Practices

The National Committee for Quality Assurance (NCQA) launched a quality improvement demonstration program for small physician practices serving minority populations. With funding from The California Endowment, NCQA provided grants and technical assistance to small practices (five physicians or fewer). The goal of the project was to learn what types of resources and tools these practices need in order to conduct and sustain quality improvement activities, especially in serving disadvantaged populations. After the project, participants reported a greater appreciation for the importance of collecting race and ethnicity data, although few practices began to do so systematically. Before the project, needs assessment surveys showed that only 15 percent of physicians had a "written standard identifying and prominently displaying in the medical record the language preferred by the patient."

While few of the practices began formal data collection, staff at most practices expressed an understanding of the value of this information. The project also improved the participants' understanding of the legal issues related to collecting data from patients on race, ethnicity, and language need. For example, one physician reported, "You guys have taught me that it is not illegal to identify race. That's such a batted about issue, but it is not against HIPAA regulations to identify race and culture and language in the medical chart." However, practical barriers to data collection remained. One challenge faced by practices was the lack of standardized fields in EHR systems. Practices that sought to collect data usually created their own method for documenting race and ethnicity.

Source: NCQA, 2009.

Multispecialty group practices, which provide a range of primary care, specialty care, inpatient care, and other services, may be in a strong position to collect race, ethnicity, and language data because they have regular contact with large numbers of patients over long periods of time, can place the data collection in the context of improvement of care rather than administration of health insurance benefits, and typically have the necessary staff and other forms of infrastructure (e.g., a shared EHR system at all care sites). A single EHR system may facilitate the sharing of race, ethnicity, and language data across sites and levels of care, assuming that the data are present and available in the system.

Health Plans

Health plans, including Medicaid managed care and Medicare Advantage plans, have the capabilities necessary to systematically compile and manage race, ethnicity, and language data, and thus have roles to play in quality improvement (Rosenthal et al., 2009). Plans, though, may have limited opportunities for direct contact during which the data can be collected and the need for the data explained. While there are multiple points at which the data can be collected (e.g., disease management programs, member surveys, enrollment), a principal occasion for contact is during enrollment, when fears about discriminatory use of the data may be greatest. California, Maryland, New Hampshire, New Jersey, New York, and Pennsylvania prohibit insurers from requesting an applicant's race, ethnicity, religion, ancestry, or national origin in applications, but the states do allow insurers to request such information from individuals once enrolled (AHIP, 2009). There are no legal impediments to collecting these data after enrollment.

As many individuals enroll in plans through their place of employment, employers provide one avenue for the collection of race, ethnicity, and language need data. It is possible in principle for individuals to self-identify during open enrollment in a health plan, with the individual's employer conveying the enrollee's race and ethnicity data to the plan through an electronic enrollment transaction. The plan could then use these data for quality improvement interventions and measurement. In fact, the Health Insurance Portability and Accountability Act of 1996 (HIPAA) 834 enrollment standard6,7 provides for the transmittal of race and ethnicity data. However, the HIPAA Transactions Rule applies only to health plans, health care clearinghouses, and certain health care providers. Thus, while race and Hispanic ethnicity may be captured in the enrollment transaction and plans are required to accept the standard transaction if it is sent to them, employers rarely use the standard and are not required to do so. As a result, this avenue of data collection is not currently operational, although pending legislation encouraging the use of electronic enrollment transaction standards may make it more common in the future.8

A study conducted by America's Health Insurance Plans (AHIP) found that 54 percent of plans collected race and ethnicity data, and 56 percent collected primary language data. The National Health Plan Collaborative (NHPC), a public-private partnership to improve quality of care and reduce disparities,9 focused on collecting demographic data on enrollees. NPHC viewed direct data collection as the gold standard since this method supports interventions and direct outreach to individuals, but NHPC members realized that obtaining data through direct methods can take years to achieve in a health plan setting (Lurie, 2009). Likewise, the limited success of Aetna with data collection (Box 5-4) after several years of concerted effort suggests that the upper limit of data collection by health plans with presently known direct methods may be far below the level necessary for identifying disparities in quality of care through stratified analysis, for example, of Healthcare Effectiveness Data and Information Set (HEDIS) data.

While the use of racial, ethnic, and language identifiers for coverage, benefit determination, and underwriting is prohibited, the collection of these data for improving quality and reducing health care disparities is both permitted and encouraged. Low participation by plan members in reporting race, ethnicity, and language data may be indicative of low trust of the industry (Coltin, 2009). Despite informing members of how data will be used, plans may also face internal legal concerns about taking on unnecessary liability through threats of legal action due to misperceptions regarding the purposes of collection.

Box 5-4. Successful Collection of Data by a Health Plan: Aetna

Aetna was the first national, commercial plan to start collecting race and ethnicity data for all of its members. In 2002, Aetna began directly collecting these data using electronic and paper enrollment forms. Multiple mechanisms are now used to capture race, ethnicity, and language data. The data may be updated at any point of contact, including at enrollment, when members speak to customer service or patient management representatives, and when members access an online member portal. Since 2002, more than 60 million Aetna members have provided race, ethnicity, and/or primary language information. As of 2009, Aetna had collected this information from more than 6 million members, representing approximately 30-35 percent cumulative coverage of race, ethnicity, and language data for its currently enrolled population. Aetna's success with direct collection has shown that no negative public reaction occurs when plans collect this information.

Sources: NCQA, 2006; Personal communication, W. Rawlins, Aetna, May 3, 2009.


Federal and state health agencies administer surveys that are primary sources for estimating the health of a population and current and future needs for health care services (Ezzati-Rice and Curtin, 2001; Mays et al., 2004). For example, a number of studies reviewed in Chapter 2 employed surveys such as the National Health Interview Survey (NHIS), the National Latino and Asian American Survey (NLAAS), and the California Health Interview Survey (CHIS). Surveys can capture data not included in administrative and utilization data-notably data on the uninsured and reports on financial and nonfinancial barriers to seeking care. Other surveys, such as the Consumer Assessment of Healthcare Providers and Systems (CAHPS®), are designed to assess plans, hospitals, and medical groups and capture respondents' self-reported race and ethnicity. These surveys are resources for quality measurement and improvement. While some can be linked to specific health care delivery sites, most are not, so they tend to be a data collection system that is parallel to, rather than integrated with, care delivery.

A fundamental feature of surveys, whether self-administered by mail or interviewer-administered in person or by phone, is that a respondent's race, ethnicity, and language need are self-identified and not ascribed by the interviewer. However, cues from the interviewer, a respondent's suspicion of lack of confidentiality, or the social and political context can influence a respondent's answer (Craemer, 2009; Foley et al., 2005). Moreover, conducting surveys of representative population-based samples in diverse settings requires an assessment of the need for in-language interviews (Ponce et al., 2006), balanced by the costs associated with high-quality translations and trained bilingual interviewers. For surveys conducted in multiple languages (e.g., the CHIS is conducted in English, Spanish, Cantonese, Mandarin, Vietnamese, and Korean), the language of the interview conveys, to some extent, the respondent's language preference in communicating health information.

Surveys are charged with obtaining stable estimates for population groups defined not only by race, ethnicity, and language, but also by geography and other demographic characteristics. Cost, logistical issues, and protection of respondents' confidentiality constrain the granularity of reportable race and ethnicity estimates (Madans, 2009). To ensure usable data on population groups, the NHIS oversamples Blacks, Asians, and Hispanics (Madans, 2009), but lower coverage is provided for smaller groups, such as Native Hawaiian or Other Pacific Islanders (NHOPI), in the NHIS (e.g., there were fewer than 10 Samoan respondents in NHIS 2007).

Oversampling is a viable strategy to increase coverage of smaller populations. Yet oversampling incurs costs associated with the rarity of the population and the expense of the survey modality (e.g., the marginal cost of adding one more Samoan respondent would be greater for in-person household interviews than for telephone interviews). Other issues relate to the clustering of a population in a designated area (if area-based oversampling is used) and the specificity and sensitivity of surname lists (if list-assisted oversampling is used). Information on granular ethnicities may also be gleaned from surveys with an explicit focus on specific ethnic groups (e.g., NLAAS) and on subregions (e.g., CHIS).

Another strategy for estimating the health and health care needs of ethnic groups is to combine years of survey data (Barnes et al., 2008; Freeman and Lethbridge-Cejku, 2006; Kagawa-Singer and Pourat, 2000). Some of the findings on variations within and among population groups reported in Chapter 2 were generated from pooled analyses of the NHIS sample to increase the size of the samples. Pooling, however, may not work for the smallest population groups; for example, it would take at least 8 years of NHIS data to obtain the sample size needed for reportable estimates on the NHOPI population. Over such a long time span, significant changes can compromise the validity and relevance of such estimates for health care policy and planning purposes. Where pooling is useful, standardized measures of demographic variables would improve the quality of the pooled data. Given the limitations of survey sampling, administrative databases offer the potential to collect data on higher numbers of smaller ethnic groups and make statistically reliable analytic comparisons across groups (e.g., a hospital administrative database versus a sample of hospital patients).

Improving Data Collection Processes

The above discussion of challenges faced by various health and health care entities highlights how important it is for data capture and quality to overcome Health IT constraints and minimize respondent and organizational resistance. Integration of data systems has the potential to streamline collection processes so that data can be reported easily, and an individual will not need to self-identify race, ethnicity, and language need during every health encounter. Until such integration is achieved, enhancing legacy Health IT systems, implementing staff training, and educating patients and communities about the reasons for and importance of collecting these data can help improve data collection processes.

The collection of race, ethnicity, and language need data by various entities within the health care system raises the possibility that conflicting data may, in some instances, be assigned to a single individual. An individual may self-identify in one clinical setting according to a limited set of choices, whereas another setting may offer more detailed, specific response options, or the individual's race may have been observed rather than requested and then recorded by an intake worker. There is value in developing a hierarchy of accuracy by which conflicting data can be adjudicated. As previously discussed in this report, OMB prefers self-reported data, and researchers view self-report as the "gold standard" (Higgins and Taylor, 2009; OMB, 1997; Wei et al., 2006). Other methods of collecting these data (e.g., observer report) have been found to be inaccurate compared with self-reported data, resulting in undercounts of certain population groups (Buescher et al., 2005; Hahn et al., 1996; West et al., 2005; Williams, 1998). Thus, in this hierarchy of accuracy, self-report can be understood as being of superior validity. The subcommittee is aware of few systems in which race and ethnicity data are collected in more than one way and compared against self-report for validation. Therefore, the subcommittee cannot make generalizations about which sources or systems are likely to be of superior validity, other than commenting that self-report is preferred over observer-report.

The Health Level 7 (HL7) standards allow for data to be attributed as observer report or self-report, which may facilitate the resolution of conflicting data. There is no solid evidence in favor of the quality of data from any one locus of data collection (e.g., a health plan or hospital), except to the extent that location is correlated with data collection methods. If a provider, for example, collects these data through self-report and hospital records involve observer assignment, then favoring the self-reported data from the provider setting would make sense if the data were linked and conflicting data were found.

Not all data systems capture the method through which the data were collected, and some systems do not allow for data overrides. The interoperability of data systems may, for example, prohibit a provider from updating data on a patient that were provided by the patient's health plan. Thus, while self-reported data should trump indirectly estimated data or data from an unknown source, ways of facilitating this process logistically warrant further investigation. Data overriding should be used with caution, as overriding high-quality data with poor-quality data reduce the value for analytic processes.

Enhancing Legacy Health IT Systems

The varied and limited capacities of legacy Health IT systems challenge the collection, storage, and sharing of race, ethnicity, and language data. A single hospital, for example, may use different patient registration systems, which may not have the capacity to communicate with one another. Often, these systems operate unidirectionally, meaning that a system may be able to send or receive information but be unable to do both. Thus, a central system may be able to send data on a patient's race, ethnicity, and language to affiliated outpatient settings, but data collected in outpatient settings may not flow back to the central system (Hasnain-Wynia et al., 2004). Additionally, some quality data are derived from billing or other sources, requiring further linkages.

In ambulatory care settings (both CHCs and physician practices), race, ethnicity, and language need data are usually collected during the patient registration process and stored in practice management systems. However, clinical performance data may be captured in an another system, meaning that race, ethnicity, and language data in the practice management system need to be imported into the EHR system to produce quality measures stratified by these variables. Practice management systems and EHR systems therefore need to be interoperable.

As technology vendors have adopted standardized communication protocols such as HL7, interoperability has improved for exchange of data such as race and ethnicity (HL7, 2009). Such standards are not universally accepted, however, so some Health IT components can communicate without modification, while others require upgrading to ensure that race, ethnicity, and language data can be collected, stored, and shared. While transitioning from legacy Health IT systems to newer systems is challenging, especially in physician practices (Zandieh et al., 2008), the American Recovery and Reinvestment Act of 200910 provides stimuli for moving forward with national standard Health IT systems.

Most hospitals have the capacity to make changes in their Health IT systems, patient registration screens, and fields in house, but some hospitals must go through a corporate office to make these changes. The engagement and support of a hospital's IT department are important to the success of such efforts.

Implementing Staff Training

Staff of hospitals, physician practices, and health plans have expressed concern about asking patients, enrollees, or members to provide information about their race, ethnicity, and language need (Hasnain-Wynia, 2007). Staff may believe, for example, that patients might be confused or offended by such a request. Furthermore, staff may be concerned about the time-sensitive nature of modern clinical practice and want to ensure that these questions can be asked efficiently.

To ensure that these data are collected accurately and consistently, health care organizations need to invest in training all levels of staff. This may include incorporating the usefulness of these data for detecting and addressing health care needs into the training of health professionals, administrative staff, and hospital and health plan leadership. For example, those responsible for directly asking patients or enrollees for this information can receive front-line training to learn about the importance of collecting these data; how they will be used; how they should be collected; and how concerns of patients, enrollees, and members can be addressed (Hasnain-Wynia et al., 2004, 2006, 2007; Regenstein and Sickler, 2006). When there is direct contact between staff and patients, for instance, if staff do not understand the greater accuracy of directly reported data, they may make their own observations of an individual's race and/or ethnicity.

Specific training points to be emphasized will depend on the context and on how the data are being collected and utilized. For example, because health plan staff do not have face-to-face contact with enrollees, demographic information is often gathered through telephone encounters. Telephone training may also be needed for staff of hospitals, CHCs, and physician practices because preregistration by telephone may occur before hospital admission or ambulatory care appointments. Contra Costa Health Plan monitored the frequency with which staff were asking for these data and implemented performance metrics to ensure staff compliance. Generally, providers have face-to-face contact with patients and may find response rates are better during that time. Therefore, staff training at clinical sites may need to emphasize elements of face-to-face communication. The Health Research & Educational Trust (HRET) Disparities Toolkit, which has been endorsed by the National Quality Forum (NQF), offers a matrix for addressing patient reluctance under different scenarios (Hasnain-Wynia et al., 2007; NQF, 2008). Questions for requesting these data may introduce response bias, in the absence of adequate staff training.11

Before embarking on formally training staff to collect data, each entity needs to assess its data collection practices and delineate what is being done currently and what will change. The changes need to be clearly communicated during staff training sessions. Despite differences among health care settings, standardizing specific components of data collection within each organization will facilitate staff training processes. Suggestions to this end are presented in Box 5-5.

Box 5-5. Standardizing Direct Data Collection

  • Who: information should always be asked of patients or their caretakers and should never be gathered by observation alone
  • When: information should be collected upon admission or patient registration to ensure that appropriate fields are completed when the patient begins treatment, or for plans, when the individual enrolls (as permitted by state law)
  • What:
    • Questions about the OMB race and Hispanic ethnicity categories (one- or two-question format permitted)
    • A question about granular ethnicity with locally relevant response categories selected from a national standard set
    • A question to determine English-language proficiency
    • A question about language preference needed for effective communication
  • Where: data should be stored in a standard format for easy linking to clinical data
  • How: patient concerns should be addressed when the information is being obtained, and staff should receive ongoing training and evaluation
Page last reviewed May 2018
Page originally created September 2012
Internet Citation: 5. Improving Data Collection across the Health Care System. Content last reviewed May 2018. Agency for Healthcare Research and Quality, Rockville, MD. https://www.ahrq.gov/research/findings/final-reports/iomracereport/reldata5.html
Back To Top