Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement

6. Implementation

The subcommittee has proposed a standardized framework for the collection of race, ethnicity, and language data for use in efforts to improve the quality of health care. This framework combines the Office of Management and Budget (OMB) race and Hispanic ethnicity categories with categories for granular ethnicity and language need selected at the local level from national standard sets. Widespread adoption of this framework would ensure consistent categories for comparative analysis and facilitate data sharing across organizations and geographic areas. The U.S. Department of Health and Human Services (HHS) is a prime locus of the subcommittee's recommendations for implementation of these improvements because of its focus on resolving disparities in health and health care and its history of promoting the collection of race, ethnicity, and language data to ensure compliance with applicable statutes and regulations. Other federal agencies that deliver health care, states, accreditation and standards-setting organizations, and professional medical groups all have roles to plan in ensuring adoption and utilization.

The race and Hispanic ethnicity categories included in the Office of Management and Budget (OMB) 1977 Directive and its subsequent 1997 revisions stemmed primarily from a need to monitor civil rights, voting access, and changing population dynamics (OMB, 1997), and not from the perspective of health care quality improvement.

The subcommittee's task is to delineate standardized categories for the collection of race, ethnicity, and language data to serve the latter purpose. Standardization of any demographic variable or quality indicator helps ensure more comparable and reliable data for analytic comparisons and for sharing across organizational boundaries. Additionally, when there is communication across information systems and consistency in defined categories, once a person has provided his/her race, ethnicity, and language data, these data would not have to be elicited repeatedly during each health-related encounter, reducing the collection burden on both staff and individual patients. Recognizing the need for more detailed data on race, ethnicity, and language to support improvements in health and the quality of health care, the subcommittee recommends combining the use of granular ethnicity categories with the broad OMB categories, as well as an assessment of a patient's language need (whether a person's spoken English proficiency is less than "very well," and what is his/her preferred spoken language for effective communication during healthrelated encounters). Quality measurement and interventions will be enhanced by having these data at the individual patient level (Nerenz and Darling, 2004).

In this chapter, the subcommittee offers recommendations for implementing standardization of race, ethnicity, and language need so that these data will be available to inform health care quality improvement endeavors. In accordance with the subcommittee's statement of task, the recommendations offered in Chapters 3 through 5 for gathering these data are intended"for those entities wishing to assess and report on quality of care across these categories." The subcommittee's recommendations, however, will likely have greater influence if they are adopted as HHS standards, required in federally funded programs, and incorporated into industry standards for electronic health record (EHR) systems and other forms of health information technology (Health IT). Additionally, states, standards-setting organizations (e.g., the Joint Commission and the National Committee for Quality Assurance [NCQA]), and professional medical bodies have a role to play in fostering the adoption and use of standardized race, ethnicity, and language data for quality improvement purposes.

HHS Action

HHS is a prime locus of the subcommittee's recommendations for standardization and implementation because of its focus on health care quality and the elimination of disparities in health and health care in policy and through its funded programs, as well as its history in promoting the collection of race, ethnicity, and language data to ensure compliance with applicable statutes and regulations (AHRQ, 2008a, 2008b,, 2003,

2009e). Additionally, HHS is responsible for implementation of health information technology provisions of the American Recovery and Reinvestment Act of 2009 (ARRA) (HHS, 2009d). Although broad application of the EHR1 will take a number of years (Blumenthal, 2009), the need for race, ethnicity, and language data is now,so efforts to identify and address health care disparities can proceed, and thereby targeted actions can be taken to raise the overall quality of care in the nation. The EHR is a tool with the potential to reduce repetitive collection and to facilitate the linkage of demographics to some quality measures. The data collection issues for other current Health IT systems do not differ significantly from those involved in future EHR applications, so providers should begin to put in place now the processes for the capture and sharing of race, ethnicity, and language data.

Framework for the Collection of Race, Ethnicity, and Language Variables 

The framework for the collection of data on race, Hispanic ethnicity, granular ethnicity, and language variables proposed by the subcommittee and detailed in Chapters 3 through 5 is summarized in Figure 6-1. Templates for national lists of granular ethnicity and language categories are provided in Appendixes E and I, respectively. These templates can serve as building blocks upon which HHS can develop and maintain comprehensive national standard lists of granular ethnicities and languages based on the experiences of participants in health care delivery and quality improvement. The subcommittee does not specify a preset number of granular ethnicities or languages that all entities must collect; instead, in the previous chapters, it affirms the importance of selecting locally relevant categories from these lists, with an opportunity for self-identification through an open-ended "Other, please specify: —" response option.

Entities may also want to design their information system to have a way to track whether a person has"declined" to provide an answer, or the ethnicity is "unknown" (e.g., in the case of an adopted child) or "unavailable" (e.g., no direct contact has occurred to elicit information); these are not response categories for patients, but to be utilized for tracking. Additionally, some information systems and EHR systems have the capability to record whether information is directly "self-reported" by patients—the preferred approach—or is "observer-reported" (e.g., as is necessary when a person arrives unconscious in an emergency room).2 It would be most useful if these terms were also standardized across collection systems.

Standard lists of categories of granular ethnicity and languages will need to be formalized from the category templates offered by the subcommittee for race and ethnicity (Appendix E) and for languages (Appendix I). As noted in Chapter 3, within HHS, for example, there are different category sets in use: the Public Health Information Network (PHIN) uses the Centers for Disease Control and Prevention (CDC)/Health Level 7 (HL7) Race and Ethnicity Code Set 1.0 (CDC, 2009), whereas the Surveillance, Epidemiology and End Results (SEER) Program uses its own Coding and Staging Manual that does not always correspond with the CDC/HL7 Code Set (Johnson and Adamo, 2008). Likewise, states such as Massachusetts and Wisconsin have developed expanded sets of ethnicity categories and different rollup schemes for aggregation and reporting (Taylor-Clark et al., 2009; Wisconsin Cancer Reporting System, 2008). Some health plans, including Kaiser Permanente and Contra Costa Health Plan, also have their own granular ethnicity, spoken language, and written language categories (go to Appendixes G and H, respectively). However, none of the current sets alone provides a complete set for the nation as a whole. Additionally, the subcommittee focuses its attention on a rollup scheme from granular ethnicities to the OMB race and Hispanic ethnicity; the subcommittee chose not to define mid-level aggregations between granular ethnicity and the OMB level, but HHS may wish to consider such mid-level aggregations of ethnicity. The Massachusetts Superset, for example, roles granular ethnicities to larger groupings of ethnicities.

HHS should develop national standard sets of granular ethnicity and language categories with a responsive updating process and associated coding, so that each state or entity would be relieved of having to develop its own category sets and coding schemes. Data would then have a greater likelihood of being compatible across entities. Although HHS may likely build on the CDC/HL7 Code Set for race and ethnicity, the national set's use extends to emerging requirements for EHRs and other applications beyond the CDC PHIN. Thus, the subcommittee believes that development of the granular ethnicity category set and associated codes may need to be elevated to a more cross-cutting entity, such as the Office of the National Coordinator for Health Information Technology (ONC) or the Office of the Assistant Secretary for Planning and Evaluation (ASPE). The subcommittee does not specify the location of this activity, but leaves it to the discretion of the Secretary. The CDC/HL7 Code Set does not include languages.

Coding for Interoperability

HHS will need to work with HL7, a clinical and administrative data standards-setting organization for EHRs (HL7, 2009), to update the five-digit unique numerical codes in the existing CDC/HL7 Code Set (CDC, 2000).3 Additionally, interoperability standards may have implications for the number of fields available in EHRs to accommodate multiple questions on ethnicity and language variables as recommended in the subcommittee's framework, as well as other details analysts may wish to have, such as whether a response is self-reported by a patient, observer-based, or based on an indirect estimation. For language coding, HHS will have to develop or adopt a set of unique codes for languages analogous to the CDC/HL7 codes for race and ethnicity (CDC, 2000). While the Census Bureau and the maintenance agencies and registration authorities for the International Organization for Standardization (ISO)4each produce language lists that contain most of the same categories, they have distinctive coding practices. Additionally, as discussed in Chapter 4, the Census Bureau list uses the same code for multiple related languages, while the ISO list has unique codes for each language (go to Appendix I). To the extent that patients who are not English proficient need language assistance services in distinct languages in order to facilitate understanding during patient-provider interactions, a care provider's ability to track specific languages would be enhanced by unique coding for distinct languages. HHS will need to consult with these entities to establish unique coding. While the subcommittee has identified approximately 600 languages in use in the United States, fewer—perhaps 300—will be encountered in a health care context.

Regular Updating

A process for input on categories from the public and federally funded direct health care delivery and insurance programs (e.g., hospitals, clinics, health plans, community health centers, Medicaid programs) would help ensure that the initial category lists for granular ethnicities and languages are as comprehensive as necessary for use in the health care environment. Once standard national lists have been established, an ongoing process should be in place for responding within a reasonable time to questions about how to code specific groups if they are not on the initial lists. A designated component within HHS should update the category and code lists annually and be available to answer any questions related to rollup of individual ethnicities to broader OMB categories to ensure nationwide consistency in practice. It is expected that only a handful of categories will emerge yearly after comprehensive initial lists of ethnicity and languages are developed, so that updating the list by a few categories will not be onerous. Annual updating may be necessary in the initial years of implementation, over time it may become apparent that annual updates are not necessary, and another timeframe could be adopted. A local entity would not have to ask permission to use a specific category if it is not yet on updated national lists; rather, an entity could use its own provisional code until one was available at the national level.

Currently, updating of the CDC/HL7 categories and unique codes is tied to redeployment of the Census5,6 Every 10 years is not frequent enough to capture new immigrant groups, their languages, or emerging findings about disparities in health care. The Census Bureau could provide updated ancestry-based ethnicity and language categories more frequently from the ongoing American Community Survey7 As health care entities in communities across the nation collect data and begin to adapt to the use of standard categories and code sets, it is likely that they will encounter individuals, sooner even than the Census Bureau, who self-identify with a category that is not already listed. Thus, there will be a need for routine technical guidance, especially during the first few years of adoption of this report's recommendations. 

Recommendation 6-1a: HHS should develop and make available national standard lists of granular ethnicity categories and spoken and written languages, with accompanying unique codes and rules for rollup procedures.

  • HHS should adopt a process for routine updating of those lists and procedures as necessary. Sign languages should be included in national lists of spoken languages and Braille in lists of written languages.
  • HHS should ensure that any national hierarchy used to roll up granular ethnicity categories to the broad OMB race and Hispanic ethnicity categories takes into account responses that do not correspond to one of the OMB categories.

Electronic Health Records

The American Recovery and Reinvestment Act of 2009 (ARRA) provides opportunities for the inclusion of race, ethnicity, and language categories in standards for EHRs, thereby influencing which demographic data will be available for use when quality improvement data are stratified. ARRA authorizes and provides resources for the Office of the National Coordinator for Health Information Technology (ONC). The Coordinator is to guide the "development of a nationwide health information technology infrastructure that allows for the electronic use and exchange of information" for purposes that include quality improvement and reduction of disparities in health and health care, public health activities, clinical and health services research on quality, guidance for medical decisions at the time and place of care, and prevention and management of chronic diseases.8 The Coordinator is to assess how information technology or its absence affects communities with known health disparities and/or a high proportion of individuals at risk of poor health because a lack of insurance and inadequate health care capacity, thus limiting their access to health care.

Of particular interest to the subcommittee is the provision of ARRA to "ensure the comprehensive collection of patient demographic data, including, at a minimum, race, ethnicity, primary language, and gender information." The act directs the Coordinator to consult with the National Committee on Vital and Health Statistics (NCVHS), whose mission is to improve information on population health. In the past, NCVHS had concluded that survey data on race, ethnicity, and language needed to be improved because broad categories such as Asian and Hispanic mask significant differentials in health status, access to health care, and service utilization (NCVHS, 2005). The subcommittee agrees with this assessment based on its review of studies in Chapter 2.

One goal stated within ARRA is an EHR for each person in the United States by 2014. An EHR is defined by ONC as:

A real-time patient health record with access to evidence-based decision support tools that can be used to aid clinicians in decision-making. The EHR can automate and streamline a clinician's workflow, ensuring that all clinical information is communicated. It can also prevent delays in response that result in gaps in care. The EHR can also support the collection of data for uses other than clinical care, such as billing, quality management, outcome reporting, and public health disease surveillance and reporting. ( HHS, 2009b)

Proposed regulations on implementation of EHR under ARRA are due by the end of 2009 (HHS, 2009a). The subcommittee's recommended variables and categories for collection should be incorporated into each individual EHR, greatly expanding the availability of such data tied to information on health and health care for quality assessment purposes. Having the standards adopted by the other components of the health care industry, including the makers of information technology systems, would help ensure that a sufficient set of data fields are available to accommodate each element recommended for collection by the subcommittee. ONC is consulting with standards-setting organizations such as the Health Information Technology Standards Panel (Health ITSP) and the Certification Commission for Healthcare Information Technology (CCHealth IT) on harmonizing industry specifications and certification criteria9

Recommendation 6-1b: HHS and the Office of the National Coordinator for Health Information Technology (ONC) should adopt as standards for including in electronic health records the variables of race, Hispanic ethnicity, granular ethnicity, and language need identified in this report.

Recommendation 6-1c: HHS and ONC should develop standards for electronic data transmission among health care providers and plans that support data exchange and possible aggregation of race, Hispanic ethnicity, granular ethnicity, and language need data across entities to minimize redundancy in data collection.

Incentive Programs

The collection of data on race, ethnicity, and language and use of these data to foster elimination of disparities in quality of care can be an element of either public or private pay-for-performance systems. In general, such systems reward providers for activities that purchasers deem desirable. A variety of such systems are in place; some provide incentives for specific structural features (e.g., presence of EHRs), some for a set of process-of-care activities (e.g., use of appropriate antibiotics for surgical patients), some for improved patient outcomes (e.g., inhospital mortality rates), and some simply for the collection and reporting of quality data (Chien, 2007; Chien et al., 2007). As these systems continue to evolve over time, they can incorporate the collection and use of data on race, ethnicity, and language for quality improvement or the achievement of specific goals for reducing disparities as criteria for incentive payments.

Medicare Physician Quality Reporting Initiative (PQRI)

The Medicare PQRI establishes incentive payments for physicians who report on quality measures for Medicare beneficiaries (CMS, 2009). The Medicare Improvements for Patients and Providers Act of 2008 (MIPPA) has extended PQRI but not its funding indefinitely,10 increased the measure set to 153 individual measures, and added a whole array of different reporting options that interface with both registries and EHRs. For 2009, quality measurement groups include preventive care, diabetes, end stage renal disease, chronic kidney disease, back pain, coronary artery bypass graft surgery, rheumatoid arthritis, and perioperative care (McGann, 2009).

Monitoring for Unintended Consequences

Performance incentive programs can have positive or negative effects on disparities in health and health care, but tend not to be designed with reduction of disparities in mind (Chien et al., 2007). Data from the National Healthcare Disparities and National Healthcare Quality Reports show that even as quality of care improves overall on specific measures, disparities persist (AHRQ, 2008a, 2008b). Monitoring of program effects along the dimensions of race, ethnicity, and language is desirable to forestall greater widening of gaps in care and to understand the effects of incentive programs on underresourced primary care safety net providers (Rust and Cooper, 2007; Williams, 2009).

The subcommittee does not take a stand for or against incentive payments in Health IT programs. Rather, the subcommittee is recommending that, when such programs exist, it would be appropriate to include the collection of race, ethnicity, and language data as one activity for which positive incentives should be offered.

Recommendation 6-1d: The Centers for Medicare and Medicaid Services (CMS), as well as others sponsoring payment incentive programs, should ensure that the awarding of such incentives takes into account collection of the recommended data on race, Hispanic ethnicity, granular ethnicity, and language need so these data can be used to identify and address disparities in care.

Recipients of Federal Funds

Health care entities have indicated that they have been reluctant to make changes to their systems until there is a standardized categorization approach for race, ethnicity, and language need (Bilheimer and Sisk, 2008; Lurie et al., 2005, 2008; NCQA, 2009; NRC, 2003; Siegel et al., 2007, 2008). This report addresses that barrier. An earlier report by the National Research Council, Eliminating Disparities: Measurement and Data Needs, stresses HHS's critical role in implementing change.

The federal government's authority to mandate the nature of data collection is limited, except in large federal health care delivery systems, through the purchasing power of programs such as Medicare, or for recipients of other federal funding mechanisms. HHS administers programs supporting the health care delivery system to provide care to persons at risk of receiving suboptimal care, and these programs present opportunities to influence the quality of care delivered to millions of Americans. For example, at least a 100 million of the 300 million people in the country are served by just three programs administered by HHS—Medicare, Medicaid, and community health centers.11 Ensuring the quality of care to its programmatic participants is an HHS priority, and HHS leadership can make a difference in the adoption of this report's recommendations as it responds to recent legislation to ensure the use of race, ethnicity, and language data in assessing quality of care and building a national health information network (HHS, 2009c).

In earlier chapters, the legal basis for the collection of race, ethnicity, and language data has been established. HHS's 1997 inclusion policy mandates the collection of race and Hispanic ethnicity data for most of its programmatic applications (HHS Data Council, 1999). The policy encourages the inclusion of more detailed race and ethnicity categories than the OMB categories provide, but does not specify additional categories for uniform national use across all HHS programs or define a national standard set from which local programs could select. However, a need for more detailed population information has been apparent, and different entities within HHS have developed their own sets (e.g., PHIN and SEER) to foster the collection of comparative categories for use within their respective programs, but not necessarily across different types of programs. The subcommittee also believes the OMB race and Hispanic ethnicity categories are necessary but insufficient for identification of health care needs and elimination of disparities (go to Chapter 2). Those categories are broad and may mask differences in receipt of appropriate care, and their sole use can end up being inefficient when interventions need only be targeted to a smaller portion of the broad category (for instance, only to populations of Vietnamese ancestry and not all people of Asian ancestry).

Besides ARRA, a new legislative effort that would require collection of race, ethnicity, and language data for use in quality reporting is section 185 of MIPPA. Medicare's plan for implementing this requirement has not yet been fully realized (McGann, 2009; Reilly, 2009b); in a report to Congress due in January 2010, CMS will address approaches to fulfilling the legislative mandate. CMS already uses a variety of direct and indirect methods in its analytic portfolio. Section 187 of MIPPA requires the Office of the Inspector General to examine implementation of culturally and linguistically appropriate services by Medicare providers and plans. In 2000, HHS released National Standards on Culturally and Linguistically Appropriate Services (CLAS) in an effort to influence all health care organizations and individual providers "to make their practices more culturally and linguistically accessible" (Office of Minority Health, 2007). The CLAS standards note the importance of using demographic data to understand and plan for the needs of the community served (standard 11); collecting data on the individual patient's race, ethnicity, and spoken and written language within both individual health records and organizational management information systems (standard 10); and using these data to monitor the cultural and linguistic responsiveness of organizations (standard 9) (Office of Minority Health, 2007). Additionally, section 201(b) of the Children's Health Insurance Program Reauthorization Act of 2009 (CHIPRA)12 pprovides an enhanced federal match for states to be used for language assistance services (interpretation and translation) for children in both CHIP and Medicaid programs. Knowledge of the language needs of people with limited English proficiency within the service population, not just knowledge of languages spoken at home, would be of significant use in understanding state program needs for language assistance. Previously, only about a dozen states and the District of Columbia participated in the matching program under Medicaid (Youdelman, 2007).

HHS's adoption of the subcommittee recommendations for its own programs would promote standardization. It is understood that changing information systems can be an expensive and time-consuming endeavor, and there will be a need for technical assistance and the application of additional resources. But the nation is now seeing the convergence of more nimble technology and efforts to build a stronger information infrastructure, along with federal economic stimulus funds for Health IT.13 Local programs often already collect more detailed data than the OMB categories in order to serve their populations, but these data are lost in aggregation in response to minimal reporting requirements. For others that do not yet have the capability to collect the specified data directly, methods are available for indirectly estimating race, ethnicity, and language need and applying these to quality metrics (go to Chapter 5). Thus, efforts to identify differential needs and disparities need not be delayed.

The subcommittee's task was to recommend standardization of race, ethnicity, and language data for use in health care quality improvement. Thus, the following recommendation focuses on the HHS programs that deliver health care services, pay for health care services through insurance mechanisms, or administer surveys that increase the knowledge base on health care needs and outcomes. The Secretary, however, may find it useful to extend the standardized approach of this report to other HHS health-related programs, such as public health surveillance activities or surveys solely about health rather than also including health care issues.

Recommendation 6-1e: HHS should issue guidance that recipients of HHS funding (e.g., Medicare, the Children's Health Insurance Program [CHIP], Medicaid, community health centers) include data on race, Hispanic ethnicity, granular ethnicity, and language need in individual health records so these data can be used to stratify quality performance metrics, organize quality improvement and disparity reduction initiatives, and report on progress.

Coordination Across Federal Health Care Delivery Systems

The Department of Veterans Affairs (VA) medical system is noted for its use of EHRs, and its experience with quality improvement illustrates the potential of using EHRs throughout the nation's health care system. Realizing the full potential involves being able to stratify quality data by race, ethnicity, and language need. Having qualityof- care information from large federal delivery systems such as the Department of Veterans Affairs, the Department of Defense (DOD), and other federally funded programs, such as community health centers, stratified by the same variables and categories recommended in this report would provide rich sources for comparative analysis. Precedents for coordinating mechanisms for quality purposes exist. For example, ARRA authorizes a Federal Coordinating Council for Comparative Effectiveness Research to assist HHS, the VA, DOD, and other federal agencies in promoting the use of clinical registries, clinical data networks, and other EHRs to produce and obtain data on health outcomes (Rosenbaum et al., 2009). Such a council might serve as a mechanism for coordinating the standard collection of race, ethnicity, and language data among these agencies as part of their promotion of sources for quality data and development of quality metrics.

Recommendation 6-2: HHS, the Department of Veterans Affairs, and the Department of Defense should coordinate their efforts to ensure that all federally funded health care delivery systems collect the variables of race, Hispanic ethnicity, granular ethnicity, and language need as outlined in this report, and include these data in the health records of individuals for use in stratifying quality performance metrics, organizing quality improvement and disparity reduction initiatives, and reporting on progress.

Page last reviewed May 2018
Page originally created September 2012
Internet Citation: 6. Implementation. Content last reviewed May 2018. Agency for Healthcare Research and Quality, Rockville, MD. https://www.ahrq.gov/research/findings/final-reports/iomracereport/reldata6.html
Back To Top