Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement

3. Defining Categorization Needs for Race and Ethnicity Data

The collection of data in the Office of Management and Budget (OMB) race and Hispanic ethnicity categories is improving across a variety of health care entities, but all entities do not yet collect or report data using these categories. Moreover, disparities within the broad groups represented by these categories support the case for collection of granular ethnicity data beyond the OMB categories. Given variations in locally relevant populations, no single national set of additional ethnicity categories is best for all entities that collect these data. Collection of data in the OMB race and Hispanic ethnicity categories, supplemented by more granular ethnicity data, is recommended, with tailoring of the latter through locally relevant categories chosen from a standardized national set. In most cases, rolling up the data on granular ethnicities to the OMB categories will be possible, but it will be necessary to exercise care as there are certain ethnicities that do not correspond with any one race. However, when questions about race and granular ethnicity are both answered, rollup is not necessary.

Collecting and maintaining demographic data in medical records and enrollment files allows for analyses stratified by race and ethnicity to identify needed improvements in health care, and for identification of individuals or population groups that might be the focus of interventions designed to address health care needs. The resultant analyses can be used, for example, to plan specific features of interventions (e.g., the use of culturally relevant content in outreach communications about preventive services) and to compare the quality of care being provided by various entities serving similar populations. The primary reason for standardizing categories for the variables of race and ethnicity is to enable consistent comparison or aggregation of the data across multiple entities (e.g., state-level analyses of providers under Medicaid or a health plan's analysis of disparities in multiple states where it is operating). At the same time, standardized categories must enable persons to self-identify with the categories and increase the utility of the data to the entity collecting them.


Both federal and state agencies (e.g., the Social Security Administration and state Medicaid programs) classify individuals by their race or ethnicity to obtain useful information for health and health care purposes (Mays et al., 2003). Other entities, such as health plans, health professionals, hospitals, community health centers, nursing homes, funeral directors, public health departments, and the public, play roles in categorizing, collecting, reporting, and using these data for quality improvement purposes. Coordinating efforts of these stakeholders to ensure accurate collection and reporting of uniformly categorized race and ethnicity data could lead to more powerful analyses of aggregated data (Sequist and Schneider, 2006). While progress has been made in the past few years to incorporate the existing national standard set of categories promulgated by OMB (Table 3-1) into the collection and presentation of data, many data collection efforts still do not fully employ these basic standard categories.

All health and health care entities are not required to collect data on race and ethnicity, but if they do, the OMB categories are the minimum that a federal agency or recipient of federal funds must include in its categorization and reporting. The OMB standards have acknowledged imperfections, though. The categories are often, as shown by the literature review in Chapter 2, too broad for effectively identifying and targeting disparities in health and health care. Additionally, a substantial portion of Hispanics do not relate to the race options, leading to many Hispanics being reported in Census data as "Some other race" because they do not choose any of the five OMB race categories (del Pinal et al., 2007; NRC, 2006; OMB, 1997a). While OMB allows two formats for the race and Hispanic ethnicity questions—one combining both race and Hispanic ethnicity in a single question and the other asking about them in two separate questions, with the Hispanic ethnicity question being asked first (Table 3-1)—OMB explicitly prefers the latter two-question format (OMB, 1997b). As discussed later in the chapter, the format used may have implications for Hispanic response rates (Baker et al., 2006; Laws and Heckscher, 2002; Taylor-Clark, 2009).

This chapter examines approaches to categorizing race and ethnicity by (1) reviewing the current state of standardized collection of race and ethnicity data, with a focus on the sufficiency of the OMB categories and their uptake in various areas of health care data collection; (2) examining the utility of the continued use of the current OMB categories; and (3) considering how the OMB race and Hispanic ethnicity categories can be combined with locally tailored, more detailed ethnicity categories selected from a national standard set, with standardized coding and rollup procedures, to capture important variations among ethnic groups. The chapter concludes by exploring approaches to eliciting responses on race, Hispanic ethnicity, and granular ethnicity, and reviewing models for data collection.

Current State of Standardized Collection of Race and Ethnicity Data

As previously noted, a variety of entities, many of which fall under the purview of the Department of Health and Human Services' (HHS') 1997 inclusion policy, collect race and ethnicity data for a variety of purposes. The HHS inclusion policy mandates the collection of at least OMB race and Hispanic ethnicity data in specific circumstances, such as in administrative records, surveys, research projects, and contract proposals associated with direct federal service programs. While the policy does not state which specific categories should be collected in addition to the OMB categories, it encourages the collection and reporting of subgroup data (HHS Data Council, 1999).


Exploring the current state of data categorization provides insight into the challenges faced by health- and health care-related entities in categorizing and collecting the data. Table 3-2 shows the categories used by various federally funded health surveys, state birth records, and cancer registries. Many of these data sources are national-level collection systems designed—among other purposes—to make comparisons across time, providers, and geographic areas (Madans, 2009). These surveys collect race and Hispanic ethnicity data in the six categories specified by OMB and a usually common set of 9 to 12 additional ethnicity categories. For example, the National Health Interview Survey (NHIS), National Survey on Drug Use and Health (NSDUH), and Medical Expenditure Panel Survey (MEPS) all include the OMB categories plus Mexican, Cuban, Puerto Rican, Asian Indian, Chinese, Filipino, Japanese, Korean, and Vietnamese categories, among others. These categories generally correspond to the 15 response check-off boxes included in Census 2000, Census 2010, and intercensal American Community Survey (ACS) questions on race and ethnicity (Table 3-2).

Despite HHS' inclusion policy, some HHS agencies have not collected even the minimum OMB categories (e.g., Medicare enrollment files). In general, HHS-funded or -sponsored surveys collect the minimum OMB categories—and often additional categories—but all categories are not necessarily reported or analyzed because of small sample sizes. As specific stratifying variables are applied to survey data, for example, the pool of applicable respondents gets smaller (e.g., receipt of diabetes care services by age and race), which may make the number of cases of small racial or ethnic groups too small for analysis. In contrast to surveys, most national administrative datasets are case-rich, meaning they may contain enough data to allow for analyses of even small ethnic groups. For example, the Medicare databases contain a large number of cases and thereby could play an important role in stratifying data by race and ethnicity.

Race and Ethnicity Categorization in Medicare Data

Medicare, a large source of quality improvement data, has limited race and ethnicity data in the enrollment files for its 44.8 million beneficiaries. Because of the history of how race and ethnicity data have been captured (Reilly, 2009), the available race and ethnicity data are often of low accuracy and quality (Bilheimer and Sisk, 2008; Bonito et al., 2008; Eicheldinger and Bonito, 2008; Ford and Kelly, 2005; U.S. House Committee on Ways and Means Subcommittee on Health, 2008). Analyses of Medicare administrative enrollment data found that while the validity of individual data on race and ethnicity was high for Whites and Blacks (the sensitivity was 97 and 96 percent, respectively), only 52 percent of Asian, 33 percent of Hispanic or Latino, and 33 percent of American Indian or Alaska Native beneficiaries were correctly identified (McBean, 2006). Medicare has historically relied on the race and ethnicity data individuals provided when they applied for a Social Security number (SSN). Before 1980, the SSN application form limited respondents to choosing Black, White, and Other. Since most people age 65 and older today received a SSN prior to 1980, their racial and ethnic identifiers were limited to these responses unless the individual changed enrollment to a specific health plan. The current SSN application combines race and ethnicity into a single question and includes only five of the six OMB categories.1 Consequently, Medicare data have been of limited use in studying differences in patterns of care for populations identified by the OMB categories (Bilheimer and Sisk, 2008; Bonito et al., 2008; Eicheldinger and Bonito, 2008; Ford and Kelly, 2005; U.S. House Committee on Ways and Means Subcommittee on Health, 2008).

The limitations of the Medicare data for race and Hispanic ethnicity have been acknowledged by Centers for Medicare and Medicaid Services (CMS) officials, and CMS is actively working to improve its coding of race and ethnicity data by working with the Social Security Administration (SSA) to ensure the capture of data according to the OMB minimum standards (Reilly, 2009). CMS has also explored a variety of indirect estimation techniques to improve analyses of race and ethnicity differentials among individuals currently in the Medicare data system (Bonito et al., 2008; Wei et al., 2006).2 A 2009 white paper by the U.S. Senate Finance Committee presented proposals to improve patient care and health delivery. One proposal included a comprehensive database required of CMS to expand existing data sources, data sharing, and matching across federal and state claims and payment data, including HHS; SSA; the Departments of Veterans Affairs (VA), Defense (DOD), and Justice (DOJ); and the Federal Employees Health Benefit Program (FEHBP) (U.S. Senate Finance Committee, 2009). The results of this and other proposals to revise payment systems and policies in the Medicare program remain to be seen. Under the Medicare Improvements for Patients and Providers Act of 2008,3 CMS is required to address quality reporting by race and ethnicity. A report by CMS detailing its proposed actions is due to Congress in January 2010.

Race and Ethnicity Categorization in State-Administered Programs

Much, but not all, of the collection of standardized data at the state level is done under federally funded programs, including Medicaid and the Children's Health Insurance Program (CHIP). Other state data collection systems, such as hospital discharge data systems and cancer registries, aim to use race and ethnicity data categories that are consistent with nationally collected denominator data (Friedman et al., 2000; Laws and Heckscher, 2002). States face difficulties, though, in consistently collecting accurate and reliable data that are uniformly classified.

Medicaid and CHIP

The Children's Health Insurance Program Reauthorization Act of 2009, signed into law in February 2009,4 stipulates the development, by January 2011, of quality measures designed to identify and eliminate racial and ethnic disparities in child health and health care. Children's Health Insurance Program Reauthorization Act of 2009, Public Law 111-3, 111th Cong., 1st sess. (February 4, 2009). This legislation has the potential to improve measurement of disparities for children in federally funded programs as it specifies that "data required for such measures is [sic] collected and reported in a standard format that permits comparison of quality and data at a State, plan, and provider level." A national standard set of race and ethnicity categories is necessary to stratify and compare these quality metrics across the Nation.


Although states are mandated to submit Medicaid claims data electronically to CMS, there are anomalies in the submitted data (CMS, 2009). For example, in 2003, race and Hispanic ethnicity were listed as "unknown" for more than 20 percent of enrollees in New York, Rhode Island, and Vermont (McAlpine et al., 2007). A 2004 survey noted that while the majority of states were collecting self-reported race and Hispanic ethnicity from their Medicaid and CHIP beneficiaries, most commonly during the enrollment process (Llanos and Palmer, 2006), few states were collecting the six OMB minimum categories (Palmer, 2004). Many states were including Hispanic as an option in the race question instead of asking a separate question about ethnicity (McAlpine et al., 2007); as noted earlier, OMB permits this format but explicitly prefers the two-question format. The subcommittee's research indicates that some progress has been made in the past six years on the collection of Medicaid data using the OMB standards. The subcommittee examined state Medicaid and CHIP application forms and found improved standardization, most notably in collecting the Asian and Native Hawaiian or Other Pacific Islander (NHOPI) categories (Table 3-3).

Vital Statistics Data

Failure to use standard categories and nonreporting or misreporting of data complicate efforts to calculate national and state birth, mortality, and morbidity rates by the OMB race and Hispanic ethnicity categories or for more detailed categories. The National Vital Statistics System (NVSS), hospital discharge data, and state registries provide data needed to calculate these rates, but the data may not be collected and reported according to the OMB categories or may be of poor quality. While the standard birth, death, and fetal death certificates now include the OMB categories plus 13 other categories,5 not all jurisdictions have adopted these standard certificates. The categories collected on the standard death certificate are included in Table 3-2. As of April 1, 2009, 32 jurisdictions (56 percent) had adopted the 2003 standard birth and death certificates, and 22 jurisdictions (39 percent) had adopted the 2003 standard fetal death report. The percentage of these vital events covered by the states that have adopted the 2003 standard certificates is higher, however, because they are states with larger populations.6

Death certificates provide the numerator for calculating death rates, while Census data provide the denominator. A deceased individual's race and ethnicity are often identified by the funeral director relying on his or her own observation, which is often inaccurate, particularly for racial and ethnic groups with a large number of multiracial individuals (Arias et al., 2008; Durch and Madans, 2001). For example, an individual who may self-identify as White and American Indian or Alaska Native may be categorized as only White by a funeral director, resulting in undercounting of deaths in the American Indian or Alaska Native population. Misclassification on death certificates produces a substantial net underestimate of mortality rates for Hispanic, Asian, American Indian or Alaska Native, and NHOPI populations (Arias et al., 2008; Durch and Madans, 2001). An assessment of the quality of death rates found them to be understated by 11 percent for both Asians and Pacific Islanders and about 21 percent for American Indians and Alaska Natives (Rosenberg et al., 1999).

Hospital Discharge Data

Hospital discharge records sometimes lack race and ethnicity information (Gold et al., 2008; Schoenman et al., 2005) because hospitals either are not required to collect and report this information or choose not to do so (Romano et al., 2003). As of May 2009, at least 39 states included some race and ethnicity data in their discharge data reporting requirements. These data fields, however, are often added without additional resources to support complete and consistent reporting. Consequently, collection and coding practices vary, and data quality may be poor.7

Forty states voluntarily participate in the HCUP databases, but only 31 of these provide HCUP with race and ethnicity data. Of these 31 states, several do not report data using the minimum OMB race and Hispanic ethnicity categories, and others report the data in different categories that HCUP must recode to allow multistate and national-state comparisons (Box 3-1) (AHRQ, 2006).

Box 3-1. Race and Ethnicity Categories in the HCUP Databases

The Healthcare Cost and Utilization Project (HCUP), a family of health care databases sponsored by the Agency for Healthcare Research and Quality (AHRQ), relies on the voluntary participation of 40 states to submit hospital discharge data. HCUP databases contain clinical and nonclinical information, including patient demographics, diagnoses, procedures, discharge status, and charges for all patients, regardless of payer (e.g., persons covered by Medicare, Medicaid, and private insurance, as well as no insurance). One HCUP data element contains source-specific information about the race and ethnicity of the patient: "race" retains information on the race of the patient as provided by the data source, and "Hispanic" retains information on Hispanic ethnicity as provided by the data source.

Only 31 of the 40 participating states provide race and ethnicity data to HCUP. Some states report on all the OMB standard categories (e.g., Arizona, Missouri), some states (e.g., Hawaii, Massachusetts, New Jersey) collect more detailed ethnicity data, and some states do not report on the minimum OMB categories (e.g., Arkansas, North Carolina, Utah). HCUP recodes the data into the race and Hispanic ethnicity categories by which it analyzes and stratifies data: White, Black, Hispanic, Asian or Pacific Islander, Native American, and Other. These categories are similar to but do not in totality mirror the OMB standards.

Sources: AHRQ, 2006; Fraser and Andrews, 2009.

Cancer Registries

State cancer registries collect, classify, consolidate, and link information on new cancer cases from hospital reports, medical records, pathology reports, hospital discharge data, and death certificates (CDC, 2009). Cancer registries operate in 45 states, the District of Columbia, Puerto Rico, and the U.S. Pacific Islands, providing surveillance capabilities for identifying patterns, trends, and variation in disease burden and care among racial and ethnic groups. Difficulties may arise, however, in coding race and ethnicity from such disparate sources including, for example, the hand-written observations of physicians (Izquierdo and Schoenbach, 2000).

The National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) Program coding manual includes two of the OMB categories directly (e.g., White and Black) and more granular ethnicity categories that correspond to the other OMB standard categories (e.g., instead of a broad Hispanic ethnicity category, SEER asks more specifically whether a person is Puerto Rican or Cuban. Go to Table 3-2 for the categories coded by SEER); altogether there are 34 categories. Because SEER stratifies the data whenever possible by more discrete groups, registries are instructed to categorize a patient's ancestry by one of the 34 categories (Johnson and Adamo, 2008). SEER supplements and improves the data it receives from states by, for example, linking with the Indian Health Service to improve American Indian or Alaska Native data (Box 3-2). SEER also uses an indirect estimation algorithm based on Spanish surnames and birthplace to improve Hispanic classification, and an algorithm based on surnames and birthplace to improve data on Asian and NHOPI ethnic groups (Edwards, 2009).

Review of the State of Standardization

This review of categories currently used in various data collection activities highlights that there are substantial efforts nationally, by a number of states, and by various health care organizations to collect race and Hispanic ethnicity data according to the OMB standards. However, not all of these efforts have yet achieved that level of categorization, and national surveys, nationally standardized birth and death certificates, and cancer registries have found it useful to use more fine-grained categorizations beyond the basic OMB categories. Efforts to standardize categorization and collection will eliminate some of the problems with comparability among data collected by disparate systems.

Box 3-2. The Use of Data Linkages to Improve Data Coverage and Quality in Cancer Registries

The American Indian or Alaska Native population makes up just over one percent of the U.S. population and is dispersed throughout the country. This complicates the collection and aggregation of data on cancer incidence, an especially important task because unique circumstances of culture, locale, history, and health care produce unusual patterns of cancer occurrence among American Indian or Alaska Native populations (Cobb et al., 2008). Alaska Natives, for example, have rates of lung, colon, and breast cancer five times higher than those of Southwestern Indians.

Studies have demonstrated that many American Indian or Alaska Natives are misclassified as another race in cancer registry data, and dividing these numerators with population denominators from the Bureau of the Census has the effect of underestimating cancer rates for American Indian or Alaska Natives. To address this problem, SEER cancer registries (which cover 26 percent of the total U.S. population and 42 percent of the American Indian or Alaska Native population) have been linked with Indian Health Service (IHS) beneficiary records using LinkPlus, a probabilistic linkage software program developed by the Centers for Disease Control and Prevention (CDC), to identify records representing the same individual in the IHS and registry databases (Espey et al., 2008).

Continued Use of the OMB Categories

The OMB race and Hispanic ethnicity categories were deemed to represent the country's broad population groups most necessary or useful for a variety of reporting and analytic purposes not specific to health care. The 1997 Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity were developed over a 4-year period during which an interagency taskforce weighed public input, expert testimony, and other evidence to consider whether and how to modify OMB's 1977 standards (OMB, 1977, 1997b). OMB has no plans to change its current standards (Wallman, 2009).

Chapter 2 documented important variations in health and health care that may be masked when data are analyzed using only the OMB race and Hispanic ethnicity categories. Notwithstanding this limitation, a large body of studies has revealed disparities in health and health care among the groups represented by those categories. Thus, use of the OMB categories yields important data for quality improvement analyses and reporting efforts (AHRQ, 2008; Cohen, 2008; Flores and Tomany-Korman, 2008; IOM, 2008; Kaiser Family Foundation, 2009). Additionally, because OMB-level reporting is required by various federal agencies and recipients of federal funds, the OMB categories serve as a denominator for many comparisons related to health and health care. Thus, the OMB categories are useful for high-level analysis, reporting, and policy intervention (e.g., in the National Healthcare Disparities Report), as well as more local uses. If all entities were to collect race and ethnicity data using the OMB categories, the process of combining or comparing data across reporting entities (e.g., hospitals in states contributing to HCUP or health plans' Healthcare Effectiveness Data and Information Set [HEDIS] data stratified by race and ethnicity) would be greatly facilitated. While the OMB categories do not define more specific subgroups and do not address how to include all difficult-to-categorize groups, they provide a useful common minimum platform for analyzing disparities in health care.

Past Activities to Improve the Collection of Data in the OMB Categories

One assumption underlying self-identified race and ethnicity data collection is that the categories and designations are recognized and accepted by the populations questioned (CDC, 1993; Lin and Kelsey, 2000). Improving the likelihood that respondents can identify with the races and ethnicities offered as response options is therefore essential to the quality of the data collected. Challenges in capturing accurate and reliable OMB-level data include the lack of detailed categories to which individuals can relate and the format of the questions used to elicit Hispanic ethnicity.

Categorizing Diverse Populations

A wide range of cultures, languages, and health-related behaviors are encompassed by each of the six OMB race and Hispanic ethnicity categories. For example, the Asian category blurs ancestry distinctions and vast cultural and geographic diversity (Holup et al., 2007). As a result, the Asian race identification may not resonate with all individuals of Pakistani, Vietnamese, or Filipino descent, for example, who might prefer to self-identify according to their ancestry (Box 3-3) (Laws and Heckscher, 2002).


Similarly, the Black or African American, White, American Indian or Alaska Native, and NHOPI populations consist of heterogeneous groups and persons within these groups may not identify with the broader race categories (Bailey, 2001; Mays et al., 2003). The Census Bureau has recognized that check-off boxes that represent more detailed categories in addition to the broad OMB categories resonate better with respondents. The Census includes several ancestry options on the Hispanic origin question and several Asian and NHOPI ancestries on the race question (Figure 3-1). Additionally, the inclusion of space to write in a free-text response permits individuals who do not identify with any of the provided check-off boxes to self-identify.


In Census 2000, about 15.4 million respondents were classified in the "Some other race" alone category, which was added to the OMB categories; this represents 5.5 percent of the total U.S. population.8 The 2005 Omnibus Appropriations Bill, at the urging of Congressman Jose E. Serrano (D-NY), directed that any collection of Census data on race identification must include "Some other race" as a response category. In previous censuses, the Census Bureau had sought and received OMB approval to include "Some other race" as a response category (U.S. Census Bureau, 2002b). More than 97 percent of those who chose this category were Hispanic (Rothenberg, 2006), and the remaining write-in responses included a range of answers, such as German and Guyanese. As Table 3-4 illustrates, 42.2 percent of the 35.2 million Hispanic respondents identified with the response category "Some other race." High rates of reporting "Some other race" on the Census may indicate that Dominicans, for example, are uncomfortable with saying "I am Black," or "I am White," and instead prefer to identify with a separate, distinct group (Bailey, 2001).9

Hispanics (discussed below) dwarf the other ethnicities in the "Some other race" category by virtue of their numbers, but individuals of other ethnicities, such as Cape Verdeans and Guyanese, also often do not self-identify with any of the OMB race and Hispanic ethnicity categories (Hernandez-Ramdwar, 1997; Laws and Heckscher, 2002; Model and Fisher, 2008). Consequently, these individuals, as well as many people of Filipino descent, among others, may not respond to the race question or may check "Some other race" if the option is available. The sub-committee concludes that making this option available in addition to the OMB categories would allow individuals who do not identify with one of the OMB race categories to respond (Recommendation 3-1 below).

Box 3-3. The Challenge of Categorizing Filipino Respondents

The Philippines consist of over 7,000 islands set in the western Pacific Ocean. The OMB standards define persons of Filipino descent as Asian. To evaluate Asian subgroup responses to race and ethnicity inquiries, Holup and colleagues (2007), asked a subset of adults participating in the Hemochromatosis and Iron Overload Screening Study to complete both the OMB-minimum and the expanded race and ethnicity measure used in the National Health Interview Survey (NHIS). The expanded measure used in the NHIS includes response categories for Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese, and Other Asian. While 89 percent of single-heritage Filipinos marked Asian in the OMB-minimum categorization, the remaining 11 percent marked primarily NHOPI. Filipinos have also been known to categorize themselves as Spanish (Mays et al., 2003), Pacific Islander, Asian American, or, if multiracial, White (Yu and Liu, 1992). Holup and colleagues note that while OMB's decision to separate the Asian and Pacific Islander category in the 1997 OMB revisions was a positive step, specification or provision of definitions when using the minimum OMB categories is "prudent."

Format of the Race and Hispanic Ethnicity Questions

One of the principal challenges in capturing race and ethnicity data for purposes of improving health care is determining how best to capture the Hispanic or Latino population, a population comprising groups that vary widely in their characteristics (McKenney and Bennett, 1994; NRC, 2006). Many Hispanic individuals, including persons of Mexican, Puerto Rican, and Cuban heritage, prefer to self-identify using their specific ancestry as opposed to the general category Hispanic or Latino (Bowman, 1994; Gimenez, 1989; Hayes-Bautista and Chapa, 1987). The term "Hispanic" may not resonate with immigrants, in particular, because it is not used outside the United States (NRC, 2006). Many Hispanics choose "Some other race" instead of the OMB race options when given the opportunity to do so, or refuse to answer the race question when it is asked (Hasnain-Wynia et al., 2008). In a study of birth certificate data, for example, approximately two-thirds of the 15,074 mothers of Hispanic ethnicity reported their race as "Some other race" (Buescher et al., 2005). Research indicates that children of immigrants may be even more likely than their parents to self-identify as "Some other race" (NRC, 2006; Portes and Rumbaut, 2001).

As previously stated, the OMB standards encourage, "whenever feasible," the separation of questions on race and Hispanic ethnicity, a distinction stemming from a 1976 law requiring documentation of the size and growth of the Hispanic population.10 Some research prior to the 1997 OMB revisions indicated that the separate, twoquestion format in which Hispanic ethnicity is elicited before race11 best identifies an OMB race category for as many Hispanic individuals as possible and allows analyses of combined race and Hispanic ethnicity categories (e.g., Hispanic Black and non-Hispanic Black). The two-question format may capture important health differences among groups. A 2006 study, for example, found that non-Hispanic Blacks have higher risks of developing coronary disease (5.8 percent) than Hispanic Blacks (4.7 percent, P = 0.017) (Lancaster et al., 2006). Additionally, a yet-to-be-released study of data from the NHIS indicates that Hispanic Blacks have a different health services and health status profile from that of either Hispanics or Blacks (Austin et al., 2009). However, the need for the dual categorization of Hispanic ethnicity and race for health care improvement purposes is not well studied.

At the same time, some research suggests that Hispanic respondents better identify with questions on race and Hispanic ethnicity when a one-question instead of a two-question format is used (Baker et al., 2006; Laws and Heckscher, 2002; Taylor-Clark, 2009). For example, the Census Bureau's 1996 Racial and Ethnic Targeted Test (RAETT), which was administered to a sample of households in preparation for Census 2000, experimented with combining race and Hispanic ethnicity into a single question. Nonresponse to the one-question format was significantly lower than nonresponse to the two-question format. However, in the one-question format, many people who had identified as Hispanic and White or Black in the two-question format changed their response to only Hispanic, despite being permitted to "Select one or more" categories (Bennett et al., 1997).12 Yet while conventional wisdom indicates that the combined format maximizes response among Hispanics (Hirschman et al., 2000; OMB, 1997a; Tucker et al., 1996; U.S. Census Bureau, 1996a), survey research has been inconclusive regarding the best way to capture information on race and Hispanic ethnicity among this population. Continued testing of a combined-question format during the 2010 Census may reveal additional information on this issue (Humes, 2009; NRC, 2009).

Legislative efforts are under way to increase the options on the Census 2020 forms to include Caribbean, Dominican, and other populations. In the first session of the 111th Congress, Representative Charles Rangel (D-NY) and Senator Kirsten Gillibrand (D-NY) introduced bills HR 1504 and SB 1084, respectively, to require that in Census questionnaires, a check-off box be included so that respondents may indicate Dominican ethnicity. Also in the first session, Representative Yvette D. Clarke (D-NY) and Senator Charles Schumer (D-NY) introduced bills HR 2071 and SB 1083, respectively, to include a Caribbean check-off box on all future Census forms. These efforts indicate a continued call for more detailed ethnicity data. The need for more detailed data and concerns about Hispanic response may require OMB to review its standards. Most important, the subcommittee concludes there is a need for an assessment of the extent to which lack of identification with the OMB categories interferes with accurate data collection for use in quality improvement efforts (Recommendation 3-3).

Identification of Multiracial Individuals

The 1997 OMB standards require that respondents be allowed to report more than one race and recommend "Mark one or more" and "Select one or more" as the included instruction. Approximately 2.4 percent of the country's population (6.8 million persons) reported multiple races in Census 2000 (U.S. Census Bureau, 2000); this percentage can be expected to increase in the coming years (Edmonston et al., 2000). The largest percentage of multirace responses are from Hispanics; in Census 2000, Hispanics were more than three times as likely as non-Hispanics to self-identify with multiple race responses (NRC, 2006). As a result, like the "Some other race" category, multirace reporting is expected to increase with the growth of the Hispanic population. Additionally, in some areas of the country, the proportion of the population self-identifying as multiracial is substantial. In Census 2000, there were 14 states where the multiracial population was above the nationwide average of 2.4 percent. For example, the multiracial population in Hawaii totaled 21 percent, followed at a distance by Alaska at 5.4 percent (Jones and Smith, 2001).

In analysis and reporting, organizations often collapse reported multiracial combinations into an aggregate "more than one race" or a "multiracial" category because the sample sizes for the individual combinations are usually too small for analysis. The Census' 1996 RAETT found that the option to "Select one or more" captures the same number of individuals as a single, multiracial/biracial category (Hirschman et al., 2000). The former instruction, though, allows for the identification of specific races, whereas the latter does not. Where possible, information on specific combinations of races and ethnicities should be preserved so the data can be aggregated over enough reporting units or periods to provide more informative analyses and the basis for targeted interventions. A single category labeled "multiracial" or "more than one race" may mask valuable information that could be used in analyses. More accurate analyses may require detail on each category selected by a respondent.

Some health information technology (HIT) systems are unable to support the collection and reporting of data in a "Select one or more" manner.13 All possible combinations of the six OMB categories results in 64 combinations. OMB guidance stipulates that civil rights enforcement agencies must include the four "double-race" combinations most frequently reported. The U.S. Department of Housing and Urban Development, for example, tabulates respondents by the five OMB race categories and four specific multiple-race combinations:

  • American Indian or Alaska Native and White.
  • Asian and White.
  • Black or African American and White.
  • American Indian or Alaska Native and Black or African American.

A sampling of the local service population or an examination of applicable Census data could reveal the most common combinations that an organization might want to capture if its information system does not allow all combinations under the "Select one or more" option.

Counting multiracial individuals as members of each individual race they select (e.g., counting individuals who self-identify as Black and Native Hawaiian in both the Black and NHOPI categories) may double-count respondents and inflate the number of respondents in denominator data. Therefore, this practice may come "at the expense of misstating disparities in the health of specific racial/ethnic groups" (Mays et al., 2003, p. 89), especially among populations in which the ratio of responses involving multiple races to a single race is high (e.g., American Indian or Alaska Native and NHOPI populations). On the other hand, this practice allows analyses to include all those who identify with a specific group.

To avoid double-counting, prioritization schemes, commonly referred to as trumping rules, recategorize multiracial individuals into a single race category and facilitate comparison of the data with data from systems that allow only single-race categories. For example, OMB guidelines stipulate that when addressing civil rights claims, "responses that combine one minority race and white are allocated to the minority race" (OMB, 2000).

Prioritization schemes reflect a lack of consideration of multiracial respondents' preference, aversion, or indifference to identifying primarily with one race. The NHIS and the California Health Interview Survey (CHIS) ask respondents who report more than one race whether there is a category with which they most identify, providing an opportunity to categorize individuals in a way that most closely matches their preferred self-identification. Those responses then can be used to inform the assigning of multiracial individuals to single-race categories in a manner more informative than arbitrary prioritization schemes (Holup et al., 2007). However, while many multiracial individuals identify with one race (Mays et al., 2003), some multiracial individuals may hesitate to choose one racial identity over another. Asking such a question also requires the collection and coding of data on an additional variable, which may be burdensome for some data systems. The subcommittee concludes that retaining specific combinations or codes for more common combinations in data systems allows for more thorough analysis and reporting. Different ways of aggregating multiracial categories may be appropriate for different purposes; therefore, the subcommittee does not endorse any single analytic approach but concludes that, whenever possible, each race an individual selects on a collection form be available for analysis.

Page last reviewed May 2018
Page originally created September 2012
Internet Citation: 3. Defining Categorization Needs for Race and Ethnicity Data. Content last reviewed May 2018. Agency for Healthcare Research and Quality, Rockville, MD. https://www.ahrq.gov/research/findings/final-reports/iomracereport/reldata3.html
Back To Top