Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement

E. Subcommittee Template: Developing a National Standard Set of Granular Ethnicity Categories and a Rollup Scheme

The Subcommittee on Standardized Collection of Race/Ethnicity Data for Healthcare Quality Improvement recommends using the Office of Management and Budget (OMB) race and Hispanic ethnicity categories (Asian, Black or African American, Native Hawaiian or Other Pacific Islander, White, American Indian or Alaska Native, and Hispanic or Latino) and using a national standard set of granular ethnicity categories based on ancestry. Not all entities collecting data will include the comprehensive list of granular ethnicity categories in their databases or on their data collection instruments as the categories most important for collection by a health plan in Boston might differ from the categories important for collection by a health system in rural Missouri. These entities can select whatever number of locally relevant categories from the national standard set to present as pre-specified categories for check-off of responses; local lists should also be capable of identifying other ethnicities for all who wish to self-identify by including an open-ended choice of "Other, please specify: __." The subcommittee believes it is important to supplement the OMB categories by collecting granular ethnicity data and to retain these in data systems so that more detailed analysis and reporting is possible than with the current OMB categories. The number of categories any entity finds necessary for analysis will vary according to the composition of the population being served or studied, whether the size of subgroups is sufficiently large to make statistically reliable comparisons, and whether the pattern of differences experienced by subgroups identifies distinct needs that are not already revealed by data aggregated into broader categories.

The national standard set must be comprehensive of the nation's population to ensure the capture of even small, geographically isolated population groups that would potentially be important in specific locales for analyses and the provision of culturally and linguistically appropriate care. Furthermore, the set must be responsive to changing demographic trends and thus must be regularly updated.

Development of the Template 

The subcommittee did not identify a single existing category list believed to be comprehensive enough to serve as a national, standard set. For example, as discussed in Chapter 3, the Centers for Disease Control and Prevention (CDC)/Health Level 7 (HL7) Race and Ethnicity Code Set 1.0 does not include all relevant granular ethnicities. It does not, for instance, include Somali or Russian. The Massachusetts Superset was developed partially because of these noted absences in the CDC/HL7 Code Set and includes granular ethnicities that are locally relevant to the Commonwealth of Massachusetts. Demographic distributions confirm that there may be ethnic groups present across the country that may not have a large presence in Massachusetts (e.g., Navajo, which may be of importance in Arizona). Thus, the subcommittee concluded that the Massachusetts Superset provides an ample, but not complete, set of granular ethnicity categories. Similarly, the Kaiser Permanente Granular Ethnicity Code Set was determined to be representative of many, but perhaps not all, granular ethnicities.

To capture all of the granular ethnicities represented in the United States, the subcommittee reviewed the Census Bureau's Ancestry Code List. The Census Ancestry Code List is compiled from responses to the Census' open-ended ancestry question, which allows respondents to write in their lineage or ancestry.1 Thus, the list includes a myriad of granular ethnicity categories, ranging from Hausa, an ethnic group in northern Nigeria, to more general responses of European and American.

The CDC/HL7 Code Set, Massachusetts Superset, Census Ancestry Code List, and Kaiser Permanente Granular Ethnicity Code Set interchangeably use country or place names to indicate ethnicities (i.e., Singapore to represent Singaporean). The subcommittee revised the list to represent categories with ethnicities as opposed to places, whenever possible; this is reflected in the subcommittee's template (Table E-1).

The CDC/HL7 Code Set includes an extensive list of American Indian or Alaska Native categories and codes. Thus, the CDC/HL7 Code Set may serve as the template from which entities can choose locally relevant tribal categories and codes. The Census Ancestry Code list does not include American Indian or Alaska Native tribes. The Massachusetts Superset and the Kaiser Permanente Granular Ethnicity Code Set both include limited lists of locally relevant tribes.

Adaption of the Template to a National Standard List

The subcommittee presents a cumulative list of granular ethnicity categories from different sources (Table E-1) that may serve as a template from which the Department of Health and Human Services (HHS) should develop a national standard list of granular ethnicity categories with accompanying unique codes (Recommendation 6-1a). Some of these granular ethnicities have already been assigned permanent five-digit unique numerical codes by CDC/HL7. The remaining granular ethnicities included in the subcommittee template also need permanent five-digit unique numerical codes.

To indicate which categories and codes may be similar, the Public Use Microdata Sample File (PUMS) considers some Census ancestry codes to have "corresponding detailed ancestry codes" (i.e., Hausa may be said to correspond with Nigeria).2 The subcommittee concluded that because of the large number of very specific ethnicities included on the Census Ancestry Code List, some ethnicities would be best presented as corresponding with others. Corresponding ethnicities are indicated in Table E-1 using indents. When HHS is developing codes for the granular ethnicity categories included in this template (per Recommendation 6-1a), corresponding ethnicities may have the same codes (i.e., one or more granular ethnicity categories may have the same code).

Rollup to the OMB Race and Hispanic Ethnicity Categories

Locally tailored quality improvement activities may target granular ethnicity groups without needing to relate those groups to a single OMB race category. Collecting race, Hispanic ethnicity, and granular ethnicity data separately, as the subcommittee recommends, allows reporting of the OMB categories when necessary without requiring rollup of the granular ethnicities, provided that individuals respond to all the questions asked. Nonetheless, the subcommittee recognizes that data collected under some circumstances (e.g., a reporting request for OMB-level data where only granular ethnicity is collected) cannot be used or compared with data collected using the OMB race and Hispanic ethnicity categories without the use of a rollup scheme to link granular ethnicities to the OMB categories. To examine both the feasibility and limitations of such schemes, the subcommittee mapped in Table E-1 granular ethnicity responses collected from the ancestry question on Census 2000 to the OMB race and Hispanic ethnicity categories. Table E-1 uses the existing CDC/HL7 rollup scheme as a basis; the subcommittee tested the assumptions of those OMB category assignments with responses to the Census race and Hispanic ethnicity questions to determine if 90 percent of respondents giving a specific ancestry response identified with the category to which the CDC/HL7 rollup scheme assigns them.

For most granular ethnicity categories, 90 percent or more of respondents to Census 2000 did self-identify with the OMB category to which the CDC/HL7 rollup would assign them. However, Appendix F identifies a number of granular categories that do not meet the 90 percent threshold, and thus would have "no determinate OMB race classification" if this threshold was adopted. An analyst wanting to roll up the categories in Appendix F to an OMB race group or Hispanic origin would have to defer to existing OMB and Census definitions based on geographic ancestry (go to Table 1-1 in Chapter 1 of the report). While many granular ethnicities can be mapped to the OMB Hispanic ethnicity category based on the existing CDC/HL7 roll-up, none of the granular ethnicities associated with the Hispanic ethnicity category can be assigned to an OMB race category with greater than 90 percent certainty. In addition, high percentages of persons who report an American Indian or Alaska Native ancestry have been known to identify as White, multiracial, or "Some Other Race" (see discussion of American Indian or Alaska Natives in Chapter 2). Similarly, substantial portions of respondents who report a Pacific Islander ancestry identify with a race besides with the Native Hawaiian or Other Pacific Islander race category. The tribal groups and Pacific Islander groups have been left by the subcommittee in the American Indian or Alaska Native and Native Hawaiian or Other Pacific Islander race categories, respectively rather than move them to a "no determinate OMB race classification." There was insufficient data to apply the 90 percent rule to all the individual subcategories under those headings.

Analysts should understand that making an assignment using a 90 percent (or any other percent) threshold or an assignment based solely on geography incurs a higher probability that the rollup assignment misclassifies individuals based upon how they would self-identify their race when asked directly about their race. The rates of misclassification, even for granular ethnicities meeting a 90 percent threshold, underscores the fact that rollup schemes only provide probabilistic assignments useful for analysis at the group or population level, and should never be used to assign an actual race to an individual's medical record.

Entities may, in some instances, want to aggregate granular ethnicity categories into broader ethnicity categories for analysis or to meet reporting requirements (e.g., aggregating all western European granular ethnicities into a broad "Western European" category). However, the granular ethnicity data should be retained in data systems when the data are shared and for use in future analysis, reporting, and service provision. The subcommittee notes that the Census ancestry code list groups ethnicities partially by geography (e.g., Western European [sans Spanish], South Asian, Sub-Saharan African) and partially by Hispanic ethnicity (e.g., Spanish, Central and South American, and West Indian).3 The Massachusetts Superset includes 31 broader ethnicity categories and 140 sub-ethnicity categories. The sub-ethnicity categories can be aggregated to the broader ethnicity categories as needed for reporting and analysis. The subcommittee concluded though that these mid-level groups should not necessarily collapse into the OMB race categories.

The list of granular ethnicities presented below provides a baseline template for a national standard set of granular ethnicity categories. An entity can decide, based on local circumstances, whether to use 10 or 100 categories from the template for collection and/or analysis. If the entity sees an increase in the use of the "Other, please specify: _____" option, it may consider adding categories to its local list. If an organization chooses not to have a preset list of categories, it will need to compile responses according to the template to ensure comparability with data collected by other entities.


1 The CDC/HL7 Code Set was developed using write-in responses to the Census questions on race and Hispanic ethnicity, not responses to the Census ancestry question. The Census ancestry list is more comprehensive than the list used to develop the CDC/HL7 Code Set.
2 U.S. Census Bureau. 2007. ACS 1-year PUMS code lists: Ancestry codes. https://www.census.gov/data/tables/2000/dec/phc-t-43.html
3 U.S. Census Bureau. 2001. Ancestry code list. http://factfinder.census.gov/metadoc/ancestry.pdf (accessed June 18, 2009).

Page last reviewed October 2018
Page originally created September 2012
Internet Citation: E. Subcommittee Template: Developing a National Standard Set of Granular Ethnicity Categories and a Rollup Scheme. Content last reviewed October 2018. Agency for Healthcare Research and Quality, Rockville, MD. https://www.ahrq.gov/research/findings/final-reports/iomracereport/reldataape.html
Back To Top