Chapter 3: Defining Categorization Needs for Race and Ethnicity Data

Race, Ethnicity, and Language Data: Standardization for Health Care Qu

Need for Locally Relevant Granular Ethnicity Categories

As noted earlier, the Office of Management and Budget (OMB) categories, when used alone, can mask important within-group variations in quality of care (Blendon et al., 2007; Jerant et al., 2008; Read et al., 2005; Shah and Carrasquillo, 2006). While the OMB standards include only two ethnicity categories (Hispanic and not Hispanic), many other ethnicities exist. Assessing and reducing disparities within the broad race and Hispanic ethnicity categories requires ethnicity data at a greater level of detail than is mandated by the OMB standards.

The subcommittee evaluated the necessary level of ethnicity detail beyond Hispanic ethnicity and considered whether it should include national origin, place of birth, and ancestry. The Supreme Court has interpreted national origin to refer to "the country where a person was born, or, more broadly, the country from which his or her ancestors came."14 Thus, a person may identify with a national origin if he or she shares physical, cultural, or linguistic characteristics with the group. This terminology, however, may indicate only country of birth to some respondents. Therefore, the subcommittee determines that ancestry, which the Census Bureau defines as "a person's ethnic origin or descent, 'roots,' or heritage, or the place of birth of the person or the person's parents or ancestors before their arrival in the United States," is the ethnicity concept most encompassing of the detail necessary in health care settings (U.S. Census Bureau, 2008). To distinguish the definition of ethnicity adopted by OMB (i.e., Hispanic ethnicity) from this more encompassing definition, the subcommittee refers to the latter concept as granular ethnicity.

Importance of Flexibility in Choosing Locally Relevant Categories

The subcommittee considered whether to recommend the OMB race and Hispanic ethnicity categories plus a uniform set of 10 to 15 additional ethnicity categories (i.e., an "OMB Plus" set similar to the categories used in national surveys outlined in Table 3-2). Demographic distributions confirm, however, that a uniform set beyond the OMB categories would include groups not relevant to all communities. The subcommittee concludes that, to allow for better understanding and serving of local populations, the categories collected and analyzed need to accurately reflect the population served. Thus, a fixed "OMB Plus" set of categories would be less desirable than local selection of ethnicity categories in addition to the OMB categories. 

Ethnicity data must be specific and appropriate to the communities in which health care providers operate (Bilheimer and Sisk, 2008). Clustering of racial and ethnic groups in specific communities, such as a relatively large population of White persons of French descent in Maine or a large population of White persons of Armenian descent in Southern California, requires the use of locally relevant granular ethnicity categories. Figure 3-2 shows the county-level distribution of the country's Asian population, revealing that there are higher concentrations of Asians in broad geographic regions (e.g., the West Coast and Northeast Corridor), as well as clustered within specific counties or metropolitan areas (e.g., Collin County, Texas; Atlanta, Georgia). In areas with larger and more diverse Asian populations, discrete categorizations are more useful than a single broad category for data collection. Even in the state of Minnesota, which has a reasonably average concentration of Asians (3.5 percent), the broad OMB Asian category masks the fact that a large portion of Asians in the state are Hmong, an important consideration for locally tailored health care interventions. Similarly, a health care provider may care for a large number of persons who belong to an ethnic group whose significant presence is masked even by county-level data in the aggregate OMB categories.

Ethnicity Categories on Data Collection Instruments

Health care entities must determine an approach to collecting granular ethnicity data that allows all individuals, if they desire, to self-identify and at the same time is feasible, given that the population of their service area may include hundreds of granular ethnicities. Individual self-identification enables entities to learn about the composition of their service population so they can decide which ethnicity categories will yield the most responses on data collection instruments, and can be used in analyses to generate information on where to target interventions. Additionally, such individualized data collection has the potential benefit of preserving small subgroup identities that might be of interest for analytic studies (assuming preservation of the specific identifiers during data transfer) at the state, health plan, or national level but that might prove too small to reveal any group-specific quality issues at the local level (e.g., higher cancer mortality among persons of Samoan descent). Of course, such aggregation presumes standardization of categories across entities.

Presenting respondents with a list of hundreds of categories (Appendix E) poses logistical challenges. Models exist for the collection of data on highly diverse populations; Kaiser Permanente, for example, collects data using approximately 260 categories of granular ethnicity through a separate question in addition to collecting the OMB minimum categories (Appendix G). Similarly, Contra Costa Health Plan uses 133 ethnicity categories (Appendix H). Both of these entities have resolved having lengthy lists through software applications that recognize keystrokes to present the most pertinent categories on screen. The Contra Costa software first identifies the 15 most frequently encountered ethnicities. Both of these organizations ask about granular ethnicity after asking a single question to solicit the OMB race and Hispanic ethnicity categories.

Respondents may find the task of self-identification from a lengthy list daunting or unreasonable when faced with a paper-based form. Likewise, it would not be feasible for staff to read through such lengthy lists when collecting the data by phone, for example, during preregistration for hospitalization. Instead, some health care entities ask patients to provide a response to an open-ended question and present no preselected response options, while others provide patients and staff with a short list of categories, often accompanied by an "Other, please specify:___" option. This latter response option is also open-ended, meaning individuals or staff can write in a self-identification if it is not included on the local list of response categories. Similarly, state or national surveys could have a limited list of categories, but also present the open-ended response option.

There are advantages and disadvantages to both open-ended and closed-ended question formats. For example, questions that list examples or check-off boxes may bias respondents to the given response options (Chesnut et al., 2007). Census research has indicated higher response rates for the ethnicities listed as examples, indicating that this question format may skew responses (Cresce et al., 2004; del Pinal et al., 2007). Traditionally, closed-ended questions have been used to elicit race and Hispanic ethnicity data. But open-ended questions may have advantages for some entities collecting granular ethnicity data, including that this format reduces the amount of space needed on paper data collection forms or electronic screens. However, collecting open-format data for hundreds of thousands of enrollees or respondents on a survey can make it difficult to use the data unless resources are devoted to coding those responses according to standardized categories. One of the difficulties with open-ended questions is that respondents may leave the item blank. Census studies have indicated that this may be the result of perceived redundancy when the open-ended ancestry question follows questions on race and Hispanic ethnicity (del Pinal, 2004; Martin et al., 1990). Open-ended questions often provide examples so respondents know what type of response is desired; for example, the Medi-Cal instruction sheet includes a list of nine examples of ethnicity (e.g., Hispanic, Cambodian, Asian Indian).

The subcommittee finds no positive evidence from a health care quality improvement standpoint to support conclusions about requiring multiple responses to a question about granular ethnicity (i.e., "Select one or more") for each individual. Additionally, the subcommittee acknowledges the potential HIT challenges of having multiple granular ethnicity responses. It is feasible and indeed required by OMB that entities collecting race and Hispanic ethnicity data according to the OMB standards allow individuals to "Select one or more," and these few categories can yield 64 combinations. However, the number of possible combinations from a list of several hundred granular ethnicities may increase the analytic burden, and multiple ethnicity combinations will result in small cell sizes and thus may not be useful for identifying patterns of care in all circumstances. Furthermore, response variation, which occurs when individuals intentionally or inadvertently make inconsistent choices over time (Snipp, 1989), increases when individuals have a greater number of choices with which to self-identify (Snipp, 2003). Kaiser Permanente's initiative to capture race, Hispanic ethnicity, and granular ethnicity does not currently allow multiple granular ethnicity responses because of collection and analytic considerations. However, there may be some communities where combinations of ethnicities may regularly occur, and health entities would find these combinations useful to collect.

Definition of a Standard National Set with Local Choices 

To ensure standardized collection of race and ethnicity data, locally relevant choices of response categories should be selected from a national standard set, with appropriate coding to facilitate sharing of the data. The national standard set of categories needs to be comprehensive enough to capture changing demographic trends, geographically isolated subgroups, and groups relevant to the provision of culturally and linguistically appropriate care. While several organizations provide lists of granular ethnicities (Table 3-5), none of these include all of the granular ethnicity categories required for a national set. Merging these sets, as is done in Appendix E, provides a starting point from which a national standard set could be developed. These sets are further explored in this section to identify the strengths and weaknesses of each.

The Centers for Disease Control and Prevention (CDC)/Health Level 7 (HL7) Race and Ethnicity Code Set 1.0 was developed to clarify the relationship of granular ethnicities to the broad OMB categories and to facilitate data exchange and analysis. In formulating this set, CDC worked with HL7 and X12, the leading standards-setting organizations for data interactions and for administrative transactions, respectively. The CDC/HL7 Code Set, which was introduced in 2000, incorporates ethnicity categories derived from write-in responses to the Census questions on race and Hispanic ethnicity, not responses to the Census ancestry question. Each ethnicity is assigned a permanent five-digit unique numerical code as well as a hierarchical code to associate with race or Hispanic ethnicity.

The CDC/HL7 Code Set, which has been under the jurisdiction of the National Center for Public Health Informatics, will be updated based on Census 2010 write-ins on the race and Hispanic ethnicity questions. Personal communication, S. Ganesan, Centers for Disease Control and Prevention, June 3, 2009.15 The addition of categories beyond those currently specified on the Census form (Figure 3-1), however, requires respondents to give free-text responses on lines provided under Hispanic or Latino, Asian, American Indian or Alaska Native, and "Some other race." Thus, for example, the granular ethnicities of African immigrants who simply check "Black or African American" may not be represented in the CDC/HL7 Code Set. The current ethnicity list, for instance, notably does not include groups such as Somalis, Russians, Cape Verdeans, or Brazilians.

The U.S. Census Bureau, in addition to cataloging write-in responses to questions on race and Hispanic ethnicity, asks a separate ancestry question for which respondents are asked to write in their ancestry or ethnic origin; thus, a person might identify with an individual country (e.g., French), a region within a country (e.g., Corsican or Breton), or a broader category (e.g., European).16 The separate ancestry question was included only on the Census "long form." This form was sent to one in six households. The American Community Survey (ACS), an annual survey sent to a sample of households, has replaced the Census "long form" and includes a question about ancestry. The Census maintains lists of write-in responses with corresponding three-digit numerical codes for its questions on race, Hispanic origin, and ancestry. The codes for each of these lists differ, although the lists overlap with many of the same categories. For example, 101 is the code for White on the Census Race Code List, the code for "Not Spanish/Hispanic" in the Hispanic or Latino Origin Code List, and the code for Azerbaijani in the Census Ancestry Code List (U.S. Census Bureau, 2002a). Korean is coded as 620 on the Census Race Code List and 750 on the Census Ancestry Code List.

The Massachusetts Division of Health Care Finance and Policy and the Massachusetts Quality and Cost Council mandated that the state's acute care hospitals and health plans, respectively, report uniform race and ethnicity data (Weinick et al., 2007). These requirements spurred development of an ethnicity categorization and coding list by the Brookings Institution. Entities responsible for the list's development considered recommending the CDC/HL7 Code Set but found it did not accurately capture all relevant population groups.17 The category and coding list developed by the Brookings Institution includes 31 ethnicity categories and additional "sub-ethnicities" that are not required for reporting but that an organization can collect, if useful. Acute care hospitals and health plans are required to report (i.e., have the fields and categories available in their HIT systems) the basic OMB race categories along with the 31 ethnicity categories (Massachusetts Executive Office of Health and Human Services, 2009a, 2009b). When an organization collects any of the "sub-ethnicity" categories, it is required to roll that category up to one of the 31 broader ethnicity categories for reporting. The Massachusetts Superset, which is intended to serve as a guide for health plans and hospitals when they collect granular ethnicity beyond the 31 required categories, includes most of the CDC/HL7 categories and 87 additional categories representing African nations (e.g., Sudanese, Somali), synonyms for existing CDC categories (e.g., La Raza, Chicano), Middle Eastern nations (e.g., Saudi Arabian, Jordanian), and other ethnicities (e.g., Cape Verdean, Brazilian, Guyanese) (Taylor-Clark et al., 2009).

Similarly, Contra Costa Health Plan and the Wisconsin Cancer Reporting System (WCRS) developed their own categorization and coding schemes (Tiutin, 2009; Wisconsin Cancer Reporting System, 2008). Contra Costa's code set is based on the CDC/HL7 Code Set, but includes nine additional granular ethnicities, including American and Russian, which are two of Contra Costa's top 15 response categories, but are not included in the CDC/HL7 Code Set (Appendix H).

In 2004, Kaiser Permanente began collecting member race and ethnicity data using the OMB categories and a limited number of detailed ethnicity groups. After implementation, Kaiser determined a need for more granular ethnicity categories to allow for better self-identification and analyses of health care data. As a result, Kaiser developed a list of granular ethnicities that could be used for self-reporting separately from the OMB race and Hispanic ethnicity categories. The code set includes 268 categories, and continual review is planned to ensure alignment with immigration trends and relevance to health care (Kaiser Permanente, 2009). Appendix G provides more detail on Kaiser Permanente's collection of data on race, ethnicity, and language need.

"Unavailable," "declined," and "unknown" codes, variations of which are included in the HRET Toolkit's suggested format, the Massachusetts Superset, the Contra Costa Health Plan code list, and the Kaiser Permanente code list, are frequently used in survey analysis. These codes are not presented as response options, but are recorded by registration/eligibility clerks or surveyors, for example, so that data systems can track the number of persons for whom the organization has attempted to collect race and ethnicity data. The subcommittee suggests that such categories be provided for individuals who have not responded (unavailable), refuse to answer (declined), or do not know (unknown). The "unavailable" category allows data collectors to see that the respondent has not yet provided the information, so the information should be solicited at a future point of contact with that individual. In contrast, the "declined" category indicates the individual should not be asked again. In some instances, the "unknown" category provides a response option if the respondent is adopted, for example, and does not know his/her race and ethnicity (Taylor-Clark, 2009).

Selection of Local Granular Ethnicity Categories

The list of granular ethnicities in Appendix E provides a baseline template for a national standard set of granular ethnicity categories. An entity can decide, based on local circumstances, whether to use 10 or 100 categories from the national standard list for collection and/or analysis. If the entity sees an increase in the use of the "Other, please specify:___" option, it should consider adding categories to its local list. If an organization chooses not to have a preset list of categories, it will need to compile responses according to the national standard list to ensure comparability with data collected by other entities.

Determining which locally relevant categories to include may initially require subjective judgments about subgroups believed to be present in large numbers. However, some organizations may not realize the diversity of their service population and thus may not understand the need to collect the OMB categories and granular ethnicity data (Box 3-4). Therefore, specific, locally relevant categories can be determined using population estimates from geographic-based Census data, school enrollment data that identify newer and growing populations in service areas, indirect estimation techniques, or surveying. However, even constructing a survey may require some knowledge of persons in the service area; Anthem Blue Cross, for example, solicited through a mailed survey the race and ethnicity of its California members, but focused on the six OMB race and Hispanic ethnicity categories and 61 additional ethnicity categories considered most pertinent to its enrollees.18 As all granular ethnicity lists should also include an "Other, please specify:____" option, the write-in responses may help organizations evaluate and expand as necessary the granular ethnicity response options provided. If an organization is receiving numerous write-in responses of "Russian," for example, it may consider adding a Russian response option. 

Box 3-4. Realizing the Necessity of Collecting Data:
The University of Mississippi Medical Center

When informed they were to begin collecting race, ethnicity, and language data from patients, employees at University of Mississippi Medical Center (UMMC) almost uniformly indicated that patients would believe this information would be used to segregate services and would create racial tensions. In fact, the director in charge of implementing the data collection was convinced that UMMC and the organizations funding and administering the data collection initiative (The Robert Wood Johnson Foundation and The George Washington University through an Expecting Success project) were "taking gasoline and pouring it on a blazing fire."

The registration department initially thought registration staff were already asking for the patient's race. The director discussed this with staff and found out they were not asking the patients but were looking at patients to determine their race. Staff informed management that patients might be offended or become indignant when asked for the information. Observer report was indicating approximately 180 Hispanic patients per year registered at UMMC. So what was the point of collecting additional race and ethnicity data for a reasonably homogenous patient population?

With funding and support from Expecting Success, UMMC implemented a staff training program to ensure patients would be asked directly their race, ethnicity, and language need. Within months of implementation, UMMC learned it was registering approximately 600 Hispanic individuals per month (approximately 1.5 percent of the 40,000 individuals registered per month) and the patient population was found to be less homogenous than initially believed. Approximately 500 patients per month were from subgroups the medical center did not even realize existed in their service area (e.g., Japanese and Russian). UMMC found that between 3 and 4 percent of the population preferred to talk to a physician in a language other than English. UMMC now has three full-time Spanish interpreters (where they previously had none) and switched vendors to ensure their interpreter phone system could handle the types and numbers of interpreter services required. In-house physicians and researchers have begun to utilize the race, ethnicity, and language data to stratify quality measures.

Source: Personal communication with Richard Pride, UMMC, June 3, 2009.

A variety of entities participate in the health care system, and while each has roles to play in capturing race and ethnicity data, not all currently collect these data and those that do so may not use uniform methods or categories. There are other entities that collect and report detailed data in ways that comply with the OMB standards and produce data useful to local and national quality improvement efforts. The subcommittee's task is to provide standardized categories "for entities wishing to assess and report on quality of care." The subcommittee aims to accomplish this by imposing the least possible data collection burden and without hindering the progress and processes of entities already collecting detailed data.

The subcommittee focuses its recommendations on care delivery sites and public and private insurers, as these health care entities are involved in measuring and improving quality, as well as on data collection activities that provide information about equity in care, care outcomes, quality of care, or utilization of care (e.g., health surveys asking about health care). Some public health activities involve delivery of care, but others do not. Because vital statistics and other public health surveillance systems are organized and supported for purposes beyond health care quality improvement, these collection activities may require different considerations. All entities related to health and health care, though, are encouraged to collect race, Hispanic ethnicity, and granular ethnicity data in accordance with the subcommittee's recommendations.

The subcommittee considered a stepwise approach to collecting race and ethnicity data, where entities would first emphasize collecting the data according to the OMB standards and then gradually implement granular ethnicity data collection over time. However, as discussed in Chapter 2, granular ethnicity data are useful for improving health care quality in many settings, and thus the collection of these data should not be considered a secondary aim in those settings. While the subcommittee recognizes that full implementation of its recommendations may require HIT and process changes for some entities (Chapter 5), race, Hispanic ethnicity, and granular ethnicity data are all necessary to effectively and efficiently target health care quality improvement to groups that are at risk of suboptimal care. 

Recommendation 3-1: An entity collecting data from individuals for purposes related to health and health care should:
  • Collect data on granular ethnicity using categories that are applicable to the populations it serves or studies. Categories should be selected from a national standard list (go to Recommendation 6-1a) on the basis of health and health care quality issues, evidence or likelihood of disparities, or size of subgroups within the population. The selection of categories should also be informed by analysis of relevant data (e.g., Census data) on the service or study population. In addition, an open-ended of "Other, please specify:___" should be provided for persons whose granular ethnicity is not listed as a response option.
  • Elicit categorical responses consistent with the current OMB standard race and Hispanic ethnicity categories, with the addition of a response option of "Some other race" for persons who do not identify with the OMB race categories.

Consistent Rollup of Granular Ethnicity to OMB Categories

While systems for rolling granular ethnicity categories up to broader categories have been developed by CDC/HL7 and the Commonwealth of Massachusetts, among others, an agreed-upon rollup strategy for granular ethnicities has not been determined or reviewed for its applicability nationwide and across the health care system. For example, the Massachusetts Superset aggregates its set of granular ethnicities to 31 mid-level aggregations whereas the CDC/HL7 Code Set aggregates its ethnicity categories to only the OMB race and Hispanic ethnicity categories. A process for rolling granular ethnicity categories up to the OMB categories is key to achieving two potentially contradictory objectives: on the one hand, consistency and standardization in analysis and reporting, and on the other hand, data collection tailored to local circumstances. Rollup procedures will need to be employed only when a person does not check off an OMB race or Hispanic ethnicity and only provides a granular ethnicity response or when only granular ethnicities are collected; however, the subcommittee prefers separate collection of granular ethnicity from OMB race and Hispanic ethnicity. The subcommittee chose not to define mid-level aggregations between granular ethnicity and the OMB categories.

Rollup Issues 

The CDC/HL7 Code Set was designed in a hierarchical fashion such that each ethnicity category corresponds to one of the OMB race or Hispanic ethnicity categories Figure 3-3). This rollup scheme can be used when reporting is required to conform to the OMB categories or when an analyst needs a consistent set of minimum categories to make comparisons across systems reporting race and ethnicity at different levels of detail. For the vast majority of individuals, mapping from ethnicity to race categories is not problematic. As discussed in Chapter 1, however, ethnicity and race are two different concepts. Individuals who self-identify as Brazilian may also identify as White, Black, or some combination of races, or may see themselves as falling into no category beyond Brazilian. As a result, a rollup scheme that assumes all respondents who self-identify as Brazilian are White could wrongly assign a race to a number of individuals.

Figure 3-3 highlights some problems with current CDC rollup procedures. For example, Brazilians may not be considered Hispanic because they speak Portuguese rather than Spanish. Additionally, several national origins correspond to two or more major racial populations. For instance, the population of Madagascar is of mixed African, Malayo-Indonesian, and Arab ancestry. This means that rolling up Madagascan to Asian, as recommended by the CDC rollup scheme, would misclassify Africans of Madagascan descent as Asian. Rollup schemes are further complicated by misclassifications introduced by the use of geographic boundaries. While the CDC rollup scheme considers Afghanistan to be Middle Eastern and consequently categorizes Afghanis as White, the Census ancestry list classifies Afghanistan as an Asian country. Additionally, the WCRS coding manual notes that descriptions of religious affiliation should be "used with caution" when determining corresponding races.19

The above discussion highlights some of the difficulties inherent in rolling up some ethnicities because (1) ethnicities can include two or more major racial populations, (2) the geographic boundaries used to distinguish major groups in different classification schemes are arbitrary, and (3) many individuals may not associate with a specific race for cultural or other reasons. Thus, an individual's race cannot always be presumed based on his or her ethnicity. For this reason, the rollup assignment of a self-reported ethnicity to an OMB category should not be placed in an individual's health record or supersede a person's direct self-report. Analysts should understand that making an assignment using a 90 percent (or any other percent) threshold or an assignment based solely on geography incurs a higher probability that the rollup assignment misclassifies individuals based upon how they would self-identify their race. The rates of misclassification, even for granular ethnicities meeting a 90 percent threshold, underscores the fact that rollup schemes only provide probabilistic assignments useful for analysis at the group or population level.

Granular Ethnicities with an Indeterminate Race or Hispanic Ethnicity Classification

Various methods are used to distinguish ethnic groups that cannot be rolled up to a specific race category. For example, in Census 2010, the Census Bureau will use OMB's geographic definitions when it reclassifies ethnic responses in the race question to an OMB race category (e.g., all entries reflecting a sub-Saharan African nation will be counted as "Black"). In Census 2000, the Census Bureau applied a 90 percent rule to reclassify write-in responses on the race question according to the OMB race categories (del Pinal et al., 2007).20 Single-ancestry responses were cross-tabulated by race responses, and if 90 percent or more of respondents in a specific ancestry group selected a particular race, that race was assigned to respondents who gave that ethnic response in the race question.

To determine whether groups included on the CDC, Census, Massachusetts, and WCRS category lists can be rolled up to a specific OMB race category with some degree of certainty, the subcommittee evaluated 2000 Public Use Microdata Samples (PUMS) data and used the methodology of the Census Bureau's 90 percent rule. The subcommittee cross-tabulated write-in responses on ancestry with the "alone or in combination with one or more other races" variable for each OMB race group. If fewer than 90 percent of respondents of a specific ancestry group selected an OMB race either alone or in combination with another race, the ancestry group was identified as being problematic for rolling up. The subcommittee did not have sufficient data on some granular ethnicity groups to apply the 90 percent rule to each ancestry subgroup (Appendix F). The subcommittee finds some granular ethnicities could not be rolled up to an OMB race category with greater than 90 percent certainty. The difficult-to-categorize granular ethnicity groups are included in Appendix F.

The subcommittee suggests that those ethnicities that do not meet the 90 percent threshold be classified as "no determinate OMB race classification." This classification differs from the "Some other race" category because "Some other race" is a response option used by individuals who do not identify with a specific OMB race category. The "no determinate OMB race classification" would be used to identify entire ethnic groups that cannot be assumed to comprise one specific racial group. None of the granular ethnicities associated with the Hispanic ethnicity category can be assigned to an OMB race category with greater than 90 percent certainty. Granular ethnicities that cannot easily be rolled up to the OMB Hispanic ethnicity category include individuals identifying a granular ethnicity associated with the non-Spanish-speaking territories in South America (Guyana, Suriname, Brazil, and Belize); additionally, these granular ethnicities should be considered "no determinate OMB race classification" because they do not meet the 90 percent rule. Appendix F highlights some additional difficult-to-categorize granular ethnicity groups, including persons of Moroccan, Brazilian, Cape Verdean, Dominican, Guyanese, and South African descent.

Rollup Schemes

For interventions aimed at quality improvement and reduction of disparities at the local level, mapping granular ethnicities to the OMB race categories may be unnecessary. Locally tailored quality improvement activities may target subgroups without needing to relate those subgroups to a single OMB race category. Collecting race, Hispanic ethnicity, and granular ethnicity data separately allows reporting of the OMB categories when necessary without requiring rollup of the granular ethnicities, provided that individuals respond to all the questions asked.

Nonetheless, the subcommittee recognizes that some circumstances will require the use of a rollup scheme to link granular ethnicities to broader categories to allow comparison or data aggregation. The Massachusetts Superset was developed to guide health plans toward a uniform set of ethnicities; this set avoids rolling up granular ethnicities to races and instead aggregates granular ethnicities into broader groups of ethnicities. Such an ethnicity rollup scheme is useful when the sample of a granular ethnicity group is too small for analysis and needs to be aggregated with others.

The subcommittee merged several ethnicity lists into a template of granular ethnicity categories. These categories are mapped to the OMB race and Hispanic ethnicity categories (Appendix E). National agreement needs to be reached on a rollup scheme, recognizing that all ethnicities do not necessarily map to an OMB race category, so that some respondents will have "no determinate OMB classification." The locus of responsibility for the development of a national standard set of ethnicity categories and a national rollup scheme is addressed in Chapter 6. 

Recommendation 3-2: Any entity collecting data from individuals for purposes related to health and health care should collect granular ethnicity data in addition to data in the OMB race and Hispanic ethnicity categories and should select the granular ethnicity categories to be used from a national standard set. When respondents do not self-identify as one of the OMB race categories or do not respond to the Hispanic ethnicity question, a national scheme should be used to roll up the granular ethnicity categories to the applicable broad OMB race and Hispanic ethnicity categories to the extent feasible.
Page last reviewed March 2010
Internet Citation: Chapter 3: Defining Categorization Needs for Race and Ethnicity Data: Race, Ethnicity, and Language Data: Standardization for Health Care Qu. March 2010. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/research/findings/final-reports/iomracereport/reldata3a.html