3. Defining Categorization Needs for Race and Ethnicity Data (continued)
Need for Locally Relevant Granular Ethnicity Categories
As noted earlier, the Office of Management and Budget (OMB) categories, when used alone, can mask important within-group variations in quality
of care (Blendon et al., 2007; Jerant et al., 2008; Read et al., 2005; Shah and Carrasquillo, 2006). While the
OMB standards include only two ethnicity categories (Hispanic and not Hispanic), many other ethnicities exist.
Assessing and reducing disparities within the broad race and Hispanic ethnicity categories requires ethnicity data
at a greater level of detail than is mandated by the OMB standards.
The subcommittee evaluated the necessary level of ethnicity detail beyond Hispanic ethnicity and considered
whether it should include national origin, place of birth, and ancestry. The Supreme Court has interpreted
national origin to refer to "the country where a person was born, or, more broadly, the country from which his or
her ancestors came."14
Thus, a person may identify with a national origin if he or she shares physical, cultural,
or linguistic characteristics with the group. This terminology, however, may indicate only country of birth to
some respondents. Therefore, the subcommittee determines that ancestry, which the Census Bureau defines as "a
person's ethnic origin or descent, 'roots,' or heritage, or the place of birth of the person or the person's parents
or ancestors before their arrival in the United States," is the ethnicity concept most encompassing of the detail
necessary in health care settings (U.S. Census Bureau, 2008). To distinguish the definition of ethnicity adopted
by OMB (i.e., Hispanic ethnicity) from this more encompassing definition, the subcommittee refers to the latter
concept as granular ethnicity.
Importance of Flexibility in Choosing Locally Relevant Categories
The subcommittee considered whether to recommend the OMB race and Hispanic ethnicity categories plus
a uniform set of 10 to 15 additional ethnicity categories (i.e., an "OMB Plus" set similar to the categories used in
national surveys outlined in Table 3-2). Demographic distributions confirm, however, that a uniform set beyond
the OMB categories would include groups not relevant to all communities. The subcommittee concludes that, to
allow for better understanding and serving of local populations, the categories collected and analyzed need to
accurately reflect the population served. Thus, a fixed "OMB Plus" set of categories would be less desirable than
local selection of ethnicity categories in addition to the OMB categories.
Ethnicity data must be specific and appropriate to the communities in which health care providers operate
(Bilheimer and Sisk, 2008). Clustering of racial and ethnic groups in specific communities, such as a relatively large population of White persons of French descent in Maine or a large population of White persons of Armenian
descent in Southern California, requires the use of locally relevant granular ethnicity categories. Figure 3-2 shows
the county-level distribution of the country's Asian population, revealing that there are higher concentrations of
Asians in broad geographic regions (e.g., the West Coast and Northeast Corridor), as well as clustered within
specific counties or metropolitan areas (e.g., Collin County, Texas; Atlanta, Georgia). In areas with larger and
more diverse Asian populations, discrete categorizations are more useful than a single broad category for data
collection. Even in the state of Minnesota, which has a reasonably average concentration of Asians (3.5 percent),
the broad OMB Asian category masks the fact that a large portion of Asians in the state are Hmong, an important
consideration for locally tailored health care interventions. Similarly, a health care provider may care for a large
number of persons who belong to an ethnic group whose significant presence is masked even by county-level data
in the aggregate OMB categories.
Ethnicity Categories on Data Collection Instruments
Health care entities must determine an approach to collecting granular ethnicity data that allows all individuals,
if they desire, to self-identify and at the same time is feasible, given that the population of their service area may
include hundreds of granular ethnicities. Individual self-identification enables entities to learn about the composition
of their service population so they can decide which ethnicity categories will yield the most responses on
data collection instruments, and can be used in analyses to generate information on where to target interventions.
Additionally, such individualized data collection has the potential benefit of preserving small subgroup identities
that might be of interest for analytic studies (assuming preservation of the specific identifiers during data transfer)
at the state, health plan, or national level but that might prove too small to reveal any group-specific quality issues
at the local level (e.g., higher cancer mortality among persons of Samoan descent). Of course, such aggregation
presumes standardization of categories across entities.
Presenting respondents with a list of hundreds of categories (Appendix E) poses logistical challenges.
Models exist for the collection of data on highly diverse populations; Kaiser Permanente, for example, collects data
using approximately 260 categories of granular ethnicity through a separate question in addition to collecting the
OMB minimum categories (Appendix G). Similarly, Contra Costa Health Plan uses 133 ethnicity categories
(Appendix H). Both of these entities have resolved having lengthy lists through software applications that recognize
keystrokes to present the most pertinent categories on screen. The Contra Costa software first identifies the
15 most frequently encountered ethnicities. Both of these organizations ask about granular ethnicity after asking
a single question to solicit the OMB race and Hispanic ethnicity categories.
Respondents may find the task of self-identification from a lengthy list daunting or unreasonable when faced
with a paper-based form. Likewise, it would not be feasible for staff to read through such lengthy lists when collecting
the data by phone, for example, during preregistration for hospitalization. Instead, some health care entities
ask patients to provide a response to an open-ended question and present no preselected response options, while
others provide patients and staff with a short list of categories, often accompanied by an "Other, please specify:___"
option. This latter response option is also open-ended, meaning individuals or staff can write in a self-identification
if it is not included on the local list of response categories. Similarly, state or national surveys could have a
limited list of categories, but also present the open-ended response option.
There are advantages and disadvantages to both open-ended and closed-ended question formats. For example,
questions that list examples or check-off boxes may bias respondents to the given response options (Chesnut et al.,
2007). Census research has indicated higher response rates for the ethnicities listed as examples, indicating that
this question format may skew responses (Cresce et al., 2004; del Pinal et al., 2007). Traditionally, closed-ended
questions have been used to elicit race and Hispanic ethnicity data. But open-ended questions may have advantages
for some entities collecting granular ethnicity data, including that this format reduces the amount of space
needed on paper data collection forms or electronic screens. However, collecting open-format data for hundreds
of thousands of enrollees or respondents on a survey can make it difficult to use the data unless resources are
devoted to coding those responses according to standardized categories. One of the difficulties with open-ended
questions is that respondents may leave the item blank. Census studies have indicated that this may be the result
of perceived redundancy when the open-ended ancestry question follows questions on race and Hispanic ethnicity
(del Pinal, 2004; Martin et al., 1990). Open-ended questions often provide examples so respondents know what
type of response is desired; for example, the Medi-Cal instruction sheet includes a list of nine examples of ethnicity
(e.g., Hispanic, Cambodian, Asian Indian).
The subcommittee finds no positive evidence from a health care quality improvement standpoint to support
conclusions about requiring multiple responses to a question about granular ethnicity (i.e., "Select one or more")
for each individual. Additionally, the subcommittee acknowledges the potential HIT challenges of having multiple
granular ethnicity responses. It is feasible and indeed required by OMB that entities collecting race and Hispanic
ethnicity data according to the OMB standards allow individuals to "Select one or more," and these few categories
can yield 64 combinations. However, the number of possible combinations from a list of several hundred granular
ethnicities may increase the analytic burden, and multiple ethnicity combinations will result in small cell sizes
and thus may not be useful for identifying patterns of care in all circumstances. Furthermore, response variation,
which occurs when individuals intentionally or inadvertently make inconsistent choices over time (Snipp,
1989), increases when individuals have a greater number of choices with which to self-identify (Snipp, 2003).
Kaiser Permanente's initiative to capture race, Hispanic ethnicity, and granular ethnicity does not currently allow
multiple granular ethnicity responses because of collection and analytic considerations. However, there may be
some communities where combinations of ethnicities may regularly occur, and health entities would find these
combinations useful to collect.
Definition of a Standard National Set with Local Choices
To ensure standardized collection of race and ethnicity data, locally relevant choices of response categories
should be selected from a national standard set, with appropriate coding to facilitate sharing of the data. The
national standard set of categories needs to be comprehensive enough to capture changing demographic trends,
geographically isolated subgroups, and groups relevant to the provision of culturally and linguistically appropriate
care. While several organizations provide lists of granular ethnicities (Table 3-5), none of these include
all of the granular ethnicity categories required for a national set. Merging these sets, as is done in Appendix E,
provides a starting point from which a national standard set could be developed. These sets are further explored
in this section to identify the strengths and weaknesses of each.
The Centers for Disease Control and Prevention (CDC)/Health Level 7 (HL7) Race and Ethnicity Code Set
1.0 was developed to clarify the relationship of granular ethnicities to the broad OMB categories and to facilitate
data exchange and analysis. In formulating this set, CDC worked with HL7 and X12, the leading standards-setting
organizations for data interactions and for administrative transactions, respectively. The CDC/HL7 Code Set, which
was introduced in 2000, incorporates ethnicity categories derived from write-in responses to the Census questions
on race and Hispanic ethnicity, not responses to the Census ancestry question. Each ethnicity is assigned a permanent
five-digit unique numerical code as well as a hierarchical code to associate with race or Hispanic ethnicity.
The CDC/HL7 Code Set, which has been under the jurisdiction of the National Center for Public Health Informatics,
will be updated based on Census 2010 write-ins on the race and Hispanic ethnicity questions.
Personal communication, S. Ganesan, Centers for Disease Control and Prevention, June 3, 2009.15
of categories beyond those currently specified on the Census form (Figure 3-1), however, requires respondents
to give free-text responses on lines provided under Hispanic or Latino, Asian, American Indian or Alaska Native,
and "Some other race." Thus, for example, the granular ethnicities of African immigrants who simply check "Black
or African American" may not be represented in the CDC/HL7 Code Set. The current ethnicity list, for instance,
notably does not include groups such as Somalis, Russians, Cape Verdeans, or Brazilians.
The U.S. Census Bureau, in addition to cataloging write-in responses to questions on race and Hispanic ethnicity,
asks a separate ancestry question for which respondents are asked to write in their ancestry or ethnic origin;
thus, a person might identify with an individual country (e.g., French), a region within a country (e.g., Corsican
or Breton), or a broader category (e.g., European).16
The separate ancestry question was included only on the Census "long form." This form was sent to one in six households. The American
Community Survey (ACS), an annual survey sent to a sample of households, has replaced the Census "long form" and includes a question
The Census maintains lists of write-in responses with corresponding
three-digit numerical codes for its questions on race, Hispanic origin, and ancestry. The codes for each
of these lists differ, although the lists overlap with many of the same categories. For example, 101 is the code for
White on the Census Race Code List, the code for "Not Spanish/Hispanic" in the Hispanic or Latino Origin Code
List, and the code for Azerbaijani in the Census Ancestry Code List (U.S. Census Bureau, 2002a). Korean is coded
as 620 on the Census Race Code List and 750 on the Census Ancestry Code List.
The Massachusetts Division of Health Care Finance and Policy and the Massachusetts Quality and Cost
Council mandated that the state's acute care hospitals and health plans, respectively, report uniform race and
ethnicity data (Weinick et al., 2007). These requirements spurred development of an ethnicity categorization and
coding list by the Brookings Institution. Entities responsible for the list's development considered recommending
the CDC/HL7 Code Set but found it did not accurately capture all relevant population groups.17
The category and
coding list developed by the Brookings Institution includes 31 ethnicity categories and additional "sub-ethnicities"
that are not required for reporting but that an organization can collect, if useful. Acute care hospitals and health
plans are required to report (i.e., have the fields and categories available in their HIT systems) the basic OMB
race categories along with the 31 ethnicity categories (Massachusetts Executive Office of Health and Human
Services, 2009a, 2009b). When an organization collects any of the "sub-ethnicity" categories, it is required to roll
that category up to one of the 31 broader ethnicity categories for reporting. The Massachusetts Superset, which
is intended to serve as a guide for health plans and hospitals when they collect granular ethnicity beyond the 31
required categories, includes most of the CDC/HL7 categories and 87 additional categories representing African
nations (e.g., Sudanese, Somali), synonyms for existing CDC categories (e.g., La Raza, Chicano), Middle Eastern
nations (e.g., Saudi Arabian, Jordanian), and other ethnicities (e.g., Cape Verdean, Brazilian, Guyanese) (Taylor-Clark et al., 2009).
Similarly, Contra Costa Health Plan and the Wisconsin Cancer Reporting System (WCRS) developed their
own categorization and coding schemes (Tiutin, 2009; Wisconsin Cancer Reporting System, 2008). Contra Costa's
code set is based on the CDC/HL7 Code Set, but includes nine additional granular ethnicities, including American
and Russian, which are two of Contra Costa's top 15 response categories, but are not included in the CDC/HL7
Code Set (Appendix H).
In 2004, Kaiser Permanente began collecting member race and ethnicity data using the OMB categories and
a limited number of detailed ethnicity groups. After implementation, Kaiser determined a need for more granular
ethnicity categories to allow for better self-identification and analyses of health care data. As a result, Kaiser
developed a list of granular ethnicities that could be used for self-reporting separately from the OMB race and
Hispanic ethnicity categories. The code set includes 268 categories, and continual review is planned to ensure
alignment with immigration trends and relevance to health care (Kaiser Permanente, 2009). Appendix G provides
more detail on Kaiser Permanente's collection of data on race, ethnicity, and language need.
"Unavailable," "declined," and "unknown" codes, variations of which are included in the HRET Toolkit's
suggested format, the Massachusetts Superset, the Contra Costa Health Plan code list, and the Kaiser Permanente
code list, are frequently used in survey analysis. These codes are not presented as response options, but are recorded
by registration/eligibility clerks or surveyors, for example, so that data systems can track the number of persons
for whom the organization has attempted to collect race and ethnicity data. The subcommittee suggests that such
categories be provided for individuals who have not responded (unavailable), refuse to answer (declined), or do
not know (unknown). The "unavailable" category allows data collectors to see that the respondent has not yet
provided the information, so the information should be solicited at a future point of contact with that individual.
In contrast, the "declined" category indicates the individual should not be asked again. In some instances, the
"unknown" category provides a response option if the respondent is adopted, for example, and does not know
his/her race and ethnicity (Taylor-Clark, 2009).
Selection of Local Granular Ethnicity Categories
The list of granular ethnicities in Appendix E provides a baseline template for a national standard set of granular
ethnicity categories. An entity can decide, based on local circumstances, whether to use 10 or 100 categories
from the national standard list for collection and/or analysis. If the entity sees an increase in the use of the "Other,
please specify:___" option, it should consider adding categories to its local list. If an organization chooses not to
have a preset list of categories, it will need to compile responses according to the national standard list to ensure
comparability with data collected by other entities.
Determining which locally relevant categories to include may initially require subjective judgments about
subgroups believed to be present in large numbers. However, some organizations may not realize the diversity of
their service population and thus may not understand the need to collect the OMB categories and granular ethnicity
data (Box 3-4). Therefore, specific, locally relevant categories can be determined using population estimates
from geographic-based Census data, school enrollment data that identify newer and growing populations in service
areas, indirect estimation techniques, or surveying. However, even constructing a survey may require some
knowledge of persons in the service area; Anthem Blue Cross, for example, solicited through a mailed survey the
race and ethnicity of its California members, but focused on the six OMB race and Hispanic ethnicity categories
and 61 additional ethnicity categories considered most pertinent to its enrollees.18
As all granular ethnicity lists
should also include an "Other, please specify:____" option, the write-in responses may help organizations evaluate
and expand as necessary the granular ethnicity response options provided. If an organization is receiving numerous
write-in responses of "Russian," for example, it may consider adding a Russian response option.
Realizing the Necessity of Collecting Data:
The University of Mississippi Medical Center
When informed they were to begin collecting race, ethnicity, and language data from patients, employees
at University of Mississippi Medical Center (UMMC) almost uniformly indicated that patients would
believe this information would be used to segregate services and would create racial tensions. In fact,
the director in charge of implementing the data collection was convinced that UMMC and the organizations
funding and administering the data collection initiative (The Robert Wood Johnson Foundation and
The George Washington University through an Expecting Success project) were "taking gasoline and
pouring it on a blazing fire."
The registration department initially thought registration staff were already asking for the patient's race.
The director discussed this with staff and found out they were not asking the patients but were looking at
patients to determine their race. Staff informed management that patients might be offended or become
indignant when asked for the information. Observer report was indicating approximately 180 Hispanic
patients per year registered at UMMC. So what was the point of collecting additional race and ethnicity
data for a reasonably homogenous patient population?
With funding and support from Expecting Success, UMMC implemented a staff training program to
ensure patients would be asked directly their race, ethnicity, and language need. Within months of implementation,
UMMC learned it was registering approximately 600 Hispanic individuals per month (approximately
1.5 percent of the 40,000 individuals registered per month) and the patient population was found
to be less homogenous than initially believed. Approximately 500 patients per month were from subgroups
the medical center did not even realize existed in their service area (e.g., Japanese and Russian). UMMC
found that between 3 and 4 percent of the population preferred to talk to a physician in a language other
than English. UMMC now has three full-time Spanish interpreters (where they previously had none) and
switched vendors to ensure their interpreter phone system could handle the types and numbers of interpreter
services required. In-house physicians and researchers have begun to utilize the race, ethnicity, and
language data to stratify quality measures.
Source: Personal communication with Richard Pride, UMMC, June 3, 2009.
A variety of entities participate in the health care system, and while each has roles to play in capturing race
and ethnicity data, not all currently collect these data and those that do so may not use uniform methods or categories.
There are other entities that collect and report detailed data in ways that comply with the OMB standards
and produce data useful to local and national quality improvement efforts. The subcommittee's task is to provide
standardized categories "for entities wishing to assess and report on quality of care." The subcommittee aims to
accomplish this by imposing the least possible data collection burden and without hindering the progress and
processes of entities already collecting detailed data.
The subcommittee focuses its recommendations on care delivery sites and public and private insurers, as these
health care entities are involved in measuring and improving quality, as well as on data collection activities that
provide information about equity in care, care outcomes, quality of care, or utilization of care (e.g., health surveys
asking about health care). Some public health activities involve delivery of care, but others do not. Because vital
statistics and other public health surveillance systems are organized and supported for purposes beyond health
care quality improvement, these collection activities may require different considerations. All entities related to
health and health care, though, are encouraged to collect race, Hispanic ethnicity, and granular ethnicity data in
accordance with the subcommittee's recommendations.
The subcommittee considered a stepwise approach to collecting race and ethnicity data, where entities would
first emphasize collecting the data according to the OMB standards and then gradually implement granular ethnicity
data collection over time. However, as discussed in Chapter 2, granular ethnicity data are useful for improving
health care quality in many settings, and thus the collection of these data should not be considered a secondary
aim in those settings. While the subcommittee recognizes that full implementation of its recommendations may
require HIT and process changes for some entities (Chapter 5), race, Hispanic ethnicity, and granular ethnicity
data are all necessary to effectively and efficiently target health care quality improvement to groups that are
at risk of suboptimal care.
Recommendation 3-1: An entity collecting data from individuals for purposes related to health and
health care should:
- Collect data on granular ethnicity using categories that are applicable to the populations it
serves or studies. Categories should be selected from a national standard list (go to Recommendation
6-1a) on the basis of health and health care quality issues, evidence or likelihood
of disparities, or size of subgroups within the population. The selection of categories
should also be informed by analysis of relevant data (e.g., Census data) on the service or
study population. In addition, an open-ended of "Other, please specify:___" should
be provided for persons whose granular ethnicity is not listed as a response option.
- Elicit categorical responses consistent with the current OMB standard race and Hispanic
ethnicity categories, with the addition of a response option of "Some other race" for persons
who do not identify with the OMB race categories.
Consistent Rollup of Granular Ethnicity to OMB Categories
While systems for rolling granular ethnicity categories up to broader categories have been developed by
CDC/HL7 and the Commonwealth of Massachusetts, among others, an agreed-upon rollup strategy for granular
ethnicities has not been determined or reviewed for its applicability nationwide and across the health care system.
For example, the Massachusetts Superset aggregates its set of granular ethnicities to 31 mid-level aggregations
whereas the CDC/HL7 Code Set aggregates its ethnicity categories to only the OMB race and Hispanic ethnicity
categories. A process for rolling granular ethnicity categories up to the OMB categories is key to achieving two
potentially contradictory objectives: on the one hand, consistency and standardization in analysis and reporting,
and on the other hand, data collection tailored to local circumstances. Rollup procedures will need to be employed
only when a person does not check off an OMB race or Hispanic ethnicity and only provides a granular ethnicity
response or when only granular ethnicities are collected; however, the subcommittee prefers separate collection
of granular ethnicity from OMB race and Hispanic ethnicity. The subcommittee chose not to define mid-level
aggregations between granular ethnicity and the OMB categories.
The CDC/HL7 Code Set was designed in a hierarchical fashion such that each ethnicity category corresponds
to one of the OMB race or Hispanic ethnicity categories (Figure 3-3). This rollup scheme can be used when
reporting is required to conform to the OMB categories or when an analyst needs a consistent set of minimum
categories to make comparisons across systems reporting race and ethnicity at different levels of detail. For the vast
majority of individuals, mapping from ethnicity to race categories is not problematic. As discussed in Chapter 1,
however, ethnicity and race are two different concepts. Individuals who self-identify as Brazilian may also identify
as White, Black, or some combination of races, or may see themselves as falling into no category beyond Brazilian.
As a result, a rollup scheme that assumes all respondents who self-identify as Brazilian are White could wrongly
assign a race to a number of individuals.
Figure 3-3 highlights some problems with current CDC rollup procedures. For example, Brazilians may not
be considered Hispanic because they speak Portuguese rather than Spanish. Additionally, several national origins
correspond to two or more major racial populations. For instance, the population of Madagascar is of mixed African,
Malayo-Indonesian, and Arab ancestry. This means that rolling up Madagascan to Asian, as recommended by
the CDC rollup scheme, would misclassify Africans of Madagascan descent as Asian. Rollup schemes are further
complicated by misclassifications introduced by the use of geographic boundaries. While the CDC rollup scheme
considers Afghanistan to be Middle Eastern and consequently categorizes Afghanis as White, the Census ancestry
list classifies Afghanistan as an Asian country. Additionally, the WCRS coding manual notes that descriptions of
religious affiliation should be "used with caution" when determining corresponding races.19
The above discussion highlights some of the difficulties inherent in rolling up some ethnicities because
(1) ethnicities can include two or more major racial populations, (2) the geographic boundaries used to distinguish
major groups in different classification schemes are arbitrary, and (3) many individuals may not associate with a
specific race for cultural or other reasons. Thus, an individual's race cannot always be presumed based on his or
her ethnicity. For this reason, the rollup assignment of a self-reported ethnicity to an OMB category should not
be placed in an individual's health record or supersede a person's direct self-report. Analysts should understand
that making an assignment using a 90 percent (or any other percent) threshold or an assignment based solely on
geography incurs a higher probability that the rollup assignment misclassifies individuals based upon how they
would self-identify their race. The rates of misclassification, even for granular ethnicities meeting a 90 percent
threshold, underscores the fact that rollup schemes only provide probabilistic assignments useful for analysis at
the group or population level.
Granular Ethnicities with an Indeterminate Race or Hispanic Ethnicity Classification
Various methods are used to distinguish ethnic groups that cannot be rolled up to a specific race category. For
example, in Census 2010, the Census Bureau will use OMB's geographic definitions when it reclassifies ethnic
responses in the race question to an OMB race category (e.g., all entries reflecting a sub-Saharan African nation will
be counted as "Black"). In Census 2000, the Census Bureau applied a 90 percent rule to reclassify write-in responses
on the race question according to the OMB race categories (del Pinal et al., 2007).20
Single-ancestry responses were
cross-tabulated by race responses, and if 90 percent or more of respondents in a specific ancestry group selected a
particular race, that race was assigned to respondents who gave that ethnic response in the race question.
To determine whether groups included on the CDC, Census, Massachusetts, and WCRS category lists can
be rolled up to a specific OMB race category with some degree of certainty, the subcommittee evaluated 2000
Public Use Microdata Samples (PUMS) data and used the methodology of the Census Bureau's 90 percent rule.
The subcommittee cross-tabulated write-in responses on ancestry with the "alone or in combination with one or
more other races" variable for each OMB race group. If fewer than 90 percent of respondents of a specific ancestry
group selected an OMB race either alone or in combination with another race, the ancestry group was identified
as being problematic for rolling up. The subcommittee did not have sufficient data on some granular ethnicity
groups to apply the 90 percent rule to each ancestry subgroup (Appendix F). The subcommittee finds some
granular ethnicities could not be rolled up to an OMB race category with greater than 90 percent certainty. The
difficult-to-categorize granular ethnicity groups are included in Appendix F.
The subcommittee suggests that those ethnicities that do not meet the 90 percent threshold be classified
as "no determinate OMB race classification." This classification differs from the "Some other race" category
because "Some other race" is a response option used by individuals who do not identify with a specific OMB
race category. The "no determinate OMB race classification" would be used to identify entire ethnic groups that
cannot be assumed to comprise one specific racial group. None of the granular ethnicities associated with the
Hispanic ethnicity category can be assigned to an OMB race category with greater than 90 percent certainty.
Granular ethnicities that cannot easily be rolled up to the OMB Hispanic ethnicity category include individuals
identifying a granular ethnicity associated with the non-Spanish-speaking territories in South America (Guyana,
Suriname, Brazil, and Belize); additionally, these granular ethnicities should be considered "no determinate OMB
race classification" because they do not meet the 90 percent rule. Appendix F highlights some additional difficult-to-categorize granular ethnicity groups, including persons of Moroccan, Brazilian, Cape Verdean, Dominican,
Guyanese, and South African descent.
For interventions aimed at quality improvement and reduction of disparities at the local level, mapping granular
ethnicities to the OMB race categories may be unnecessary. Locally tailored quality improvement activities may
target subgroups without needing to relate those subgroups to a single OMB race category. Collecting race, Hispanic
ethnicity, and granular ethnicity data separately allows reporting of the OMB categories when necessary without
requiring rollup of the granular ethnicities, provided that individuals respond to all the questions asked.
Nonetheless, the subcommittee recognizes that some circumstances will require the use of a rollup scheme
to link granular ethnicities to broader categories to allow comparison or data aggregation. The Massachusetts
Superset was developed to guide health plans toward a uniform set of ethnicities; this set avoids rolling up granular
ethnicities to races and instead aggregates granular ethnicities into broader groups of ethnicities. Such an ethnicity
rollup scheme is useful when the sample of a granular ethnicity group is too small for analysis and needs to be
aggregated with others.
The subcommittee merged several ethnicity lists into a template of granular ethnicity categories. These categories
are mapped to the OMB race and Hispanic ethnicity categories (Appendix E). National agreement needs to
be reached on a rollup scheme, recognizing that all ethnicities do not necessarily map to an OMB race category, so
that some respondents will have "no determinate OMB classification." The locus of responsibility for the development
of a national standard set of ethnicity categories and a national rollup scheme is addressed in Chapter 6.
Recommendation 3-2: Any entity collecting data from individuals for purposes related to health
and health care should collect granular ethnicity data in addition to data in the OMB race and
Hispanic ethnicity categories and should select the granular ethnicity categories to be used from a
national standard set. When respondents do not self-identify as one of the OMB race categories or
do not respond to the Hispanic ethnicity question, a national scheme should be used to roll up the
granular ethnicity categories to the applicable broad OMB race and Hispanic ethnicity categories
to the extent feasible.
Return to Contents
Proceed to Next Section