Page 1 of 1

Chapter 5: Enhancing Data Resources

Future Directions for the National Healthcare Quality and Disparities

Improving Race, Ethnicity, Language Need, and Insurance Status Data

The NHDR reveals that even as health care quality improves on specific measures, disparities often persist. Addressing such disparities begins with the fundamental step of bringing the nature of the disparities and the groups at risk for those disparities to light by analyzing health care quality information stratified by race, ethnicity, language need, socioeconomic, and insurance status data (IOM, 2009a; IOM, 2009b; NRC, 2004). This section of the report briefly examines the need for each of these sociodemographic data elements in documenting disparities in health care, and summarizes a recent IOM report on standardizing race, ethnicity, and language need data for quality improvement. Then, it evaluates the variables by which AHRQ stratifies data, the data sources used to create the NHDR, and the ways in which AHRQ analyzes disparities data.

Enhanced Collection, Analysis, and Reporting

In 2008, AHRQ contracted with the IOM to form the Subcommittee on Standardized Collection of Race/Ethnicity Data for Healthcare Quality Improvement in conjunction with the Committee on Future Directions for the National Healthcare Quality and Disparities Reports. As required by the project's statement of task (go to Chapter 1), the subcommittee conducted its own consensus-based, in-depth analysis that was then issued as an independently reviewed, stand-alone report. The subcommittee's report Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement was released on August 31, 2009.7 It identified current methods for categorizing and coding race, ethnicity, and language need data; discussed the challenges involved in obtaining these data in health care settings; and made recommendations for improvement. The subcommittee's findings and recommendations (go to Appendix G) provide background information relevant to the committee's task of recommending ways to improve the data reported in the NHQR and NHDR. The committee draws on the subcommittee's work regarding race, ethnicity, and language need data, but also addresses socioeconomic and insurance status data, which were outside of the scope of work for the subcommittee.

Rationale for Granular Ethnicity Data

Since the 2003 release of the IOM's Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care, evidence of disparities in health care among racial and Hispanic populations, as these populations are categorized by the OMB,8 has continued to accumulate. There is more information on differences in life expectancy (IOM, 2008a) and mortality risks or rates for certain medical conditions (Murthy et al., 2005; Wang et al., 2006), along with knowledge of disparities in general health status, access to health care, and utilization rates (Cohen, 2008; Flores and Tomany-Korman, 2008; Kaiser Family Foundation, 2009a; Ting et al., 2008). Even as quality-of-care indicators show improvement for the overall U.S. population (e.g., screening for colorectal cancer), disparities persist among the OMB race and Hispanic ethnicity categories (Moy, 2009; Trivedi et al., 2005). Therefore, the subcommittee endorsed continued collection of the OMB categories because they are useful for comparative analysis and have been the standard since 1977 (with adjustments in 1997).

There has been relatively less attention paid to the issue of disparities as they relate to more discrete ethnic groups within the OMB categories (e.g., persons of Cuban, Russian, Chinese, or Nigerian ethnicity, whether born in the United States or elsewhere). The OMB categories are not always sufficiently precise to capture population groups of interest to national and local quality improvement efforts. Currently, the NHDR presents the OMB-defined race and Hispanic ethnicity groups as homogenous populations. For example, the section of the NHDR that discusses Hispanics as a priority population makes no mention of the wide range of cultures, languages, and health-related behaviors encompassed by the Hispanic ethnicity category. Because some national surveys collect data on individuals of Mexican, Puerto Rican, and Cuban ethnicities, among others, it would be possible to provide illustrative examples of disparities, when they exist, among these specific ethnic groups.

These more specific data can highlight quality gaps among more precisely defined populations that differ in the extent of risk factors, degree of health problems, quality of care received, and outcomes. Numerous studies have described heterogeneity in health and cultural factors within the OMB's Black or African American population, and the need to examine this population in greater detail (e.g., Black individuals of African heritage versus those of Caribbean heritage) (Kington and Nickens, 2001; Pallotto et al., 2000; Read et al., 2005). Similarly, disparities are apparent within other OMB-defined groups, including in the broad OMB-defined White, Asian, Native Hawaiian or Other Pacific Islander, American Indian or Alaska Native, and Hispanic categories. For example, the need for health care services can depend, in many instances, on ancestry: large differences exist in asthma burden between groups of Hispanic children in the United States. One study indicated that compared to children of Mexican heritage, children of Puerto Rican heritage had a higher prevalence (10 percent and 26 percent, respectively) and rate of recent asthma attacks (4 percent and 12 percent, respectively) (Lara et al., 2006).

Because disparities can exist within the broad OMB categories, there is value in collecting and utilizing data that have more fine-grained ethnicity categories than those put forth by the OMB (Blendon et al., 2007; Jerant et al., 2008; Read et al., 2005; Shah and Carrasquillo, 2006). The subcommittee recommended, and the committee concurs, that health care-related entities should collect data on granular ethnicity—defined as "a person's ethnic origin or descent, 'roots,' or heritage, or the place of birth of the person or the person's parents or ancestors" (U.S. Census Bureau, 2008a)—in addition to soliciting data in the OMB race and Hispanic ethnicity categories (Figure 5-1). More discrete population data are necessary to identify opportunities for quality improvement and outreach without unnecessarily and inefficiently targeting interventions to an entire broad race or Hispanic population.

The design of the national healthcare reports may make it difficult to display data on a large number of granular ethnicity groups for each measure. For instance, the heart disease measure presented on page 62 of the 2008 NHDR would become overwhelmingly complex if the figure also included data for Americans of Mexican, Japanese, and Jamaican ethnicity. A derivative product of the NHQR and NHDR that focused on subgroups within the broad OMB race or ethnicity groups would be well suited to present more discrete population information. Additionally, online functionalities that allow users to further analyze subgroup data would facilitate more discrete data analyses without imposing additional data into the print version of the NHDR.

The Rationale for Language Need Data

Robust evidence exists that patients with limited English-proficiency encounter significant disparities in access to health care (Hu and Covell, 1986), decreased likelihood of having a usual source of care (Kirkman-Liff and Mondragon, 1991; Weinick and Krauss, 2000), increased probability of receiving unnecessary diagnostic tests (Hampers et al., 1999), and more serious adverse outcomes from medical errors (Divi et al., 2007) and drug complications (Gandhi et al., 2000). The most compelling case for collection and use of language need data is that appropriate, understandable communication represents a foundation of quality health care. That is, patient understanding, comprehension, and informed decision-making are necessary for the provision of high-quality care.

Consequently, HHS, in conformance with Department of Justice principles to prevent discrimination and to ensure access to federally funded programs, provides guidance on collecting language need data (HHS, 2003) in its Culturally and Linguistically Appropriate Services (CLAS) standards. However, English language proficiency and preferred language for health care encounters are not often captured in clinical, survey, or administrative datasets. While surveys may capture language need by noting the language in which the survey was administered, surveys are often only administered in Spanish and English, and measures of language need are more detailed than simply listing an individual's language preference.

The subcommittee concluded, and the committee agrees, that language need can best be assessed by asking two questions: one aimed at determining whether an individual speaks English "less than very well" and a second aimed at identifying the individual's preferred spoken language during a health care encounter (Figure 5-1). In evaluating spoken English proficiency, the subcommittee determined that the threshold of speaking English "less than very well" (as opposed to "less than well") is the most sensitive for assessing effective communication. Individuals with limited English proficiency may need to have greater English proficiency for health care encounters than for other daily tasks because of the unfamiliarity of health concepts and the complexity of medical terminology (Karliner et al., 2008; Siegel et al., 2001).

Collecting and storing standardized language need information allows its use in measuring system-level quality (e.g., the availability of interpreters and translated materials, and evaluating whether patients have been matched with language-concordant providers), and for stratifying measures by English language proficiency. Collecting these data for analysis at the national level could inform the need for culture competency measures or help target areas where culturally and linguistically appropriate policies and interventions are necessary.

While the subcommittee principally focused on the categorization of race, ethnicity, and language need—as it was charged to do—it recognized the role of health literacy, among other variables in health care quality. The subcommittee adopted the following definition of health literacy:

The degree to which individuals have the capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions. ( Ratzan and Parker, 2000, p. vi)

Medical information is complex to understand, even without the added barrier of having a primary language other than English. Comprehending many health-related materials requires education at the high school level, as most materials are written at a 10th-grade reading level or higher (D'Alessandro et al., 2001; Downey and Zun, 2007; IOM, 2004a). To ensure effective communication, patients may need to discuss written materials with an interpreter or bilingual provider even if the materials are translated into the patients' primary language, which is why the subcommittee prioritized the collection of spoken language ability over written language ability when data systems limit the number of data elements that can be collected.

The Rationale for Socioeconomic Data

Examining socioeconomic status (SES) and insurance status was outside the scope of the subcommittee's task, although the subcommittee acknowledged the importance of these factors when assessing health care quality. Therefore, the Future Directions committee looked at other studies to evaluate the usefulness of these data. The multidimensional construct of SES, which can be represented by various measures (e.g., income, education, occupation), can act as both a mediator of racial and ethnic health care disparities, and a further source of disparities.

The terms socioeconomic status, socioeconomic position, and class are often used interchangeably. Isaacs and Schroeder, for instance, determined that class can be measured by income, wealth, and education (2004). These are the same components that a National Research Council committee concluded to encompass a broad set of socioeconomic characteristics defined as socioeconomic position (SEP) (NRC, 2004).10 This committee uses the term SES because it is used in the literature more frequently than SEP or class. The committee understands SES to be a broad concept that encompasses income, wealth, and education.

Higher SES is related to better health and health care quality (Fiscella et al., 2009). Studies have found, for example, that higher income and education are associated with lower mortality (Deaton and Paxson, 2004; Egerter et al., 2009; Mechanic, 2007; Sorlie et al., 1995) and that SES is correlated with cancer incidence and mortality (Singh, 2003). While the relationship between SES and health care is complex, there are several established pathways. First, income is related to affordability. Even among the insured, most health care plans include premiums, deductibles, copayments, and non-covered services. Persons with a higher income level are better able to afford these expenses (McWilliams, 2009), as well as to take time off from work to seek care. Second, education is linked with health knowledge, behavior, employment, income, social and psychological factors, and social standing, and is therefore a "crucial path" to health (Egerter et al., 2009). Because education is related to wealth and income, it is therefore related to an individual's ability to both access and afford the health insurance market (NRC, 2004). Third, a low level of health literacy is associated with less use of preventive services and a greater use of emergency departments (Arispe et al., 2005). Conversely, higher health literacy, which is correlated with education, is generally associated with improved ability to navigate a highly complex and disjointed health delivery and health care payment system (NRC, 2004). Additionally, higher education is associated with greater diffusion and uptake of newer technology, presumably due to a combination of health literacy and social networks (Chang and Lauderdale, 2009).

A person's health and health care are "greatly influenced by powerful social factors such as education and income and the quality of neighborhood environments" (RWJF Commission to Build a Healthier America, 2009, p. 10). While the casual relationships between income, class, neighborhood, and health care are complex, it is clear that where people live, learn, and work have implications for the health services they receive (California Newsreel, 2008; Health Policy Institute, 2008; RWJF Commission to Build a Healthier America, 2009). Among other factors, diet, housing conditions, educational quality, and neighborhood environment are a function of class, and neighborhood conditions constrain access to healthful foods, quality medical care, and opportunities for exercise (California Newsreel, 2008).

Although there is some evidence for reverse causality (e.g., poor health results in lower income due to downward occupation drift), the balance of the evidence suggests that the primary pathway is from SES to health and health care (Marmot, 2006). Although measures of SES are correlated, each distinctly influences health and health care outcomes (Mechanic, 2007). For example, although education is associated with income, wealth, and occupation, it has independent effects beyond these joint influences (Mechanic, 2007). SES provides a crude index of health status (and thus health care need) within a population and has implications for both allocation of resources and assessment of health performance (Casalino and Elster, 2007; Fiscella et al., 2009). Without collecting SES data, it is difficult to assess whether policies and interventions are mitigating or exacerbating health and health care disparities.

The Rationale for Insurance Status Data

A 2009 IOM report on the consequences of uninsurance concluded that "health insurance is integral to personal well-being and health" (IOM, 2009a, p. 5) and that high levels of uninsurance undermine the quality of the nation's health care, even for insured populations. The report presented a robust body of evidence that demonstrated the substantial health and health care benefits of insurance and supported a previous IOM report's conclusion that "health insurance contributes essentially to obtaining the kind and quality of health care that can express the equality and dignity of every person" (IOM, 2004b, p. 159). AHRQ reviewed the impact of uninsurance on many of the measures included in the 2006 NHQR and NHDR and found, for instance, that uninsured individuals were much less likely than those with private or public insurance to have a usual primary care provider (AHRQ, 2008).

The Availability of Data for Disparities Analysis and Reporting

The categories for collection and methods of aggregation for reporting race, ethnicity, and language need data vary across the data sources used to create the NHDR. As previously indicated, the 2008 NHQR and NHDR are comprised of data from a variety of sources; these data sources do not uniformly report on all variables (e.g., poor, White, Black or African American, Hispanic, American Indian or Alaska Native, Asian, and Native Hawaiian or Other Pacific Islander) for each measure. For example, all core quality measures in the NHDR cannot be broken down even into each of the OMB race and Hispanic ethnicity categories. This is evident in the 2008 NHDR where 24 of the 46 core measures are missing data from at least one of the OMB categories. For these 24 measures, reliable data were unavailable for specific groups, most commonly the American Indian or Alaska Native population (AHRQ, 2009a). More recently, AHRQ has indicated that it can analyze most of the core measures by insurance status.

The subcommittee report Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement recommended actions to improve data processes across the health care system. These recommendations, with which the Future Directions committee agrees, are as follows:

  • The necessary variables for disparities measurement (i.e., race, Hispanic ethnicity, granular ethnicity, English language proficiency, and preferred spoken language) should be included in clinical records, surveys, and administrative data.
  • HHS, states, and accreditation and standards setting organizations can require or encourage this adoption through a variety of mechanisms (go to Appendix G).

AHRQ's ability to analyze such data for the national healthcare reports is dependent on the uptake of these recommendations; AHRQ should work with its data partners to increase the availability of these descriptors.

Federally Funded National Surveys

National-level surveys, which include the National Health Interview Survey (NHIS), the Health and Retirement Study (HRS), the National Health and Nutrition Examination Survey (NHANES), and the National Immunization Survey (NIS), are designed—among other purposes—to make comparisons across time, providers, and geographic areas (Madans, 2009). Much of what is known about racial and ethnic disparities has been derived from surveys of the national population (Sequist and Schneider, 2006). For example, the available evidence on health and health care disparities among granular ethnicity groups in the U.S. population is limited primarily to those groups for which there is currently discrete categorization on national survey instruments.

The various federally funded health surveys that provide data for the NHQR and NHDR collect race and Hispanic ethnicity data in the six categories specified by the OMB and a usually common set of 9 to 12 more granular ethnicity categories. For example, the NHIS, National Survey on Drug Use and Health (NSDUH), and Medical Expenditure Panel Survey (MEPS) all include the OMB categories plus Mexican, Cuban, Puerto Rican, Asian Indian, Chinese, Filipino, Japanese, Korean, and Vietnamese categories, among others.11

Many studies using data from large national datasets still often need to pool data over multiple years to get sample sizes sufficient to support reliable inferences and conclusions for racial and ethnic groups. As an example, using logistic regression analyses of MEPS data pooled from 2002 through 2005, AHRQ identified the independent effects of socioeconomic factors on obese adults given advice by a doctor about exercise (AHRQ, 2009a). Without pooling the data, information on subgroups would have been small and less reliable for analysis.

Health Care Facilities and Clinical Data

AHRQ utilizes a variety of clinical data sources in the NHQR and NHDR. The subcommittee found, and the committee concurs, that a lack of standardization of race, ethnicity, and language need variables and categories has been a barrier to the widespread collection, aggregation, and utilization of these data. Hospitals, health plans, and accrediting bodies, for example, have expressed reluctance to implement data collection because they did not have guidance on what exactly to collect (Taylor and Gold, 2009; Weinick et al., 2008). Standardization can promote greater comparability and ability to aggregate data collected by providers or plans, or, for instance, transferred from providers to multiple plans or from multiple plans to a state. The American Recovery and Reinvestment Act of 2009 (ARRA)12 lays out expectations for the collection of race, ethnicity, and language data by specifying the inclusion of these variables in EHRs (CMS, 2010). Clinical data would be valuable for the NHQR and NHDR because provider settings supply data otherwise not collected in surveys or administrative datasets.

Administrative Data

Surveys are useful to capture information for which patients are considered the best reporters (e.g., patient-centeredness), whereas administrative data sources generally provide more reliable and detailed information about aspects of care that are not based on patient recall (e.g., utilization of services, costs, efficiency). Ensuring the collection of race, ethnicity, language need, and SES in Medicare, Medicaid, and Children's Health Insurance Program (CHIP) claims and enrollment data is important to documenting disparities.

As indicated in Table 5-1, the NHQR and NHDR utilize several CMS data sources, including data from the Nursing Home Minimum Dataset and the Home Health Outcomes and Assessment Information Set, but there is potential to use additional CMS data sources, including data from Medicare Part D. As a byproduct of administering the Medicare program, CMS has a wealth of information on enrollment, utilization, and costs, among other variables (McGann, 2009; Reilly, 2009), on the nearly 100 million individuals it insures.13 Thus, Medicaid and Medicare datasets are particularly useful in determining utilization rates for different types of services (IOM, 2002), although they may not contain sufficient clinical information (such as the need for a particular service or its outcome) and they often contain incomplete, inaccurate, or even no data on race, ethnicity, language need, or SES (Bonito et al., 2008).14 These are critical limitations because Medicare and Medicaid claims data are among the few publicly available data sources that would be large enough to provide data on small population subgroups.

Improvement in the collection of race, ethnicity, language need, and SES data in Medicare and Medicaid files is needed. To date, CMS has conducted some preliminary studies using indirect estimation tools to enhance race and ethnicity data obtained through current collection methods. Under the Medicare Improvements for Patients and Providers Act of 2009,15 CMS is required to address quality reporting by race and ethnicity, and a report by CMS detailing its proposed actions is due to be publicly available in 2010.

Using Indirectly Estimated Data

When directly collected race or ethnicity data are incomplete or unavailable in a dataset, estimating the probability of a person's race or ethnicity from other information (e.g., zip code, surname) may be useful. Indirect estimates of race and ethnicity can allow for analyses of associations between race and ethnicity and outcomes of interest. The subcommittee's report recommended that such inferences can be useful when the limits of direct collection of racial and ethnic data have been reached.

One of the simplest indirect approaches is to use area-level population data derived from the Census. Such data include the racial and ethnic composition of an area, as well as socioeconomic measures such as median income, percent in poverty, distribution by years of educational attainment, percent reporting speaking a language other than English at home, and proficiency with English. Substantial literature on the use of "geocoding" in health research compares the effects of using data aggregated to various geographic levels (Fiscella and Fremont, 2006; Fremont et al., 2005; Krieger et al., 2003a,b,c, 2005; Rehkopf et al., 2006; Subramanian et al., 2006); generally, research has concluded that effects are detected more sensitively when data are linked to smaller (more detailed) geographic units.

Additionally, names have been used as indicators of racial and ethnic identity. For some names, there is a corresponding racial and ethnic composition based on self-identification of people with that name in Census data. These data have been summarized in lists of common Spanish and Asian surnames and more specific lists of surnames associated with different Asian-origin ethnicities (Elliott et al., 2008; Fiscella and Fremont, 2006; Wei et al., 2006).

The distributions of race and ethnicity in an area or for a particular name can be interpreted as probabilities that a randomly chosen person from the group (of residents of the area or persons with that name) is a member of each race or ethnicity. Under the assumption that information such as area composition and name are independent given the person's race, the information can be combined using Bayes's theorem to produce a posterior probability for each race and ethnicity (Elliott et al., 2008; Fiscella and Fremont, 2006).

Although the use of indirectly estimated data at the individual level is limited by the probabilistic nature of the data and the consequent possibility of error, the subcommittee concluded—and the committee concurs—that these techniques can be used to bridge gaps for analysis until directly collected data are available. In several illustrative analyses, disparities identified with these methodologies closely matched those identified using self-reported race and ethnicity data (Elliott et al., 2008). However, users of indirectly estimated data should be cautioned against interpreting such data to make conclusions about individual characteristics (e.g., assigning a race to a person's individual medical chart).

Stratifying Quality Measures

The most analytically simple approach to reporting disparities is to calculate and present the differences between groups being compared. The NQF has noted that addressing issues of quality within "vulnerable patient populations" requires stratifying measures by "gender, race, ethnicity, SES, primary language, and insurance status." This chapter's discussion of the rationale for race, ethnicity, language need, SES, and insurance status data highlights the importance of exploring quality measures by these variables. Analyzing these measures within the context of social determinants of health (e.g., neighborhood environments) could also be an effective strategy to explore complex relationships between race, ethnicity, income, education, class, and health care.

Further, the ability to stratify measures by gender and age is important to consider as females, children, and older adults are among AHRQ's priority populations. Studies have shown, for instance, that women with cardiovascular disease are treated less aggressively than men and are less likely to undergo cardiac procedures (Chou et al., 2007). Further stratification may be particularly important, however, in the context of intra- and inter-race variability. Studies that have stratified cardiac care patients by gender and race have found higher rates of clinically appropriate care among men and underuse of clinically appropriate care among Blacks (Epstein et al., 2003), with the lowest rates of clinically appropriate care utilization being for Black women (Steiner and Miller, 2008). In addition, the analysis of disparity measures by age will provide important insight. For example, a measure that depicts receipt of a vaccine by the elderly population could adjust for age to show whether the likelihood of being vaccinated by a given age is the same for all population groups.

With perfect data, AHRQ might be able to control for a variety of factors (e.g., age, gender, SES, comorbid behavioral and health disorders) to determine whether such factors confound or mediate relationships between high-quality care and race or ethnicity. However, these data are not uniformly available. When possible, AHRQ might discuss in text whether uncontrolled factors would likely mitigate or worsen disparities and could also discuss data limitations. The 2008 NHDR includes a table listing AHRQ's ability to stratify the core measures by the OMB race and Hispanic ethnicity categories, and by whether individuals have household incomes less than 100 percent of federal poverty thresholds (AHRQ, 2009a, p. 287).16 The committee commends AHRQ for indicating where reliable data are and are not available and encourages AHRQ to expand its table of data availability to include not only all of the OMB race and Hispanic ethnicity categories, but also availability of granular ethnicity, language need, SES, and insurance status data.

The IOM's 2002 Guidance for the National Healthcare Disparities Report advised AHRQ to present analyses of racial and ethnic disparities that take into account the effect of SES (IOM, 2002). Similarly, the 2008 IOM report State of the USA Health Indicators recommended that data be first presented by race, ethnicity, and SES, and then by race and ethnicity data stratified by SES (e.g., a bar chart in which each part represents an income group within a specific race) (IOM, 2008b). Stakeholders have suggested that data presentation in the NHDR could be further strengthened by stratifying race and ethnicity by SES or, in some cases, controlling for SES via multivariate regressions (IOM, 2008b). AHRQ has only done this to a limited extent (e.g., see pages 199 and 143 of the 2008 NHDR for examples of how AHRQ presents multiple stratifications). Figure 5-2 shows another way in which AHRQ might present such data. This format would allow readers to examine racial, ethnic, and SES aspects of a specific disparity and would show the independent and combined contributions of each of these factors. In the 2008 NHDR, AHRQ presented multivariate regression analyses for three measures: obese adults who were given advice about exercise, people without insurance, and people who have a usual primary care provider (AHRQ, 2009a).

There are both positive and negative implications of controlling for various factors depending on whether they are viewed primarily as confounders or mediators. The IOM report Unequal Treatment acknowledges that income is one of many intervening variables between race, ethnicity, and disparities (IOM, 2003). However, controlling for SES may possibly "mask" the "main effects" of disparities (IOM, 2008b). Moreover, controlling for SES may obscure important differences among providers that deserve attention, such as poorer performance among providers caring for disadvantaged populations or lack of resources available to provide services in low-income areas (Williams, 2008). For these reasons, it is best to present data both with and without adjustment for income and insurance status. One way of teasing out its potential mediating role is by examining the relationships between race, ethnicity, and quality both with and without income included. The committee does not intend that AHRQ report on all measures stratified by all of the above-discussed variables; rather, AHRQ should present data when they reveal disparities or should note that the analyses were performed and did not reveal a disparity.

Recommendation 5: AHRQ should:
  • Continue to stratify all quality measures in the NHDR by at least the OMB race and Hispanic ethnicity categories, by socioeconomic status variables (e.g., income, education), and by insurance status.
  • Strive toward stratifying measures by language need (i.e., English language proficiency and preferred spoken language for health care-related encounters), and extend its analyses in the NHDR and derivative products to include quality measures stratified by more granular ethnicity groups within the OMB categories whenever the data are available.
  • Document shortcomings in the availability of OMB-level race and Hispanic ethnicity data, granular ethnicity data, language need, and socioeconomic and insurance status data to support these analyses; work to enhance the collection of these data in future iterations of the source datasets; and whenever necessary, should utilize alternative valid and reliable data sources to provide needed information even if it is not available nationally.

7 The full text of Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement is available at Exit Disclaimer
8 The OMB's Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity (1997) include a minimum of two ethnic categories: (1) Hispanic or Latino and (2) Not Hispanic or Latino, and five race categories: (1) American Indian or Alaskan Native, (2) Asian, (3) Black or African American, (4) Native Hawaiian or Other Pacific Islander, and (5) White. Federal data collection requires that respondents be allowed to select more than one race.
9 The subcommittee's recommendation to collect English language proficiency and preferred spoken language is closely aligned to how the NQF defines primary language—the self-selected language the patient wishes to use to communicate with his or her health care provider (NQF, 2009).
10 In 2004, the National Research Council of the National Academy of Sciences defined SEP as a "complex concept, encompassing a number of elements of a person's position in society, including economic resources (earnings, income, and wealth), social resources (social networks and connections to community resources), education (formal credentials, communication skills, and health information), and occupation" (NRC, 2004, pp. 33-34).
11 These categories generally correspond to the check-off boxes included in Census 2000, Census 2010, and intercensal American Community Survey (ACS) questions on race and ethnicity.
12 American Recovery and Reinvestment Act of 2009, Public Law 111-5 §3002(b)(2)(B)(vii), 111th Cong., 1st sess. (February 17, 2009).
13 At least 100 million of the 300 million people in the U.S. are served by three programs administered by HHS—Medicare, Medicaid, and community health centers. There were 44.8 million Medicare beneficiaries in 2008, 58.7 million Medicaid and CHIP recipients in 2006, 10 million with dual enrollment, and 8.9 million uninsured or privately insured individuals served by health centers. The U.S. population, as of July 1, 2008, was 304 million (HRSA, 2008; Kaiser Family Foundation, 2009b; U.S. Census Bureau, 2008b).
14 Because Medicare historically relied on the race and ethnicity data individuals provided when they applied for a Social Security number (SSN), racial and ethnic identifiers were limited to "Black," "White," and "Other" responses included on the SSN application form (unless the individual changed enrollment to a specific health plan). Consequently, Medicare data have been of limited use in studying differences in patterns of care for populations identified by the OMB categories (Bilheimer and Sisk, 2008; Bonito et al., 2008; U.S. House Committee on Ways and Means Subcommittee on Health, 2008). The limitations of the Medicare data for race and Hispanic ethnicity have been acknowledged by CMS officials, and CMS is actively working to improve its coding of race and ethnicity within existing datasets (Bonito et al., 2008). As of August 2009, the Social Security Administration (SSA) has updated its SS-5 form (to include all of the OMB race and Hispanic ethnicity categories) (Social Security Administration, 2009). This is an important update as SSA provides demographic information to Medicare.
15 Medicare Improvements for Patients and Providers Act of 2009, Public Law 110-275 §118, 110th Cong., 2d sess. (July 15, 2008).
16 Twenty-three measures are not assessed by income level.

Return to Contents
Proceed to Next Section


Current as of December 2010
Internet Citation: Chapter 5: Enhancing Data Resources: Future Directions for the National Healthcare Quality and Disparities . December 2010. Agency for Healthcare Research and Quality, Rockville, MD.