Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement

5. Improving Data Collection Across the Health Care System (continued)

Educating Patients and Communities

Baker and colleagues (2005, 2007) found that while most patients believe health care providers should collect data on race and ethnicity, minority patients may feel uncomfortable with providing this information. Informing patients that the data are being collected to monitor and improve the quality of care for everyone helps improve patients' comfort level. Thus, in health and health care settings, providing a rationale for asking the questions may make patients and enrollees feel better about responding. The HRET Toolkit provides suggested wording for this purpose: "We want to make sure that all our patients get the best care possible. We would like you to tell us your racial/ethnic background so that we can review the treatment that all patients receive and make sure that everyone gets the highest quality of care."

When Contra Costa Health Plan began requesting these data from its members, call center staff read a script developed from the HRET Toolkit before asking about race and ethnicity. Employees found the script timeconsuming to read in the call center environment, resulting in a reevaluation of its collection methods. The rationale for the data collection is no longer automatically provided in advance; instead, the data are requested when other information, such as the member's address and phone number, are being verified.12 Contra Costa's experience highlights the need for adapting best practices to what will be most successful in specific circumstances. Informing and engaging communities may facilitate data collection efforts. For example, community-based organizations can be informed of the purposes of the data collection and be used as avenues for passing this information on to constituencies. Within health care settings, information pamphlets, cafeteria table tent cards, and posters in languages other than English (Hasnain-Wynia et al., 2007) may help patients and their families understand what is being asked and why.

Using Probabilistic Indirect Estimation Of Race And Ethnicity Data

When direct collection of race and ethnicity data is incomplete or impossible, it may be useful to infer some information about a person's race or ethnicity from other information that is already available or can readily be obtained for use in analyses of associations between race and ethnicity and outcomes of interest. Such inferences can be useful when the limits of direct collection of racial and ethnic data have been reached for a given data system or as an interim measure while data are being collected from individuals. This use of predictive variables rather than direct collection of information from patients is termed "indirect estimation." A number of indirect estimation approaches can be applied to race and ethnicity data, including linking area-level population data from the Census Bureau to quality data, using names for indirect estimation, and attributing Bayesian probabilities to indirectly estimated data.

Linking Area-Level Data to Quality Data

One of the simplest indirect approaches is to use area-level population data derived from the Census. Such data include the racial and ethnic composition of an area (percent in each race and ethnicity category), as well as socioeconomic measures such as median income, percent in poverty, distribution by years of educational attainment, percent reporting limited English proficiency, or an overall indicator of socioeconomic status combining several such measures. Until 2000, these measures were collected from the long-form sample of the decennial Census and released in tabulations by a range of Census geographical units from the state to the block group. More recently, collection of these data has shifted to the American Community Survey, a continuous data collection process from which tabulations are released for 1-, 3-, or 5-year accumulations depending on the level and population of the geographic unit. The numerous applications of the methodology reflect the ease with which addresses can be linked to area data, either by "geocoding" addresses to small areas or by using tabulations for zip code tabulation areas, which approximate postal zip codes.

Analyses with area variables may proceed either by categorizing variables into ranges or by regressing on the numerical value of the variable. For example, researchers might block groups into categories with zero to 10 percent, 10 to 20 percent, and 20 to 30 percent Hispanic residents. If the researchers then found that the block groups with higher concentrations of Hispanic residents also had higher rates of diabetes, a higher rate of diabetes among Hispanics than non-Hispanics might be inferred. Additionally, it is possible to regress the diabetes rate on the percent Hispanic, finding that the diabetes rate increases (along the fitted regression line) by a certain amount (e.g., 0.15 percentage points) for each 1 percentage point increase in the percent Hispanic. Thus, it might be possible to conclude that 0.15 or 15 percentage points is the difference in rates for Hispanics and non-Hispanics. There is a substantial literature on the use of area measures in health research (Krieger et al., 2003a, 2003b, 2003c, 2005), comparing the effects of using data aggregated to various geographic levels; generally, the conclusion has been that effects are detected more sensitively when data are linked to smaller (more detailed) geographic units.

When an outcome is regressed on an area variable defined as the percentage in a particular group (such as the percentage African American or the percentage in poverty), the regression coefficient can be interpreted as the effect of being a member of that group. This analysis, sometimes known as "ecological inference," is technically correct only under the assumption that the outcome is related to individual effects (membership in the group), but not to the degree of concentration of the group in the area. For example, diabetes rates are higher for African Americans than for Whites; if rates for each group were uniform across the country (and assuming for presentation that there are only these two groups), the average rate in each area would be directly related to the percent African American. In fact, the rate would be a weighted average of the rates for the two groups, where the weights are the percentages of each group in the area; in other words, the effects would be purely compositional. The assumption of uniformity could be violated, however, if African Americans in highly segregated areas have different socioeconomic and health characteristics (e.g., probability of having diabetes) than their counterparts living in integrated areas.

Because of concerns about such possible "noncompositional" effects, the literature on the use of area effects often regards effects of area-level race and ethnicity measures as representing a combination of compositional effects (the average of effects of individual-level characteristics across the population of the area) and contextual effects (the effects of being in an area of a certain kind). By this logic, the area-level variables might be relevant to include in models even when individual-level measures are available and included. When individual-level variables are not available, the area composition variables can allow only approximate estimation of disparities at the individual level. However, results from area-level analyses can still be very useful in revealing disparities. For example, if residents of areas with high proportions of African American residents are shown to have higher rates of a health or health care problem than those in areas with few African American residents, this is good evidence for disparities even if a precise estimate of average African American-White differences cannot be obtained.

The accuracy of this method is directly dependent on the proportion of the targeted group in the particular area. Community rates of racial and ethnic segregation will affect the method's accuracy in catchment areas. This method also generally works better for African Americans than for other racial and ethnic groups because their rates of segregation, particularly in Eastern cities, are much higher than those of other groups. Also, rates may differ considerably depending on the unit of analysis (e.g., ZIP code, Census tract, Census block). Smaller units may be more useful, particularly for groups with lower numbers in the community. ZIP code data are readily available, while analysis using Census blocks or block groups requires the additional step of geocoding addresses to the relevant unit of analysis.

Data collection efforts that include an individual's address can be useful for indirectly estimating race and ethnicity. EHR standards and other administrative databases (e.g., registration and billing) include demographic data elements such as address and date of birth (Certification Commission for Healthcare Information Technology, 2007). Appropriate handling of these data is important because addresses are highly identifiable. HIPAA Privacy Rule requirements for deidentifying data protect individuals but may, in some cases, raise barriers to exchanging address data, as is sometimes necessary for indirect estimation processes.

Using Names for Indirect Estimation

Names have been used as indicators of racial and ethnic identity. For each name there is a corresponding racial and ethnic composition based on self-identification of people with that name in Census data. These data have been summarized in lists of common Spanish and Asian surnames and more specific lists of surnames associated with different Asian-origin ethnicities (Elliott et al., 2008; Fiscella and Fremont, 2006; Sweeney et al., 2007; Wei et al., 2006), but the exact race and ethnicity of those with each name are more informative. For example, a large proportion of those with the surname "Rodriguez" are Hispanic, while those with the surname "Lee" might include substantial proportions of Asian Americans, African Americans, and Whites. While surnames are not useful for identifying groups without distinctive ethnicity-related surnames, identification of African Americans through distinct given names has shown some success (Ting, 2009).

Attributing Bayesian Probabilities to Indirectly Estimated Data

The distributions of race and ethnicity in an area or for a particular name can be interpreted as probabilities that a randomly chosen person from the class (of residents of the area or persons with that name) is a member of each race or ethnicity. For example, if all one knows about an individual is that he lives in a block group in which 37 percent of the residents are African American, one might say there is a 37 percent probability that he is African American. Similar statements can be made using names. Note that the information about race and ethnicity obtained in this way is probabilistic rather than deterministic: even if someone's block group is 90 percent Hispanic, one can say only that there is a 90 percent chance he is Hispanic, not that he is definitely Hispanic.

An important benefit of this formulation is that probabilities from different pieces of information can be combined formally to generate a summary combined probability. Technically, under the assumption that the two pieces of information—block group composition and name—are independent given the person's race, they can be combined using Bayes's theorem to produce a posterior probability for each race and ethnicity that summarizes the two pieces of information (Elliott et al., 2008; Fiscella and Fremont, 2006; Fremont et al., 2005). In particular, the racial and ethnic proportions in a small area can be regarded as prior probabilities that an individual from that area would be from each race and ethnicity group, while the probabilities that a person from each race and ethnicity would have the individual's name (e.g., the probability that a Hispanic would have the name "Gomez," the probability that a non-Hispanic White would have the name "Gomez") constitute the likelihood for each race and ethnicity. For example, a person named "Gomez" in a block group that is 50 percent Hispanic is more likely to be Hispanic than either a person named "Smith" in a block group that is 50 percent Hispanic or a person named "Gomez" in a block group that is 20 percent Hispanic.

The assumptions for this application of Bayes's theorem are not likely to hold exactly; for example, a Hispanic in an area of Hispanic concentration (perhaps with many recent immigrants) might be more likely to have the name "Gomez" and less likely to have the name "Smith" than a Hispanic in an integrated area. Nonetheless, this probability calculation provides a principled way of combining multiple indicators of race and ethnicity. This procedure has been implemented with health plan datasets (Elliott et al., 2008). A similar procedure, but using ad hoc rules based on lists and cutoffs rather than formal probability calculations, was used to create a file of imputed racial and ethnic identifications for Medicare beneficiaries (Bonito et al., 2008).

Combining evidence about individuals in this way will tend to improve the accuracy of predictions in the sense that individuals' probabilities of belonging to each race and ethnicity will become more differentiated and therefore more informative. For this reason, a combined approach is preferable when possible. However, the fact that these are still only probabilities and not certainties has several implications for the use of indirectly estimated race and ethnicity. First, collapsing probabilities to a single imputed racial and ethnic classification for each individual loses useful information and can be misleading. For example, suppose each person is assigned the race and ethnicity classification with the highest probability. Then in a population of individuals for whom the probability of being non-Hispanic White is 60 percent and the probability of being African American is 40 percent, all of those individuals would be classified as non-Hispanic White, although the proper inference would be that the split is 60/40 percent. Another classification approach would be to impute randomly from the given probabilities (in the previous example to divide the population randomly in a 60/40 ratio). While this approach would yield a more realistic distribution of race and ethnicity for the group, the random imputations would have no relationship to any actual differences between Whites and African Americans, and therefore an analysis using this approach would, perhaps falsely, lead to the conclusion that there are no health differences between the two groups. For these reasons, it is essential to record probabilities from indirect estimation rather than a single assignment.

On the other hand, probabilities can be used analytically to draw useful conclusions about disparities. As described above, regressing on probabilities can generate estimates of racial and ethnic differences, although these estimates are valid only under the assumption that variations in outcomes of interest within each racial and ethnic group are uncorrelated with the calculated probabilities. In several illustrative analyses, disparities identified with this methodology closely matched those identified using individual race and ethnicity variables (Elliott, 2009). For example, for estimates of disparities for Black versus White, Hispanic versus White, and Asian versus White, the sign of the coefficient based on indirectly estimated data matched that based on self-reported data 38 out of 39 times, with a significance level of 0.05 (Elliott, 2009).

Using Indirectly Collected Data

Indirect race and ethnicity identifications can be used in quality improvement efforts when direct identifications are unavailable (Box 5-6). In addition to aggregate analyses such as those described above, they can be used in examining characteristics of patients who suffered specific health problems or health care deficits. For example, mapping of the residences of such patients together with indirectly derived race and ethnicity could illuminate patterns of problems that could be addressed through targeted interventions. To plan services and conduct community-based targeted interventions, NQF recommends using proxy data from geocoding, surname analysis, and Bayesian estimation. NQF's recommendation also states that indirectly estimated data should not be used to target interventions for individual patients (NQF, 2008).

Indirect methods are best applied to population-based assessments of quality of care and can be used to identify "hotspots" where individuals who are at risk of or are receiving poor care are clustered. Knowing that a provider group's service area overlaps with a hotspot can be instructive, allowing the group to improve service delivery to specific communities. While targeting entire hotspots may be relatively ineffective for plans that do not dominate the market, community interventions in which plans pool efforts may be cost-effective (Fremont, 2009).

Box 5-6. The Use of Indirectly Collected Data by a Health Plan: Wellpoint, Inc.

Wellpoint, the largest member of the BlueCross BlueShield Association, recognizes that while it is preferable to collect race, ethnicity, and language data via self-report, plans often encounter data collection plateaus due to the costs of adding data collection and storage fields to Health IT systems, the costs of multiple attempts at collection, inaccurate data from external entities, and internal legality concerns. Wellpoint partnered with the RAND Corporation to develop a low-cost, easy-to-implement alternative to collecting primary source data. The initiative resulted in an analytic model for indirectly estimating race and ethnicity using a combination of geocoding, surname analysis, a proprietary African American first-name list, and logistic regression.
The indirectly estimated data can be used to examine differences among groups in various health indicators by linking proxy race and ethnicity data with member claims data and quality process measures. The data are also used to develop maps used for business decisions regarding the design of quality improvement programs and community collaboration projects. In 2008, Wellpoint began using the proxy data to channel culturally and linguistically appropriate screening reminder messages to members. The indirect methodology allows analysis of members who do not respond to requests for self-reported data, decreases the selection bias among self-reported respondents, and makes plan, regional, and practice-level analysis more accurate.

Sources: NCQA, 2008; Ting, 2009.

The use of indirectly estimated data at the individual level is limited by the probabilistic nature of the data and the consequent possibility of error. The subcommittee has considered a number of potential uses of indirect estimates, ranging from those that posed very little risk of harm to the patient to those that posed unacceptable risk. At one end of this spectrum, using indirect estimation to target mail distribution of health information tailored to the needs, language, or cultural style of a particular group would at worst lead to some misdirected and wasted mailing. At the other end, erroneous assumptions about race and ethnicity in personal contacts with patients could lead to offense and mistrust. In particular, the subcommittee finds that the clinical and interpersonal risks of including indirectly estimated identifications in individuals' medical records far outweigh any potential benefits given the danger of misreading the identification as certain, the likely interpersonal costs of such misreading, and the possibility of clinical consequences from relying on erroneous identification. Instead, if indirect estimation of race and ethnicity is to be used, the estimated probabilities should be stored in a system that is distinct from medical records but can be merged with medical record data to create analytic files for identification of disparities.

Recommendation 5-1: Where directly collected race and ethnicity data are not available, entities should use indirect estimation to aid in the analysis of racial and ethnic disparities and in the development of targeted quality improvement strategies, recognizing the probabilistic and fallible nature of such indirectly estimated identifications.

  • Race and ethnicity identifications based on indirect estimation should be distinguished from self-reports in data systems, and if feasible, should be accompanied by probabilities.
  • Interventions and communications in which race and ethnicity identifications are based on indirect estimation may be better suited to population-level interventions and communications and less well suited to use in individual-level interactions.
  • An indirectly estimated probability of an individual's race and ethnicity should never be placed in a medical record or used in clinical decision making.
  • Analyses using indirectly estimated race and ethnicity should employ statistically valid methods that deal with probabilistic identifications.


There are both opportunities for and challenges to the collection of data on race, ethnicity, and language need at all organizational levels in the U.S. health care system. The infrastructure of the current health care system does not facilitate the data exchanges necessary to capture race, ethnicity, and language data for all populations. No one locus of data collection has a clearly superior balance of opportunities and challenges and the ability to serve as the primary data collection point for a large fraction of the U.S. population. Until such a clearly preferred locus of data collection emerges, it will be necessary for existing entities to collect these data using standardized categories and work to develop methods and policies for sharing the data so as to reduce the duplication of effort that occurs when all entities attempt to collect the data at most or all encounters.

All entities should collect these data, knowing their limitations and constraints, and implement steps to address these limitations and constraints. These steps can improve data collection processes by addressing Health IT constraints and minimizing respondent and organizational resistance. To enhance legacy Health IT systems, standardized communication protocols are needed to permit interoperability, and some systems will require upgrading. Training staff and educating communities about the importance of collecting race, ethnicity, and language data for improving health and the quality of health care are also necessary.

Direct collection of race and ethnicity data is preferable to observation and to indirect methods. When direct collection is impossible or has not been completed, however, indirect approaches can be employed. These approaches include linking area-level population data from the Census to quality data, using data like names to infer race and ethnicity, and attributing Bayesian probabilities to indirectly estimated data. At the same time, indirect estimates are always inferior to data obtained directly from individuals, and data based on indirect estimation should never be included in an individual's medical record.


AHA (American Hospital Association). 2008. Annual survey of hospitals and health systems. Chicago, IL: American Hospital Association.

AHIP (America's Health Insurance Plans). 2009. A legal perspective for health insurance plans: Data collection on race, ethnicity, and primary language. Washington, DC: America's Health Insurance Plans.

Baker, D. W., K. A. Cameron, J. Feinglass, P. Georgas, S. Foster, D. Pierce, J. A. Thompson, and R. Hasnain-Wynia. 2005. Patients' attitudes toward health care providers collecting information about their race and ethnicity. Journal of General Internal Medicine 20(10):895-900.

Baker, D. W., R. Hasnain-Wynia, N. R. Kandula, J. A. Thompson, and E. R. Brown. 2007. Attitudes toward health care providers, collecting information about patients' race, ethnicity, and language. Medical Care 45(11):1034-1042.

Barnes, P. M., P. F. Adams, and E. Powell-Griner. 2008. Health characteristics of the Asian adult population: United States, 2004-2006. Hyattsville, MD: National Center for Health Statistics.

Beal, A. C. 2004. Policies to reduce racial and ethnic disparities in child health and health care. Health Affairs 23(5):171-179.

Beal, A. C., M. M. Doty, S. E. Hernandez, K. K. Shea, and K. Davis. 2007. Closing the divide: How medical homes promote equity in health care: Results from The Commonwealth Fund 2006 Health Quality Survey. New York: The Commonwealth Fund.

Blumenthal, D. 2009. Stimulating the adoption of health information technology. New England Journal of Medicine 360(15):1477-1479.

Bonito, A. J., C. Bann, C. Eicheldinger, and L. Carpenter. 2008. Creation of new race-ethnicity codes and socioeconomic status (SES) indicators for Medicare beneficiaries. Final report, sub-task 2. Rockville, MD: RTI International.

Buescher, P. A., Z. Gizlice, and K. A. Jones-Vessey. 2005. Discrepancies between published data on racial classification and self-reported race: Evidence from the 2002 North Carolina live birth records. Public Health Reports 120(4):393-398.

Certification Commission for Healthcare Information Technology. 2007. Certification of ambulatory EHRs. Chicago, IL: Certification Commission for Healthcare Information Technology.

Chin, M. H., A. C. Kirchhoff, A. E. Schlotthauer, J. E. Graber, S. E. Brown, A. Rimington, M. L. Drum, C. T. Schaefer, L. J. Heuer, E. S. Huang, M. E. Shook, H. Tang, and L. P. Casalino. 2008. Sustaining quality improvement in community health centers: Perceptions of leaders and staff. Journal of Ambulatory Care Management 31(4):319-329.

Coltin, K. 2009. Implementation challenges for health plan collection of race, ethnicity & language data. Harvard Pilgrim Health Care. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 9, 2009. Washington, DC. PowerPoint presentation.

Craemer, T. 2009 May 25. Can a survey change one's race? An experiment on context effects and racial self-classification. Paper presented at the Annual Meeting of the American Association for Public Opinion Association, Fontainebleau Resort, Miami Beach, FL.

De Milto, L. 2009. Bolstering electronic health records at four community health centers in Chicago with race, ethnicity and language data. Princeton, NJ: Robert Wood Johnson Foundation.

Elliott, M. 2009. Use of indirect measures of race/ethnicity to target disparities. RAND Corporation. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, March 12, 2009. Newport Beach, CA. PowerPoint presentation.

Elliott, M. N., A. Fremont, P. A. Morrison, P. Pantoja, and N. Lurie. 2008. A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. Health Services Research 43(5):1722-1736.

Ezzati-Rice, T. M., and L. R. Curtin. 2001. Population-based surveys and their role in public health. American Journal of Preventive Medicine 20(4):15-16.

Fiscella, K., and A. M. Fremont. 2006. Use of geocoding and surname analysis to estimate race and ethnicity. Health Services Research 41(4p1):1482-1500.

Foley, K. L., J. Manuel, and M. Vitolins. 2005. The utility of self-report in medical outcomes research. Evidence-Based Healthcare and Public Health 9(3):263-264.

Freeman, G., and M. Lethbridge-Cejku. 2006. Access to health care among Hispanic or Latino women: United States, 2000-2002. Advance Data (368):1-25.

Fremont, A. 2009. Practical applications of indirect estimates of race/ethnicity & lessons learned. RAND Corporation. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, March 12, 2009. Newport Beach, CA. PowerPoint presentation.

Fremont, A. M., A. Bierman, S. L. Wickstrom, C. E. Bird, M. Shah, J. J. Escarce, T. Horstman, and T. Rector. 2005. Use of geocoding in managed care settings to identify quality disparities. Health Affairs 24(2):516-526.

Gallegos, J., E. Mulamula, J. Patnosh, and C. Ulmer. 2008. Serving patients with limited English proficiency: Results of a community health center survey. Bethesda, MD: National Association of Community Health Centers.

Hahn, R. A., B. I. Truman, and N. D. Barker. 1996. Identifying ancestry: The reliability of ancestral identification in the United States by self, proxy, interviewer, and funeral director. Epidemiology 7(1):75-80.

Hasnain-Wynia, R. 2007. Collecting race, ethnicity, and primary language data in small physician practices. Chicago, IL: Health Research and Educational Trust/AHA. PowerPoint presentation.

Hasnain-Wynia, R., D. Pierce, and M. A. Pittman. 2004. Who, when, and how: The current state of race, ethnicity, and primary language data collection in hospitals. New York: The Commonwealth Fund.

Hasnain-Wynia, R., D. Pierce, A. Haque, C. H. Greising, V. Prince, and J. Reiter. 2007. Health Research and Educational Trust Disparities Toolkit. www.hretdisparities.org (accessed December 18, 2008).

Hasnain-Wynia, R., J. Yonek, D. Pierce, R. Kang, and C. H. Greising. 2006. Hospital language services for patients with limited English proficiency: Results from a national survey. Chicago, IL: Health Research and Educational Trust/AHA.

Healthcare Financial Management Association. 2007. The emergency department as admission source. Westchester, IL: Healthcare Financial Management Association.

Higgins, P. C., and E. F. Taylor. 2009. Measuring racial and ethnic disparities in health care: Efforts to improve data collection. Princeton, NJ: Mathematica Policy Research.

HL7 (Health Level 7). 2009. What is HL7? http://www.hl7.org/about/hl7about.htm (accessed May 22, 2009).

HRSA (Health Resources and Services Administration). 2009. The Health Center Program: Program assistance letter 2009-02, Uniform Data System changes for calendar year 2009. Rockville, MD: U.S. Department of Health and Human Services.

Hurley, R., L. Felland, and J. Lauer. 2007. Issue Brief No. 116: Community health centers tackle rising demands and expectations. Washington, DC: Center for Studying Health System Change.

Jha, A. K., C. M. DesRoches, E. G. Campbell, K. Donelan, S. R. Rao, T. G. Ferris, A. Shields, S. Rosenbaum, and D. Blumenthal. 2009. Use of electronic health records in U.S. hospitals. New England Journal of Medicine 360(16):1628-1638.

Kagawa-Singer, M., and N. Pourat. 2000. Asian American and Pacific Islander breast and cervical carcinoma screening rates and Healthy People 2000 objectives. Cancer 89(3):696-705.

Kmetik, K. 2009. American Medical Association. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 10, 2009. Washington, DC. PowerPoint Presentation.

Krieger, N., J. T. Chen, P. D. Waterman, D. H. Rehkopf, and S. V. Subramanian. 2003a. Race/ethnicity, gender, and monitoring socioeconomic gradients in health: A comparison of area-based socioeconomic measures—the Public Health Disparities Geocoding Project. American Journal of Public Health 93(10):1655-1671.
— 2005. Painting a truer picture of US socioeconomic and racial/ethnic health inequalities: The Public Health Disparities Geocoding Project. American Journal of Public Health 95(2):312-323.

Krieger, N., J. T. Chen, P. D. Waterman, M. J. Soobader, S. V. Subramanian, and R. Carson. 2003b. Choosing area based socioeconomic measures to monitor social inequalities in low birth weight and childhood lead poisoning: The Public Health Disparities Geocoding Project. Journal of Epidemiology and Community Health 57(3):186-199.

Krieger, N., P. D. Waterman, J. T. Chen, M. J. Soobader, and S. V. Subramanian. 2003c. Monitoring socioeconomic inequalities in sexually transmitted infections, tuberculosis, and violence: Geocoding and choice of area-based socioeconomic measures—The Public Health Disparities Geocoding Project (US). Public Health Reports 118(3):240-260.

Lurie, N. 2009. Needed: National standardization of race/ethnicity data to address health disparities. RAND Corporation. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 9, 2009. Washington, DC. PowerPoint presentation.

Madans, J. H. 2009. Race/ethnic data collection: Population surveys and administrative records. National Center for Health Statistics. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 9, 2009. Washington, DC. PowerPoint® presentation.

Maizlish, N., and L. Herrera. 2006. Race/ethnicity in medical charts and administrative databases of patients served by community health centers. Ethnicity and Disease 16:483-487.

Manatt Health Solutions and RSM McGladrey. 2007. Improving commercial reimbursement for community health centers: Case studies and recommendations for New York. New York: RCHN Community Health Foundation.

Massachusetts Executive Office of Health and Human Services. 2009. FY2007 inpatient hospital discharge database documentation manual. Boston, MA: Division of Health Care Finance and Policy.

Mays, V. M., S. D. Cochran, and N. A. Ponce. 2004. Thinking about race and ethnicity in population-based studies of health. In Race & research, perspectives on minority participation in health studies. Washington, DC: American Public Health Association.

Mutha, S., R. Do, and N. Solomon. 2008. Incorporating cultural competence into pay-for-performance. Paper presented at Quality Health Care for Culturally Diverse Populations, September 22, 2008, Minneapolis, MN.

National Association of Community Health Centers. 2006. 2006 Data on Community Health Centers, summary of findings. Bethesda, MD: National Association of Community Health Centers.

NCHS (National Center for Health Statistics). 2009. Health, United States, 2008 . Hyattsville, MD: Department of Health and Human Services.

NCQA (National Committee for Quality Assurance). 2006. Innovative practices in multicultural health care. Washington, DC: NCQA.
— 2008. Innovative practices in multicultural health care. Washington, DC: NCQA.
— 2009. Supporting small practices: Lessons for health reform. Washington, DC: NCQA.

Nerenz, D. R., and D. Darling. 2004. Addressing racial and ethnic disparities in the context of Medicaid managed care: A six-state demonstration project. Rockville, MD: HRSA.

Nerenz, D. R., C. Currier, and K. Paez, eds. 2004. Collection of data on race/ethnicity by private sector organizations, results of a medical group survey. In Eliminating disparities: Measurement and data needs, p.249-271. Washington, DC: The National Academics Press.

NQF (National Quality Forum). 2008. National voluntary consensus standards for ambulatory care—measuring healthcare disparities. Washington, DC: National Quality Forum.

NRC (National Research Council). 2009. Principles and practices for a federal statistical agency: Fourth edition. Edited by C. F. Citro, M. E. Martin and M. L. Straf. Washington, DC: The National Academics Press.

OMB (Office of Management and Budget). 1997. Recommendations from the Interagency Committee for the Review of the Racial and Ethnic Standards to the Office of Management and Budget concerning changes to the standards for the classification of federal data on race and ethnicity. Federal Register (3110-01):36873-36946.

Palaniappan, L. P., E. C. Wong, J. J. Shin, M. R. Moreno, and R. Otero-Sabogal. 2009. Collecting patient race/ethnicity and primary language data in ambulatory care settings: A case study in methodology. Health Services Research. http://www3.interscience.wiley.com/cgi-bin/fulltext/122465240/PDFSTART (accessed September 3, 2009).

Ponce, N. A., N. Chawla, S. H. Babey, M. S. Gatchell, D. A. Etzioni, B. A. Spencer, E. R. Brown, and N. Breen. 2006. Is there a language divide in Pap test use? Medical Care 44(11):998-1004.

Rachman, F. 2007. Chicago Alliance of Community Health Centers pioneers EHR implementation with AHRQ support Rockville, MD: AHRQ.

Regenstein, M., and D. Sickler. 2006. Race, ethnicity, and language of patients: Hospital practices regarding collection of information to address disparities in health care. Princeton, NJ: Robert Wood Johnson Foundation.

Reynolds, P. P. 1997. The federal government's use of Title VI and Medicare to racially integrate hospitals in the United States, 1963 through 1967. American Journal of Public Health 87(11):1850-1858.

Romano, P. S., J. J. Geppert, S. Davies, M. R. Miller, A. Elixhauser, and K. M. McDonald. 2003. A national profile of patient safety in U.S. Hospitals. Health Affairs 22(2):154-166.

Rosenthal, M. B., B. E. Landon, S. L. Normand, T. S. Ahmad, and A. M. Epstein. 2009. Engagement of health plans and employers in addressing racial and ethnic disparities in health care. Medical Care Research and Review 66(2):219-231.

Shields, A. E., P. Shin, M. G. Leu, D. E. Levy, R. M. Betancourt, D. Hawkins, and M. Proser. 2007. Adoption of health information technology in community health centers: Results of a national survey. Health Affairs 26(5):1373-1383.

Siegel, B., J. Bretsch, K. Jones, V. Sears, L. Vaquerano, and M. J. Wilson. 2008. Expecting Success: Excellence in cardiac care. Results from Robert Wood Johnson Foundation Quality Improvement Collaborative. Princeton, NJ: Robert Wood Johnson Foundation.

Siegel, B., M. Regenstein, and K. Jones. 2007. Enhancing public hospitals' reporting of data on racial and ethnic disparities in care. New York: The Commonwealth Fund.

Sweeney, C., S. L. Edwards, K. B. Baumgartner, J. S. Herrick, L. E. Palmer, M. A. Murtaugh, A. Stroup, and M. L. Slattery. 2007. Recruiting Hispanic women for a population-based study: Validity of surname search and characteristics of nonparticipants. American Journal of Epidemiology 166(10):1210-1219.

Taylor, J. 2004. The fundamentals of community health centers. Washington, DC: The George Washington University National Health Policy Forum.

Ting, G. 2009. Applications of indirect estimation of race/ethnicity data in health plan activities. Wellpoint. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, March 12, 2009. Newport Beach, CA. PowerPoint presentation.

U.S. Census Bureau. 2000. Census 2000 summary file 1: 100-percent data. Washington, DC: U.S. Census Bureau.

Wei, I. I., B. A. Virnig, D. A. John, and R. O. Morgan. 2006. Using a Spanish surname match to improve identification of Hispanic women in Medicare administrative data. Health Services Research 41(4):1469-1481.

Weinick, R. M., J. M. Caglia, E. Friedman, and K. Flaherty. 2007. Measuring racial and ethnic health care disparities in Massachusetts. Health Affairs 26(5):1293-1302.

Weinick, R. M., K. Flaherty, and S. J. Bristol. 2008. Creating equity reports: A guide for hospitals. Boston, MA: The Disparities Solution Center at Massachusetts General Hospital.

West, C. N., A. M. Geiger, S. M. Greene, E. L. Harris, I. L. Liu, M. B. Barton, J. G. Elmore, S. Rolnick, L. Nekhlyudov, A. Altschuler, L. J. Herrinton, S. W. Fletcher, and K. M. Emmons. 2005. Race and ethnicity: Comparing medical records to self-reports. Journal of the National Cancer Institute. Monographs.(35):72-74.

Williams, D. R. 1998. The quality of racial data. Washington, DC: U.S. Department of Health and Human Services.

Zandieh, S. O., K. Yoon-Flannery, G. J. Kuperman, D. J. Langsam, D. Hyman, and R. Kaushal. 2008. Challenges to EHR implementation in electronic- versus paper-based office practices. Journal of General Internal Medicine 23(6):755-761.


1 A PHR is a medical or health record owned and maintained by a patient him- or herself. EHRs are further defined in Chapter 6.
2 American Recovery and Reinvestment Act of 2009, Public Law 111-5 § 3002(b)(2)(B)(vii), 111th Cong., 1st sess. (February 17, 2009).
3 The Civil Rights Act of 1964, Public Law 88-352, 78 Stat. 241, 88th Cong., 2d sess. (July 2, 1964).
4 The Social Security Act of 1965 , 89th Cong., 42 U.S.C. § 7, 1st sess. (July 30, 1965).
5 More than 3 million of the 16 million users were recorded in 2007 under "Unreported/Refused to report." With these two categories being combined, it is impossible to tell if there was actually no data collection or if a large portion of people refused to respond.
6 Health Insurance Portability and Accountability Act of 1996, Public Law 104-191, 104th Cong., 2d sess. (August 21, 1996).
7 Version 4010 of the X12 standards defines the 834 enrollment transaction. Version 5010 was adopted in January 2009 and must be implemented by January 1, 2012. Under this version, the transaction will still need to come from a plan sponsor or employer, and as sponsors and employers are not covered entities under HIPAA, they are not required to use the enrollment standard (Personal communication, L. Doo, Office of E-Health Standards and Services, Centers for Medicare & Medicaid Services, July 14, 2009).
8 As of July 2, 2009, the "Affordable Health Choices Act" included provisions that the Secretary of HHS shall streamline and simplify standards for electronic enrollment, including capability for individual enrollees to manage their enrollment online.
9 NHPC was established in 2004 and included 11 national plans with more than 87 million members. As of 2009, its activities are coordinated by America's Health Insurance Plans (AHIP).
10 American Recovery and Reinvestment Act of 2009, Public Law 111-5 § 3002(b)(2)(B)(vii), 111th Cong., 1st sess. (February 17, 2009).
11 Personal communication, O. Carter-Pokras, University of Maryland School of Public Health, April 13, 2009.
12 Personal communication, O. Tiutin, Contra Costa Health Plan, July 10, 2009.

Page last reviewed May 2018
Page originally created September 2012
Internet Citation: 5. Improving Data Collection Across the Health Care System (continued). Content last reviewed May 2018. Agency for Healthcare Research and Quality, Rockville, MD. https://www.ahrq.gov/research/findings/final-reports/iomracereport/reldata5a.html
Back To Top