Page 1 of 1

Chapter 3

Creation of New Race-Ethnicity Codes and SES Indicators for Medicare Beneficiaries - Chapter 3

3. Creating and Validating an Index of Socioeconomic Status

3.1 Introduction

Over the years, there has been considerable empirical evidence accumulated that indicates in the US that health status, mortality, and health services use differ by what has been referred to variously as socioeconomic status, social class, social position or SES. (Braveman et al. 2005) More recently, there has been a growing unease about the accumulation of evidence on the extent of variation in health status, mortality, and health services use that is associated with race and ethnicity (Krieger et al., 2005). While they are different, it is unfortunate that socioeconomic status and race/ethnicity are not independent of one another in their association with health status, mortality, and health services use. This has at times led to the mistaken use of race/ethnicity as a surrogate measure of socioeconomic status.

Because of this, it is particularly important to try to separate the influences of socioeconomic status (SES) and race/ethnicity on health and utilization of health services in our empirical research. Only then will it be possible for policymakers to identify where to place their priorities in the development of ameliorative interventions - to overcome the socioeconomic barriers to accessing timely, appropriate, and good quality care, the sub-cultural values and restricted world view that keep some minorities from taking full advantage of the services available to them, or the prejudice against minorities of providers and the health care system. As we indicated earlier, the first objective of this project is to create and validate a measure of SES to include in analyses of racial/ethnic health care disparities in the use of covered services by Medicare beneficiaries.

Our interest in this issue arises from the use of Medicare claims in the study of racial/ethnic disparities. Medicare beneficiaries enrolled in the fee-for-service program present an ideal opportunity to study racial/ethnic disparities in health status, mortality, and health services use because they have similar health care coverage. The Medicare enrollment database (EDB) contains person-specific information on the demographic characteristics - age, gender, race/ethnicity - of beneficiaries. It also includes information on whether beneficiaries receive additional Government benefits - ranging from help paying their share of premiums to benefits not included in regular Medicare - due to their low income level. It does not, however, include any person-level measures that are typically considered indicators of socioeconomic status.

The EDB does contain residential address information for beneficiaries that, while not in a form that is immediately useable, can with some reasonable effort be transformed into a geocode that corresponds to US Census designated areas (e.g., block groups, tracts, municipalities, counties, ZIP code tabulation areas, states, divisions, regions). These areas have some well-accepted indicators of socioeconomic status reported at least every 10 years. In fact, a literature has developed in Epidemiology, Social Medicine, and Medical Sociology that has established the relevance of SES measures at the level of meaningful homogeneous social aggregates like neighborhoods and communities. It has been shown that such social aggregates reflect common culture, behavior, norms, and values in response to selected symptoms of ill health, health care seeking behavior, as well as demonstrating likely differences in access to services, quality of available care, and discrimination in the provision of services.

Return to Contents

3.2 Prior Work As The Starting Point

We began our SES index development activity based on the work of Dr. Nancy Krieger and colleagues from the Harvard University School of Public Health working on the Public Health Disparities Geocoding Project (Krieger, et al, 2003a) She and her colleagues have published extensively on the development and use of socioeconomic measures to understand disparities in health and health care (Krieger, et al, 2003b; Krieger, et al, 2005; Krieger, et al, 2002a; Krieger et al., 2002b). They have noted the absence of person level socioeconomic status measures in many research areas relying on analyses of administrative data and have promoted the practice of geocoding addresses and the use of area-based measures of socioeconomic status.

Socioeconomic status is a multidimensional concept. Among the dimensions typically associated with SES are occupational status, educational achievement, income, poverty, and wealth. Krieger has identified and employed a number of Census measures that are available to measure many of the dimensions associated with socioeconomic status (SES). These include for occupational status: percentage of the population in the working class (based on percent of persons employed in non-supervisory positions in 8 of 13 occupational groups) and the percentage of the labor force that is unemployed; for the income dimension: the median household income, the percentage of households with income below half of the national median income, and the percentage with household incomes more than four times the national median income; for poverty: the percentage of the population below the Federal poverty level; for wealth: the percentage of households with owner occupied homes valued at four times or more of the national median home value; for the educational dimension: the percentage of the adult population with less than a 12th grade education, and the percentage with at least four years of college education; and crowding: percentage of households with one or more persons per room. Krieger has developed composite socioeconomic status measures based on principal components factor analyses of these and related Census variables for Zip code areas, census tracts, and census block groups in several states. In addition, she and her colleagues have used them in analyses of birth, death, and other public health statistics that can be associated with geographic areas (addresses and geocodes).

As we indicate earlier in this document, while it is possible to analyze Medicare claims to investigate the presence of racial and ethnic disparities in health care utilization, the lack of a readily available measure of socio-economic status to separate the impact of SES from race and ethnicity has been a real limitation to identifying health care disparities associated exclusively with race/ethnicity. It was to create such a measure to make this kind of analysis possible that this sub-task was conceived. The first objective of it was to establish whether it was possible to create a reasonably good single composite measure of SES that could be assigned to individual beneficiaries based on a number of measures of residential area characteristics available from the 2000 US Census. Because we did not have person-level measures of the previously mentioned SES dimension indicators, we instead geocoded each Medicare beneficiary's residential address and identified a FIPS code for that address that links to Census data available at the block group level.

Block groups are a cluster of census blocks having the same first digit of the four-digit identifying numbers within a common census tract. Block groups generally contain between 600 and 3,000 people, with an optimum size of 1,500 people. We have chosen to use block groups rather than the smaller block unit because it is the lowest level Census geographic unit for which we have available the kind of economic measures for the area whose characteristics we can use to represent Medicare beneficiaries' residential areas.

Return to Contents

3.3 Development of the SES Index

The first step in the process of creating a composite SES index for Medicare beneficiaries was to perform a principal components type of factor analysis. We chose to use this type of analysis to quantify into a single index value the contributions of a set of several SES related measures thought to contribute to the measurement of the primary underlying dimension of the measures, which we will refer to as SES. It was our intention that the index we were attempting to produce would be based on the first principal component emerging from the analysis, because the first principal component would account for the greatest variation in the analyzed measures among the block groups and be independent of any other components that might emerge subsequent to it in the analysis.

We performed a principal component analysis on a set of seven measures identified and used previously by Krieger (Krieger, et al, 2003a). These measures are on their face considered related to, and are at times used as proxies for, SES. The measures we included in the principal components analysis were: (1) as a measure of occupation, the percentage of persons in the block group who are 16 years of age and older and in the labor force but are unemployed; (2) as a measure of income, the percentage of persons in the block group living below the federal poverty level; (3) as a related measure of income, a standardized14 measure of the median household income in the block group; (4) as a measure of wealth, a standardized measure of the median value of owner-occupied dwellings in the Block Group; (5) as a measure of educational attainment, the percentage of persons 25 years of age or older with less than a 12th grade education; (6) as a second measure of educational attainment, the percentage of persons 25 years of age or older who completed at least four years of college; and (7) as a measure of crowding related to wealth (based on fact that lower income persons have on average more persons per room than wealthier persons who typically have larger homes), the percentage of households that average one or more persons per room.

We analyzed these variables across the entire set of 211,267 U.S. Census block groups that had all seven measures available. The results of the principal components analysis of the seven SES variables, using data from the block groups, are presented in Table 3.1. To determine whether the first principal component appropriate accounts for most of the variance common to the seven measures, we examined the eigenvalues. A common rule of thumb is that one principal component (in this case the first one is the only one in which we are interested) is adequate to represent the common aspect of the measures when the ratio of the first to second eigenvalue is at least three. In our analyses, this ratio was equal to 2.98, or rounded to 3.0, because the first eigenvalue was 3.85 and the second one was 1.29. Therefore, we were satisfied with extracting only the first principal component.

Table 3.1 Principal Components Analysis of Seven SES Measures: Based on Block-Group Data for 2000 US Census (N = 211,267)

ConstructMeasureDefinitionPrincipal Components Loading
 UnemploymentPercentage of persons aged 16 years or older in the labor force who are unemployed (and actively seeking work)-0.66
 Below US poverty linePercentage of persons below the federally defined poverty line-0.79
 Median income*Median household income0.85
 Property values*Median value of owner-occupied homes0.64
 Low educationPercentage of persons aged > 25 years with less than a 12th-grade education-0.84
 High educationPercentage of persons aged > 25 years with at least 4 years of college0.79
 Crowded householdsPercentage of households containing one or more person per room-0.56

*These variables are standardized to have values ranging from 0 to 100.
Note: Values of loadings are multiplied by -1 so that higher values for the composite scores represent higher SES levels.

The loadings of each of the variables on the first principal component are also displayed in Table 3.1. The loadings can be interpreted as measures of association between the individual measures and the first principal component which we are calling socioeconomic status or SES. The associations are reasonably high and they all run in the anticipated directions. The positive signs indicate that the following block group measures are associated with higher SES: larger percentages of more highly educated, higher median home values, and higher median household incomes. The block group measures with negative signs indicate that those are associated with lower SES: higher percentages of unemployed persons, larger percentages of persons below the federal poverty level, greater percentages of persons with less than a 12th grade education, and higher percentages of households with one or more persons per room.

We attempted to compute SES index scores for all 211,267 block groups in the U.S. according to the formula in Figure 3.1, but there were 3,462 for which the data were missing for some measures and an SES index could not be calculated. The SES index scores were derived by multiplying the measure's values times the respective weights estimated by the principal components analysis and summing them.

Figure 3.1 Scoring Algorithm for SES Index

SES Index Score = 50 + (-0.07*crowded)+(0.08*prop100)+(-0.10*pct_poverty)+




  • crowded = Percentage of households containing one or more person per room
  • prop100 = Median value of owner-occupied values, standardized to range from 0-100
  • pct_poverty = Percentage of persons below the federally defined poverty line
  • hhinc100 = Median household income, standardized to range from 0-100
  • high_educ = Percentage of persons aged > 25 years with at least 4 years of college
  • low_educ = Percentage of persons aged > 25 years with less than a 12th-grade education
  • pct_unemp = Percentage of persons aged 16 years or older in the labor force who are unemployed (and actively seeking work)

The distribution of the SES scores for the block groups is presented in Table 3.2. While the SES scores were calculated to theoretically range from 0 to 100, they actually only ranged from 21 to 78. The scores were grouped as closely as possible into quartiles. The SES index scores for the block groups are presented grouped into quartiles in Table 3.3.

Table 3.2 Distribution of SES Index Scores: Block-Group Data (N= 207,805)

SE Index ScoreN%Cumulative %

Table 3.3 Quartile Distribution of SES Categories: Block-Group Data (N = 207,805)


Next, we calculated the SES index scores for the unweighted sample of 1.96 million Medicare beneficiaries we described in an earlier section of this report. This is the sample of Medicare beneficiaries on which extensive tabulations and limited multivariate modeling are to be performed. The unweighted distribution of their SES index scores is presented in Table 3.4. Note that distribution does not include 390,779 sample members who either did not have a geocode (FIPS code) or whose block group did not contain the needed Census data. The SES index scores only ran from 25 to 78.

Table 3.4 Distribution of SES Index Scores: Based on Unweighted RTI Sample of 1.96 Million Medicare Fee-for-Service Beneficiaries (N = 1,569,342)

SE Index ScoreN%Cumulative %

Note: SES index scores could not be computed for 390,779 sample beneficiaries due to missing Census data or no FIPS code to link to Census data.

The distribution of SES index scores for the sample of Medicare fee-for-service beneficiaries was divided as closely as possible into quartiles. The approximate quartile distribution of the SES index scores (unweighted and weighted) is presented in Table 3.5. Note that the four categories, numbered one to four, respectively, from the category with the lowest index scores to one with highest, are the ones that we used in the tabulations and the multivariate regression analyses. One can think of the SES 1 category as representing Medicare beneficiaries in the lowest SES group, SES 4 as containing Medicare beneficiaries in the highest SES group, and those in SES 2 and SES 3 as falling in between.

We are confident that the SES index we created for this project captures the concept of SES better than any of the individual component measures because the index combines several different aspects into its composition, and the validation that follows will demonstrate that. However, the fact that nearly 20 percent of the sample of Medicare beneficiaries was not successfully geocoded and linked to the Census block group data from which the SES index was created is a definite limitation. It remains for future research on the SES index to determine whether the missing beneficiaries are a serious cause of bias. For now, having the SES index for more than 80 percent of Medicare beneficiaries provides health services researchers with opportunities for research not hitherto available.

Table 3.5 Distribution of SES Categories: Based on RTI Sample of 1.96 Million Medicare Fee-for-Service Beneficiaries (N = 1,569,342)

SES CategoryUnweighted NUnweighted %Weighted NWeighted %
1( 0-49)642,18140.97,967,12529.0
2 (50-52)299,92719.16,614,86324.1
3 (53-56)324,65020.77,214,72126.3
4 (57-100)302,58419.25,650,91120.6

Note: SES index scores could not be computed for 390,779 beneficiaries due to missing Census data or no FIPS code to link to Census data.

14The standardization was accomplished by subtracting the mean of the distribution from each value and dividing by the standard deviation of the distribution.

Return to Contents
Proceed to Next Section

Current as of January 2008
Internet Citation: Chapter 3: Creation of New Race-Ethnicity Codes and SES Indicators for Medicare Beneficiaries - Chapter 3. January 2008. Agency for Healthcare Research and Quality, Rockville, MD.