4. Defining Language Need and Categories for Collection (continued)
Language Categories to be Used by Health Care Entities
The subcommittee considered whether a single limited list of languages (e.g., the top 10 or top 40 nationwide) should be used by all health care entities for quality improvement purposes. A precedent exists for recommending use of such a list—the HRET Toolkit, endorsed by the National Quality Forum (NQF) for achieving more culturally competent organizations. The subcommittee reviewed Census data to determine the usefulness of such lists. However, the subcommittee concludes that the language of each individual must be captured, regardless of whether that language is present on any list developed to facilitate data collection and analysis locally or nationally.
Top Languages Nationally
The subcommittee first reviewed Census data on the top 10 languages reported to be spoken most frequently at home besides English:
- Spanish (28.1 million)
- Chinese (2.0 million)
- French (1.6 million)
- German (1.4 million)
- Tagalog (1.2 million)
- Vietnamese (1.0 million)
- Italian (1.0 million)
- Korean (0.9 million)
- Russian (0.7 million)
- Polish (0.7 million) (Shin and Bruno, 2003; U.S. Census Bureau, 2003j)11
A list of these 10 languages would cover 38.6 of the 46.9 million U.S. residents who speak a language other than English at home—a figure that might argue for all entities to use this list for collecting data on language needs. However, analysis reveals that this list fails to capture the top 10 languages in each state, as shown in a sample of four states (Figures 4-3a-d). Numerous additional languages important for state-level planning—Navajo, Bengali, Afrikaans, Hindi, Dakota, Norwegian, Laotian, Amharic, Cushite, Hmong, Arabic, Urdu, Tagalog, Persian, Portuguese, Mon-Khmer—are among the top languages spoken in just these four states. Likewise, while Spanish is among the top 10 languages in 3,122 of 3,141 counties in the United States, numerous other languages are often at the top?for example, Turkish in 12 counties, Laotian in 125, Navaho in 74, Serbo-Croatian in 58, and Portuguese in 229 (U.S. English Foundation, 2009a, 2009b). Thus, focusing on the collection of language data to a top 10 national list would not always be useful even for system-level planning for states and counties, and certainly would not capture the diversity among states or smaller jurisdictions or the specific needs faced by hospitals, health plans, or individual provider practices. However, similar approaches have been used for some national purposes; for example, section 118 of Medicare Improvements for Patients and Providers Act of 2008 (MIPPA) requires translation of the Medicare Savings Program application form, at a minimum, into the 10 languages most used by persons applying for the program.12
Additionally, some of the top 10 languages nationally are declining in use, while others are increasing because of changing immigration patterns. The numbers of Italian, German, and Polish speakers have declined, while the numbers of Spanish, Vietnamese, Chinese, Russian, Tagalog, Korean, Arabic, and French Creole speakers have increased substantially since 1990 (Shin and Bruno, 2003). The number of Spanish speakers has increased by 62 percent since 1990, while the number speaking other Indo-European languages has increased by just 14 percent, Asian and Pacific Islander languages by 55.6 percent, and all other languages by 51.2 percent (Shin and Bruno, 2003).
The subcommittee then reviewed a longer list based on the 39 languages on which the Census routinely reports, consisting of 30 individual languages and the rest groups of languages (Table 4-4). The HRET Toolkit guidance for hospital collection of demographic data includes 35 language or language group choices; it also provides additional options for inclusion in the data system, such as the patient declined to answer. The HRET Toolkit list closely mirrors but improves upon the commonly reported Census categories by adding American Sign Language. The State of California requires under SB853 that each health plan survey its enrollees to understand the language needs of its members (CPEHN, 2008). Table 4-4 includes the language categories of one such survey, by Anthem Blue Cross, fielded in spring 2009. That list includes 37 individual languages or dialects, and also distinguishes between American and other sign languages and recognizes other communication difficulties, including hearing and speech loss (Ting, 2009). The list has many of the elements of the Census and HRET lists but incorporates several additional languages specific to its service population.
In reviewing the applicability of the 39 Census-reported languages for national use, the subcommittee found that in all but six states (Hawaii, Maine, New Hampshire, North Dakota, South Dakota, Vermont), people who speak Spanish at home are the largest group. Those who speak Chinese are the next-largest group nationwide, with large concentrations in California, New York, and Washington but located in every state. Although the penetration varies, each of the 39 languages included in Census 2000 is reported as being spoken in some homes within each state, with the following few exceptions: Gujarathi in Alaska; Navaho in Delaware and Vermont; Hmong in Delaware, District of Columbia, Idaho, Kentucky, Louisiana, New Hampshire, New Mexico, North Dakota, Vermont, West Virginia, and Wyoming; Mon-Khmer, Cambodian in Wisconsin and Wyoming; and Persian in Wyoming (U.S. Census Bureau, 2003j). Depending on an entity's collection approach, having 40 languages may prove unwieldy (go to the section below on collection considerations)
Neither the Census reporting list nor the HRET list captures all the top 10 languages in each state. For example, numerous individual languages are consolidated under such categories as "Other Native Northern American languages" or "African languages." Approximately 2.2 million people who speak a language other than English fall into these general categories. These categories fail to capture, for example, Yupik, an Alaska Native language, that is among Alaska's top 10 languages; Dakota, an American Indian language among the top ones encountered in North Dakota (Figure 4-3a); and Amharic, an African language, encountered in Minnesota (Figure 4-3b). In addition, it should be noted that within individually reported languages, such as Chinese, there are various languages/dialects, some of which are sufficiently different that they have been classified as separate languages by the Census Bureau (e.g., Mandarin and Cantonese).
The number of languages spoken in each state is clearly diverse, in some states more so than others. As seen in Figure 4-4, which is based on Census 2000 data, the number of languages reported to be spoken at home ranges from 56 in Wyoming to 207 in California (U.S. English Foundation, 2009c). Thus, data collection instruments must take into account the diversity of the population of the service area and the feasibility of collecting data in lengthy lists of categories. This administrative issue is discussed later in this chapter in the section on collection considerations.
The subcommittee concludes that mandating data collection using a single national list of a limited number of languages might be useful for national population-level tracking and planning. For most entities, however, it would be less useful than locally relevant lists for assessment and planning to meet the diverse language needs of individuals, health care entities, and jurisdictions across the United States.
Selection of a List Relevant to the Service Population
A variety of sources can be helpful for determining languages of interest in a service population. One approach is to survey the service recipients. For example, to assess which languages are most needed by their enrollees, managed care plans in California must survey their enrollees.13 Mailed survey responses alone, however, can skew results if the responses are not representative. An entity's previous experience with language services or the most common languages in Census data on the service area can provide guidance on which languages may be most commonly spoken at home and which language groups represent the greatest proportion of people with LEP. Census tract data provide one indirect check on the proportions of different language groups; they can also reveal the languages of potential patients an entity might wish to serve but for whom lack of language outreach has presented a barrier.
The Census publishes detailed tables on English-language proficiency by language category for 39 individual languages or groupings nationally and by state (U.S. Census Bureau, 2003a). For example, more than a million people in the United States speak French at home, but 75 percent of them speak English very well, resulting in 300,000 persons in this language category who are LEP by the subcommittee's definition. Other language groups may have a smaller portion who can speak English proficiently (e.g., 34 percent of those speaking Vietnamese at home and 43 percent of Russian speakers) (U.S. Census Bureau, 2003e). Moreover, the proportion of persons who speak English very well can differ from state to state for the same language—for example, in Alabama the proportions are 43 percent for Vietnamese speakers and 56 percent for Russian speakers, while in Iowa they are 26 and 53 percent, respectively, and in Washington State 30 and 38 percent, respectively (U.S. Census Bureau, 2003f, 2003g, 2003h). These data are readily available for all geographic areas; using the Census 2000 Summary File 3 and the American Community Survey Factfinder allows one to investigate the ability to speak English by Census block group and higher geographic summary levels, including zip code, Census tract, and county.
The Modern Language Association, using data from the American Community Survey of 2005, has an easy-to-use mapping function that shows state-, county-, and zip code-level data for 30 of the most common languages in the United States based on responses to the question of what language is spoken at home (Modern Language Association, 2009b). These data can be sorted by age group, change from 2000 to 2005, and ability to speak English. Additionally, an interactive list of the languages that appeared in the Census reports can help locate states in which any of the 377 languages are spoken at home and identify the level of English proficiency in those states (Modern Language Association, 2009a). The U.S. English Foundation has similarly sorted Census data on 322 languages by state, county, and selected cities (U.S. English Foundation, 2009a).
School-based data help identify emerging language populations in communities. Among LEP school-aged children, Spanish is the most common language in all states except Alaska (most common language Yup'ik), Hawaii (Ilocano), Maine (French), Montana (Blackfoot), North Dakota (Native American, unspecified), South Dakota (Lakota), and Vermont (Serbo-Croatian) (Kindler, 2002). What might be surprising is that more children needing language services in school are native rather than foreign born especially in the prekindergarten to fifth-grade age range (77 percent) as compared with the sixth- to twelfth-grade (56 percent) age range (Fix and Capps, 2005). The 2006 American Community Survey showed that there were 3 million children who spoke English less than very well (Kominski et al., 2008). The subcommittee concludes that there should be local flexibility in determining the language categories that are used for analysis, as long as the collection process captures language need for each individual so that entities can use the information for quality improvement purposes such as being able to provide language assistance services.
Recommendation 4-2: The choice of response categories for spoken and written language questions should be informed by analysis of relevant data on the service area (e.g., Census data) or service population, and any response list should include an option of "Other, please specify: ___" for persons whose language is not listed.
Thresholds for Collection of Spoken or Written Languages
The subcommittee considered whether there should be a percentage or numerical threshold requirement for establishing the minimum number of languages on which data should be collected by health care entities or states, given the flexibility recommended for use of locally relevant categories. Such thresholds have been set both for language assistance generally and translation of documents into specific languages. NQF has endorsed as a preferred practice to "translate all vital documents, at a minimum, into the identified threshold languages for the community that is eligible to be served," with the threshold set according to existing legislative requirements (NQF, 2009). It is outside the subcommittee's charge to make recommendations about specific interventions that may or may not follow from the collection of language data, so it is outside its charge to recommend any thresholds linked to those interventions (e.g., provide written language materials for every language present in a specific proportion of the population). Nonetheless, it is useful to review existing approaches to setting thresholds to determine whether any would serve as the basis for a recommendation on thresholds for specifying which language categories should be collected for health care quality improvement in general.
Thresholds for establishing the languages in which services and written materials must be made available often combine a percentage of 5 percent and a variable numerical cutoff point. For example, the California Health and Safety Code requires that general acute care hospitals in the state provide language assistance services 24 hours a day for language groups that make up 5 percent or more of the facility's geographic service area or actual patient population.14 The California Department of Mental Health defines a threshold language for written materials as "a language identified on the Medi-Cal Eligibility Data System (MEDS) as the primary language of 3,000 beneficiaries or five percent of the beneficiary population, whichever is lower, in an identified geographic area."15 Similarly, OCR's settlement of a Title VI case with the Hawaii Department of Human Services identified a threshold for translated documents of 5 percent or 1000 persons (whichever is less) who are "eligible to be served or likely to be directly affected or encountered by the department" (HHS, 2008). More recent legislative action (SB 853) in California requires the Department of Managed Health Care to ensure that health plans assess the number of persons needing language services and the languages that should be offered, and set standards for staff training, compliance monitoring, and translation of vital documents (CPEHN, 2008).16 Specific tiered thresholds, with different combinations of plan enrollees and percentages and numerical thresholds, are established for the translation of documents:
- "For health plans with a million or more enrollees: they must translate vital documents into the top two non-English languages, plus any language whose number of speakers in the plan is either 15,000 enrollees or greater, or totals 0.75% of the enrollee population.
- For plans with 300,000 to one million enrollees: vital documents must be translated into the top non-English language plus languages whose speakers are 6,000 enrollees or 1% of the enrollee population.
- For plans with less than 300,000 enrollees: vital documents must be translated into any language whose speakers total 3,000 enrollees or 5% of the enrollee population." (CPEHN, 2008)
In the Voting Rights Act, specific population thresholds are established to determine what constitutes a language-minority group and for whom documents must be translated (U.S. Census Bureau, 2002). The thresholds are defined as more than 10,000 persons, more than 5 percent of all voting-age citizens in a district, more than 5 percent of residents of an Indian reservation, or a locale where the illiteracy rate is higher than the national rate (U.S. Department of Justice, 2008).
Examination of the effect of using a percentage threshold to identify which languages should be included as data collection categories at the state level reveals that significant subgroups would be omitted. For example, 815,386 people aged 5 and over speak Chinese at home in California (2.6 percent of the state population) but this percentage is far higher than the national figure of 0.7 percent (U.S. Census Bureau, 2003j). Application of a 5 percent threshold statewide in California would identify only Spanish, even though that state, with 39 percent of those aged 5 and over speaking a language other than English at home, is one of the most linguistically diverse states in the nation (U.S. Census Bureau, 2003j) and has a large LEP population, estimated at 6.3 million (U.S. Census Bureau, 2003i). Even a 1 percent population threshold in that state would make only Spanish, Chinese, Vietnamese, and Tagalog threshold languages for data collection. A 1 percent threshold applied to other states would for the most part yield only Spanish as a language to monitor (U.S. Census Bureau, 2003j). When applied to smaller geographic areas with more concentrated LEP populations, however, such percentages would yield additional language groups, and thresholds might be found useful for states or health plans in establishing the number of languages required for reporting and/or translation of materials.
The size of the population served should influence any numerical threshold; the service populations for all of the different entities potentially affected by a recommendation of this subcommittee are too variable for a single threshold number of 1,000 or some other value to be applied. Therefore, and because available information on thresholds is set in the context of a specific intervention (provision of language assistance services or translation of documents), the subcommittee decided not to specify a threshold (e.g., number of persons or percent of population speaking a language) for determining which spoken or written languages should be used as response options or as categories in analysis by states or other entities for the purposes of health care quality improvement. The subcommittee believes that any numerical or percentage thresholds for purposes of requiring the delivery of services or the translation of documents would best be determined by appropriate regulatory, licensing, or accrediting bodies.
Considerations for Modes of Data Collection
While the goal is to identify the specific language needs of each individual to enable effective health care communications, having lists of 400 to 500 language categories is impractical for most data collection instruments, whether in paper or electronic form, unless electronic systems have more sophisticated software to reduce staff or patient time required to search for the correct category. Accordingly, many entities will have to construct lists of perhaps 10 to 20 language categories that will be manageable within the space constraints of their paper or electronic data collection formats. These lists should always have an option to collect languages not listed by including an "Other, please specify: ___" choice so that data on any language needed by an individual can be collected. Such an approach was employed in one study to identify the languages used among school-age children. A state survey of LEP students included 13 prespecified languages on the collection form, with the opportunity to list other languages; the responses ultimately yielded 460 languages (Kindler, 2002). For intake systems that do not allow for writing in an "other" response, more detailed lists will be required, as simply reporting a large "other" category with no specific language identifiers is not useful for understanding the language needs of individual patients.
An alternative to having a prespecified locally relevant list would be to include an open-response section on paper forms or computer input screens. Some find this approach desirable because a single free-response box takes up minimal space. For example, the California Healthy Family program uses an open-ended format that captures about 30 languages including American Sign Language.17 The main drawback is that it is generally more time-consuming to enter each response manually into a database and to decipher handwriting on paper forms and spelling variations whether paper forms or computer input screens. The Census Bureau has the ability to scan optically or key in individually the free-response answers on language use (Shin and Bruno, 2003), but this is likely too costly an approach for many entities. Kaiser Permanente's computerized registration pages incorporate keystroke recognition; as a clerk types in the first couple of letters, the computer responds with a short list of alternatives out of the 131 options in the full set of language options (Appendix G) (Tang, 2009). Contra Costa Health Plan uses a system in which typing the initial letter of a language brings up one of the most commonly encountered languages (top 15 languages), such that typing an "s," for example, would bring up Spanish; if the desired response is not in the first grouping, a second keystroke on "s" will bring up Samoan and other selections (Appendix H).
In sum, as a practical matter, most individual providers, plans, or states may want to have a limited list of language categories for collection based on the languages most common among their populations with LEP, taking into account as well as the space limitations of their paper forms or the capacity of their computer systems. Any prespecified list of response categories should also include the option of "Other, please specify: ____" to capture an individual's language need when it does not appear on the list. Entities using open-format questions must make sure that responses are specific enough to be useful in planning services and in conducting analyses—for example, a response that says Asian language will not be specific enough to identify a language.
Development of a Nationally Standardized List of Language Categories
Since effective patient-provider communication is central to patient-centered care and the overall quality of health care, knowing the language each individual needs to communicate effectively and to understand the care process is fundamental. The subcommittee sought to determine how many languages are in use in the United States to understand the scope of what might be encountered during a patient contact or visit. The subcommittee notes that any national list of languages ideally should have a common vocabulary of language names and unique codes for languages to facilitate data sharing. Every organization may not need to report language data to others, and thus may not need to participate in a uniform coding scheme or will be able to make a crosswalk from its own coding practices to a national standard set. Overall, however, comparability and interoperability will be enhanced by a coding system. The subcommittee has identified two major code sets for consideration: the Census Bureau and the International Organization for Standardization (ISO) 639 language code sets.
National Standard List of Spoken Language
As noted, the Census reports about 380 single languages, as well as several language groups (Scandinavian, American Indian, and African languages for general responses not captured by specific language names such as Norwegian or Navajo), with unique codes (Modern Language Association, 2009a; U.S. Census Bureau, 2007). The subcommittee prepared a draft template of spoken languages in use in the United States, based on Census categories, and formal and informal reports from hospitals, community health centers, language assistance services, individual hospitals, and health plans. This compilation resulted in more than 650 languages or composite groupings; however, a smaller number may be needed for effective communication in a health care context (i.e., the subcommittee identified 300 from its limited survey of health care entities). The resulting list of spoken languages (Appendix I) can serve as basis for finalizing a national standard list of languages.
What defines a unique language versus a dialect? Linguistic scholars and those who speak a language do not always agree on what defines a distinctly unique language. For ISO 639, classification takes into account "linguistic similarity, intelligibility, a common literature," and whether speakers of one language can understand the other. Even with this understanding, however, there may be other "well-established distinct ethnolinguistic identities [that] can be a strong indicator that they should nevertheless be considered to be different languages." Thus, the ISO language lists and particularly their coding focus on distinct languages with distinct codes, whereas the Census Bureau is more likely to give related languages the same code. The ISO codes represent both spoken and written language names; separate script codes apply to written languages, as well, to describe their lettering (SIL International, 2009c).
The names of numerous languages have multiple possible spellings, even between the Census Bureau and ISO 639 language lists, and patients may provide an alternative spelling as well. Languages might even be called slightly different names, such as Amish, Pennsylvania Dutch, or Pennsylvania German. This need not be a barrier to the list of choices developed locally as long as it is clear on a national standard list how to categorize the alternative spellings or names.
The subcommittee did not generate a list of written languages, but illustrates these needs with the experiences of Kaiser Permanente (Appendix G) and Contra Costa Health Plan (Appendix H). ISO 15924 has four-letter script codes that can be appended to language names to distinguish how a language is written (e.g., use of Cyrillic [Cyrl], or Arabic [Arab] (Unicode ISO, 2009). Braille has the script code of Brai.
Coding of Responses
This section reviews approaches to coding the languages included on the Census and ISO/Ethnologue lists. Ethnologue studies the world's living and ancient languages (living languages now number more than 6000) and updates the language lists every four years. The Census set includes about 380 three-digit numeric codes (e.g., Spanish 625, Russian 639, Thai 720) for the languages it tracks (U.S. Census Bureau, 2007). This set actually covers a greater number of languages, about 530, since as noted, the same code is used for multiple related languages; by comparison, the languages in this larger set have their own unique codes under the ISO 639-3 classification system. The Census codes underlie the extensive data available on language spoken at home and level of English proficiency among subgroups.
The ISO codes have evolved from a first-generation two-letter coding system (ISO 639-1), to a three-letter system to accommodate additional languages primarily for bibliographic uses (ISO 639-2), to a set that now incorporates more three-letter codes to cover 6,000 languages (ISO 639-3). The ISO 639-3 codes are intended "to provide a comprehensive set of identifiers for all languages for use in a wide range of applications, including linguistics, lexicography and internationalization of information systems." (Library of Congress, 2007; SIL International, 2009b).
In some instances, the distinction among languages in the ISO coding system may be of less practical concern, but in other cases distinct coding may be necessary. For instance, the difference among German, Swiss German, and Austrian German will not matter for most analyses and quality improvement initiatives; these three languages have an identical code under the Census Bureau system (607), but are coded deu, bar, and gsw, respectively, under ISO 639-3. On the other hand, there are even cases in which very different languages have the same name but very different meanings; for example, the Census codes Mende as 793,18 but one cannot know whether this is the Mende language of Sierra Leone (men) or of Papua New Guinea (sim) as distinguished by ISO 639-3. At the local level, practitioners are likely to figure out the difference, but if it is desirable to aggregate such detail across multiple sites for various analytic purposes or to plan interventions, the more discrete codes may be better. Sorting the Chinese languages is particularly challenging for the lay person.
Health Level 7 (HL7), a standards-setting organization for electronic health records, worked with Centers for Disease Control and Prevention (CDC) to develop the unique codes for use in the CDC/HL7 Race and Ethnicity Code Set 1.0 for ethnicities (CDC, 2000). HL7 has not yet adopted any codes for languages. In its incidental collection of information on languages, the subcommittee encountered more instances of use of the ISO coding scheme. For example, the Illinois Department of Human Services and Contra Costa Health Plan use the ISO 639-1 two-letter alphabet code. Others are using the three-letter coding for tracking language needs and determining resources required to address them (e.g., the courts of New Jersey to identify persons who need interpreters and to plan for service enhancement; Anthem Blue Cross survey of language needs).19
In conclusion, the subcommittee believes that there are advantages to both the Census Bureau and ISO coding schemes for languages. In the next chapter, the subcommittee indicates the need for HHS to consult with the Census Bureau, the registration authorities for the ISO codes, and others that establish unique coding for interoperability, such as HL7; the subcommittee itself does not endorse one coding scheme over another.
If the Census coding approach were to be adopted, the subcommittee notes that the Census list of languages and codes would likely need some additional changes to be useful. Because of how the language question is asked on the Census (Does this person speak a language other than English? [Figure 4-1]), yes (language other than English) and no (English only) are responses coded just 1 and 2, respectively; there is no unique three-digit code for English. Sign language, an important communication tool, is not a language response on the Census. By contrast, ISO-639 has unique codes for 130 types of sign languages (SIL International, 2009a), such as aed for Argentine Sign Language and ase for American Sign Language. As the Census Bureau does not have a specific code for sign language, it would code a response of American Sign Language as English for its purposes20—an approach that is less helpful in responding to a person's language needs in the health care environment. A separate category for noting which persons have speech loss has been useful for some entities to understand the communication needs of all patients. Further options for "declined," "unavailable," or "unknown" are also useful when data are being recorded to determine the portion of the service population from whom language data have been collected; the Census Bureau does not generally code for these options.
Recommendation 4-3: When any health care entity collects language data, the languages used as response options or categories for analysis should be selected from a national standard set of languages in use in the United States. The national standard set should include sign language(s) for spoken language and Braille for written language.
The subcommittee has reviewed the frequency of health provider interactions with people needing language assistance and the impact of limited English proficiency on access to care, health outcomes, and patient safety. An estimated 21.3 to 23.0 million people in the United States would meet the subcommittee's definition of LEP for health care purposes—self-assessment as speaking English less than very well. The subcommittee has established a hierarchy of questions to ask about the language variable, with the highest emphasis on establishing language need based on two questions—a person's rating of their English language proficiency and the preferred language needed for health care interactions.
The subcommittee's task extended to exploring what a national standard list of language categories might look like. A number of approaches to designating languages for collection were considered, including whether there should be uniform collection nationwide of a limited number of categories or locally relevant lists chosen by the individual data collection entity from a larger national list. A limited national list, whether of 10 languages or 40, would not be useful for every health care provider, state, or health plan. The subcommittee therefore favors the approach of allowing selection of locally relevant language categories from a national standard list, with a common category and coding framework. Local lists should provide an "Other, please specify: ___" option in case an individual does not find a needed language on a collection instrument with check-off boxes or even if that language is not yet on the national standard list of names. Such a language list might need to be updated from time to time to accommodate new immigrant groups, and health care providers might encounter new names before a formal Census or ISO review takes place. The subcommittee provides a draft template of spoken language names and of Census and ISO identifiers as a list that might be encountered in health care settings (Appendix I). In Chapter 6, the subcommittee discusses a process for adoption of the language list and an associated code set for data aggregation and exchange.