Comparative Analysis of HCUP and NHDS Inpatient Discharge Data
This report assesses potential biases of statistics calculated from Release 4 of the Nationwide Inpatient Sample (NIS) of the Healthcare Cost and Utilization Project (HCUP-3). Release 4 of the NIS includes hospital discharge data from a sample of community hospitals for calendar year 1995. Statistics for discharge- and hospital-level characteristics of the NIS data are compared with the National Hospital Discharge Survey (NHDS) data.
This report is part of the documentation for the Nationwide Inpatient Sample, Release 5, 1996, and can be found on compact disk number 6 of the data product.
Most statistics calculated from the Nationwide Inpatient Sample (NIS) are consistent with those from the National Hospital Discharge Survey (NHDS), particularly those for region and patient characteristics. Several differences exist between the NIS and NHDS discharge estimates when discharges are stratified by hospital size. The sample of hospitals in the NIS was stratified on hospital size and weighted to the AHA universe to better represent the universe of hospitals. The NIS estimates of average length of stay appear consistent with the NHDS. NIS estimates of in-hospital mortality rates are higher than the NHDS estimates in all the regions except the Northeast.
Inconsistencies between the NIS estimates and estimates from the NHDS data may be caused by a number of factors. Sample design may cause some differences. Some may be due to differences in coding schemes. In other cases, differences may be attributed to slightly dissimilar populations.
This report assesses potential biases of statistics calculated from the Nationwide Inpatient Sample (NIS), Release 4 of the Healthcare Cost and Utilization Project (HCUP-3). The NIS, Release 4 includes hospital discharge data from a sample of community hospitals for the calendar year 1995. Statistics for discharge- and hospital-level characteristics of the NIS data are compared with the National Hospital Discharge Survey (NHDS) and the American Hospital Association Annual Survey data.
The NIS, Release 4 was established to provide analyses of hospital utilization across the United States. For each calendar year, the NIS universe of hospitals was established as all community hospitals located in the U.S. However, the NIS sampling frame was constructed from the subset of universe hospitals that released their discharge data for research use. Currently, the Agency for Health Care Policy and Research (AHCPR) has agreements with 22 data sources that maintain statewide, all-payer discharge data files to include their data in the HCUP-3 database. However, only 19 of these States could be included for this fourth release. These 19 States represent the addition of two States more than the second and third releases, and eight States more than the first release, as shown by Table 1. The NIS, Release 4 is composed of all discharges from a sample of hospitals from these frame States.
|Calendar Years||States in the Frame|
|1988 (Release 1)||California, Colorado, Florida, Illinois, Iowa,
Massachusetts, New Jersey, and Washington
|1989-92 (Release 1)||Add Arizona, Pennsylvania, and Wisconsin|
|1993 (Release 2)
1994 (Release 3)
|Add Connecticut, Kansas, Maryland,
New York, Oregon, and South Carolina
|1995 (Release 4)||Add Missouri and Tennessee|
Creation of the NIS was subject to certain restrictions:
- The Illinois Health Care Cost Containment Council stipulated that no more than 40 percent of Illinois discharge data could be included in the database for any calendar quarter. Consequently, approximately 50 percent of the Illinois community hospital universe was randomly selected for the frame each year.
- Hospitals in Missouri were allowed to withhold their data from the NIS. Thirty-five Missouri hospitals, from a State total of 119, chose not to participate in the NIS.
- South Carolina and Tennessee both imposed "small strata/cell restrictions," requiring the NIS to exclude hospitals, when only one State hospital appears in a sampling strata. As a result, the NIS is not representative of South Carolina or Tennessee hospitals.
To improve the generalizability of the NIS estimates, five hospital sampling strata were used:
1. Geographic Region—Midwest, Northeast, West, and South.
2. Ownership—government, investor-owned, and nonprofit nongovernment.
3. Location—urban and rural.
4. Teaching Status—teaching and non-teaching.
5. Bedside—small, medium, and large, specific to the hospital's location and teaching status as shown in Table 2.
|Location and teaching status||Bedside|
To further ensure geographic representativeness, hospitals were sorted by State and the first three digits of their ZIP code prior to systematic sampling.
The NIS is a stratified probability sample of hospitals in the frame, with sampling probabilities calculated to select 20 percent of the universe contained in each stratum. The overall objective was to select a sample of hospitals "generalizable" to the target universe, including hospitals outside the frame (which had a zero probability of selection).
Sample weights were developed for the NIS to obtain national estimates of hospital and inpatient parameters. For example, with these weights it should be possible to estimate DRG-specific average lengths of stay over all U.S. hospitals, using weighted average lengths of stay based on averages or regression estimates from the NIS. Ideally, relationships among outcomes and their correlates estimated from the NIS should generally hold across all U.S. hospitals. However, since only 19 States contributed data to this fourth release, some estimates may be biased. In this report, we compare estimates based solely on the NIS against estimated quantities from other sources of data.
This report compares both discharge- and hospital-level statistics. Discharge statistics include discharge counts, inpatient charges, in-hospital mortality, and average lengths of stay. Hospital statistics include items such as number of beds, occupancy rates, and staffing levels.
This report is divided into four sections. The first section includes a discussion of the data sources used in the analysis. The second section explains the methodology used to compare the NIS and NHDS. The third section includes a presentation of the results: tables for this section are included at the end of the report. The final section offers some conclusions and recommendations for analyses of the NIS, Release 4.
Benchmark statistics for 1995 from several data sources were compared. The NIS, Release 4, 1995 data were drawn from a frame of 19 States and includes approximately 6.7 million discharges from 938 hospitals. NIS statistics were mainly compared with those calculated from these two data sources:
- National Hospital Discharge Survey (NHDS), 1995. Conducted by the National Center for Health Statistics, the NHDS includes about 260,000 discharges sampled from 400 hospitals. To be part of the NHDS, hospitals must have six or more beds staffed for patient use. The NHDS covers discharges from short-stay U.S. hospitals (hospitals with an average length of stay under 30 days), general-specialty (medical or surgical) hospitals, and children's hospitals. Federal, military, and Veterans Administration hospitals are excluded from the survey. The NHDS sampling frame includes very few specialty hospitals such as psychiatric, maternity, alcohol/chemical dependency, orthopedic, and head-injury hospitals.
Statistics calculated from the NHDS do have sampling error. However, the statistics are assumed to be unbiased because the sampling frame is relatively unrestricted, encompassing all nonfederal, acute-care, general U.S. hospitals with six or more beds.
- AHA Annual Survey of Hospitals, 1995. This hospital-level file contains one record for every hospital in the NIS universe, making it a convenient source for calculating various statistics based on both the population of hospitals and the NIS sample of hospitals. The file contains hospital-level statistics for hospital reporting periods, which do not necessarily correspond to the calendar year.
Table 3 summarizes some of the key differences in hospitals and discharges represented by the NIS and NHDS data files.
Comparisons with NHDS
The following measures were chosen to compare the NIS and NHDS databases:
- Total number of discharges.
- Average length of stay (ALOS).
- In-hospital mortality rate.
These measures of utilization and outcomes were selected because they are typically used in health services research.
For each statistic, a test was performed to determine whether a difference was statistically significant between the NIS and NHDS estimates. Since the NHDS estimate was based on a sample, two-sample t-tests were used, as described in the Appendix. Differences were reported at the one and five percent significance levels.
To assess their reliability, the statistics listed above were compared within the following types of strata:
- Geographic regions (Midwest, Northeast, West, and South).
- Hospital characteristics (ownership, rural location, teaching status, and bedside).
- Patient characteristics (age, race, gender, and payer).
- Diagnosis groups (The principal diagnosis code for each discharge was assigned to a diagnosis group defined by the Clinical Classifications for Health Policy Research (CCHPR) Version 2 algorithm [Elixhauser and McCarthy, 1996]).
- Procedure groups (The principal procedure code for each discharge was assigned to a procedure group defined by the CCHPR, Version 2 algorithm [Elixhauser and McCarthy, 1996]).
Further, special analyses were conducted for hospitals in the South region, an area in which the NIS coverage is limited. In the NIS, Release 1, the South region was represented by only Florida. The Second Release of the NIS added Maryland and South Carolina. For Release 4 of the NIS, the South is represented by Florida, Maryland, South Carolina, and Tennessee.
All NIS statistics used sample weights and accounted for the sample design using the SUDAAN microcomputer statistical software to calculate finite sample statistics and their variances. All NHDS statistics were calculated with Statistical Analysis System (SAS) microcomputer software. For NHDS statistics, standard errors were calculated as described in the Appendix.
Comparisons Between the NIS and the NHDS
Since the NIS and the NHDS represent different samples of the same universe of hospitals, some differences are expected, and can be attributed to statistical "noise." Moreover, because of the large number of comparisons, some of the statistically significant differences will not be real differences using 0.05 level of significance. While bias could be present in either sample, the NHDS estimates are less likely to be biased because the hospital sampling frame is far less restricted than that for the NIS. The following sections describe results of statistical comparisons by region, hospital characteristics, patient characteristics, diagnosis, and procedure.
Table 4 compares estimates of discharges, average lengths of stay, and in-hospital mortality generated from NIS and NHDS data. Comparisons are presented by total and by region for 1995. The NIS and NHDS estimates of national and regional discharges do not significantly differ. Overall, the NIS and NHDS produce similar estimates of average length of stay, although the NIS estimate is significantly higher than the NHDS estimate for the Midwest (by 30 percent). NIS in-hospital mortality rate estimates are also significantly higher in total (by 8 percent) for the Midwest and South (by 24 and 12 percent respectively).
Table 5 compares estimates of discharges, average lengths of stay, and in-hospital mortality between the NIS and NHDS for 1995, by hospital ownership categories (private/investor-owned, private/nonprofit, and government/nonfederal) and bedside categories (6-99, 100-199, 200-299, 300-499, and 500+).
Several of the estimates for hospital discharges differ significantly between the two sources. For government hospitals, the NIS estimates 15 percent more discharges than the NHDS. For private hospitals, which represent the majority of the discharges, there is no significant difference in total discharges for either nonprofit or investor-owned hospitals. Within the ownership groups, significant differences are found for most bedside categories except for 200-299 bed hospitals. The NIS estimates more discharges than the NHDS for five of the 10 significant differences, and fewer for the remaining five.
It should be noted that the total number of 1995 universe discharges in hospitals with over 500 beds is 6.6 million according to the AHA file. Consequently, the NIS (with 7.0 million) may provide a better estimate of discharge counts for large hospitals than the NHDS (with 3.9 million). These differences in estimated discharge counts may contribute to differences in outcome statistics, reported in Table 5, between the two sources because the discharge counts are essentially sums of discharge weights, which are used to calculate outcome statistics.
Totals for each ownership group show no significant differences in average length of stay (ALOS) or in-hospital mortality estimates. In addition, there are few differences within the ownership groups between the two sources: we note here one significant ALOS difference out of 15 comparisons. A significant ALOS difference between the NIS and NHDS for government hospitals is found only for 100-199 bed hospitals (19 percent higher).
Estimates for in-hospital mortality tend to be higher for the NIS than for NHDS, although not significantly in most cases. There are only four significant differences between the NIS and NHDS estimates although the NIS estimate is higher than the NHDS estimate for 12 of the 15 strata. The NIS estimate is significantly higher than the NHDS estimate for investor-owned hospitals with 100-199 beds (by 15 percent), and for nonprofit hospitals with fewer than 6-99 beds (31 percent) and between 100-199 beds (by 16 percent).
Table 6 compares estimates of discharges, average lengths of stay, and in-hospital mortality between the NIS and NHDS for 1995—by primary payer, age group, gender, and race. The NIS contains uniform values for race, however, there is variation in source data from the participating States. Specifically, in some States hospitals report "other" race for all non-white patients, resulting in overreporting for this race category. Any analysis of NIS data by race categories is affected by this variation. Except for mortality, the majority of estimates are not significantly different between the two data sources for these strata.
Discharge estimates for Medicare, Medicaid, private insurance, all age groups, males, females, and three categories of race (White, Black, and missing), show no significant differences between the NIS and NHDS. Significant differences however, are found for the payer categories of self-pay, no charge, other, and missing. The NIS discharge estimates for self-pay patients is 40 percent higher than the NHDS estimate. For no charge, other, and missing payer, the NIS estimates are lower than the NHDS estimates. The NIS estimate for other race is higher than the NHDS estimate by 158 percent.
Average length of stay estimates from the two sources are not statistically different. Estimates of in-hospital mortality rates from the NIS also tend to be higher than the NHDS estimates. Of the 17 strata, the NIS estimates are larger than the NHDS estimates for 11 strata, although not all differences are statistically significant. The NIS estimates are significantly larger than NHDS estimates for the payer category of other (36 percent); age groups 15-44 years, and 65+ years (17 and 4 percent); males and females (6 and 9 percent); plus the white, and missing race categories (12 and 18 percent). The NIS estimate is significantly smaller, by 16 and 24 percent respectively, than the NHDS estimate for the age group 0-15 years and other race strata.
Table 7 gives a detailed comparison for the South Region by hospital and patient characteristics. Of the 21 strata in Table 7, significant differences are found between the NIS and NHDS estimates for discharges (8 out of 21) and in-hospital mortality rates (6 out of 21). None of the comparisons for average lengths of stay are statistically different.
No significant differences in discharge estimates are found for any ownership, age group, or gender category. Four of the five bedside categories, however, show significant differences between the NIS and NHDS estimates of discharges. The NIS estimates are lower than the NHDS estimates for small and medium hospitals (6-99, 100-199, and 200-299 beds) by 9 to 28 percent. The NIS estimates for very large hospitals (500+ beds) are larger than the NHDS estimates by 53 percent. No significant differences are found for the primary payer categories of Medicare, Medicaid, and private insurance, while the categories of self-pay, no charge, other and missing do show significant differences. NIS discharge estimates are higher for the self-pay category and lower for the no charge, other, and missing categories. These are similar to the discharge estimates over all regions by payer as found in Table 6.
The average length of stay estimates from the NIS generally agree with the NHDS estimates for the South. The NIS in-hospital mortality estimates are higher than the NHDS estimates for nearly every hospital and patient category, including by age group (17 of the 23 strata), although only six of the differences are significant. The higher NIS estimates may stem from the large impact of Florida hospitals on the estimate for the South. Florida accounts for 52 percent of Southern discharges and 51 percent of Southern hospitals within the 1995 NIS data. Because many of the Southern States are not represented in the NIS, discharges from Florida hospitals, and the characteristics of Florida's hospital and patient populations, may be amplified in NIS estimates. Specifically, Florida has a large immigrant population with serious health problems and this may explain some of the differences in mortality estimates.
Table 8 compares the NIS and NHDS by the 25 most frequent primary diagnosis categories, ranked according to the NIS estimates of number of discharges for each category. CCHPR code categories (version 2) are assigned based on the primary (vs. principal or admitting) diagnosis. The NIS discharge estimates differ significantly from the NHDS estimates for 12 of the 25 CCHPR categories; NIS estimates are significantly higher for eight diagnosis categories and significantly lower for four categories.
Some of the discrepancies found in the estimated number of discharges may be explained by differences in the assignment of primary diagnosis for the NIS and NHDS databases. In building the NIS, there is no reordering of diagnoses. The first diagnosis listed for each discharge was assigned as the primary diagnosis (although the State organizations that supply NIS data may have assigned the principal diagnoses to the primary diagnosis position prior to supplying data for the NIS). The NHDS reordered diagnoses under certain conditions.
For example, differences in the number of delivery-related discharges could be explained by the reordering of diagnosis codes in the NHDS. For women discharged after a delivery, a code of V27 (Outcome of Delivery) from the supplemental classification is entered as the second-listed code. A code designating normal or abnormal delivery is then listed in the first position. This could explain differences in the number of discharges counted in the diagnosis group for normal pregnancy and/or delivery (rank 8), trauma to the perineum and vulva (rank 6), fetal distress and abnormal forces of labor (rank 18), other complications of birth affecting mother (rank 23), and other complications of pregnancy (rank 24).
As another example of diagnosis reordering in the NHDS, if the first-listed diagnosis was a symptom, it was reassigned as a secondary diagnosis. This may have affected estimates for the 13th ranked diagnosis category, nonspecific chest pain. Taking into account the differences in ordering of diagnoses reduces the number of significant differences in estimated discharges between the two data sources from 12 to six of the 25 categories.
Comparisons of ALOS and in-hospital mortality rates by diagnosis category (also shown in Table 8) indicate few significant differences between NIS and NHDS estimates. Significant differences are found for only one ALOS estimate (Normal Pregnancy) and for no in-hospital mortality estimates. The in-hospital mortality rates yielded valid significance tests for only 19 categories. This is due to the fact that valid NHDS standard errors for in-hospital mortality could not be calculated for six categories (go to Appendix for validity criteria).
Table 9 lists the top 25 procedure categories, ranked according to the NIS estimates of number of discharges for each category. Similar to the diagnosis groups, CCHPR codes are assigned based on the primary, or first-listed, procedure for each discharge. The NIS discharge estimates differ significantly from the NHDS estimates for 9 of the 25 CCHPR categories; NIS estimates are significantly higher for 7 procedure categories, and significantly lower for only 2 categories.
Procedures for which the NIS discharges were significantly higher than the NHDS estimates include the following: episiotomy, diagnostic cardiac catheterization, upper GI, percutaneous coronary angioplasty, respiratory intubation, CT head scans, and cancer chemotherapy. These differences may be explained by the estimated high number of discharges from large hospitals in the NIS, which are more likely to perform high technology procedures (go to Table 5), compared to the number of large hospitals in NHDS.
Comparisons of average length of stay and in-hospital mortality rate estimates by procedure category show few significant differences between NIS and NHDS estimates. Three significant differences are found for ALOS, and three differences are also found for in-hospital mortality. Significance tests were not performed for five in-hospital mortality rate estimates due to the unavailability of valid standard errors for NHDS estimates (go to Appendix).
Comparison with AHA Data
Table 10 demonstrates that hospital weights associated with the NIS yield hospital counts consistent with AHA universe counts for various categories of hospital types. This is expected because the sample of NIS hospitals was stratified on most of these variables, and sample hospital weights were calculated within strata based on AHA data.
Table 11 compares the universe (AHA) and weighted frame (NIS) means and medians for selected hospital-level measures defined in the 1995 AHA Annual Survey. In general, the frame hospital weighted averages and medians tend to be slightly higher than the universe averages.
In general, for many types of estimates, the NIS performs very well. Some differences emerge when the NIS is compared to specific data sets. Sometimes, these variations are caused by differences in definitions (e.g., NIS and NHDS coding schemes). In some cases, differences are due to certain shortcomings in the NIS.
Comparisons of Total Population Estimates
Based on comparisons between statistics calculated from the NIS and the NHDS, it appears that most statistics calculated from the two data sources are similar. Overall, when compared with the NHDS, the NIS seems to estimate higher discharges for certain types of hospitals (government hospitals and large hospitals) and higher in-hospital mortality rates. The higher mortality estimates may be in part because the NIS tends to have higher estimates of discharges for "large" hospitals, and these patients may represent a somewhat different severity of illness than those in other hospitals.
Estimates of LOS and mortality by diagnosis and procedure groups show few significant differences. However, several estimates of discharges by diagnosis and procedure groups are significantly different. These differences of LOS and mortality could be attributable to differences in data handling—the NIS takes all diagnosis and procedure codes as they are recorded, while the NHDS has specific rules for what is considered a valid first-listed diagnosis.
In summary, the NIS estimates of ALOS appear to be unbiased in most contexts. The NIS estimates of discharge counts differ under some conditions from the NHDS estimates but not in any consistent direction. The NIS estimates for in-hospital mortality are higher than estimates from the NHDS for the Midwest and South. Based on comparisons with AHA data, NIS hospitals tend, on average, to be larger than the universe of community hospitals. This higher percentage of weighted NIS discharges coming from "large" hospitals—and the more complex case mix of those hospitals—may contribute to the higher in-hospital mortality estimates when compared to the NHDS.
2. Elixhauser, A. and McCarthy, E. Clinical Classifications for Health Policy Research, Version 2: Hospital Inpatient Statistics. (AHCPR Publication No. 96-0017) Agency for Health Care Policy and Research, Healthcare Cost and Utilization Project (HCUP-3) Research Note 1. February, 1996.