Technical Supplement 13
This report assesses potential biases of statistics calculated from Release 4 of the Nationwide Inpatient Sample (NIS) of the
Healthcare Cost and Utilization Project (HCUP-3). Release 4 of the NIS includes hospital discharge data from a sample of
community hospitals for calendar year 1995. Statistics for discharge- and hospital-level characteristics of the NIS data are
compared with the National Hospital Discharge Survey (NHDS) data.
This report is part of the documentation for the Nationwide Inpatient Sample, Release 5, 1996, and can
be found on compact disk number 6 of the data product.
Summary
Most statistics calculated from the Nationwide Inpatient Sample (NIS) are consistent with those from the National Hospital Discharge Survey (NHDS), particularly those for region and patient
characteristics. Several differences exist between the NIS and NHDS discharge estimates when discharges are stratified by
hospital size. The sample of hospitals in the NIS was stratified on hospital size and weighted to the AHA universe to better
represent the universe of hospitals. The NIS estimates of average length of stay appear consistent with the NHDS. NIS
estimates of in-hospital mortality rates are higher than the NHDS estimates in all the regions except the Northeast.
Inconsistencies between the NIS estimates and estimates from the NHDS data may be caused by a number of factors.
Sample design may cause some differences. Some may be due to differences in coding schemes. In other cases,
differences may be attributed to slightly dissimilar populations.
Introduction
This report assesses potential biases of statistics calculated from the Nationwide Inpatient Sample (NIS), Release 4 of the
Healthcare Cost and Utilization Project (HCUP-3). The NIS, Release 4 includes hospital discharge data from a sample of
community hospitals for the calendar year 1995. Statistics for discharge- and hospital-level characteristics of the NIS data
are compared with the National Hospital Discharge Survey (NHDS) and the American Hospital Association Annual Survey
data.
The NIS, Release 4 was established to provide analyses of hospital utilization across the United States. For each calendar
year, the NIS universe of hospitals was established as all community hospitals located in the U.S. However, the NIS
sampling frame was constructed from the subset of universe hospitals that released their discharge data for research use.
Currently, the Agency for Health Care Policy and Research (AHCPR) has agreements with 22 data sources that maintain
statewide, all-payer discharge data files to include their data in the HCUP-3 database. However, only 19 of these States
could be included for this fourth release. These 19 States represent the addition of two States more than the second and third
releases, and eight States more than the first release, as shown by Table 1. The NIS, Release 4 is composed of all
discharges from a sample of hospitals from these frame States.
Table 1. States in the Frame for the NIS, Release 4
| Calendar Years |
States in the Frame |
| 1988 (Release 1) |
California, Colorado, Florida, Illinois, Iowa, Massachusetts, New Jersey, and Washington |
| 1989-92
(Release 1) |
Add Arizona, Pennsylvania, and Wisconsin
|
1993 (Release 2)
1994 (Release 3)
|
Add Connecticut, Kansas, Maryland,
New York, Oregon, and South Carolina |
| 1995 (Release 4) |
Add Missouri and Tennessee
|
Creation of the NIS was subject to certain restrictions:
- The Illinois Health Care Cost Containment Council stipulated that no more than 40 percent of Illinois discharge data could
be included in the database for any calendar quarter. Consequently, approximately 50 percent of the Illinois community
hospital universe was randomly selected for the frame each year.
- Hospitals in Missouri were allowed to withhold their data from the NIS. Thirty-five Missouri hospitals, from a State total of
119, chose not to participate in the NIS.
- South Carolina and Tennessee both imposed "small strata/cell restrictions," requiring the NIS to exclude hospitals, when
only one State hospital appears in a sampling strata. As a result, the NIS is not representative of South Carolina or
Tennessee hospitals.
To improve the generalizability of the NIS estimates, five hospital sampling strata were used:
1. Geographic Region—Midwest, Northeast, West, and South.
2. Ownership—government, investor-owned, and nonprofit nongovernment.
3. Location—urban and rural.
4. Teaching Status—teaching and non-teaching.
5. Bedside—small, medium, and large, specific to the hospital's location and teaching status as shown in Table 2.
Table 2. Bedside Categories
| Location and teaching status |
Bedside |
| Small |
Medium | Large |
| Rural | 1-49 | 50-99 | 100+ |
| Urban, nonteaching | 1-99 | 100-199 | 200+ |
| Urban, teaching | 1-299 | 300-499 | 500+ |
To further ensure geographic representativeness, hospitals were sorted by State and the first three digits of their ZIP code
prior to systematic sampling.
The NIS is a stratified probability sample of hospitals in the frame, with
sampling probabilities calculated to select 20 percent of the universe contained
in each stratum. The overall objective was to select a sample of hospitals "generalizable"
to the target universe, including hospitals outside the frame (which had a zero
probability of selection).
Sample weights were developed for the NIS to obtain national estimates of hospital and inpatient parameters. For example,
with these weights it should be possible to estimate DRG-specific average lengths of stay over all U.S. hospitals, using
weighted average lengths of stay based on averages or regression estimates from the NIS. Ideally, relationships among
outcomes and their correlates estimated from the NIS should generally hold across all U.S. hospitals. However, since only 19
States contributed data to this fourth release, some estimates may be biased. In this report, we compare estimates based
solely on the NIS against estimated quantities from other sources of data.
This report compares both discharge- and hospital-level statistics. Discharge statistics include discharge counts, inpatient
charges, in-hospital mortality, and average lengths of stay. Hospital statistics include items such as number of beds,
occupancy rates, and staffing levels.
This report is divided into four sections. The first section includes a discussion of the data sources used in the analysis. The
second section explains the methodology used to compare the NIS and NHDS. The third section includes a presentation of
the results: tables for this section are included at the end of the report. The final section offers some conclusions and
recommendations for analyses of the NIS, Release 4.
Data Sources
Benchmark statistics for 1995 from several data sources were compared. The NIS, Release 4, 1995 data were drawn from a
frame of 19 States and includes approximately 6.7 million discharges from 938 hospitals. NIS statistics were mainly
compared with those calculated from these two data sources:
- National Hospital Discharge Survey (NHDS), 1995. Conducted by the National Center for Health Statistics, the NHDS
includes about 260,000 discharges sampled from 400 hospitals. To be part of the NHDS, hospitals must have six or more
beds staffed for patient use. The NHDS covers discharges from short-stay U.S. hospitals (hospitals with an average length of
stay under 30 days), general-specialty (medical or surgical) hospitals, and children's hospitals. Federal, military, and
Veterans Administration hospitals are excluded from the survey. The NHDS sampling frame includes very few specialty
hospitals such as psychiatric, maternity, alcohol/chemical dependency, orthopedic, and head-injury hospitals.
Statistics calculated from the NHDS do have sampling error. However, the statistics are assumed to be unbiased because
the sampling frame is relatively unrestricted, encompassing all nonfederal, acute-care, general U.S. hospitals with six or more
beds.
- AHA Annual Survey of Hospitals, 1995. This hospital-level file contains one record for every hospital in the NIS universe,
making it a convenient source for calculating various statistics based on both the population of hospitals and the NIS sample
of hospitals. The file contains hospital-level statistics for hospital reporting periods, which do not necessarily correspond to
the calendar year.
Table 3 summarizes some of the key differences in hospitals and discharges represented by the NIS and NHDS data files.
Methods
Comparisons with NHDS
The following measures were chosen to compare the NIS and NHDS databases:
- Total number of discharges.
- Average length of stay (ALOS).
- In-hospital mortality rate.
These measures of utilization and outcomes were selected because they are typically used in health services research.
For each statistic, a test was performed to determine whether a difference was statistically significant between the NIS and
NHDS estimates. Since the NHDS estimate was based on a sample, two-sample t-tests were used, as described in the
Appendix. Differences were reported at the one and five percent significance levels.
To assess their reliability, the statistics listed above were compared within the following types of strata:
- Geographic regions (Midwest, Northeast, West, and South).
- Hospital characteristics (ownership, rural location, teaching status, and bedside).
- Patient characteristics (age, race, gender, and payer).
- Diagnosis groups (The principal diagnosis code for each discharge was assigned to a diagnosis group defined by the
Clinical Classifications for Health Policy Research (CCHPR) Version 2 algorithm [Elixhauser and McCarthy, 1996]).
- Procedure groups (The principal procedure code for each discharge was assigned to a procedure group defined by the
CCHPR, Version 2 algorithm [Elixhauser and McCarthy, 1996]).
Further, special analyses were conducted for hospitals in the South region, an area in which the NIS coverage is limited. In
the NIS, Release 1, the South region was represented by only Florida. The Second Release of the NIS added Maryland and
South Carolina. For Release 4 of the NIS, the South is represented by Florida, Maryland, South Carolina, and Tennessee.
All NIS statistics used sample weights and accounted for the sample design using the SUDAAN microcomputer statistical
software to calculate finite sample statistics and their variances. All NHDS statistics were calculated with Statistical Analysis
System (SAS) microcomputer software. For NHDS statistics, standard errors were calculated as described in the Appendix.
Results
Comparisons Between the NIS and the NHDS
Since the NIS and the NHDS represent different samples of the same universe of hospitals, some differences are expected,
and can be attributed to statistical "noise." Moreover, because of the large number of comparisons, some of the statistically
significant differences will not be real differences using 0.05 level of significance. While bias could be present in either
sample, the NHDS estimates are less likely to be biased because the hospital sampling frame is far less restricted than that
for the NIS. The following sections describe results of statistical comparisons by region, hospital characteristics, patient
characteristics, diagnosis, and procedure.
Comparisons by Region
Table 4 compares estimates of discharges, average lengths of stay, and in-hospital mortality generated from NIS and NHDS
data. Comparisons are presented by total and by region for 1995. The NIS and NHDS estimates of national and regional
discharges do not significantly differ. Overall, the NIS and NHDS produce similar estimates of average length of stay,
although the NIS estimate is significantly higher than the NHDS estimate for the Midwest (by 30 percent). NIS in-hospital
mortality rate estimates are also significantly higher in total (by 8 percent) for the Midwest and South (by 24 and 12 percent
respectively).
Comparisons by Hospital Characteristics
Table 5 compares estimates of discharges, average lengths of stay, and in-hospital mortality between the NIS and NHDS for
1995, by hospital ownership categories (private/investor-owned, private/nonprofit, and government/nonfederal) and bedside
categories (6-99, 100-199, 200-299, 300-499, and 500+).
Several of the estimates for hospital discharges differ significantly between the two sources. For government hospitals, the
NIS estimates 15 percent more discharges than the NHDS. For private hospitals, which represent the majority of the
discharges, there is no significant difference in total discharges for either nonprofit or investor-owned hospitals. Within the
ownership groups, significant differences are found for most bedside categories except for 200-299 bed hospitals. The NIS
estimates more discharges than the NHDS for five of the 10 significant differences, and fewer for the remaining five.
It should be noted that the total number of 1995 universe discharges in hospitals with over 500 beds is 6.6 million according
to the AHA file. Consequently, the NIS (with 7.0 million) may provide a better estimate of discharge counts for large hospitals
than the NHDS (with 3.9 million). These differences in estimated discharge counts may contribute to differences in outcome
statistics, reported in Table 5, between the two sources because the discharge counts are essentially sums of discharge
weights, which are used to calculate outcome statistics.
Totals for each ownership group show no significant differences in average length of stay (ALOS) or in-hospital mortality
estimates. In addition, there are few differences within the ownership groups between the two sources: we note here one
significant ALOS difference out of 15 comparisons. A significant ALOS difference between the NIS and NHDS for
government hospitals is found only for 100-199 bed hospitals (19 percent higher).
Estimates for in-hospital mortality tend to be higher for the NIS than for NHDS, although not significantly in most cases.
There are only four significant differences between the NIS and NHDS estimates although the NIS estimate is higher than the
NHDS estimate for 12 of the 15 strata. The NIS estimate is significantly higher than the NHDS estimate for investor-owned
hospitals with 100-199 beds (by 15 percent), and for nonprofit hospitals with fewer than 6-99 beds (31 percent) and between
100-199 beds (by 16 percent).
Comparisons by Patient Characteristics
Table 6 compares estimates of discharges, average lengths of stay, and in-hospital mortality between the NIS and NHDS for
1995—by primary payer, age group, gender, and race. The NIS contains uniform values for race, however, there is variation
in source data from the participating States. Specifically, in some States hospitals report "other" race for all non-white
patients, resulting in overreporting for this race category. Any analysis of NIS data by race categories is affected by this
variation. Except for mortality, the majority of estimates are not significantly different between the two data sources for these
strata.
Discharge estimates for Medicare, Medicaid, private insurance, all age groups, males, females, and three categories of race
(White, Black, and missing), show no significant differences between the NIS and NHDS. Significant differences however,
are found for the payer categories of self-pay, no charge, other, and missing. The NIS discharge estimates for self-pay
patients is 40 percent higher than the NHDS estimate. For no charge, other, and missing payer, the NIS estimates are lower
than the NHDS estimates. The NIS estimate for other race is higher than the NHDS estimate by 158 percent.
Average length of stay estimates from the two sources are not statistically different. Estimates of in-hospital mortality rates
from the NIS also tend to be higher than the NHDS estimates. Of the 17 strata, the NIS estimates are larger than the NHDS
estimates for 11 strata, although not all differences are statistically significant. The NIS estimates are significantly larger than
NHDS estimates for the payer category of other (36 percent); age groups 15-44 years, and 65+ years (17 and 4 percent);
males and females (6 and 9 percent); plus the white, and missing race categories (12 and 18 percent). The NIS estimate is
significantly smaller, by 16 and 24 percent respectively, than the NHDS estimate for the age group 0-15 years and other race
strata.
Comparisons for the South Region
Table 7 gives a detailed comparison for the South Region by hospital and patient characteristics. Of the 21 strata in Table 7,
significant differences are found between the NIS and NHDS estimates for discharges (8 out of 21) and in-hospital mortality
rates (6 out of 21). None of the comparisons for average lengths of stay are statistically different.
No significant differences in discharge estimates are found for any ownership, age group, or gender category. Four of the
five bedside categories, however, show significant differences between the NIS and NHDS estimates of discharges. The NIS
estimates are lower than the NHDS estimates for small and medium hospitals (6-99, 100-199, and 200-299 beds) by 9 to 28
percent. The NIS estimates for very large hospitals (500+ beds) are larger than the NHDS estimates by 53 percent. No
significant differences are found for the primary payer categories of Medicare, Medicaid, and private insurance, while the
categories of self-pay, no charge, other and missing do show significant differences. NIS discharge estimates are higher for
the self-pay category and lower for the no charge, other, and missing categories. These are similar to the discharge
estimates over all regions by payer as found in Table 6.
The average length of stay estimates from the NIS generally agree with the NHDS estimates for the South. The NIS in-hospital mortality estimates are higher than the NHDS estimates for nearly every hospital and patient category, including by
age group (17 of the 23 strata), although only six of the differences are significant. The higher NIS estimates may stem from
the large impact of Florida hospitals on the estimate for the South. Florida accounts for 52 percent of Southern discharges and
51 percent of Southern hospitals within the 1995 NIS data. Because many of the Southern States are not represented in the NIS,
discharges from Florida hospitals, and the characteristics of Florida's hospital and patient populations, may be amplified in
NIS estimates. Specifically, Florida has a large immigrant population with serious health problems and this may explain some
of the differences in mortality estimates.
Comparisons by Diagnosis Category
Table 8 compares the NIS and NHDS by the 25 most frequent primary diagnosis categories, ranked according to the NIS
estimates of number of discharges for each category. CCHPR code categories (version 2) are assigned based on the
primary (vs. principal or admitting) diagnosis. The NIS discharge estimates differ significantly from the NHDS estimates for
12 of the 25 CCHPR categories; NIS estimates are significantly higher for eight diagnosis categories and significantly lower
for four categories.
Some of the discrepancies found in the estimated number of discharges may be explained by differences in the assignment
of primary diagnosis for the NIS and NHDS databases. In building the NIS, there is no reordering of diagnoses. The first
diagnosis listed for each discharge was assigned as the primary diagnosis (although the State organizations that supply NIS
data may have assigned the principal diagnoses to the primary diagnosis position prior to supplying data for the NIS). The
NHDS reordered diagnoses under certain conditions.
For example, differences in the number of delivery-related discharges could be explained by the reordering of diagnosis
codes in the NHDS. For women discharged after a delivery, a code of V27 (Outcome of Delivery) from the supplemental
classification is entered as the second-listed code. A code designating normal or abnormal delivery is then listed in the first
position. This could explain differences in the number of discharges counted in the diagnosis group for normal pregnancy
and/or delivery (rank 8), trauma to the perineum and vulva (rank 6), fetal distress and abnormal forces of labor (rank 18),
other complications of birth affecting mother (rank 23), and other complications of pregnancy (rank 24).
As another example of diagnosis reordering in the NHDS, if the first-listed diagnosis was a symptom, it was reassigned as a
secondary diagnosis. This may have affected estimates for the 13th ranked diagnosis category, nonspecific chest pain.
Taking into account the differences in ordering of diagnoses reduces the number of significant differences in estimated
discharges between the two data sources from 12 to six of the 25 categories.
Comparisons of ALOS and in-hospital mortality rates by diagnosis category (also shown in Table 8) indicate few significant
differences between NIS and NHDS estimates. Significant differences are found for only one ALOS estimate (Normal
Pregnancy) and for no in-hospital mortality estimates. The in-hospital mortality rates yielded valid significance tests for only
19 categories. This is due to the fact that valid NHDS standard errors for in-hospital mortality could not be calculated for six
categories (go to Appendix for validity criteria).
Comparisons by Procedure Category
Table 9 lists the top 25 procedure categories, ranked according to the NIS estimates of number of discharges for each
category. Similar to the diagnosis groups, CCHPR codes are assigned based on the primary, or first-listed, procedure for
each discharge. The NIS discharge estimates differ significantly from the NHDS estimates for 9 of the 25 CCHPR
categories; NIS estimates are significantly higher for 7 procedure categories, and significantly lower for only 2 categories.
Procedures for which the NIS discharges were significantly higher than the NHDS estimates include the following: episiotomy,
diagnostic cardiac catheterization, upper GI, percutaneous coronary angioplasty, respiratory intubation, CT head scans, and
cancer chemotherapy. These differences may be explained by the estimated high number of discharges from large hospitals
in the NIS, which are more likely to perform high technology procedures (go to Table 5), compared to the number of large
hospitals in NHDS.
Comparisons of average length of stay and in-hospital mortality rate estimates by procedure category show few significant
differences between NIS and NHDS estimates. Three significant differences are found for ALOS, and three differences are
also found for in-hospital mortality. Significance tests were not performed for five in-hospital mortality rate estimates due to
the unavailability of valid standard errors for NHDS estimates (go to Appendix).
Comparison with AHA Data
Table 10 demonstrates that hospital weights associated with the NIS yield hospital counts consistent with AHA universe
counts for various categories of hospital types. This is expected because the sample of NIS hospitals was stratified on most
of these variables, and sample hospital weights were calculated within strata based on AHA data.
Table 11 compares the universe (AHA) and weighted frame (NIS) means and medians for selected hospital-level measures
defined in the 1995 AHA Annual Survey. In general, the frame hospital weighted averages and medians tend to be slightly
higher than the universe averages.
Discussion
In general, for many types of estimates, the NIS performs very well. Some differences emerge when the NIS is compared to
specific data sets. Sometimes, these variations are caused by differences in definitions (e.g., NIS and NHDS coding
schemes). In some cases, differences are due to certain shortcomings in the NIS.
Comparisons of Total Population Estimates
Based on comparisons between statistics calculated from the NIS and the NHDS, it appears that most statistics calculated
from the two data sources are similar. Overall, when compared with the NHDS, the NIS seems to estimate higher discharges
for certain types of hospitals (government hospitals and large hospitals) and higher in-hospital mortality rates. The higher
mortality estimates may be in part because the NIS tends to have higher estimates of discharges for "large" hospitals, and
these patients may represent a somewhat different severity of illness than those in other hospitals.
Estimates of LOS and mortality by diagnosis and procedure groups show few significant differences. However, several
estimates of discharges by diagnosis and procedure groups are significantly different. These differences of LOS and
mortality could be attributable to differences in data handling—the NIS takes all diagnosis and procedure codes as they are
recorded, while the NHDS has specific rules for what is considered a valid first-listed diagnosis.
Conclusion
In summary, the NIS estimates of ALOS appear to be unbiased in most contexts. The NIS estimates of discharge counts
differ under some conditions from the NHDS estimates but not in any consistent direction. The NIS estimates for in-hospital
mortality are higher than estimates from the NHDS for the Midwest and South. Based on comparisons with AHA data, NIS
hospitals tend, on average, to be larger than the universe of community hospitals. This higher percentage of weighted NIS
discharges coming from "large" hospitals—and the more complex case mix of those hospitals—may contribute to the higher
in-hospital mortality estimates when compared to the NHDS.
References
1. Gesler, Wilbert M. and Thomas C. Ricketts. Health in Rural North America. New Brunswick: Rutgers University Press,
1992.
2. Elixhauser, A. and McCarthy, E. Clinical Classifications for Health Policy Research, Version 2: Hospital Inpatient Statistics.
(AHCPR Publication No. 96-0017) Agency for Health Care Policy and Research, Healthcare Cost and Utilization Project
(HCUP-3) Research Note 1. February, 1996.
Internet Citation:
Comparative Analysis of HCUP and NHDS Inpatient Discharge Data. Technical Supplement 13, NIS Release 5. Agency for Health Care Policy and Research, Rockville, MD. http://www.ahrq.gov/data/hcup/niscomp.htm