Appendix H: Information on Statistical Significance

Asthma Care Quality Improvement: A Resource Guide for State Action

Appendix H: Information on Statistical Significance

This section is provided for data analysts who wish to generate other statistics and/or perform statistical tests for other comparisons than those that are provided in the National Healthcare Quality Report (NHQR) and National Healthcare Disparities Report (NHDR).

Comparing State and Average Estimates Using P-Values

When comparing an individual State estimate to another estimate, such as the all-State average or the average for the top tier of States, every measure has error associated with it. The error is associated with sampling (size of the sample or sampling methods), accuracy of respondent recall and responses, data entry processes, and many other factors. When comparing estimates it is important to take this error (which can be estimated with statistical assumptions) into account.

A common statistic for comparing two rates to determine whether they differ is the t-test based on a normal distribution. The t-test can be compared to a normal distribution with a prespecified level of significance or acceptable error in conclusions about whether or not two statistics come from the same distribution or population. The p-value, a statistic for a normal distribution, can be calculated to determine whether two measures are likely from the same or from different distributions.

Statistical significance and magnitude of the difference should be considered together when comparing two estimates. The first check should be: Is the difference statistically different? The second check should be: Are the differences large enough to be meaningful for policy purposes? These questions are addressed below:

  • Is the difference statistically different? Are the p-values less than 0.05? If so, you can assume that the underlying distributions come from different populations or experiences. But there are some other considerations. The statistical test of differences is affected by the number of observations from which the measures were generated. For example, if the measures were generated from hundreds of thousands of records then summary measures (such as averages) have less variance and lower p-values, which imply "statistical significance" even when the magnitude of the differences might be tiny. Alternatively, when differences are large and the number of observations is few, the absence of statistical significance might simply mean that the data set does not have enough observations for a powerful test. This happens frequently with the Behavioral Risk Factor Surveillance System (BRFSS) measures because the annual sample sizes of the State surveys are small—from about 2,000 to 8,500 observations.
  • Are the differences large enough to be meaningful for policy purposes? Because of the relationship between the statistical test and the number of observations, some judgment must be used to assess the meaning of the differences between State estimates. Thus, in addition to statistical significance it is important to ask the second question: Is the State-to-benchmark difference large enough to warrant efforts to rectify it? A one or two percentage point difference in a measure may not be worth the effort to improve it. A 5 or 10 percentage point difference may mean that a substantial number of State residents are affected by poor health care quality in the State. These are judgments that local experts and stake holders who understand the environment of a State can help make.

Calculating P-Values

Calculating the p-value is straightforward when the standard errors (SEs) of the estimate are provided. For example, standard errors are provided for the national average and for individual States. Thus, the test for statistical significance between those two estimates is straightforward (and provided first). However, calculating another average (say, the top decile average) for which the standard error has not been provided is more complicated. In fact, the top decile comparisons in this work are evaluated for statistical significance because the population denominators were not readily available in time for publication of this Resource Guide. Nevertheless, the method for that calculation is presented below.

Calculating the P-value When the Relevant Standard Errors Are Provided

For an individual State estimate compared to the all-State average, the appropriate standard errors have been provided in the NHQR tables. To assess whether or not a State rate is statistically different from the average, calculate the p-value, as follows.

Two-sided t-test: Formula for two-sided t-test is written out. t = R1 (State rate) minus R2 (national rate) over SE1 (square of the standard error of the state rate) minus SE2 (square of the standard error of the national rate) and p equals 2 times Prob times Z greater than t.

Where:
R1 = a State rate
R2 = national rate
SE21 = square of the standard error of the State rate (or its variance)
SE22 = square of the standard error of the national rate (or its variance)

If the p value is smaller than 0.05, then a State can conclude, with 95 percent confidence, that the State rate is statistically different from the all-State average rate.

The p-value can be calculated using SAS or EXCEL with the following data elements and formula functions:
SAS: p = 2 * (1 - PROBNORM(ABS(t)));
EXCEL: p= 2*(1- NORMDIST(ABS(t),0,1,TRUE))

Calculating the p-value when the relevant standard errors are not provided

The fundamental equation of analysis of variance can be used to calculate p-values for other comparisons. For example, comparing a State rate to the average of the top three States would involve the following. The total sum of squares about the overall three-State mean is the sum of the within-State sum of squared deviations from the State mean and the between-State sum of squared deviations from the three-State pooled mean. The within-State sum of squares is obtained by squaring the State's standard error and multiplying by the sample size. The between-State sum of squares is obtained by summing the sample-weighted squared difference between the State average and the overall three-State average. The formula is below (note: x**2 = x squared and sqrt(x) = square root of x):

Let n1, n2, and n3 be the sample sizes for each State.
Let m1, m2, and m3 be the means for each State.
Let s1, s2, and s3 be the standard errors for each State.
N = n1 + n2 + n3, is the overall three-State sample size.
M = (n1*m1 + n2*m2 + n3*m3) / N, is the overall three-State mean.
SS = n1*(n1-1)*s1**2 + n2*(n2-1)*s2**2 + n3*(n3-1)*s3**2 + n1*(m1-M)**2 + n2*(m2-M)**2 + n3*(m3-M)**2
VAR = SS / (N-1)
SE = sqrt(VAR), which is the estimated standard error for the three-State mean.

Now suppose you have a mean m0 and standard error s0 from a State and you want to test whether m0 is significantly different from M. The test statistic is:

Z = (m0 - M) / sqrt(SE**2 + s0**2),

which can be compared to 1.96 to test the difference at the 5-percent significance level. Or alternatively the p-value can be calculated as in the previous section.

Return to Contents

Revised as of September 2009
AHRQ Publication No. 06(09)-0012


Internet Citation:

Asthma Care Quality Improvement: A Resource Guide for State Action. AHRQ Publication No. 06(09)-0012, September 2009. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/qual/asthmacare/


Current as of September 2009
Internet Citation: Appendix H: Information on Statistical Significance: Asthma Care Quality Improvement: A Resource Guide for State Action. September 2009. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/professionals/quality-patient-safety/quality-resources/tools/asthmaqual/asthmacare/appendix-h.html