# Chapter 12. Description of Ideal Evaluation Methods: Quantitative Approaches to Context Heterogeneity

## Assessing the Evidence for Context-Sensitive Effectiveness and Safety

**Special contribution from Naihua Duan, Ph.D., Columbia University, New York, NY**

### Introduction

Context often moderates intervention effectiveness; i.e., the effectiveness of an intervention might vary from site to site, depending on the contextual factors present at each site.^{1,2} This phenomenon is what we have termed context heterogeneity. This moderation effect is usually formulated statistically through the "intervention context" interaction:

(1a) Yi = b_{0} + b_{1} Ti + b_{2} Ci + b_{12} Ti Ci + *epsilon*i,

where i denotes the unit of analysis (usually the various sites in the study, but it can also be dyads of sites in matched comparisons), Yi denotes the outcome measure, Ti denotes the intervention status (Ti=1 for intervention, Ti=0 for control), Ci denotes the contextual factor, Ti Ci denotes the "intervention context" interaction, *epsilon*i denotes random error, b_{0} denotes the intercept for the model, b_{1} denotes the main effect for the intervention, b_{2} denotes the main effect for the contextual factor, and b_{12} denotes the moderation effect for the contextual factor, i.e., the influence of the contextual factor on intervention effectiveness.

As an example, consider a dichotomous contextual factor, say, C=1 denotes a teaching hospital and C=0 denotes a non-teaching hospital. According to model (1a), the intervention effect for a non-teaching hospital is given by b_{1}, while the intervention effect for a teaching hospital is given by b_{1} + b_{12}. If the moderation effect is absent (b_{12}=0), the intervention effect does not vary between teaching and non-teaching hospitals. If the moderation effect is present (b_{12}≠0), the intervention effect does vary between teaching and non-teaching hospitals, the difference being the moderation effect b_{12}.

Model (1a) presents the "intervention context" interaction" for a single contextual factor. The model can be generalized in a straightforward manner to accommodate multiple contextual factors:

(1b) Yi = b_{0} + b_{1} Ti + b_{2} C_{2i} + b_{3} x C_{3i} + ... + b_{k} x C_{ki}

+ b_{12} Ti C_{2i} + b_{13} Ti C_{3i} + ... + b_{1k} Ti C_{ki} + *epsilon*i,

where C_{2i}, C_{3i},..., C_{ki} denote multiple contextual factors, and b_{12}, b_{13},..., b_{1k} denote the respective moderation effects.

The assessment of the moderation effect depends on the methodology used to assess the intervention effect. We discuss here eight scenarios.

### Pre-Post Comparisons

One option that could be used to assess the intervention effect is to compare the outcome measures pre- and post-intervention, without concurrent control sites. Under the assumption that the outcome measure is stable over time if no intervention were provided, any change observed would be attributed to the effect of the intervention. In particular, Model (1a) takes the following form:

(2a) Y_{0i} = b_{0} + b_{2} C_{i} + *epsilon*_{0i},

(2b) Y_{1i} = b_{0} + b_{1} + b_{2} C_{i} + b_{12} C_{i} + *epsilon*_{1i},

(2c) Y_{1i} - Y_{0i} = b_{1} + b_{12} C_{i} + (*epsilon*_{1i} - *epsilon*_{0i}),

where the subscript _{i} denotes the i-th site in the study, Y_{0i} denotes the pre-intervention outcome measure at the i-th site, Y_{1i} denotes the post-intervention outcome measure at the i-th site; C_{i} denotes the contextual factor at the i-th site; *epsilon*_{0i} and *epsilon*_{1i} denote the respective error terms for the pre- and post-intervention outcome measures. Compared to Model (1a), the terms b_{1} T and b_{12} T C are absent in submodel (2a) because the intervention status T assumes the value T=0 under the control condition. Similarly, in submodel (2b) compared to Model (1a), the intervention status T assumes the value T=1 under the intervention condition, therefore the terms b_{1} T and b_{12} T C are given as b_{1} and b_{12} C_{i}. Submodel (2c) compares submodels (2a) and (2b): the term Y_{1i} - Y_{0i} denotes the pre-post change, which measures the intervention effect at the i-th site.

The moderation effect (b_{12} in Model (2c)) can be assessed by regressing the intervention effect at the i-th site, Y_{1i} - Y_{0i}, on the contextual factor C_{i} in model (2c). For continuous contextual factors, this regression analysis estimates the rate of change for the intervention effect at the i-th site, Y_{1i} - Y_{0i}, when the contextual factor C_{i} changes by one unit. For dichotomous contextual factors, this regression analysis simplified to a two-sample comparison, comparing the average of the intervention effect among sites with the contextual factor C_{i}=1 (such as teaching hospitals) versus the average of the intervention effect among sites with the contextual factor C_{i}=0 (such as non-teaching hospitals).

The validity of pre-post comparisons depends on the validity of the assumption that the outcome measure is stable over time. If this assumption is questionable, e.g., if there is a possibility of a secular trend in the outcome measures, the validity of pre-post comparisons is questionable both for the assessment of intervention effect per se, and for the assessment of the effect of moderation for the contextual factors.

### Longitudinal Comparisons

An important extension of pre-post comparisons is longitudinal comparisons of repeated measurements of the outcome measures over time, without concurrent control sites. Under the assumption that the outcome measure is stable over time if no intervention were provided, any change over time that is observed would be attributed to the effect of the intervention. In particular, Model (1a) takes the following form:

(3a) Y_{0i} = b_{0} + b_{2} C_{i} + *epsilon*_{0i},

(3b) Y_{ti} = b_{0} + b_{1} t + b_{2} C_{i} + b_{12} C_{i} t + *epsilon*_{ti},

(3c) R_{i} = b_{1} + b_{12} C_{i} + *delta*_{i},

where the subscript _{i} denotes the i-th site in the study, Y_{0i} denotes the pre-intervention outcome measure at the i-th site, Y_{ti} denotes the outcome measure at time t for the i-th site; R_{i} denotes the rate of change for the outcome measure for the i-th site; C_{i} denotes the contextual factor at the i-th site; *epsilon*_{0i} and *epsilon*_{ti} denote the respective error terms for the outcome measures; *delta*_{i} denotes the error term for the rate of change. We assume here that the trajectory of the outcome measure is linear over time, therefore the influence of time on the outcome measure in Model (3b) can be expressed as linear functions in time, t. Furthermore, the linearity assumption allows us to summarize the trajectory using the rate of change, R, in Model (3c): the rate of change for the i-th site, R_{i}, can be estimated by regressing the outcome measures, Y_{ti}'s, on time, t, within the i-th site. It is of course possible to extend the model beyond linear trajectories and allow non-linear trajectories.

In Model (3c), b_{1} measures the intervention effect for sites with null values for the contextual factor, such as non-teaching hospitals; for these sites, the outcome measures improve at the rate of b_{1} per unit time. For sites with C=1, say, teaching hospitals, the intervention effect is b_{1} + b_{12} - the term b_{12} measures the moderation effect for the contextual factor C.

The moderation effect (b_{12} in Model (3c)) can be assessed by regressing the rate of change, R_{i}, on the contextual factor C_{i} in model (3c).

The validity of longitudinal comparisons depends on the validity of the assumption that the outcome measure is stable over time. If this assumption is questionable, e.g., if there is a possibility of a secular trend in the outcome measures, the validity of longitudinal comparisons is questionable both for the assessment of intervention effect per se, and for the assessment of the effect of moderation for the contextual factors.

### Matched Comparisons for Post-intervention Outcome Measures

Another strategy that could be used to assess the intervention effect is to include concurrent control sites matched individually to the intervention sites and compare the post-intervention outcome measures across sites.^{a} We assume that the sites are matched on the contextual factor. Under the assumption that the matched sites differ only in the intervention status, the intervention effect can be assessed by the difference in the post-intervention outcome measures for each dyad of matched sites. In particular, Model (1a) takes the following form under this approach:

(4a) Y_{0i} = b_{0} + b_{2} C_{i} + *epsilon*_{0i},

(4b) Y_{1i} = b_{0} + b_{1} + b_{2} C_{i} + b_{12} C_{i} + *epsilon*_{1i},

(4c) Y_{1i} - Y_{0i} = b_{1} + b_{12} C_{i} + (*epsilon*_{1i} - *epsilon*_{0i}),

where the subscript _{i} denotes the i-th dyad of matched sites in the study, Y_{0i} denotes the post-intervention outcome measure at the control site in the i-th dyad, Y_{1i} denotes the post-intervention outcome measure at the intervention site in the i-th dyad; C_{i} denotes the contextual factor for both sites in the i-th dyads; *epsilon*_{0i} and *epsilon*_{1i} denote the respective error terms for the post-intervention outcome measures. Compared to Model (1a), the terms b_{1} T and b_{12} T C are absent in submodel (4a) because the intervention status T assumes the value T=0 for the control site. Similarly, in submodel (4b) compared to Model (1a), the intervention status T assumes the value T=1 for the intervention site, therefore the terms b_{1} T and b_{12} T C are given as b_{1} and b_{12} C_{i}. Submodel (4c) compares submodels (4a) and (4b): the term Y_{1i} - Y_{0i} denotes the difference between the intervention and control sites in the i-th dyad, which measures the intervention effect in the i-th dyad.

The moderation effect (b_{12} in Model (4c)) can be assessed by regressing the intervention effect in the i-th dyad, Y_{1i} - Y_{0i}, on the contextual factor C_{i} in model (4c). For continuous contextual factors, this regression analysis estimates the rate of change for the intervention effect in the i-th dyad, Y_{1i} - Y_{0i}, when the contextual factor C_{i} changes by one unit. For dichotomous contextual factors, this regression analysis simplified to a two sample comparison, comparing the average of the intervention effect among dyads with the contextual factor C_{i}=1 (such as teaching hospitals) versus the average of the intervention effect among dyads with the contextual factor C_{i}=0 (such as non-teaching hospitals).

The validity of matched comparisons depends on the validity of the assumption that the matched sites differ only in the intervention status. If this assumption is questionable, e.g., if there are important prognostic factors that differ between matched sites in the same dyad, the validity of matched comparisons is questionable, both for the assessment of intervention effect per se, and for the assessment of the effect of moderation for the contextual factors. If the unmatched prognostic factors are observed, it is possible to adjust for them using analysis of covariance (ANCOVA) models for post-intervention outcomes, or propensity scores analyses, to be discussed in Section F below.

### Matched Comparisons for Pre-Post-Intervention Changes in Outcome Measures

Another strategy that can be used to assess the intervention effect is to combine strategies (A) and (B) and obtain both pre- and post-intervention outcomes measures for both intervention and control sites. The pre-post changes in outcome measures are compared across intervention and control sites to assess intervention effects. By combining pre-post and matched site comparisons, this approach can be applied under weaker assumptions than the assumptions required for either strategy discussed in Sections A and B. In particular, this combined approach no longer requires the rather strong assumption in Section A that there is no secular trend. Instead, this approach only requires that any secular trend that might be present be the same between intervention and control sites in the same dyad. This assumption is also weaker than the rather strong assumption in Section B that the matched sites in the same dyad differ only in the intervention status. Instead, this approach allows the sites to differ in their pre-intervention status as long as these differences do not affect the pre-post change. In particular, Model (1a) takes the following form under this approach:

(5a) D_{0i} = b_{0} + b_{2} C_{i} + *epsilon*_{0i},

(5b) D_{1i} = b_{0} + b_{1} + b_{2} C_{i} + b_{12} C_{i} + *epsilon*_{1i},

(5c) D_{1i} - D_{0i} = b_{1} + b_{12} C_{i} + (*epsilon*_{1i} - *epsilon*_{0i}),

where the subscript _{i} denotes the i-th dyad of matched sites in the study, C_{i} denotes the contextual factor for both sites in the i-th dyads. In submodel (5a), D_{0i} denotes the pre-post change in the outcome measure at the control site in the i-th dyad, which measures the secular trend in the i-th dyad. Here we allow the secular trend to depend on the contextual factor. In submodel (5b), D_{1i} denotes the pre-post change in the outcome measure at the intervention site in the i-th dyad. Submodel (5c) compares submodels (5a) and (5b): the term D_{1i} - D_{0i} denotes the difference in the pre-post change between the intervention and control sites in the i-th dyad, which measures the intervention effect in the i-th dyad.

The moderation effect (b_{12} in Model (5c)) can be assessed by regressing the intervention effect in the i-th dyad, D_{1i} - D_{0i}, on the contextual factor C_{i} in model (5c). For continuous contextual factors, this regression analysis estimates the rate of change for the intervention effect in the i-th dyad, D_{1i} - D_{0i}, when the contextual factor C_{i} changes by one unit. For dichotomous contextual factors, this regression analysis simplified to a two-sample comparison, comparing the average of the intervention effect among dyads with the contextual factor C_{i}=1 (such as teaching hospitals) versus the average of the intervention effect among dyads with the contextual factor C_{i}=0 (such as non-teaching hospitals).

The validity of matched comparisons depends on the validity of the assumption that any secular trend that might be present be the same between intervention and control sites in the same dyad. If this assumption is questionable, the validity of matched comparisons is questionable both for the assessment of intervention effect per se, and for the assessment of the effect of moderation for the contextual factors. If the unmatched prognostic factors associated with the secular trend are observed, it is possible to adjust for them using analysis of covariance (ANCOVA) models for pre-post changes, or propensity scores analyses, to be discussed below.

### Matched Comparisons for Longitudinal Rates of Change in Outcome Measures

A combined strategy similar to matched comparison of pre-post change, discussed in Section D above, is to combine strategies (A) and (C) and assess longitudinally the rate of change for outcomes measures for both intervention and control sites. The rate of change is compared across intervention and control sites to assess intervention effects. By combining longitudinal and matched sites comparisons, this approach can be applied under weaker assumptions than the assumptions required for either option discussed in Sections A and C. In particular, Model (1a) takes the following form under this approach:

(6a) R_{0i} = b_{0} + b_{2} C_{i} + *delta*_{0i},

(6b) R_{1i} = b_{0} + b_{1} + b_{2} C_{i} + b_{12} C_{i} + *delta*_{1i},

(6c) R_{1i} - R_{0i} = b_{1} + b_{12} C_{i} + (*delta*_{1i} - *delta*_{0i}),

where the subscript _{i} denotes the i-th dyad of matched sites in the study, C_{i} denotes the contextual factor for both sites in the i-th dyads. In submodel (6a), R_{0i} denotes the longitudinal rate of change in the outcome measure at the control site in the i-th dyad, which measures the secular trend in the i-th dyad. Here we allow the secular trend to depend on the contextual factor. In submodel (5b), R_{1i} denotes the longitudinal change in the outcome measure at the intervention site in the i-th dyad. Submodel (6c) compares submodels (6a) and (6b): the term R_{1i} - R_{0i} denotes the difference in the longitudinal rate of change between the intervention and control sites in the i-th dyad, which measures the intervention effect in the i-th dyad.

The moderation effect (b_{12} in Model (6c)) can be assessed by regressing the intervention effect in the i-th dyad, R_{1i} - R_{0i}, on the contextual factor C_{i} in model (6c).

The validity of matched comparisons depends on the validity of the assumption that any secular trend that might be present be the same between intervention and control sites in the same dyad. If this assumption is questionable, the validity of matched comparisons is questionable both for the assessment of intervention effect per se, and for the assessment of the effect of moderation for the contextual factors. If the unmatched prognostic factors associated with secular trend are observed, it is possible to adjust for them using analysis of covariance (ANCOVA) models for longitudinal rate of change, or propensity scores analyses, discussed below.

### Adjusted Comparisons for Post-intervention Outcome Measures

Matched comparisons discussed in Sections (B)-(E) above assume that the intervention and control sites can be matched to the degree required under each strategy. In practical applications, this usually is not a realistic assumption. Therefore, adjustment for covariates is usually important, both for studies in which matching is attempted and for studies in which matching is not attempted. The adjustments can made either using the analysis of covariance (ANCOVA) model, or the propensity scores analysis.^{3-6}

With ANCOVA, Model (1a) takes the following form:

(7a) Y_{i} = b_{0} + b_{1} T_{i} + b_{2} C_{i} + b_{3} W_{i} + b_{12} T_{i} C_{i} + *epsilon*_{i},

where the subscript _{i} denotes the i-th site in the study, Y_{i} denotes the post-intervention outcome measure for the i-th site, T_{i} denotes the intervention condition for the i-th site (T=1 if intervention, T=0 if control), C_{i} denotes the contextual factor for the i-th site, W_{i} denotes the covariates for the i-th site, and *epsilon*_{i} denotes the error term. The inclusion of the term b_{3} W_{i} adjusts for the imbalance in the covariates, W, that might be present between the intervention vs. control sites.

The coefficient b_{1} denotes the intervention effect for sites with null values of the contextual factor (C=0), such as non-teaching hospitals; the intervention effect for sites with the value off the contextual factor C=1, such as teaching hospitals, is given by b_{1} + b_{12}. The coefficient b12 denotes the moderation effect, e.g., how the intervention effect differs between teaching hospitals and non-teaching hospitals.

The moderation effect (b_{12} in Model (7)) can be assessed (along with the other coefficients in the model) by regressing the post-intervention outcome measure, Y_{i}, on the intervention status T_{i}, the contextual factor C_{i}, the covariates, W_{i}, and the interaction term, T_{i} C_{i}, in model (7).

With propensity score analysis, we first model the propensity for the i-th site to be an intervention site:

(7b) *Pi* _{i} = logit(P(T_{i}=1)) = g0 + g1 C_{i} + g2 W_{i}.

The propensity model (7) is usually specified and fitted as a logistic regression of intervention status (T) on the contextual factor (C) and covariates (W). The fitted model is then applied to all sites in the study to derive the propensity score, *Pi* , for each site to be an intervention site. The propensity scores can then be used in several ways to adjust for the imbalance between the intervention and control sites in the sample. One option that is particularly suitable for the assessment of the moderation effect for the context factor, C, is the following ANCOVA model that uses the propensity score *Pi* instead of the covariates W in model (7a):

(7c) Y_{i} = b_{0} + b_{1} T_{i} + b_{2} C_{i} + b_{3} *Pi* _{i} + b_{12} T_{i} C_{i} + *epsilon*_{i}.

Alternative ways that can be used to implement the propensity score analysis include matching, stratification, and weighting.

The validity of adjusted comparisons depends on the success of the adjustment to remove all imbalances between intervention and control sites in the sample. With either ANCOVA or propensity scores analysis, it is necessary to assume that all relevant covariates are observed, i.e., there are no hidden confounders.

### Adjusted Comparisons for Pre-post-intervention Changes in Outcome Measures

Another strategy is to combine strategies (A) and (F), and apply ANCOVA or propensity score analysis to the pre-post changes in outcome measures.

With ANCOVA, Model (1a) takes the following form:

(8a) D_{i} = b_{0} + b_{1} T_{i} + b_{2} C_{i} + b_{3} W_{i} + b_{12} T_{i} C_{i} + *epsilon*_{i},

where D_{i} denotes the pre-post change for the i-th site. The rest of the model is identical to Model (7a) discussed in Section F above.

With propensity score analysis, the same propensity model (7b) is used to assess the propensity scores *Pi* _{i}. The fitted propensity scores are then used in the following model:

(8c) D_{i} = b_{0} + b_{1} T_{i} + b_{2} C_{i} + b_{3} *Pi* _{i} + b_{12} T_{i} C_{i} + *epsilon*_{i}.

### Adjusted Comparisons for Longitudinal Rates of Change in Outcome Measures

A combined strategy similar to adjusted comparison of pre-post change, discussed above, is to combine strategies (A) and (E) and compare longitudinally the rate of change for outcomes measures between intervention and control sites, adjusted for covariates that might be imbalanced, using either ANCOVA or propensity scores analysis.

With ANCOVA, model (1a) takes the following form:

(9a) R_{i} = b_{0} + b_{1} T_{i} + b_{2} C_{i} + b_{3} W_{i} + b_{12} T_{i} C_{i} + *epsilon*_{i},

where R_{i} denotes the pre-post change for the i-th site. The rest of the model is identical to Model (7a) discussed above.

With propensity score analysis, the same propensity model (7b) is used to assess the propensity scores *Pi* _{i}. The fitted propensity scores are then used in the following model:

(9c) R_{i} = b_{0} + b_{1} T_{i} + b_{2} C_{i} + b_{3} *Pi* _{i} + b_{12} T_{i} C_{i} + *epsilon*_{i}.

The choice of analytic strategies for the assessment of intervention effect and the corresponding strategies for the assessment of the moderation effect for contextual factors depend on the design of the study. Strategies that are based on pre-post changes or longitudinal rates of changes can be applied only to studies that obtain pre-post measures or repeated measures of outcomes. Strategies that are based on matched comparisons can be applied only to studies designed with matched sites. In order to allow more flexibility in the analytic strategies, it would be advantageous to design the studies to include these features (either pre-post measures of outcome, or, more preferably, repeated measures; and matched sites).

The strategies discussed above are not exhaustive. Some of the strategies can be expanded, e.g., the linearity assumption in the longitudinal models can be relaxed to allow for non-linear trajectories over time. In addition, strategies such as instrumental variables analysis^{7} and causal sensitivity analysis^{6,8} can be used to address hidden bias, i.e., unobserved factors that are imbalanced between intervention and control sites. However, the eight strategies discussed above are probably the most practical methods and most commonly applied.

### References for Chapter 12

- Kravitz R, Duan N, Braslow JT. Evidence based medicine: Heterogeneity of treatment effects, and the trouble with averages.
*Milbank Q*2004; 82(4):66187. Erratum in:*Milbank Q*2006; 84(4):759-60. - Greenfield S, Kravitz R, Duan N, Kaplan SH. Heterogeneity of treatment effects: Implications for guidelines, payment, and quality assessment.
*Am J Med*2007; 120(4 Suppl 1):S3-9. - Rosenbaum PR, Rubin DB. The central role of propensity score in observational studies for causal effects.
*Biometrika*1983; 70:41-55. - Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score.
*J Am Stat Assoc*1984; 79:516-24. - Rubin DB. Estimating causal effects from large data sets using propensity scores.
*Ann Intern Med*1997; 127(8 Pt 2):757-63. - Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome.
*Journal of the Royal Statistical Society, Series B*1983b; 45:212-218. - Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables (with Discussions).
*J Am Stat Assoc*1996: 91(434):444-72. - Rosenbaum PR. Observational studies (2
^{nd}Edition). New York:Springer Verlag; 2002.

^{a} This strategy can be combined with pre-post comparisons, to be discussed in the following section. For now we assume that the pre-intervention outcome measures are not available.