Chapter 4. Evaluation Design—Outcome Evaluation

Design and Evaluation of Three Administration on Aging (AoA) Programs: Chronic Disease Self-Management Program Evaluation Design (continued)

The key objective of this project is to design the most rigorous yet feasible evaluation of the impact of the CDSMP on the health status, health behavior, self-efficacy, quality of life, cognitive symptom management, and health care utilization and expenditures of individuals with chronic conditions. To address key research questions about the effectiveness of the CDSMP, the IMPAQ/Abt team proposes the implementation of a random assignment evaluation (i.e., experimental, Randomized Controlled Trial - RCT) design. This design, which is widely regarded as the gold standard for evaluating program impacts, will provide a comprehensive and rigorous assessment of the efficacy of CDSMP in improving the outcomes of program participants. Even though the experimental design is the gold standard, we cannot say with certainty that it will be feasible to implement at the time of evaluation. Due to uncertainties in future availability of funds for CDSMP3 and in applicants' willingness to accept being put on a waiting list for CDSMP workshops, we are also proposing a quasi-experimental evaluation design as an alternative. The alternative design utilizes propensity score matching (PSM) methodology which is one of the most rigorous methodologies in the absence of a feasible experimental methodology. Furthermore, sample size requirements are much more easily met in comparison to an experimental design since PSM does not involve randomization of applicants (i.e., no denial of service) and it also involves a matched administrative control group that provides more additional sample. An optimal strategy would be to start implementing the experimental design and to switch to the PSM design if it becomes apparent halfway through the intake process that the target sample size will not be reached.

In this chapter, we provide a detailed overview of our proposed evaluation design for conducting the evaluation of the effectiveness of the CDSMP in AoA-funded settings. Section A lists the key research questions to be addressed by the evaluation. Section B describes the two approaches to evaluation: a RCT and an alternative quasi-experimental PSM design. Section C provides a power analysis to guide the determination of minimum sample size required to detect statistically significant and qualitatively meaningful results. Site selection and recruitment considerations are presented in Section D. Section E provides details on baseline data collection and the participant tracking system that is necessary to ensure uniform data collection and critical monitoring of evaluation site data. Random assignment procedures are described in Section F. Section G provides details on follow-up data collection and Section H concludes the section with the details of the data analysis plan. Where appropriate, we note how the implementation of the quasi-experimental design differs from the experimental design.

A. Research Questions

The objective of the impact evaluation is to produce rigorous analyses of the CDSMP impacts on participant outcomes. Specifically, the evaluation will address a number of key research questions, including:

  • What is the impact of CDSMP on the physical and mental health status of program participants?
  • Is CDSMP effective in assisting program participants to better manage their conditions and achieve self-efficacy?
  • What is the impact of CDSMP on health care services utilization?
  • Does CDSMP lead to a reduction in health care expenditures?
  • What is the impact of CDSMP on Medicare costs for participants who are Medicare beneficiaries?
  • What is the impact of CDSMP on Medicaid costs for dual eligibles?
  • How do the impacts of the program differ across key groups (age group, Medicare status, chronic disease, etc.)?
  • How do the implementation factors affect impacts? Do factors such as the organizational structure, financial structure, experience of facilitator or leader, and site-level course completion rates affect the success of CDSMP?
  • What are the characteristic of CDSMP participants? How do they differ from AoA's usual customers? How is their use of long term care services and support? What percent are Medicaid beneficiaries (dual eligibles)?

Return to Contents

B. Evaluation Methodologies

The impact of the CDSMP on participant outcomes is essentially the difference between participant outcomes after program participation (treated outcomes) and participant outcomes had they not participated in the program (counterfactual outcomes). The typical evaluation problem is that, while the treated outcomes are observed, the counterfactual outcomes are not observed. The solution to this problem is straightforward when random assignment (RA) is used to determine program participation. Under RA, eligible program applicants are randomly assigned to the treatment group (receive program services) or to the control group (do not receive program services). RA ensures that individuals in the treatment and the control groups are equivalent in their observed and unobserved characteristics at the time they apply to participate in the program. Thus, any subsequent differences between the treatment outcomes (treated outcomes) and the control outcomes (counterfactual outcomes) are attributed to the program. This explains why experimental evaluation designs are preferable for rigorously assessing program efficacy relative to non-experimental designs that rely on statistical methods to identify appropriate comparison groups.

As explained before, we are also presenting an alternative rigorous design as a backup plan in case the evaluation sites will not be able to reach recruitment targets necessary for a meaningful evaluation. The alternative design is a quasi-experimental design where the control group members are selected from the extensive pool of Medicare administrative records. Separately in each host site service area, Medicare beneficiaries with similar characteristics will be identified and matched to the beneficiaries who are taking the CDSMP workshops. In this design, there is no randomization and hence everybody taking the workshops can be in the evaluation sample. We propose Propensity Score Matching (PSM) to ensure controls and experimental cases are matched as best as they can by an extensive set of variables on individual demographic characteristics, Medicare utilization and expenditures over the 12 months preceding the study. Propensity Score Matching offers an ideal solution to the problem of trying to simultaneously match on a multitude of variables. In this method, this extensive set of variables is used to predict the probability that an individual participates in the CDSMP. Individuals are then matched using these estimated probabilities.

PSM is a relatively costly technique—if implemented rigorously—due to intense data and analytic requirements. However, it is proven to be very effective and regarded often as the second best option after RCTs when a large pool of individuals exists to obtain matches (Medicare administrative records). There will be separate matching exercises for each host site area ensuring that the comparison group matches reside in the same area as the treatment group individuals ("hard matches" by geographic area). Furthermore, within these geographic areas we further hard-match certain key subgroups such as dual eligibles. The nature of the technique requires multiple iterations of matching by varying certain parameters and then carrying out extensive "balance tests" for each such iteration. Balance tests are used to assess how well a particular matching works by assessing how close the statistical distributions of the matched characteristics of the comparison group are to those of the program group. The drawback of the PSM is that comparison cases may not be matched with program cases on unobservable characteristics such as motivation to improve health. The validity of the comparison group depends on how important these unobservable factors are in explaining outcomes of interest and how well they can be predicted by the extensive set of Medicare administrative records. Further advantages and disadvantages of the PSM vis-à-vis RCT are described in the following sections.

Return to Contents

C. Required Sample Sizes

An important consideration in implementing a design is to include a sufficiently high number of participants to ensure that the evaluation detects statistically significant estimates of the CDSMP's impacts. Depending on the number of program applicants from the selected evaluation sites and a set of assumptions about the means and standard deviations of the outcomes of interest, one can calculate the minimum detectable effect (MDE). The MDE is the smallest program impact that the evaluation will likely be able to detect for a given outcome of interest. If the actual impact of the CDSMP on a given outcome is higher than the MDE, one can be reasonably confident that the impact will be detected by the evaluation. Since the actual program impact is not known a priori, smaller MDEs are desirable; the smaller the MDE, the greater the chance that the actual impact will be detected.

We are considering the 6-month impact of the CDSMP on a number of key variables that are derived from survey data or Medicare administrative records. Key variables derived from the surveys: (1) likelihood of self-efficacy and (2) likelihood of good self-rated health; key variables derived from Medicare administrative records: (3) likelihood of hospitalization (inpatient stay), (4) number of hospital stays, (5) days hospitalized, (6) hospital expenditures, and (7) total Medicare expenditures. Exhibit 1 presents the MDEs for these outcomes using different baseline sample sizes (i.e., the number of treatment and control group participants).4-8

As shown in Exhibit 1, if the evaluation includes 1,500 participants (750 treatment and 750 control), the MDE for the likelihood of self-efficacy will be 7.5 percentage points. This means that if the impact of the CDSMP program is at least 7.5 percentage points, an evaluation sample of 1,500 program participants will enable the detection of the program's impact. By the same token, an evaluation sample of 1,500 participants would enable the detection of a 7.1 percentage-point positive impact on the likelihood of good health, a 5.1 percentage-point reduction in the likelihood of hospitalization, a 0.12 days reduction in the number of hospital stays, and a 0.8 reduction in the number of days hospitalized. Furthermore, a sample of 1,500 study participants would enable the detection of reductions of $756 and $1,719 in Medicare hospital expenditures and total Medicare Part A and Part B expenditures, respectively.

If the evaluation sample includes a larger number of participants, the MDE would be lower. For example, a sample of 2,000 participants (1,000 treatment and 1,000 control) would enable the detection of at least a 6.5 percentage-point impact on the likelihood of self-efficacy. If the evaluation sample consists of 3,000 applicants, the MDE would be even lower (5.1 percentage points). As shown in Exhibit 1, the MDEs decline with larger sample sizes, indicating that large evaluation samples would enable the detection of small program impacts.

Exhibit 1: 6-Month Minimum Detectable Effects by Target Sample Size

  Number of Participants in the Evaluation Sample
N = 1,500 N = 2,000 N = 3,000 N = 4,000 N = 5,000
1) Likelihood of Self-Efficacy +7.5 +6.5 +5.3 +4.6 +4.1
2) Likelihood of Good Health +7.1 +6.2 +5.1 +4.4 +3.9
3) Likelihood of Hospitalization -5.1 -4.4 -3.6 -3.1 -2.8
4) Number of Hospital Stays -0.12 -0.10 -0.08 -0.07 -0.06
5) Days Hospitalized -0.80 -0.70 -0.57 -0.49 -0.44
6) Hospital Expenditures -$756 -$655 -535 -463 -414
7) Total Medicare Expenditures -$1,719 -$1,488 -$1,215 -$1,052 -$941

There are two important considerations in determining the number of participants that will be included in the CDSMP evaluation. First, a sufficiently high number of participants must be included to ensure that the evaluation will detect the CDSMP impacts. Exhibit 1 shows the MDEs for various sample sizes. Second, the number of participants in the evaluation cannot exceed the projected number of CDSMP applicants in the selected evaluation sites who will be willing to participate in the evaluation during the intake period. As explained in Section 3.C, it does not seem realistic to expect more than 3,000 study participants even with supplementary grants from AoA to host sites that are aimed at doubling the number of CDSMP participants.

Based on the power calculations presented in Exhibit 1, and the feasibility considerations, an evaluation sample of 3,000 individuals (1,500 treatment and 1,500 comparison) would enable the detection of sufficiently low MDEs for the key outcomes of interest except Medicare expenditures. It does not seem feasible to detect reasonable differences in Medicare expenditures due to very high variability in them. A meaningful effect size to detect would be $300 per participant as this has recently been estimated as the per participant cost of holding the workshop.9 $300 also corresponds to 7.5% of average Medicare expenditures in a period of six months. Ability to detect an effect size of $300 requires a sample size of 50,000. The evaluation contractor should propose methods to increase precision in the estimate of Medicare expenditure which may include modeling logarithm of Medicare expenditures rather than actual expenditures.

It is our estimation that available sample size for the PSM option will be triple the available sample size for the RCT option. MDEs for the PSM option are expected to be at most half of those for the RCT. Thus, the PSM option will have significantly higher statistical power than the RCT option.

Return to Contents

D. Site Selection and Recruitment

An important aspect of this evaluation is to identify CDSMP program sites that are strong candidates for inclusion in the evaluation. The evaluation contractor must work closely with AoA to identify CDSMP evaluation sites that satisfy certain criteria, including the following:

  • Size—As discussed earlier, large evaluation sample sizes will ensure that the evaluation will be able to detect small program impacts. To achieve high statistical power in assessing CDSMP impacts, the evaluation contractor and AoA should select large CDSMP evaluation sites to be included in the evaluation. The selection of large sites—sites that receive a high number of applications for program participation—will ensure that large sample sizes will be achieved and that the evaluation will be able to detect program impacts with high accuracy.
  • Representativeness—If feasible, selected evaluation sites will represent a wide range of program sites, based on a number of community-level characteristics, including but not limited to (1) rural/urban location, (2) average household income, (3) socioeconomic characteristics, and (4) average health care expenditures. If evaluation sites are representative of a wide range of program sites, AoA will be able to use the results of the evaluation to draw inferences on the effectiveness of the CDSMP program on a large scale.
  • Wait Lists—If feasible, the evaluation will include sites that keep wait lists due to a large demand for CDSMP services. Such sites would presumably be more willing to participate in the evaluation, since randomly assigning a portion of their applicants to the control group will not affect their usual operations.
  • Tomando de Su Salud—The evaluation will include sites implementing the Spanish-speaking version of the CDSMP.

The evaluation contractor should use criteria, such as those outlined above, to identify a preliminary list of sites for inclusion in the evaluation. This list may include 70–80 CDSMP sites. Once this list is approved by AoA, the contractor must then make all possible efforts to ensure that a sufficient number of selected sites agree to participate in the evaluation. We propose that the evaluation includes 50 sites; these numbers will ensure a sufficiently high number of participants even though there will be challenges for coordinating and managing a 50-site RCT evaluation. If the PSM option is selected, the evaluator should consider a smaller number of sites. The evaluation contractor must make all possible efforts to secure the participation of the target number of sites. Specifically, we recommend that the evaluation contractor implement and support the following activities to ensure site participation:

  • Step 1—Inform sites of the study and generate support for participation. The evaluation contractor should begin the site recruitment process by contacting the sites on the preliminary list. The contractor should draft and transmit letters of introduction to the sites; these letters should be sent to a high-level site administrator who is likely to have sufficient authority for committing the site to participate in the study. We recommend that the evaluation contractor enlist AoA's and Medicare's support in publicizing the study and encouraging site participation. Thus, AoA should provide a letter supporting the study and asking for the site's cooperation. The letter of introduction is intended to provide a method for ensuring that contacts understand the following: the objective of the study, the random assignment design of the study, and the need for participating sites to cooperate with the evaluation contractor to facilitate random assignment of program applicants and the collection of baseline data. The letter will also ask sites to consider participating in the study to assist AoA in gaining a better understanding of the effectiveness of the CSMP. Finally, the letter should provide contact information for the evaluation contractor and the AoA project officer.
  • Step 2—Conduct teleconferences with interested sites. The contractor should schedule teleconferences with sites that expressed interest in participating in the study. During these conference calls, the contractor will provide detailed information to the sites about the project's objectives and the requirements for the evaluation sites. Specifically, the following topics should be discussed during the conference calls: (1) overview of the project, (2) key research questions, (3) study design, (4) criteria for selecting evaluation sites, and (5) requirements for evaluation sites (random assignment of applicants, collecting contact information on applicants, etc.). It is recommended that AoA participate in these calls, when possible, to indicate its support of the request and to provide further encouragement for the sites' participation and explain any plans for offering grant funding to help implement the study.
  • Step 3—Secure site participation and select evaluation sites. Once sites commit to participating in the program, the evaluation contractor and AoA will select the final list of sites to be included in the evaluation. The contractor will then contact the selected evaluation sites to inform them of their inclusion in the evaluation and to request letters of commitment.

Return to Contents

E. Baseline Data Collection and Participant Tracking System

The evaluation sites will collect information on all individuals aged 60 or older who apply for CDSMP participation during the evaluation period. To facilitate the evaluation, sites will be asked to uniformly collect the following baseline information:

  • Applicant contact information (name, address, phone number)
  • Socioeconomic characteristics (gender, race, ethnicity, age, education, income, etc.)
  • Health characteristics (health status, chronic conditions, illnesses, disabilities, etc.)
  • Living arrangements (living alone, living with family, managed care facility, etc.)
  • Health care expenditures prior to CDSMP application
  • Medicare/Medicaid status (Medicare, Medicaid, and Dual eligibility)
  • Health insurance status and Participation in a Medicare Advantage Plan
  • Prior participation in the CDSMP or similar programs
  • Type of Program Participant (individual or proxy/caregiver)
  • Use of long-term care services or supports
  • Language spoken

Note that while this survey will be administered to both treatment and control group members in the RCT design, it will only be administered to the treatment group members in the PSM design. The evaluator will not have contact information for the comparison group members in the PSM design. Furthermore, even with the contact information, response rates would be very low due to the "cold call" nature of the inquiry.

The above information will be used to produce detailed descriptive analyses of the characteristics of CDSMP applicants. In addition, this information will be used to produce control variables for the impact analyses (Section H).

The evaluation contractor will be responsible for facilitating the process of collecting baseline information on program applicants by the evaluation sites. Specifically, the contractor will develop a baseline data collection instrument that will be used by the evaluation sites to collect information on applicant characteristics. The contractor will train all evaluation sites on how to use this instrument before the start of the implementation period.

Furthermore, the evaluation contractor will create a Participant Tracking System (PTS), which will be used to track all interactions of program applicants with the CDSMP, from program application through service delivery. The PTS will be designed to include the following features:

  • Store baseline characteristics of all applicants—The PTS will be designed to allow evaluation sites to store the baseline applicant characteristics once they are collected. The PTS will also allow the evaluation sites and the evaluation contractor to access information on program applicants and produce custom reports using up-to-date data. Furthermore, the PTS will be designed to allow the evaluation contractor to obtain easy access to baseline characteristics for conducting the impact analyses.
  • Facilitate random assignment (for the RCT study)– The PTS will be designed to allow the evaluation contractor to access up-to-date information on new program applicants for conducting the random assignment of applicants to the treatment group or the control group and notify sites of the assignment.
  • Track participants' program use—The PTS will be designed to allow sites to document program use by participants. Specifically, sites will be able to collect information on whether treatment group members showed up to receive program services, the types of services received, and whether they completed the program. In addition, sites can use the PTS to report whether a proxy participant attended the program instead of the actual applicant. This information can be used to assess the use of the program by treatment group members.
  • Track contamination: For RCT, the PTS will track whether the control group members take CDSMP workshop within the first 6 months. For PSM, the PTS will track everyone in the matched Medicare comparison group whether they take the CDSMP workshop. This will take quite a bit of effort as it requires identifying and matching the Medicare beneficiaries with the CDSMP participants based on limited information such name, gender, age, and geographical area.

The PTS will provide the sites and the contractor with information necessary to effectively monitor program activities. For example, the PTS will provide real-time data on program enrollment and program attrition, enabling the monitoring of how far in the sequence of CDSMP services the participant progresses. Furthermore, the evaluation contractor will be able to identify early if sites are experiencing a shortfall in the expected number of applications.

Return to Contents

F. Random Assignment

Random assignment is applicable to the RCT design only. A random assignment (RA) process will be used to assign program applicants to the treatment group or the control group. Those assigned to the treatment group will receive program services, whereas those in the control group will have to wait 6 months before receiving services. To achieve the required sample sizes for the evaluation, the intake period will be 12 months; if the required sample sizes are not reached, the evaluation may switch to the PSM option.10 Below, we discuss the recommended steps for successfully implementing RA of program applicants during the implementation period.

  • Step 1—Informed consent form. During the implementation period, sites will inform all program applicants aged 60 years or older that the site is participating in an RA study of the CDSMP program. Sites will explain to these applicants that each applicant will be randomly assigned to the treatment group or the control group and that assignment to the control group means that the individual will need to wait 6 months before receiving program services. Sites will also explain that applicants will be asked to respond to a follow-up survey, approximately 6 months after program application. Individuals who are still interested in program participation must sign an informed consent form which states that the applicant is aware of the evaluation, the random assignment process, and the follow-up data collection.
  • Step 2—Random assignment of program applicants. During the implementation period, evaluation sites will feed information about new program applicants (aged 60 or older) into the PTS. On a weekly basis, the evaluation contractor will access the PTS to identify new applicants and will use statistical software with random number generator capabilities to assign each new applicant a random number. Half of the numbers will then be randomly drawn and assigned to the treatment group, and the remaining half will be assigned to the control group. The evaluation contractor will then update the PTS with information about whether the individual was assigned to the treatment group or the control group.
  • Step 3—Notify applicants and provide services. Once the evaluation sites receive the treatment and control group information, they will notify applicants of their assignment. The sites will then provide services to all program applicants as follows: treatment group members will receive services once workshops become available; control group members will be provided services after the 6-month wait period and after the follow-up data collection is completed.

This process will ensure that the random assignment of program applicants to the treatment or the control group is successfully implemented. Random assignment will enable the evaluation contractor to compare the outcomes of treatment and control group members 6 months after program application (using the follow-up survey and Medicare/Medicaid data) to rigorously estimate CDSMP impacts. Furthermore, this design ensures that all program applicants receive CDSMP services, with the caveat that control group members will need to wait 6 months before receiving services. This design alleviates ethical concerns since none of the evaluation site program applicants will be denied services.

Evaluation sites will be asked to screen applicants based on their age; only applicants aged 60 or older will be randomly assigned to the treatment or the control group, while those under age 60 will follow the normal process of receiving CDSMP services.11

Return to Contents

G. Follow-Up Data Collection

Given the relative scarcity of CDSMP participants (Section 3.C) and the need to have as many study participants as possible to increase the likelihood of detecting statistically significant effects (Section 4.C), it is critical to retain as many baseline participants for the follow up surveys as possible. To that end, we recommend administering the follow up surveys as Computer Aided Telephonic Interviews (CATI) while providing the paper-and-pencil option to participants to maximize response rates. Follow up survey response rates are assumed to be at least 70%. The primary goal of the planned outcome evaluation is to assess health and other outcomes for CDSMP participants compared to non-participants. Data collection procedures and instruments designed to assess outcomes will be identical for the treatment and control groups. Follow up survey data for the RCT study will be collected from both treatment and control groups at months 6 and 12 after random assignment. For the PSM study, survey data will be only be collected from the treatment group at baseline, 6 months, and 12 months.

Survey type and method: development and testing. As stated earlier, there are several well-developed and tested instruments for evaluating health outcomes, quality of life outcomes, and self-efficacy. We recommend that the evaluation contractor select an existing, psychometrically sound tool to measure self-reported health, quality of life (QOL), and self-efficacy outcomes based upon the environmental scan described previously. We recommend that the survey mode be a mail survey, as has been employed in past CDSMP evaluations.

Use of a validated instrument will eliminate the need to cognitively test the survey tools; however, pilot-testing will still be required to ensure smooth survey implementation across sites and valid survey results. As stated previously, for purposes of generating survey cost estimates, we have assumed that the instrument selected will be the Stanford Chronic Disease Questionnaire. The rationale for recommending this tool is that all domains of interest are included in one instrument, the scales included in the tool have documented psychometric properties,12 and the degree of respondent burden is minimized with one tool containing approximately 48 items. 

Sampling Frame. The sampling frame for the RCT design will be obtained from CDSMP sites participating in the national evaluation. Participant contact information for the treatment and control groups will be obtained by the site through the application process and forwarded to the survey contractor for input into the contact database. Copies of informed consent for survey participation will have been obtained by each participating site and copies forwarded to the evaluation contractor for record-keeping. The treatment arm of the PSM study will have the same sampling frame as the RCT. However, the comparison arm will have a different sampling frame. For a given host site, the sampling frame for the comparison group will be the Medicare beneficiaries who are 65 years of age or older residing in the area served by the host site. The list of these beneficiaries will be obtained from Medicare administrative records through the Centers for Medicare and Medicaid Services. Then, PSM will be used to select among these beneficiaries the ones that match well to the treatment group by an extensive set of individual characteristics as described in Section 5.B.

G.2 Survey Instrument Development

Pilot testing. Pilot testing informs the evaluation contractor of difficulties in administering the survey (e.g., wording is difficult to read as written) as well as difficulties that respondents may have in interpreting and/or understanding specific questions. In addition, pilot testing can help the researcher test the length of the survey and adjust it as necessary to maintain the preferred survey length. Most importantly, pilot testing will ensure that the evaluation contractor is obtaining the desired data from the survey questions, for both the treatment and the control groups.

Given the design team's recommendations that the already-tested Stanford Chronic Disease Questionnaire be utilized, we do not believe that it will be necessary for the evaluation contractor to conduct a lengthy pilot test; however, we would recommend that the survey instrument and survey procedures be tested for control and treatment group participants from two CDSMP sites to assure AoA and the evaluation contractor that the instructions are clear and the process will work as intended prior to full-scale baseline data collection.

Survey administration. The study will be conducted through self-administered paper and pencil surveys for the baseline data collection. Two waves of follow-up measurements will be administered through CATI telephonic survey system. An item to capture whether control group participants have received any education about their disease since the baseline survey should be added to these follow-up surveys to enable assessment of control group contamination. Spanish-language surveys should be sent to those treatment and control group participants who are identified by the CDSMP site as reading only Spanish.

G.3 Administrative Data Review

As previously stated, the design team recommends that the effects of CDSMP programs on health care utilization and expenditures be assessed using Medicare claims data. These data may be obtained from CMS, using the following procedures:

  • Step 1—Obtain understanding from AoA about any interagency agreements regarding data use/data acquisition.
  • Step 2—Complete ResDAC data request package, including study protocol, request letter, and Data Use Agreement (DUA).13
  • Step 3—Request access and specify media (e.g., DVD, external hard drive) in which to receive the following Medicare data: EDB, Inpatient SAF, Outpatient SAF. It would be desirable for the national evaluation contractor to be able to access the data via the CMS mainframe due to the potentially large size of the files; however, mainframe access is unlikely given other CMS data processing priorities.

It is likely that, during the development of the study protocol and data request package, the evaluation contractor will need to consider development of a "finder file" for ResDAC's use. This file will provide ResDAC (or other CMS data contractor) with the appropriate identifiers for the CDSMP treatment and control samples. The finder file will allow ResDAC to pull claims for only those Medicare beneficiaries for whom survey data are available and thus streamline the resulting size and content of the administrative files. The evaluation contractor should anticipate that awaiting receipt of data from ResDAC will likely take the better part of Year 1 of the contract, and thus the data request package should be submitted as soon as possible after contract award.

Claims data for the year just prior to the evaluation, for the 12 months of recruiting and for the 12-month follow-up period, should be obtained for development of the longitudinal file. Our estimate at this point of the data required would be for calendar years 2012 through 2015.

G.4 IRB and OMB Clearance

To ensure the protection of human subjects used in the proposed research, the survey and evaluation plan must be approved by a certified Institutional Review Board (IRB) prior to commencement. In addition, it is anticipated that the data collection effort will require preparation and submittal of a Paperwork Reduction Act (PRA) package for Office of Management and Budget (OMB) approval. Based upon previous experience, a minimum of 9 months should be allowed in the proposed project schedule for OMB clearance; thus, the design and data collection protocols must be finalized within the first several months of the national evaluation in order to obtain clearance to proceed with the study by Year 2. Mechanisms for achieving the highest possible response rate should be articulated in the PRA package, as should the evaluation contractor's plans for assessing non-response bias. We suggest that data on the study population be collected from the sites to help assess non-response bias and make any needed adjustments to the survey data. We also suggest that responders to follow-up surveys are compensated for their time with $15 checks. We anticipate the need for OMB clearance both for the CDSMP participant survey and the CDSMP host site Web survey.

Return to Contents

H. Data Analyses

The objective of this evaluation is to estimate the CDSMP impact on participant outcomes, 6 months and 12 months following program participation. For the RCT design, since program applicants will be randomly assigned into the treatment group (receive services) or the control group (required to wait 6 months before receiving services), there will not be any observed or unobserved differences in characteristics between the treatment and the control groups. Thus, treatment-control differences in outcomes 6 months after RA can be attributed to CDSMP participation and represent accurate estimates of program impacts.14 At the 12th month, some control group participants may have taken the workshop after being put in the waitlist for 6 months. So, the 12 month impact estimates will provide a lower bound for the effectiveness of CDSMP. Even though the PSM design ensures that there are no observed pre-existing differences between the treatment group and the matched comparison group, it does not guarantee this for unobserved differences. Below, we provide an overview of the recommended data analyses for estimating the impact of CDSMP on participant outcomes. If there is a difference between the RCT and PSM designs in data analyses (after randomization for RCT and after matching for PSM), we indicate these below.

H.1 Descriptive Analyses of Evaluation Sample

Prior to conducting any impact analyses, descriptive analyses of all CDSMP applicants in the sample should be produced based on information provided at the time of application. These analyses will provide an overview of the baseline characteristics of treatment and control group individuals, including:

  • Socioeconomic characteristics (gender, race, ethnicity, age, education, income, etc.).
  • Health characteristics (health status, chronic conditions, illnesses, disabilities, etc.).
  • Living arrangements (living alone, living with family, managed care facility, etc.).
  • Health care expenditures prior to CDSMP application.
  • Medicare/Medicaid status (Medicare, Medicaid, and Dual eligibility).
  • Health insurance status and Participation in a Medicare Advantage Plan.
  • Prior participation in the CDSMP or similar programs.
  • Use of long-term care services and supports.
  • Type of program participant (individual or proxy/caregiver).
  • Language spoken.

These descriptive analyses are necessary to provide an overall characterization of CDSMP applicants in each evaluation site. This information will be used to produce characteristics comparisons between CDSMP applicants and all individuals aged 60 or older in the state, and between CDSMP applicants and all Medicare beneficiaries in the state. Analysis will include completers and non-completers, as well as an analysis of how many and which classes they attended. Additionally, these analyses will provide some preliminary evidence on whether random assignment of CDSMP applicants to the treatment or the control group was performed effectively. The PSM, by design, makes sure that there are no existing differences by conducting balance tests. So, the following tests to assess randomization only apply to RCT.

  • Treatment-control comparisons in observable characteristics—The evaluation contractor should produce a standard treatment-control group means comparison of each available characteristic. Using t-tests, the contactor should examine whether there are any statistically significant differences in characteristics between treatment and control group members in each evaluation site. If randomization was done effectively, there should not be any statistically significant differences in characteristics between the treatment and the control group in each evaluation site.
  • Estimate likelihood of treatment group assignment—The evaluation contractor should estimate a linear regression model for each evaluation site, where the dependent variable is the likelihood of being assigned to the treatment group, and controls include all available characteristics at the time of application. If randomization was done effectively, the estimated parameters of each control variable will be statistically insignificant.

H.2 Impact Analyses Using Means Differences

Once it is verified that randomization of program applicants was successful, means comparisons in outcomes 6 months after RA between treatment and control group applicants will produce unbiased estimates of CDSMP program impacts. The first step in estimating program impacts is to produce comparisons between the mean outcomes of treatment group members and the mean outcomes of control group members for all outcomes of interest. T-tests should be used to assess if the treatment-control differences in post-RA outcomes are statistically significant, that is, if those in the treatment group had different outcomes than those in the control group. 

These analyses should be produced for all available outcomes of interest, including health status, self-efficacy, total health care expenditures, Medicare/Medicaid expenditures, etc. Based on the results of these analyses, the evaluation contractor will be able to draw preliminary conclusions about the efficacy of CDSMP in improving participant outcomes.

H.3 Impact Analyses Using Regression Models

To estimate the impact of CDSMP on the post-RA outcomes of program participants with increased statistical efficiency, multivariate regression models should be used. Regression models are used to refine the impact estimates by eliminating differences in outcomes between the treatment and the control groups that may have occurred by chance as a result of differences in observed characteristics. These models will also permit subgroup analyses that can identify impact differences across key participant groups (e.g., by age, chronic condition, Medicare status) and by program characteristics (e.g., program size, location, services).

The impact analysis regression model can be expressed by the following equation: 

Y = α · T + X · β  + u                      (1)

The dependent variable in this model (Y) is the participant post-RA outcome of interest (self-efficacy, health care expenditures, etc.). Control variables include:

T, which equals 1 if the individual was in the treatment group and 0 otherwise.

X, which includes all available individual characteristics at RA (e.g., socioeconomic characteristics, living arrangements, chronic condition, participation in prior workshop, and prior health care expenditures).

u, which is a zero mean disturbance term.

The parameter of interest in this model is α, the regression-adjusted treatment effect of the CDSMP program on the outcome. This parameter represents the impact of being assigned to the CDSMP treatment group (intent-to-treat effect) and not the impact of actually receiving services. The model will be estimated separately for each available post-RA outcome. Once the evaluation contractor estimates these models, t-tests should be used to assess whether estimated impacts are statistically significant.

The results of these analyses will address many of this evaluation's key research questions. For instance, based on these results, AoA will be able to assess if CDSMP led to significant improvements in the health status and self-efficacy of participants. AoA will also be able to assess whether CDSMP participation led to significant savings in Medicare and/or Medicaid expenditures.

H.4 Sensitivity Analyses

In addition to the above analyses, we recommend that the evaluation contractor performs sensitivity analyses that will provide better insight on program effectiveness and will ensure that program impacts were estimated accurately. Specifically, we suggest the following types of analyses.

Subgroup analyses. The above regression model can be modified to assess whether the CDSMP program had differential impacts across key applicant groups. For example, it would be interesting to assess if the program has differential impacts by chronic condition, living arrangements, and health status at the time of application. To conduct these analyses, the evaluation contractor can modify the regression model above to include interactions between the treatment indicator (T) and the characteristic of interest (age, chronic condition, etc.). T-tests should then be used to examine if the interaction effects are statistically significant, that is, if there are important differences in program impacts across key groups. These analyses will provide important insight to AoA about program effectiveness.

Program differences analyses. There may be specific implementation factors that lead to higher CDSMP impacts. For example, it would be interesting to assess if the program has differential impacts across program characteristics, such as duration, class size, or language of instruction. To conduct these analyses, the evaluation contractor can modify the regression model above to include interactions between the treatment indicator (T) and the implementation factors of interest. Using t-tests, the contractor will be able to assess if there are important differences in program impacts across key implementation factors.

Weighting to account for sample attrition. We also recommend that the evaluation contractor produce weights that account for sample attrition. These weights would be used in the analyses to capture treatment-control differences in survey response and ensure that the impact estimates are not tainted by sample attrition. Using attrition weights, therefore, is critical to ensure that the evaluation produces consistent estimates of program impacts. The evaluation contractor could also use other methods to account for sample attrition, such as the PROC MIXED/PMM approach.

Sample contamination. In some states, CDSMP program services are available to individuals through an online tool. Thus, it is possible that in such states, control group members would receive the services online before the expiration of the 6-month waiting period. This would "contaminate" the control group and might jeopardize the evaluation. One strategy for avoiding this issue is to ensure that the evaluation excludes sites in states where an online CDSMP tool is available. Another strategy is to use responses to the follow-up surveys to control for whether individuals in the treatment or the control group received the online CDSMP services or some other type of services.


3. ARRA funding for CDSMP is set to expire in March 2012.

4. The typical standard used in program evaluation is that the estimated impact is statistically significant at the 5 percent level. The typical standard for statistical power (i.e., the probability that a statistically significant impact is detected) is 80 percent. Based on our review of several reports and data sources, we make the following assumptions about the outcomes of interest: (1) 50 percent of program applicants achieve self-efficacy, (2) 65 percent of applicants report that they are in good health, (3) 12 percent of Medicare beneficiaries has an inpatient stay, (4) the standard deviation of the number of hospital stays is 0.783, (5) standard deviations of days hospitalized is 5.4, (6) the standard deviation of hospital expenditures is $5,073, and (7) the standard deviation of total Medicare expenditures is $11,530.

5. Our team did not have access to any Medicare data specific to this design project. We utilized a 5% Medicare annual sample and only included beneficiaries who were 65 or older, had 12 months of Part A and Part B coverage, and did not have Medicare Advantage enrollment in any month. The data did not allow for calculation of standard deviation of total Medicare expenditures for a 6-month period. We approximated the 6-month standard deviation as 75% of the 12-month standard deviation. Further details can be provided separately if requested. The true MDEs can be larger or smaller.

6. We assume that 70 percent of baseline study participants will respond to the 6-month follow up survey. As described in Section 3.C, 60% of baseline participants are assumed to have Medicare claims records as some of the participants will be younger than 65, be Medicare Advantage enrollees, or will not be able to be matched to their Medicare records for various reasons. Since there is no non-response in administrative records, we also assume that everyone for whom Medicare data is available at baseline will have Medicare data available at month 6 as well.

7. The estimates are based on simple random drawing. We did not consider between-site variation and the implied "design effect". Thus, it is possible that the presented MDEs are biased downwards.

8. We assumed that 5% of the variation in the survey-based outcomes of interest will be explained by covariates. For Medicare claims-based outcomes of interest, this was was set at 10%.

9. One of the CDSMP grantee states we interviewed estimated the per participant cost of providing the workshop as $300. Some of the Lorig articles provide estimates in the range of $70-$200; however, these are likely to be outdated.

10.The exact implementation period will be estimated by the evaluation contractor once the recruitment of sites is complete.

11. As of March 2011, 27% of CDSMP program applicants in the last 12 months were younger than 60 years of age.

12. Go to http://patienteducation.stanford.edu/research/cdCodeBook.pdf (93.44 KB).

13. Go to http://www.resdac.org/Medicare/requesting_data_NewUse.asp for details on the data request procedures and package.

14. The focus is to estimate the program's intent-to-treat effect on participant outcomes, that is, the effect of being offered CDSMP services. From this point on, impact or effect refers to the intent-to-treat effect.


Return to Contents
Proceed to Next Section

Page last reviewed December 2014
Page originally created May 2011
Internet Citation: Chapter 4. Evaluation Design—Outcome Evaluation. Content last reviewed December 2014. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/research/findings/final-reports/aoa/aoachronic-4.html