Like many complex evaluations of grant programs, the 5-year national evaluation of the CHIPRA quality demonstration grant program faced key challenges as the NET worked on a variety of tasks and produced numerous and diversified products. Our five interim reports for AHRQ (submitted in final form in August 2011, August 2012, February 2013, October 2014, and May 2015) detail the tasks we undertook during the evaluation, the challenges we faced, and the solutions we devised to address them. Here we discuss four overall conclusions regarding the evaluation itself:
- The national evaluation accomplished many of its goals, but it did not include impact analyses for demonstration projects because of challenges related to program design, program implementation, and data availability.
- We developed diverse methods for collaborating with grantees, such as providing evaluation-focused TA.
- Our technical expert panel was helpful in the early stages of the evaluation, but need for their input diminished once the evaluation design solidified.
- We developed and disseminated evaluation findings throughout the evaluation period, emphasizing emerging lessons learned about program implementation at first and synthesizing findings about program outcomes and effects in the last months of the project.
In addition to listing more than 100 potential questions that the evaluation could address, AHRQ’s request for task order (RFTO) noted that the evaluation’s purpose was to provide CMS and States with: (1) “insight into how best to implement quality improvement programs” for children and (2) “information on how successful programs can be replicated.” As noted above, the NET has generated a large number of products that provide insights into strategies for improving quality of care, suggesting that the national evaluation accomplished the first goal. These products also address a large proportion of AHRQ’s original research questions (www.ahrq.gov/policymakers/chipra/demoeval/index/html).
We did, however, face significant challenges in reaching the second goal—determining the success of the demonstration projects based on quantitative measures of care derived from claims or other types of quantifiable data. Typically, evaluations use rigorous research designs to estimate the impact of programs on designated outcomes. Strong research designs include randomized controlled trials or comparison group designs that draw on data collected before and after program initiation, from both the group receiving the intervention and a comparison group that is similar in characteristics but is not involved in the intervention. These designs are considered strong because they provide evidence about what would have happened in the absence of the intervention. Comparing outcomes for the group affected by the intervention with outcomes for the comparison group allow one to estimate the impact of the intervention beyond what would have happened anyway.
The fact that States were not required to have comparison groups as a condition of their grant impeded the use of these rigorous methods. Furthermore, many projects were underway, and intervention sites often selected and enrolled before the evaluation contract was awarded. Therefore, we worked actively throughout the evaluation period to determine opportunities to work with States and implement comparison group designs for at least one project in each demonstration State. These efforts included:
- Asking States to use their grant funds to identify and obtain data from comparison group practices as part of their Category C projects.
- Asking Pennsylvania to use a lagged implementation approach for its Category B work, so practices that implemented electronic screeners in later years of the grant could be used as comparison practices in earlier years.
- Requesting that States with projects designed to enhance medical home features use a standard measure of medical homeness so that we could combine data or compare outcomes across States.
- Working with States to ensure that we had the quantitative information necessary to develop claims-based measures of service use and to attribute children to specific intervention and comparison group practices.
- Providing evaluation-focused TA to States to ensure that they gathered the data needed for quantitative analyses.
Although we examined each project to determine whether impact analyses would be feasible, we focused on Category C projects because they appeared particularly conducive to a rigorous impact evaluation. In particular, 12 States planned to implement a PCMH model to improve quality of care for children in selected practices. As described in our first evaluation design report, we planned to collect Medicaid administrative data and practice-reported PCMH surveys from the CHIPRA intervention practices and a set of comparison practices to assess whether outcomes (such as receipt of well child care and avoidable emergency department visits) improved more among children in the intervention versus comparison practices. Moreover, by combining data across States with similar interventions, we expected to have enough statistical power to detect project impacts on children’s health care.
Unfortunately, for numerous reasons, we could not conduct these analyses as planned. (Appendix C provides a detailed account of the problems we encountered.) In fact, as we worked with each State during the evaluation period, we encountered obstacles beyond our control that made it impossible to implement our plans for quantitative analyses:
- The number of intervention practices in some States’ Category C projects was so small that the chance of detecting differences in service use for children in these practices and children in comparison practices did not warrant the substantial investment of resources required to conduct impact estimates. For example, Alaska and Idaho each worked with only three practice sites in their Category C projects.
- The quality and comprehensiveness of the Medicaid administrative data were compromised by lack of encounter data from managed care organizations. Many of the Category C States have high use of Medicaid managed care among child beneficiaries, with 45 to 90 percent of children in managed care in Florida, Massachusetts, Oregon, South Carolina, Utah, and West Virginia. Most states could not provide managed care encounter data. Without data from managed care organizations, our evaluations would have represented a small proportion of intervention children in those States.
- States altered original plans for their interventions so substantially that the project’s actual implementation was far less likely to achieve the effects originally intended. For example, several States focused on a narrow range of PCMH transformation activities, rather than implementation of the full model as they had originally planned.
- States initially agreed to identify and collect data from comparison groups but then did not do so because they did not want to impose data collection burdens on practices without providing some benefits, the costs of which had not been included in the grant’s budget.
- Because of the selection process for identifying intervention sites, developing an equivalent group of comparison sites was not feasible, especially in the less populous States (for example, the intervention sites were the largest and most sophisticated in the State).
- Post-intervention data were unavailable from CMS’ data files because of major lags in the data submitted by States and because of major delays as CMS transformed its data file structure from one system (MSIS) to another (T-MSIS).
In addition to these obstacles, many of the demonstration projects were designed to enhance the State’s infrastructure for QI activities—as requested in the original solicitation. Infrastructure programs are typically designed to affect children statewide; as a result, there are no “intervention” or “comparison” groups. For example, Massachusetts sought to assemble a group of stakeholders, The Massachusetts Child Health Care Quality Coalition, to develop a shared understanding of child health care quality priorities, create a platform for formulating systemwide goals and objectives, and implement activities to support those goals.
Moreover, evaluating infrastructure programs requires a substantial period of time. Following implementation (which may require several years of planning and activity), the effects of such programs on beneficiaries’ service use are likely to be measurable only after a substantial amount of time has passed. For example, Wyoming spent more than 3 years of the grant designing and developing the administrative infrastructure for the State’s first care management entity to improve care for children with severe behavioral health care needs; then, the State piloted the program with around 150 youth in the final years of the demonstration.
Our inability to complete impact analyses for at least one demonstration project in at least one State was a major disappointment. Nonetheless, our efforts to do so led to three positive developments.
First, because of our early work with Massachusetts to support enrollment of comparison practices into their Category C program activities, the State was in a position to conduct its own impact analyses using its Medicaid claims and managed care encounter data (whereas we were able to use only its fee-for-services claims).22 At the State’s request, we provided TA for these analyses. Consequently, we anticipate that the Massachusetts team will complete a manuscript describing the impact of their Category C program and will likely submit the manuscript to a peer-reviewed journal in late 2015. Their unpublished findings show that children with chronic conditions attributed to CHIPRA practices for the full 3-year demonstration had a significant reduction in potentially avoidable emergency department use, whereas comparison children with chronic conditions had no such reduction over the same time period.
Second, we reassigned resources originally allocated for impact analyses to a quantitative survey of physicians in three States to ensure that we could address questions related to provider perceptions of QI efforts based on practice-level quality reports—an issue that is directly related to many of the demonstration States’ Category A and Category C projects. One of our journal manuscripts (submitted for publication) presents quantitative findings based on analysis of data from this survey.
Third, we used the baseline claims data received from three States to conduct an analysis of the association between a practice’s degree of medical homeness and health care utilization of child Medicaid beneficiaries in those practices. Although this work is not technically an evaluation of the CHIPRA demonstration activities, the publication of our analysis23 in a peer-reviewed journal contributed to the field’s limited knowledge of the effect of PCMH for children, using data that were already collected and cleaned in preparation for the planned impact evaluation.
Although many of the challenges we encountered could not have been foreseen, we believe that future grant programs could avoid some of these problems by adhering to recommendations made in Section 3 of this report.
From the beginning of the evaluation, the NET worked carefully to develop productive working relationships with the demonstration States and engage them in our work. We remained mindful of the need to avoid imposing unnecessary burdens on the States and the importance of acknowledging the value of their experiences and perspectives. We also provided States with an opportunity to comment on our products, and make factual corrections as needed. We held Webinars and conference calls to discuss overarching issues and specific content. As noted above, the presence of full-time project directors seemed to be more conducive to external evaluation activities compared with our experiences on other similar large demonstration projects.
We also offered States evaluation-focused TA. The need for this kind of TA emerged in the first 6 months of the evaluation in response to our recognition that the States had not proposed any mechanism for gathering counterfactual information to support rigorous evaluation for the majority of the 52 projects. In the first 12 months of the project, we strongly urged States to identify comparison practices and to administer measures of “medical homeness” to both comparison and intervention groups. (See the “challenge” section above for further discussion of this issue).
Our TA took different forms at different stages of the evaluation. In the first year of the evaluation, we helped selected States consider comparison groups for their Category C interventions, to make their projects more conducive to a rigorous evaluation. In the second year of the evaluation, we established periodic calls with all demonstration States to address issues in measuring “medical homeness.” During these calls, we provided overviews of different measurement frameworks, discussed the strengths and weaknesses of each for application in the CHIPRA demonstration projects, and answered questions from the States. In addition, we participated in several calls with the Children’s Hospital of Philadelphia (CHOP) Policy Lab to collaborate on designing an evaluation they planned to conduct of the effect of the CHIPRA developmental screening intervention on children’s receipt of early intervention services. In years three and four of the evaluation, we held a series of calls related to measuring outcomes using claims data. In the last year of the evaluation, we offered to provide assistance to States in developing technical reports related to their own State-based evaluations. For example, as described above, we worked closely with staff in Massachusetts to help them develop an impact analysis of their PCMH intervention using Medicaid claims and managed care encounter data and to present their findings in a manuscript for submission to a peer-reviewed journal.
We met with our 14-member Technical Expert Panel (TEP) in person in the fourth month of the evaluation and presented our overall plan for conducting the evaluation. The TEP concurred with our approach and also offered helpful suggestions for refining our methodology.
We used subsequent meetings (held by telephone beginning at the midpoint of our second year) to help prioritize the long list of research questions that AHRQ had originally posed for the evaluation. Through these deliberations, we recognized that we would not be able to address all the questions in a comprehensive manner. With the TEP’s assistance, we were able to prioritize the most important questions, which allowed us to focus our resources productively. Subgroups of the TEP also provided input on specific topics, such as the content of the physician survey.
As we moved past the design and prioritization phases of the evaluation, we realized that TEP meetings would be less useful over time, because we would be asking the TEP members to read and comment on only the products we had committed ourselves to developing. In conjunction with our AHRQ project officers, we decided to use the remaining national evaluation funds that had been allocated to run the TEP to support the writing of Evaluation Highlights and other evaluation products.
AHRQ provided consistent encouragement to the NET to develop—as soon as possible and throughout the evaluation period—products with findings that would be of use to States, in particular, and also to CMS and the field of child health care in general. In line with this emphasis, we focused on several methods for disseminating our products.24 Specifically, we took the following steps:
- In August 2012, we launched the national evaluation Web page, hosted on AHRQ’s Web site. Initially, we used the Web page as a venue for educating stakeholders about the demonstration and our evaluation. As the evaluation progressed, we posted our products on this page and posted links to State-generated reports as they became available. By the end of the national evaluation in September 2015, more than 8,500 individuals had become subscribers to the CHIPRA national evaluation updates. AHRQ used the GovDelivery platform, along with its Electronic Newsletter, Child and Adolescent Health Periodic Digest, and Twitter feed to inform subscribers and others when new information was posted to the national evaluation Web site.
- We developed dissemination partners to help broaden the reach of our findings. In addition to the States themselves, we worked closely with the Maternal and Child Health Bureau (MCHB), the Association of Maternal and Child Health Programs (AMCHP), the Catalyst Center, the National Association of Medicaid Directors (NAMD), the National Academy for State Health Policy (NASHP), the Children’s Hospital Association (CHA), the American Academy of Pediatrics (AAP), the American Academy of Family Physicians (AAFP), Voices for America’s Children, and the National Initiative for Children’s Healthcare Quality (NICHQ). Some of these organizations (AMCHP, the Catalyst Center, and NASHP, for example) helped us in our dissemination efforts by including announcements of our products in their newsletters and through other means. Other organizations (the AAP and the AAFP, for example) were less interested in helping with dissemination.
- We presented findings to the demonstration States during several CMS-hosted conference calls, at CMS-sponsored quality conferences, and at various professional conferences (including several of AcademyHealth’s annual conferences and at its first National Child Health Policy Conference).
- During the last 3 months of the projects, we helped organize several Webinars in conjunction with key dissemination partners. At the time of writing this report, we had Webinars scheduled with (1) NASHP to present to their CHIP directors and Children in the Vanguard learning networks, (2) the State-University Partnership Learning Network hosted by AcademyHealth, (3) the National Improvement Partnership Network led by the University of Vermont, and (4) the Association of Medicaid Medical Directors.
Overall, the Web page on AHRQ’s Web site provided a sturdy platform for making available to interested individuals both the products developed by the NET and links to State reports. The number of subscribers increased steadily during the evaluation period. The visits to and downloads of our products typically peaked in the month of their publication and then waned. Introduction of a new product often produced some traffic to earlier publications.
The major challenge we faced with our dissemination work was the short period of time between completing our final analyses (July 2015) and the end of the contract (September 8, 2015). We were unable to develop Webinars with key dissemination partners until we had a reasonably clear idea of our results. As our findings emerged from our analyses during the spring of 2015, we began reaching out to our partners. In most cases, they indicated that they would be willing to collaborate on Webinars during the fall, rather than during the summer. Hence, we worked to plan the Webinars and develop the necessary slides and materials during the contract period.