Summary of the Recommendations

Expanding Research and Evaluation Designs to Improve the Science Base for Health Care and Public Health Quality Improvement Symposium

On September 13-15, 2005, AHRQ convened a meeting to examine public health quality improvement interventions.

Overview of the Reports Back Sessions

The September 13-15, 2005, symposium was designed to elicit a collaborative exchange of ideas to improve the science base for health care and public health quality improvement research and evaluation. During the symposium, three breakout sessions were held to give participants the opportunity to generate recommendations. For the first and second breakout sessions, participants were asked to attend the group related to their area of expertise. The five homogeneous groups were: research design experts, peer reviewers and journal editors, research training directors, experts in disparities in health care quality, and users of quality improvement research, meaning stakeholders and evidence synthesizers. For the third breakout session, the purpose of the session was to generate overarching recommendations and, for this, participants were sorted into heterogeneous groups by the first letter of their last name. Each breakout session was led by a facilitator who was assisted by a recorder taking notes at a flipchart. Facilitators and recorders are listed at the end of this document. The facilitator and recorder conferred to produce a five-minute oral report for their session. The reports for the homogeneous breakout sessions are combined for the summaries below.

The charge to the breakout groups for the first breakout session was to recommend solutions to the challenges described in the presentations at the clinical microsystems level and, for the second breakout session, to recommend solutions to the challenges at the clinical microsystems and health systems levels. In addition, the second session's purpose was to note similarities in the first and second sessions' recommendations, and to discuss the roles for various players (journal editors, funders, researchers, end-users, and others).

In introducing this charge to the breakout sessions, Dr. Dougherty noted that this symposium might transform into a conference on developing better quality improvement (QI) strategies, but the field will not get there unless we do better QI evaluations. She noted that attendees might feel constrained during the first breakout session by wanting to address the issue of resources and by the discussions having been on the clinical microsystem, and that higher levels of the health care system and bigger challenges for design would be coming later in the symposium. Dr. Dougherty asked participants to address these other levels and to generate ambitious ideas, but to keep the focus on how to evaluate QI projects. We were asking for solutions to the challenges participants had heard and for big ideas.

Dr. Arnold Milstein moderated the first reports back session, and Dr. Shawna Mercer moderated the second session.

Return to Contents

Reports from the Homogeneous Breakout Groups

Disparities Recommendations

Dr. Chin and Dr. Dougherty reported that this was a passionate group that generated eleven major points:

What is the appropriate definition for disparities, what constitutes a significant disparity, and which disparities should we take action upon? It would be valuable to set clear expectations for this kind of work. The group did not come to a conclusion for this, but suggested that narrowing the gap between blacks and whites, for example, or raising the floor for the lowest measures of care might be ways to address disparities.
The group noted that it is hard to find control groups for disparities research.
We need to develop a conceptual model or logic model for disparities. We need to know what causes disparities, in terms of individual factors and larger organizational and societal factors, so that mechanisms can be identified and then tested in interventions. It is important to look at where disparities issues fit within QI. The Institute of Medicine (IOM) includes equity in its definition of quality, but equity has not been highlighted in other outcomes work. If the Joint Commission on Accreditation of Healthcare Organizations developed a definition of equity, it could be applied to hospitals. QI projects could be required to examine the reduction of disparities.
The U01 was suggested as a funding mechanism to establish a network of researchers to develop a logic model for disparities reduction within QI and outside of QI. (The National Institutes of Health [NIH] U01 mechanism is a cooperative agreement supporting a specified project. This is a type of grant in which NIH has substantial scientific and programmatic involvement with the institution receiving the award.)
Context and structure: there can be disparities within organizations or across organizations. Where the disparity exists and the investigators' model of disparities will affect what the intervention is, such as a provider-level intervention or an organization-level intervention.
Data requirements and research question requirements: should racial and ethnic identifiers be included in data collection? Journals and funders should examine whether disparities questions can be required aspects of projects. Another possibility would be requiring community-level needs assessment or a CBPR-type process for funded and published projects. We do not know what pay for performance effects on disparities will be.
"Bang for the buck" — one needs to look at difficult settings where there are fewer resources and be willing to take risks. There might be a good payoff from working in such settings.
We think that disparities happen in underfunded settings so we need to pay attention to these settings to reduce disparities.
The group would like disparities reductions reports from NIH's disparities activities.
Other suggestions for increasing a disparities focus within QI were to identify good examples of such work and give awards to people.
Finally, a new challenge was identified: how does public health disparities reduction interact with QI?

Return to Contents

Research Training Directors' Recommendations

Dr. Francis reported that this was a small but diverse group that included individuals involved in training predoctoral students, postdoctoral fellows, physicians, and non-physicians, along with a funder of training programs and one research fellow. This group made eight points:

We want to encourage diversity in the training environment and to engage a variety of practitioners, locations, and disciplines. Quality improvement involves messy, "real world" settings, and trainees need to gain experience in such settings, particularly those that may not have a robust infrastructure for data collection.
Quality improvement is a topic that crosses disciplinary lines, not a discipline per se. The output of academic training is an independent investigator versed in the methods of their particular discipline. This creates an immediate tension between the goals of the academy and the real-world needs of a health system or community for improvement.
There was considerable discussion of whom to train and when in their careers (e.g., pre- or post-doctorate) to train them. This may be better suited for continuing education after one has gained experience in a discipline. This training should include clinicians and practitioners and not just individuals who will engage in QI research.
Related to the issue of the proper time to learn QI research skills is how to fit this into one's career trajectory. Finding job security in a traditional discipline is challenge enough. How willingly will individuals seek out non-traditional trajectories, for which the return on investment is less clear? We mused that persons working in this area would need a "portfolio" as opposed to a "curriculum vitae" to be marketable—e.g., short resumes with attached case studies and testimonials about the impact of their QI projects. Some recommended, tongue-in-cheek, that this sort of research be conducted only after one gets tenure.
Pursuing this type of research requires a nurturing team and an environment of respect. Part of this is learning to be a good partner with the system you are studying and not overwhelming it with data requests and publications. Building trust means not publicly exposing the system's limitations without some mutually negotiated agreement. Sometimes the organizations with which you are working need to maintain control over the data and dissemination.
One wants to make sure that trainees do not become overburdened with service-related demands ("scut" in clinical vernacular). The needs of trainees should be balanced with the needs of the organization.
Organizational skills are every bit as important as technical and methodological ones in undertaking such applied work.
Just as this conference presented QI implementation case studies from both health systems and community-based perspectives, we need to recognize the overlap of public health and health systems research.

Return to Contents

Design Experts' Recommendations

Dr. Leviton and, on Dr. Francis Chesley's behalf, Dr. David Introcaso reported that there is no way to do justice to the richness of the discussion among the design experts. The challenges that were identified by the group included the following points.

We have difficulty with nested question and theories that apply to the levels of explanation. These range from the local market for insurers and payers to institutions to units and microsystems to patient level characteristics. Currently, the theories are more elaborated for the individual behavior level than they are for organizations and larger systems.
There are multiple decision makers with vastly different needs for information about QI studies, ranging from policy makers to specific practitioners. The gold standard of the RCT needs to be matched by something that is legitimized as an alternative and that captures the dimensions of interventions that we feel to be most important.
Even for systematic reviews, people find that they do not know how to classify various types of interventions. For example, in diabetes interventions, one can look at case management. On its own, case management does not make a huge difference, but when one looks at specifics such as a nurse providing different medications, this part can make a big difference in the whole intervention.
Another challenge is that there is always an interaction between the specific study and the context in which it was conducted, and we need to understand the interaction and do further testing to verify impressions.
There were many other issues the group considered to be important, such as the unit of analysis.

The key recommendations from the group were that:

There was general agreement that we need to change the research platform for this type of study to have a better marriage of practice (including decision makers for different settings) with the research question.
We need a more agile set of funders that can take advantage of natural experiments or fund major platforms of research so that many types of questions can be asked of one data set.
We need to legitimize the discovery process and work on lines of inquiry that can generate hypotheses and then test them.
We need to brainstorm about the mechanisms by which QI occurs, and we need to specify mediators and moderators of changes in quality.
We need to brainstorm about new methods for the study of mediators and moderators.
We have some ideas about important elements in QI studies, such as the idea that high staff turnover will impair quality improvement efforts.
There was much discussion of a compendium or toolkit or toolbox. There is great cumulative knowledge, but we are not fully exploiting what has already been done.
There was discussion of the issues concerning systems and process and leadership. Study design needs to include data collection on not only the diagnosis and treatment of a disease but also on social interactions, such as between patient and provider.
We need to be more interdisciplinary. We need to be appreciative of Greenhalgh et al.'s work on innovation processes in this reference: Greenhalgh T, Robert G, Macfarlane F, Bate P, Kyriakidou O. Diffusion of innovations in service organizations: systematic review. The Milbank Quarterly 2004;82(4):581-629.
The group noted the need for a shared taxonomy/glossary and databases.

Return to Contents

Peer Reviewers' and Journal Editors' Recommendations

Dr. Wagner and Dr. Molla Donaldson reported that this group made a series of observations and recommendations.

There was strong consensus on the need for a taxonomy/glossary of QI terms which would allow, for example, for labeling types of QI projects and would include descriptions of phases of translational research. Such a taxonomy will be important for reviewers and users of QII.
The group discussed the challenges for researchers in getting their grants funded and their papers published. On a more detailed level, one needs to be able to convey in grant applications enough detail about the system changes to be implemented that reviewers can assess their value while allowing for the adjustments in design that will inevitably occur.
There is an unavoidable tension between presenting both quantitative and qualitative research methods in an understandable way for reviewers who are not familiar this them and fitting this description within the 25-page limit in grant applications and the 2,700 word limit of some print publications.
Researchers need to be more aware in writing proposals and manuscripts of the multidisciplinary perspectives of grant and manuscript reviewers. Indeed, papers in this field require transdisciplinary review, perhaps by editorial committees. Dr. Wagner joked that the group briefly considered genetic engineering of reviewers.
Two challenges are of particular importance in developing this field as a science. First, most QIIs take place in a limited geographic or organizational context. In both proposals and manuscripts, researchers must address the generalizability of the study design and findings. If successful, is it sustainable? Whether successful or not, what interventions, and in what combination were important, and which were not? To which populations, conditions, or organizations might these interventions apply?
Second, a science of QII, like other sciences, needs to build a body of evidence about effective and ineffective interventions and the conditions in which they have been used. The beauty of randomized controlled trials is, of course, that potentially confounding variables are either eliminated by exclusion and inclusion criteria or differ insignificantly in control and experimental groups. In addition, a very restricted set of interventions are held constant. These two conditions, however, are not always (or even usually) possible in QII. This means it is critical to address variations in populations, organizations, and evolving implementation strategies in proposals and manuscripts.
In order to build a coherent evidence base, researchers also need to put new projects and new findings in the context of previous findings. One suggestion was that funders and peer reviewers could do more to promote grant proposals and papers that would contribute to cumulativeness.
There is a need for broader inclusion of users in the planning of improvement projects, in study sections, during studies, and in the editorial review process. Requests for applications (RFAs) and requests for proposals should reflect the broader view of what the end users seek to achieve by these projects.
Researchers also need to be more aware of the final end-users of the research; that is, that particularly in this field, publication is not an end in itself or even just a step in further research. The purpose of QII is to discover and deploy useful tools for improving health care.
The usual brief summary of methods in publications may not meet users' needs. Users who want to implement change may want and need more detail, but space limitations exist, particularly in peer-reviewed print journals. Possible solutions are splitting articles, online appendices, and splitting write-ups across journals on different topics. One wonders whether it is better to lump into one article or split one's findings into related publications. Other suggestions were that individual articles include the role of theory and how to operationalize theory in an intervention. In addition, articles could include more on an intervention project's history and on the theoretical constructs and assumptions underlying the development of the specific interventions.
It also would be valuable to have a publication vehicle to describe early results. Perhaps this could be covered in a special journal section dealing with works in progress. It would be valuable for authors to explain how early studies led to the design of the intervention that was used, to include information on process evaluation, and provide guidelines for others in testing or replicating an intervention.
Two different kinds of databases could be very useful. The first is a registry of ongoing studies. Perhaps it would make sense to make registration of studies a condition of publication. Second, it would be valuable to have a data repository of published studies. With appropriate agreements, such data might be used for secondary analysis.
A journal's impact factor is important to publishers. The impact factor is based on the number of citations a paper receives, which is a measure of a paper's use to researchers. Might there be a revised (or separate) impact factor that would reflect the study's impact on practice or policy? Similarly, might there be an impact factor on outcomes and on decreases in disparities?
Finally, Dr. Donaldson noted that there is a lot of focus on genomics now; in fact, there are 83 "-omics" terms indexed in PubMed. Dr. Donaldson joked that if you want to capture the attention of funders and policy makers, it might be good to change the name of the field to "improve-omics."

Return to Contents

Users' Recommendations

The users group included practitioners, managers, people at the front lines of QI, evidence synthesizers, and people representing patient advocacy groups. Dr. Atkins and Dr. Briss reported on these sessions. The overriding theme of the first breakout session was to encourage paying attention to the issue of spread. In other words, it is important not only to look at whether an intervention works, but also to examine what it takes for something to spread and take hold in a broader environment.

The first recommendation was to recognize the importance of engaging clinical users as well as patients and communities in framing questions and interpreting results. PBRNs have an important role in generating clinically important questions from the practitioner level.
It is important to realize that the question is not what works, but what works for whom. Understanding effect modifiers is important. This challenge of assessing adaptability and generalizability is more complex in QI research than in many types of clinical studies and creates challenges in assessing the extent to which work will translate across specific populations and settings.
The group recommended creating a toolbox to provide not one fixed intervention package but a more flexible set of options that users can adapt to their skills, needs, setting, resources, etc. One challenge is that we need a better shared taxonomy so that we can determine common components among interventions that would allow users to more consistently assess lessons learned. Toolboxes would make it easier to replicate and implement best practices.
Several recommendations were related to spread. We need to understand and support research networks so we can understand how intervention spread works. We need to understand how to support a network in terms of technological needs, but especially in terms of the social aspects of what makes a network or learning community work. We need to understand the incentives that affect willingness to participate in research and to undertake change. These are incentives that relate to patients and community as well as practitioners. The current model for dissemination and translation in terms of journal articles and online material is not adequate for promoting spread. We need additional options. Maybe we need better case studies (something like what one sees in the Harvard Business Review) or face-to-face interaction in the collaborative model. This needs to be built into education and training. In other words, "willingness to change" needs to be built in to clinical education. We also need to train researchers on how to work with practices and communities. This will require an understanding of how to build this capacity and to consider who is going to be responsible for building this capacity. Would it be government, funders, or the health systems themselves?
We need partnership building leading to partners/networks that survive across projects. We need to engage users during projects and understand the perspectives of various actors.
We need to develop conceptual models (and small "t" theories of change, which address how these interventions work and what one expects to change). Theories need to be grounded in the reality of implementation and to reflect that these efforts are not a grand, academic exercise.
We need to address data collection and synthesis issues (i.e., what works and for whom). We need better measures of outcomes and better data on intermediate variables, economic costs, and measures of potential harm.
We need to capture and share information on harder-to-get-at information such as tacit knowledge and experiential learning (i.e., the "grey" information that does not usually get into publication).
Communications need to be segmented according to intended audience.
There need to be incentive structures that encourage everyone to participate and that encourage researchers to disseminate.

Discussion period after the first set of reports back

Moderator: Dr. Arnold Milstein

Comments during this session focused on the causes of observed disparities in care, legitimacy of available designs, and the role of journals in QII evaluation dissemination.

A participant asked whether researchers had measured patient preferences to distinguish between differences due to disparities and differences due to preferences, particularly in regard to surgery and antibiotics. To address disparities in a meaningful way, there is a need to identify effective and efficient ways to measure preferences. Dr. Dougherty responded that a literature review by the IOM found that patient preference is not one of the strongest predictors of disparities.

One participant noted that within the design experts' group, there was discussion of increasing the legitimacy of a variety of methods. Often, there is skepticism about non-RCT methods. He asked if members of the peer review group had considered what would make research proposals or the submission of papers to journals with non-RCT methods more acceptable and what it would take to convince people that there is value in these methods and that the findings are generalizable and real. Dr. Wagner indicated there has been a certain degree of progress in this area in terms of nonrandomized studies and studies with qualitative components not being automatically rejected, but the medical community still needs to pay attention to how submissions are written in terms of including detailed description of the methods. He recommended being as numerative as possible when writing up qualitative results.

Another participant encouraged others to write articles that include both qualitative and quantitative components. Annals of Family Medicine may start a format in which the details of a project are in an online appendix and the printed article will focus on the lessons learned. They may also have sets of three companion papers, including a quantitative article, a qualitative article, and a synthetic article on lessons learned across the qualitative and quantitative methods. The journal is exploring these options. However, if one is going to pursue mixed methods, one needs to do a good job on both the quantitative and qualitative pieces and bring them together well. Others noted that there are good exemplars of in other fields, such as criminal justice and education.

A participant asked if the field is "letting the journals off too easy and the users off too hard." To do both qualitative and quantitative research well within the same project, one either has to have a great deal of money or ask small questions which can be addressed rigorously, but which may not be as important. She asked how we can benefit from smaller-scale projects that address important questions but do not have a large enough budget to have the highest level of rigor. This is what important user communities are asking: do these projects have to stay outside journals and the learning sphere?

Return to Contents
Proceed to Next Section

Current as of March 2009

Internet Citation: Summary of the Recommendations: Expanding Research and Evaluation Designs to Improve the Science Base for Health Care and Public Health Quality Improvement Symposium. March 2009. Agency for Healthcare Research and Quality, Rockville, MD. https://archive.ahrq.gov/news/events/other/phqisymp/phqi6.html