A Report from the Subcommittee on
Children's Healthcare Quality Measures for Medicaid and CHIP Programs (SNAC)
By Rita
Mangione-Smith, MD, MPH, Associate
Professor of Pediatrics, University
of Washington, SNAC
Co-Chair
This article provides a brief overview and evaluation of the process used by AHRQ's Subcommittee on Children's Healthcare Quality Measures for Medicaid and CHIP Programs to identify the recommended core set of children's health care quality measures. It also suggests ways this process might be improved for similar efforts in the future.
Contents
Introduction
Process Used to Identify the Initial Core Set
Lessons Learned During this Process
References
Introduction
Title IV of the
Children's Health Insurance Program Reauthorization Act (CHIPRA; Public Law 111-3)
required the Secretary of the U.S. Department of Health and Human Services
(HHS) to identify and post for public comment by January 1, 2010, an initial,
recommended core set of children's health care quality measures for voluntary
use by Medicaid and Children's Health Insurance Programs (CHIP), health
insurance issuers and managed care entities that enter into contracts with such
programs, and providers of items and services under such programs.
In response to
this legislative directive, the Agency for Healthcare Research and Quality
(AHRQ) and the Centers for Medicare & Medicaid Services (CMS) signed a memorandum
of understanding giving AHRQ leadership responsibilities for identifying the
initial core set, working in very close partnership with CMS. CMS has the
authority for implementation of all CHIPRA provisions.
As
one of the first steps in the process of identifying the recommended core set
of measures, the AHRQ Director approved a charter creating the AHRQ National
Advisory Council on Healthcare Research and Quality (NAC) Subcommittee on
Children's Healthcare Quality Measures for Medicaid and CHIP (SNAC). The AHRQ NAC had agreed to provide advice to AHRQ and CMS to facilitate their work to recommend an
initial core set of measures of children's health care quality for Medicaid and
CHIP programs. To provide the requisite expertise and input from the range of
stakeholders identified in the CHIPRA legislation, the NAC established the SNAC.
The
SNAC included four State Medicaid program officials (from Alabama, Minnesota,
Missouri, and the District of Columbia) and one State CHIP official (from
Alabama). Other members represented Medicaid, CHIP, and other State programs
more generally (i.e., representatives of the National Academy on State Health
Policy, National Association of State Medicaid Directors, and the Association
of Maternal and Child Health Programs).
Representatives
of health care provider groups came from the American Academy of Family
Physicians, American Academy of Pediatrics, American Board of Pediatrics, the
National Association of Children's Hospitals and Related Institutions, and the
National Association of Pediatric Nurse Practitioners, and there was a Medicaid
health plan representative. The interests of families and children were represented
by the March of Dimes. Individual SNAC members provided expertise in children's
health care quality measurement, children's health care disparities, tribal
health care, pediatric dental care, substance abuse and mental health care,
adolescent health, and children's health care delivery systems in general. Two
members of the NAC also participated in the SNAC.
The
SNAC was charged with providing guidance on measure evaluation criteria to be
used in identifying an initial core measurement set, providing guidance on a
strategy for gathering additional measures and measure information from State
programs and others, and reviewing and applying criteria to a compilation of
measures currently in use by Medicaid and CHIP programs to begin selection of
the initial core measurement set. SNAC recommendations were to be provided to
CMS and the NAC, which in turn would advise the Director of AHRQ. The Directors
of AHRQ and CMS would then review and decide on the final recommended core set
to be presented to the HHS Secretary for consideration.
This
paper provides a brief overview and evaluation of the process the SNAC used to
identify the initial recommended core set of children's health care quality
measures and outlines how this process might be improved for similar efforts in
the future.
Return to Contents
Process Used to Identify the Initial Core Set
With assistance from
CMS, AHRQ staff identified a set of 77 measures that were currenlty in use by
Medicaid and/or CHIP programs. The next step was to decide on an evaluation
process the SNAC could use to assess these 77 measures. The SNAC co-chairs,
AHRQ staff, CMS staff and other representatives from the CHIPRA Federal Quality
Workgroup (members available at http://www.ahrq.gov/chipra/corebackgrnd.htm)
agreed that the SNAC should use the RAND/UCLA modified Delphi process to evaluate
the identified measures.1
When applied to
quality of care measures, the RAND/UCLA modified Delphi process involves a
series of assessments by a panel of experts, in this case the SNAC. The experts
are usually provided with standard definitions for measure validity and
feasibility and then asked to apply these criteria to each measure under
consideration. The measures are scored on a 1 to 9 scale for each criterion. Scores
of 7-9 mean the measure is considered highly valid and/or feasible, scores of
4-6 are assigned to measures with equivocal validity and/or feasibility, and
scores of 1-3 indicate the measure is not considered valid and/or feasible. These
measure assessments are first done individually at the panelists' home
institutions. This is followed by a group discussion of the measures in a
face-to-face meeting, after which panelists individually score the measures again.
The summation of this final set of individual assessments is used to determine
whether particular measures in the set under consideration are retained or
deleted. Explicit ratings are used to determine which measures are included in
the final quality measurement set because in small group discussions some members
tend to dominate the conversation, and this can lead to a decision that does
not reflect the sense of the group.2
To facilitate the
SNAC members' individual assessments of the 77 measures under consideration
prior to their first face-to-face meeting, they were provided with measure evaluation
criteria definitions for validity and feasibility before the meeting. Because one
of the main charges to the SNAC included providing guidance on the evaluation
criteria to be used in evaluating measures for the core set, it was clear that
the criteria definitions provided for the first round of the Delphi process
would need to be reviewed and would potentially change at the first
face-to-face meeting. Although this process was not ideal, it did facilitate a
round of quality measure assessment prior to the first SNAC meeting, and it was
felt to be necessary given the constricted timeframe in which the Subcommittee
had to complete this work. Doing this pre-meeting scoring also oriented the
SNAC to the Delphi method early in this process of measure selection, which
facilitated subsequent rounds of measure scoring and assessment.
When scoring the
measures for validity, the SNAC members were asked to assess the degree to
which the measures were supported by scientific evidence and/or expert
professional consensus, whether the measures supported a link between the
structure, processes, and outcomes of care, and whether the majority of factors
that determine adherence to a measure were under the control of the health care
organizations subject to measurement. For feasibility, the SNAC was asked to
evaluate whether:
- The data needed to assess the measures were readily available to health care organizations.
- The measures were currently in use (thus supporting their feasibility of implementation).
- Estimates of adherence to the measure based on available data sources were likely to be reliable and unbiased.
The
median scores for validity and feasibility were used to determine whether
candidate measures would be discussed at the face-to face meeting. Traditionally,
when using the RAND/UCLA modified Delphi process, all measures are discussed at
the face-to-face meeting regardless of their first round median scores. However,
this was not feasible given the time constraints under which the SNAC was working.
As such, measures with a median validity score of 6 or 7, a median feasibility
score ≥4, and a relatively wide distribution of scores across members (suggesting
little consensus among the group) were discussed by the SNAC. Forty-five of the
originally identified 77 measures in use by Medicaid or CHIP programs met these
scoring criteria and were discussed.
Refinement of the Measure
Evaluation Criteria
Refinement of the
measure evaluation criteria involved reviewing, discussing, and reaching
consensus on the definitions the SNAC would use for validity and feasibility
(including reliability) when evaluating candidate measures in future rounds of
the Delphi process (evaluation criteria definitions for Delphi Round II
available at http://www.ahrq.gov/chipra/corebackground/corebackapa8.htm).
In addition, importance was added as a third criterion, along with validity and
feasibility, for the SNAC to consider when evaluating potential measures. This
refinement process, although important and necessary, led to some inefficiency
and re-work related to identifying the recommended initial core set of measures.
Ideally, the SNAC would have had the opportunity to meet, discuss, and reach
consensus on the measure evaluation criteria definitions prior to doing any
individual measure scoring.
Other Steps and Decisions at the
First Face-to-Face Meeting
The SNAC's
discussion at this first meeting resulted in the recommendation that more
information related to measure validity, feasibility, and importance (VFI) would
be needed before any further consideration and evaluation of the measures could
take place. The SNAC also determined that a call for nominations of additional
pediatric quality measures in use (either within or outside of Medicaid and
CHIP programs) should be used to identify a larger set of measures to consider
for the final core set. AHRQ staff was also asked to identify VFI-relevant
information on the measures scored in Delphi Round I. SNAC members felt it was
important to open the nomination process as broadly as possible to other
stakeholder groups.
Ideally, the
decision to conduct a broad nomination process of quality measures in use both
within and outside of Medicaid and CHIP programs would have been made much
earlier in the process before any measure evaluation and scoring had occurred. AHRQ
had initially felt it was important to limit measure consideration to those
already in use in at least one Medicaid or CHIP program for feasibility issues
related to implementation. Nevertheless, the SNAC felt that it was essential to
broaden the measures considered to those in use by entities outside of Medicaid
and CHIP; otherwise, many valid, feasible, and important measures would not
have been considered for inclusion in the initial recommended core set. Thus,
after the first face-to-face meeting, the final decision was made to conduct a
broad measure nomination process.
Developing an Online Measure Nomination
Template
During the 2 months
between the first and second SNAC meetings, AHRQ staff worked to develop an
online quality measure nomination template. The measure nomination template
asked for key pieces of information that SNAC members would need to evaluate
the VFI of nominated measures. An ideal nomination would include the following
information on the measure: the numerator and denominator, scientific evidence
supporting the measure, evidence that the measure truly assesses what it
purports to measure, detailed measure specifications, evidence of the measure's
reliability, whether the measure addresses an area of care mandated for
inclusion in the CHIPRA legislation, and evidence of variation in performance
on the measure in different populations or organizations. Unfortunately, many
of the nominated measures as submitted lacked much of this information. AHRQ
staff and the SNAC co-chairs worked to fill in information gaps for several of
the nominated measures and for all of the measures that required reassessment
after Delphi Round I.
The AHRQ staff
worked to find measure specifications and information related to importance
criteria, e.g., evidence of variation in performance across insurance types or
racial/ethnic groups. The SNAC co-chairs performed focused literature reviews
to identify scientific evidence supporting links between structure, processes,
or outcomes of care for the nominated measures. They also assigned grades to
the level of evidence supporting the measures (Table 1) using the Oxford Center
for Evidence Based Medicine grading criteria (Go to http://www.cebm.net/index.aspx?o=1023). 
Table 1. Oxford
Centre for Evidence Based Medicine (CEBM) Evidence Grades
Evidence Grade |
Definition of Grade |
Definition of Study Types |
A |
Consistent level 1 studies |
Level
1:
Randomized controlled trials |
B |
Consistent level 2 or 3 studies or extrapolations* from level
1 studies |
Level
2: Cohort
studies; Outcomes research Level:
3: Case control
studies |
C |
Level 4 studies or extrapolations from level 2 or 3 studies |
Level 4: Case series |
D |
Level 5 evidence or troublingly inconsistent or inconclusive
studies of any level |
Level
5: Expert
consensus opinion |
* "Extrapolations" are where data are used in a
situation that has potentially clinically important differences than the
original study situation.
All of the
information for the measures supplied by the nominators, the AHRQ staff, and
the SNAC co-chairs was abstracted into one-page summaries for each measure (an
example of the one-page summary sheet is available at http://www.ahrq.gov/chipra/corebackground/corebackapa9.htm).
These summaries were made available to all SNAC members to review during their
next round of Delphi scoring.
The second round of
Delphi scoring included a total of 119 quality measures: the 70 measures that
either passed Delphi round I (25 measures) or were discussed at the first
face-to-face SNAC meeting (45 measures) and 42 new measures nominated after the
first meeting. While SNAC members had more information during their individual
scoring for Delphi round II, much of the needed information was still missing
(Table 2).
Table 2. Missing
Information Needed to Assess Validity, Feasibility, and Importance of the
Nominated Measures (N = 119)
Criteria |
Number of Measures |
Percent |
No specifications |
26 |
22 |
No reliability data |
59 |
50 |
Not in use |
29 |
24 |
No measure validation |
42 |
35 |
Evidence grade: | 18 | 15 |
|
—A | 58 | 49 |
|
—B | 3 | 3 |
|
—C | 26 | 22 |
|
—D | 14 | 12 |
|
No evidence/Unable to grade | | |
No information on variation/disparities |
76 |
64 |
Additionally, the SNAC members had 1 week to assess 119 measures,
which limited the amount of time that could be spent evaluating the merits of
any one measure in the set. Given these limitations, the SNAC adopted a
philosophy of "leaving an empty chair" rather than recommending quality
measures that were too weak or not enough information was available (Table 3).
Table 3. Areas of Care with Few or No Valid and/or Feasible Measures
|
Measurement Category
- Most integrated health care systems:
- Medical home.
- Integration with entities outside the traditional health care system.
- Duration of coverage—"churning."
- Availability of care.
- Specialty care.
- Inpatient care.
- Care for substance abuse.
- Mental health treatment.
- Health outcomes.
- Uniform, reliable methods to measure and identify disparities.
|
Of
the 119 measures evaluated in Delphi Round II, 65 were scored as being valid,
feasible, and important by the SNAC members. Due to the abbreviated timeline
and the need to identify a reasonable core set of measures (the SNAC's target
number was 25 measures for the core set), the initial plan was to discuss and
consider only these 65 measures at the second face-to-face meeting. However,
initial discussions at the meeting resulted in adding back five measures that
did not strictly pass the second Delphi round (i.e., those with high median
feasibility and importance scores [≥7] and median validity scores of
6 or 6.5 rather than the cutoff of 7). Thus, 70 of the 119 measures scored in Delphi round II were discussed and considered for the core set at the meeting.
The RAND/UCLA
Delphi process usually involves the experts re-rating the measures individually
after all discussions are completed. The scores of the panel are then
summarized and measures with passing median VFI scores would then go on to be
included in the recommended core set. However, given the large number of
passing measures in the initial phase of this round of Delphi scoring (54% - 65
of 119 measures assessed), it was unlikely that re-rating the measures after
discussing them would result in 25 or fewer measures being in the final
recommended core set. It was also important that the SNAC be able to recommend
a balanced core set in terms of the requirements of the legislation, with at
least some measures representing several different areas of care (e.g.
prevention and health promotion, provision of acute care, and provision of
chronic care). Thus, the SNAC agreed to use an alternative approach to further
assess the remaining 70 measures under consideration.
This alternative
approach involved a series of private votes using electronic voting devices to
further reduce the number of measures under consideration. The process involved
discussing and prioritizing measures according to legislative criteria and eliminating
over-lapping or redundant measures (e.g. there were multiple dental measures and
measures pertaining to healthy birth, including the prevention of premature
birth) that passed the criteria for VFI. This process resulted in 31 measures
for final consideration.
Getting to a Parsimonious and Grounded Core Set of
Measures
Three rounds of
voting were conducted in succession on the 31 remaining measures. SNAC members
could vote for their top 20 measures out of the 31 that remained. In round one,
members individually voted for their top 10 measures; in round two their next 5
measures; and in round three their final 5 measure choices. In the first round
of voting, measures received 3 points per vote, then 2 points per vote in the
second round, and finally, 1 point per vote in the third round. A priority
score was then calculated for each measure that represented the total points
assigned to that measure by SNAC members after the three rounds of voting. The top
25 measures according to final priority scores were retained for the final recommended
core set.
Return to Contents
Lessons Learned During this
Process
If AHRQ and CMS were
to embark on a similar process in the future, ideally it might be organized in
a different way than what was described here. With a short timeline, the order
in which the steps of the process are pusued is critical for efficiency. Because
the first charge to the SNAC was to identify measure evaluation criteria, this
is likely where we should have started. But even prior to this, an open
nomination process over 1 to 2 months (rather than 3 weeks) where various
stakeholder groups could recommend measures for consideration may have resulted
in a much richer set of measures for consideration during the process. That
said, the Federal Quality Workgroup, including AHRQ and CMS, had to balance the
need to consider a comprehensive set of measures in use with the need to
ultimately recommend a feasible set of measures (in terms of numbe rof
measures) for implementation by Medicaid and CHIP programs.
For similar efforts
in the future, more time and resources should be allocated to both the
evaluation of nominations and gathering of missing information based on those
evaluations. As much as we tried to "level the playing field" for the measures
under consideration, some of them had far more complete VFI information than
others. In some cases this occurred simply because there was not enough time or
resources allocated to gathering all of the missing information for the
nominated measures.
One advantage to the
short timeframe to complete this work is that it did result in the timely
recommendation of a relatively good set of quality measures for the initial
core set. The recommended core set is not perfect and neither was this process.
That said, the SNAC felt that it was critical to not let the perfect become the
enemy of the good. If we set our standards at a level that was too
aspirational, we would have had very few measures to recommend. By design, we
took into consideration the staffing, funding, and infrastructure that would be
needed to implement the recommended measures. In the end, if we wanted these
measures to have a chance of being implemented by Medicaid and CHIP programs,
we determined that the recommended core set had to be a grounded, parsimoneous
set of measures that were in use and thus demonstrated to be feasible to
implement. This may be a lower bar than we should have established for the core
set. Fortunately, CHIPRA provided support for advancing and improving pediatric
quality measures and called for priorities to be set to guide a new pediatric
quality measures program. This provides the opportunity to improve the core set
moving forward.
By critically
analyzing the process used to identify the initial core set of quality measures
for voluntary use by Medicaid and CHIP programs, we learned which parts of the
process worked and which parts need improvement. We hope similar processes of
evaluation and improvement of child health care will be stimulated by
implementation of the recommended core set of quality measures.
Return to Contents
References
1. Brook RH. The
RAND/UCLA appropriateness method. In: McCormick KA, Moore SR, Siegel RA, eds.,
Clinical practice guidelines development:methodology perspectives. Rockville,
MD: Agency for Health Care Policy and Research; 1994.
2. McGlynn EA, Kosecoff J, Brook RH. Format and conduct of
consensus development conferences: a multi-nation comparison. In: Goodman C,
Baratz S, eds., Improving consensus development for health technology
assessment. Washington, DC: National Academy Press; 1990.
Disclaimer: The views expressed in this paper do not
necessarily reflect those of the Agency for Healthcare Research and Quality (AHRQ)
National Advisory Council Subcommittee on Children's Healthcare Quality
Measures for Medicaid and CHIP Programs (SNAC), the Agency for Healthcare
Research and Quality, the Centers for Medicare & Medicaid Services, or
other components of the U.S. Department of Health and Human Services. The work
was supported by AHRQ Contract HHSN263200500063293 to teamPSA, with funding
from the Centers for Medicare & Medicaid Services.
Return to Contents
Current as of May 2010
Internet Citation:
Lessons Learned from the Process Used to Identify an Initial Core Quality Measure Set for Children's Health Care in Medicaid and CHIP. May 2010. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/chipra/lessons.htm