Background Report for the Request for Public Comment on Initial, Recommended Core Set of Children's Healthcare Quality Measures for Voluntary Use by Medicaid and CHIP Programs

Background Report on request for public comment on initial, recommended core set of Children's Healthcare Quality Measures for voluntary use by Medicaid and CHIP Programs.

Appendix A-8. Description of the Second Modified Delphi Process to Rate and Select Valid, Feasible, and Important Quality Measures

Delphi Round 2 for AHRQ SNAC Member

Members of the AHRQ SNAC are being asked to rate the validity, feasibility, and importance for a set of quality measures that include the following:

  1. Measures that had passing scores for validity, feasibility, and importance in Delphi Round 1. Modified criteria for validity, feasibility, and importance were established at the first SNAC meeting on July 22 and 23. In addition, substantially more information has been collected on several of these measures since the first round of scoring. As such, these measures require re-assessment by the SNAC.
  2. Measures that were judged to be "controversial" during scoring for validity and feasibility in Delphi Round 1.
  3. Measures identified through environmental scans but that were not included on the original list of measures scored during Delphi Round 1.
  4. Measures nominated by SNAC members, Federal partner agencies, and the public, between July 24 and August 24.

To accomplish this task, AHRQ staff members have provided the following materials as zip files enclosed with this mailing:

  1. A Delphi Round 2 scoring sheet;
  2. A one-page summary of key information related to the validity, feasibility, and importance for each measure you need to score;

In addition, they have provided access to the AHRQ SNAC Extranet,

where one folder of information exists for each measure, and a summary "guide to the webex" for your convenience. Everyone should have received a username and password to access the extranet. If you haven't please let Denise Dougherty know as soon as possible. The extranet folders provide substantially more detailed information collected on each measure. Some folders will contain more information than others. This is dependent on how much relevant information was available or provided for each measure by those who nominated them. These folders are provided for you to review if you feel you need more information than what was provided in the one-page summary before you make your scoring decisions. Reviewing all of this material is optional not required.

Please note that each measure has been assigned a control #, e.g., PHP-1. This number will appear on the scoring sheet next to the measure name, on the top of the one-page summary for the measure, and on the folder for the measure found on the AHRQ SNAC Extranet.

Please score each measure according to the criteria outlined below for validity, feasibility, and importance that were established at our meeting in July. SNAC members are asked to please return their scoring sheets by Sunday, September 13 to Alison Long at the Seattle Children's Research Institute via Email: If we do not receive your scores by Midnight (PDT) 9/13/09 your scores will not be factored into the analysis we will use to determine which measures pass Delphi Round 2. We realize this is an extremely tight time frame in which to complete this task, however, this was the only way we could allow adequate time for additional measure nominations after the July meeting.

Description of Assessment Criteria Agreed upon July 22-23, 2009


Validity is the degree to which a quality measure is associated with what it purports to measure (e.g., a clinical decision support system is a measure of structure or capacity; prescribing is a measure of a clinical process; asthma exacerbations are a measure of health outcomes).

A quality measure should be considered valid if:

  • It meets criteria for scientific soundness:
    1. There is adequate scientific evidence or, where evidence is insufficient, expert professional consensus to support the stated relationship between:
      • Structure and process1 (e.g., that there is a demonstrated likelihood that a clinical decision support system (a structural or capacity measure) in a hospital or ambulatory office leads to increased rates of appropriate flu vaccination in the hospital or practice).
      • Structure and outcome (.e.g., higher continuity of care in the outpatient setting (influenced by how appointments are organized) is associated with fewer ambulatory care sensitive hospitalizations, (e.g., hospitalizations for dehydration), or
      • Process and outcome (e.g., that there is a demonstrated likelihood that prescribing inhaled corticosteroids (a clinical process) to specified patients with asthma will improve the patients' outcomes) and vice versa (e.g. that if we measure quality as a health outcome measure there is sufficient demonstrated likelihood that the outcome can be attributed to either health care delivery structures or clinical processes of care or an explicit combination of both)
  • The measure itself is valid—that is, it should truly assess what it purports to measure

iStructure of care is a feature of a healthcare organization or clinician relevant to its capacity to provide health care. A process of care is a health care service provided to, on behalf of, or by a patient appropriately based on scientific evidence of efficacy or effectiveness. An outcome of care is a health state of a person resulting from health care.

Measures should be scored on a 9-point scale:

7-9 → Measure concept is scientifically sound and the measure itself is definitely valid (i.e., sufficient evidence of scientific soundness and measure validity)
4-2 → Measure concept has uncertain scientific soundness (i.e., insufficient evidence) and the measure itself has uncertain validity (may not measure what it purports to measure).
1-3 → Measure concept is not scientifically sound and the measure itself is not valid (sufficient evidence of lack of scientific soundness and invalidity of the measure itself).

Measures with a median validity rating (taking all submitted ratings into account) of 7-9 will pass and be considered in the final round of assessment at the September 17-18 meeting in Washington DC.


A measure will be considered feasible if:

  1. The data necessary to score the measure are available to state Medicaid and CHIP programs.
  2. Detailed specifications are available for the measure.*
  3. Estimates of adherence to the measure based on available data sources are likely to be reliable and unbiased. This allows for meaningful comparisons across states, programs, individual providers or institutional providers.
    1. Reliability is the degree to which the measure is free from random error.

Measures should be scored on a 9-point scale:

7-9 → Measure is definitely feasible
4-2 → Measure has uncertain feasibility
1-3 → Measure is not feasible

Measures with a median feasibility rating (taking all submitted ratings into account) of 4-9 will pass and be considered in the final round of assessment at the September 17-18 meeting in Washington DC.


During the SNAC meeting on 7/23, we worked to establish consensus on the criteria we would use to rank the importance of measures under consideration. To be considered important at least some of the following criteria should be met by the measure. The criteria are listed in order of decreasing weight as determined through a voting process by SNAC members on 7/23:

  1. The measure should be actionable. States, CHIP managed care plans, and relevant healthcare organizations should have the ability to improve their performance on the measure with implementation of quality improvement efforts.
  2. The cost to the nation for the area of care addressed by the measure should be substantial.
  3. Health care systems should clearly be accountable for the quality problem assessed by the measure.
  4. The extent of the quality problem addressed by the measure should be substantial.
  5. There should be documented variation in performance on the measure.
  6. The measure should be representative of a class of quality problems, i.e., it should be a "sentinel measure" of QOC provided for preventive care, mental health care, or dental care, etc.
  7. The measure should assess an aspect of health care where there are known disparities.
  8. The measure should contribute to a final core set that represents a balanced portfolio of measures and is consistent with the intent of the legislation.
  9. Improving performance on measures included in the core set should have the potential to transform care for our nation's children.

Measures should be scored on a 9-point scale:

7-9 → Measure is definitely important and meets several of the above criteria.
4-2 →Measure has an uncertain level of importance and meets some of the criteria above but fails to meet some of the criteria given higher weight by the committee (1-4 above).
1-3 →Measure fails to meet most of the criteria for importance outlined above.

Measures with a median importance rating (taking all submitted ratings into account) of 4-9 will pass and be considered in the final round of assessment at the September 17-18 meeting in Washington DC.

The Nine-Point scale

The nine point scale has been used for more than two decades at RAND in developing explicit measures for evaluating appropriateness and quality.i Essentially these methods require individuals who rate quality measures to place them into one of three categories (e.g., valid criterion for quality, equivocal criterion for quality, invalid criterion for quality) and each category can be rated on a three point scale to allow for some variation within category. The scale is ordinal so that a 9 is better than an 8 and so on. Because quantities (e.g., risk-benefit ratios) are not assigned to each number on the scale, the difference between and 8 and a 9 is not necessarily the same as the difference between a 5 and a 6. Explicit ratings are used because in small groups some members tend to dominate the discussion and this can lead to a decision that does not reflect the sense of the group..ii

For validity ratings, we use a more stringent level for the passing median score, i.e. 7-9, than we do for feasibility or importance ratings which require a median score of 4-9 to pass. The rationale for this difference is that feasibility and importance are more subjective assessments than validity. For validity, either the evidence exists to support the measure or it does not which results in relatively objective information being available to make this assessment. For feasibility, some states or CHIP programs may find a measure quite feasible to implement (due to their infrastructure, amount of available funding, etc) while others will not. Feasibility of measure implementation can also be field tested. If it is determined that a measure is less feasible to implement than initially assumed, the measure could be deleted from the core set. The importance rating is the most subjective of the three criteria and thus again, we choose to set the bar lower for the passing median score.

The Meeting on September 17 and 18

At the meeting we will only be discussing and considering measures that pass Delphi Round 2. We will work to fill in a balancing grid that will help us to track how well we are doing in terms of selecting a set of measures that is responsive to the intent of the legislation. This will require much discussion and many rounds of voting. The panel co-chairs, Drs. Mangione-Smith and Schiff, will lead this discussion of the measures. To facilitate the voting process, AHRQ has arranged for electronic voting to be available at the meeting. For those joining on the phone, AHRQ staff will talk with you off speaker phone and allow you to privately register your votes which they will electronically enter for you. Hopefully, at the end of this process we will have a parsimonious, balanced set of 10-25 measures that we can recommend for inclusion in the Core Set.

We want to thank you for your commitment to this important process and for taking the time to lend your expertise.

i Brook RH. The RAND/UCLA appropriateness method. In: McCormack KA, Moore SR, Siegel RA (eds), Clinical Practice Guidelines Development: Methodology Perspectives. Rockville, MD: Agency for Health Care Policy and Research; 1994.
ii McGlynn EA, Kosecoff J, Brook RH. Format and conduct of consensus development conferences: a multi-nation comparison, in Goodman C and Baratz S (eds), Improving Consensus Development for Health Technology Assessment. Washington, DC: National Academy Press; 1990.

Return to Contents
Proceed to Next Section

Page last reviewed December 2009
Page originally created September 2012
Internet Citation: Background Report for the Request for Public Comment on Initial, Recommended Core Set of Children's Healthcare Quality Measures for Voluntary Use by Medicaid and CHIP Programs. Content last reviewed December 2009. Agency for Healthcare Research and Quality, Rockville, MD.