September 17, 2009: Morning Session (continued)
Rita Mangione-Smith: Right, so my job is to remind us of all the work that we did back in July.
What I'm going to start out with is reminding you just—and not that you all are not completely aware of this, but this is also for our more public audience, to review what we agreed to as the scope for the core measurement set. This brings us back to what we really worked hard to come to consensus on back in July. We really need to have a realistic view of what's possible in terms of staffing and funding needs for collecting, analyzing, and reporting the available data. That's part of the grounded piece. We've all worked so hard on this that it would be a shame if we end up with a set of measures that nobody can use, and I think it's very, very important for us to keep that in the forefront as we move forward the next day and a half.
We said we would make a comprehensive effort to find good measures for all service categories, duration of enrollment, and other aspects of care required by the legislation. However, if no good measures currently exist for a given aspect of care, we were pretty firm that we would leave the chair empty, right? If there's a bad measure, it's better to leave it empty than to fill it with a measure that we're not convinced meets our criteria.
We would include measures not currently used by Medicaid and CHIP. That was a big change in direction last time and resulted in a lot of work, but a lot of good work, I think, and we've gotten a much stronger set of measures to look at as a result of that. And we would choose measures that are actionable. It's very important that if you're going to be measured on something, that there are clear steps to take to improve on your performance for those measures.
So just to remind you, these were our validity criteria that we set out. The measures needed to be supported by scientific evidence or, where evidence was insufficient, by expert consensus opinion. We really wanted to try very hard to take every measure we were considering and seek out and find whether there was any evidence for links between structure and outcomes of care, structure and processes of care, or processes and outcomes of care. And just to remind you, a couple of examples of that structure and outcomes looking at, well, child care adherence and outcomes such as: if you have good adherence to well-child care according to the AAP (American Academy of Pediatrics) schedule—does that prevent ambulatory care sensitive hospitalizations or ER (emergency room) visits? Structure and process of care—if I've got a decision support system within my office, does that help me to give influenza vaccine to more of my patients? Process and outcomes—if I keep my persistent asthmatics on controller medications, we know that improves their outcomes. So as we prepared those one-page summaries that I'm sure you're all tired of looking at, at this point, we tried very, very hard to find any evidence of those kinds of links for the measures we were looking at.
We wanted the measures to represent aspects of care. They're under the control of health care providers and systems. And very importantly, this is really the true core of validity, the measure should truly assess what it purports to measure. And we wanted to also consider measures supported by evidence from unpublished data, and where we were able to get access to that, we did try to include that for you to take into consideration when you did your recent Delphi process.
Feasibility, this is probably to me one of the most important criteria. The data necessary to score the measure must be available to State Medicaid and CHIP programs. It's not reasonable for us to ask States to make this part of their core measurement set and to voluntarily report on them if they can't get access to the data that they need to do the measures appropriately. And along those same lines, we wanted detailed specifications available for the measures that would allow for reliable and unbiased scoring of the measures across different States and institutions. So if I measure the same thing in Florida and California, I'm truly measuring the same thing.
Importance, we spent a lot of time on these criteria. And if you remember, we had a voting process last time to think through what our importance criteria were. These are listed in order of descending importance.
The measure should be actionable. The cost of the condition to the Nation should be high. The health care system has to be clearly accountable for the quality problem that we're assessing with the measure. The extent of the quality problem should be substantial. There should be documented variation in performance.
I just want to take a minute to give credit to Denise and her staff. They worked long and hard trying to find performance variation information so everybody would have a sense of that.
The measure should be representative of what we call the class of quality problems, a sentinel measure of quality of care. For instance, for preventive care, mental health care, you noticed as you went through to do your Delphi process we have a little bit of repetitiveness in certain categories of measures, and I think one of our tasks over the next day and a half is to really say when we've got that repetition, which one do we really want? Which one do we really think is the sentinel best measure that we can use for this given area of care?
The measure needs to assess an aspect of health care where there are known disparities, and again, Denise and her staff looked long and hard to try and find information about disparities on the measures we were considering, any data that were available. And the core set should represent a balanced portfolio of measures and be consistent with the intent of the legislation. This piece I really have to give Jeff credit. He spent an incredible amount of time putting together what you're going to see later, which we're going call our balancing grid. I'm going to call it the Jeff Schiff grid because, frankly, he put hours into thinking about what that should look like. And it's a grid to really help us as we go through this process to make sure that we are seeing balance across what the intent of the legislation was.
And then improving on performance for this core set of measures should have the potential, we hope, to transform our Nation's—the care for our Nation's children. That's a very lofty goal. I think that's why it's down at the bottom. I think we do all think it's important, but I think we're all grounded and realize that this is a starting point, and that goal is probably going to take more than just us making a recommendation about a core measurement set.
And we also said we're going to be very transparent about the level of evidence supporting the measures that we end up with in the core set and about the burden for collecting the data. Would it be low, moderate, or high? And we tried, again, to really give you a good objective view of what the level of evidence was supporting the various measures as you went through to access them.
So what have you all been doing, right, or what have we been doing? We've been doing this Delphi process, right? And as you know, 7 to 9 on that 1 to 9 scale was an indication that we thought measures were definitely valid or feasible or important, okay? And 4 through 6, we were uncertain about validity, feasibility and importance. And 1 to 3 represented the decision that no, based on the information I've been given, this is not valid, it's not feasible, and it's not important.
Today, we are going to be looking at measures from our second Delphi process that had a median validity score of greater than or equal to 7 and a median feasibility and importance score of greater than or equal to 4.
So round one of Delphi, we used some established criteria that have been used before at the RAND Institute in California, but as we just discussed, we expanded those criteria at the last meeting, so that's why we re-scored some measures we had already looked at before.
The four groups that we just assessed—I'll have you all know you assessed 120 measures in case anybody was wondering what the count was. We did measures that had a passing score for validity, feasibility, and importance in our first round of Delphi scoring back in July. And, again, because we had expanded the criteria, we wanted to redo those measures, and we really made a commitment to try and to get much more rigorous information about those measures before we re-scored them. We also re-scored measures that we judged as controversial during the last scoring session in July. Then there were additional measures that were identified through environmental screens by Denise and her staff, but those had not been identified before our first meeting, so those were included. And then measures nominated by members of this subcommittee, Federal partner agencies, the public, that all happened within the 1-month period between July 24th and August 24th.
So here's how it turned out. We passed through the Delphi II process—second round—65 of the 121 measures, and this is how it shook out: so for prevention and health promotion, we passed slightly over half, 27 out of 50; for management of acute condition, 16 out of 25; chronic conditions, 18 out of 31; and down the line you can see how things went. We really, pretty uniformly as a group, if you look at the scores, nobody felt that the most integrated health care system measure that we had to consider or the availability of services measures that we had to consider were passing in terms of our criteria. There was one health status measure that was put up, which also did not pass, and there were two duration-of-enrollment measures that were put in, and one of those two did pass the first round.
These are the main things that we're going to be looking at on that balancing grid, okay? So the following constructs are the ones that we really want to make sure we're being true to as we whittle down that 65 to what we hope will be around 25 measures. But then, again, that number is debatable and something that we'll all be discussing, I'm sure.
So for the ages, we went the full age range of children to be represented in these measures. We would like measures for which there are known disparities in care, ones that addressed the different sites and types of care, the care continuum in terms of outpatient measures and inpatient measures. The system continuum, we'd like some structural measures, some process and outcomes. We probably won't get any true efficiency measures. There were a couple—I just want to put this out right now—a couple of measures that were called efficiency measures which in reality they really think are outcome measures. They were the FEC measures, and I won't worry you with why they—we checked off efficiency, which was probably not appropriate.
And then entities using the measure, we really want to track that to make sure that the measure is in use and has some experience being used, as well as information on the data source. So we would like some variety in terms of measures that require administrative medical records or survey data.
Okay, so I'm going to pass this back over to Jeff. So that's what we've done so far, and now we're going to talk about what we need to do still.
Jeffrey Schiff: We're almost done talking. I always think the beginning of these meetings is hard because everybody's quiet. What I want to do right now is just spend some minutes talking about what we've envisioned as the process for this meeting. And I think what we've tried to do is create an equitable engaged process to get to where we need to go.
So, in preparation for this, we have had this nomination that we've sort of called the VFI (validity, feasibility, importance) process. I was the co-chair that was less involved in this, and I really want to again thank Rita and Denise for shepherding all these through. And I think that, hopefully, has succeeded, but I think it really was to get to the point where we had the same information for every measure so that we could actually know what we were voting on. Denise, if I can call on you very quickly, I just want at this point for you to acknowledge your staff people who worked with you if you can.
Denise Dougherty: I think it's—sure, yes. [Inaudible]
Jeffrey Schiff: So I think these guys were called to jump on a moving train and they did. But anyhow, then we voted. So the first thing I want to do is—so we've reviewed this process, and now we have this set up, and I want to walk through this, and we'll take some comments about this.
The first thing we envision doing is discussing and getting some affirmation of what domains or the constructs on this balancing grid that Rita just talked about. We'll also go—we'll also spend some time discussing those to make sure that we agree that these should be all filled in.
The next thing we want to do is we want to spend a little bit of time reaffirming what we talked about last time, this core grounded parsimonious set of measures to make sure we're all talking about that. We'd like to then spend some time, hopefully later this morning or this afternoon, to see how we're doing—to discuss the measure by legislative categories, so the category of prevention, health promotion, the category of acute care, the category of care of chronic conditions, and talk about those. Within each category, we'd like to rank each category or subcategory for inclusion into the set. Our proposal is to retain, at least initially, the top third of each category by vote. And we'll round up, so if there are five in a subcategory, we'll take the top two, but just know that we'll get to that.
We will hopefully by the end of the day then get to—go from our 65 measures down to a smaller set of measures. With that smaller set we will actually try to create a new balancing grid before or at dinner for folks so that they can actually look at this and see what's happening. So we'll hopefully confirm by the end of today that we have adolescent measures, measures for infancy. We'll confirm that we have measures that are collected both from the inpatient setting as well as outpatient as far—as well dental. So we'll—the idea of the grid is that you'll get a look down each column and say, "Look, we have five infant measures, here and we only have one adolescent measure. Do we really want that to be the case?" Or that we have all inpatient measures, and that's way too much weight on inpatient, and we want more ambulatory measures, whatever. So the idea of the grid is to get us some sense of where we're going, going forward.
Hopefully, if we get to that point by the end of today, I think we'll all feel good. This is I think pruning. I keep on thinking of pruning my bushes at my house, and you have to start cutting sometime, and then you have to step back and see whether or not you've created this big hole somewhere and gap and hopefully where in the end it all looks the same. So tomorrow, I think, is really about trying to create a unified whole. We'll have some time tonight to sleep on it. In our dreams, we will hopefully do the starry night thing and get our—and realize where we're at with this. We'll review the balancing grid tomorrow, hopefully. We'll identify and acknowledge holes, so we may look at this and say, we really never got to the mental health measures we want, are there things—are we okay with that, and we'll acknowledge that this is going to be a gap or whatever gaps we have.
What we'd like to do at that point is we'd like to vote on or rank the remaining measures as a whole set, and the way we envisioned doing this is we will give everyone probably 10 first place votes, five second place votes, and five third place votes, and we'll see what we end up with. And then what we should have, we'll have to take a break at this point, we should have a balancing grid that will have a final set of numbers and we—I think what we have envisioned doing is saying if we include only the 10 top vote getters in this core set, how does that fit on the balancing grid, are we happy with that?
What if we include the next five, and we'll probably try to stop at 25, okay, so we would eventually get to the point where we'd actually have ranked them and figure out what our cut point is, what we then say is the number for the core set. Then we'll vote for that set size; hopefully, we'll take a big break there.
And then I think at that point we really need to sort of process this as a group and see what comments we want to make to make sure they get included in this, about our process, what we did as far as if we have recommendations for AHRQ and CMS (Centers for Medicare & Medicaid Services), what needs to get worked on in the future, comments about the process. So that's sort of how we've envisioned this day and a half going. Jon.
Jonathan Klein: Just a sort of procedural question here—as we go forward, and you may want to make a decision about this or put this off to later, as we look at measures, say for example well-visits at early childhood, middle childhood, adolescence, are we going to consider that a single measure with a different way of assessing it at different ages, or are we going to have to think of that as three measures?
Rita Mangione-Smith: We need to think of that as three measures. We're really pretty committed to every line on that grid of what you voted for being considered as a singular measure.
Jonathan Klein: Because I think that—and this may be more of a question when it comes to the agency and CMS's implementation strategy, conceptually to say that children having the appropriate set of well-visits is a single way of thinking about a quality metric as the utilization piece, and so if we have to spend 3 of our 10 or 25 measures to get that at different ages, I think that has some implications for how we look at the breadth of the measures that we come up with.
Rita Mangione-Smith: I think the rationale for separating them when we went into the Delphi II process—they were actually separated in the original process also—was that the levels of evidence available for the effectiveness of those visits are variable by age range, so we did think we needed to look at them separately from that standpoint. In terms of do we want to spend three measures to get all three, I think we have to take the whole VFI criteria into account for all three to say are they of equal weight, and should they all be included, and if that's the decision, then I do think we spend three measures on them.
That's our way of saying these three are all important enough to be on the core set, and maybe that means we end with 30 core measures rather than 25. I think that last day process where everybody puts up their first 10 and then their next 5 and then their next 5 and we see like what those different cut points get us. I think I'm a little bit concerned about the fidelity of the process that we've been through so far if we now start saying we're going to lump stuff together, and that they don't really stand individually, and we don't need to think about them as individual measures.
Female Voice: I think one of the issues for me would be what do the State programs feel? Are these three—are these measures three different approaches to collecting the data? If so, it's really three measures, and we're not going to get past that. We may have to do it differently.
Cathy Caldwell: I just had a question about the ones that passed. It looks like caesarian section is on what I printed off but not on this one, and there are a couple of ED measures on this, but I didn't think any had passed.
I'm looking at passed, and there's 12 under prenatal and perinatal, but only 11 here; and it looks like C-sections are gone —
Rita Mangione-Smith: You're right, it's missing.
Cathy Caldwell:—you've got ED in here, but this one had no ED measures—
Rita Mangione-Smith: The balancing grid was the very last thing that you were working on, and as you can imagine, given 1 week, it was quite a scramble to get it together.
Cathy Caldwell: So this one is the definitive one.
Rita Mangione-Smith: This is the definitive one as far as passed and failed—
Jeffrey Schiff: So actually there's—we really have first the passing measures from the Delphi that was completed on Monday, and then the balancing grid should reflect that, even though if it doesn't, that's okay.
So, let me finish up. I want to take any other questions people have just about the process. Does this process seem logical and okay to folks? It makes sense?
Rita Mangione-Smith: If we can do the same thing we did before, if you have things you want us to talk about, turn your card up for us.
Jeffrey Schiff: Right, that would be great.
Cathy Caldwell: It was only to raise the question that I put in my E-mail last night which is are we at all open to—or shall we just say no and let it go at that—to reconsidering the six or so measures that would have made it except that they were scored as 6 or 6.5 on validity because in a few cases if they had been scored as 7, they would definitely outrank measures that passed—the ones that I identified were scored as 7 for feasibility and importance. And we have some measures that got a 7 on validity and were a 6 or even 5 on either feasibility or importance.
And I think those six or so that I came up with, I would like to put on the table that we revisit whether to consider them because it would have taken only maybe one vote to have shoved them up into a 7 for validity. They looked to me to be important measures at least to consider the first time around.
Jeffrey Schiff: For people's information I'm going to just tell you the numbers of the measures that Cathy was talking about and I can—I have sort of the general category as well but I'm not going to have the specifics. They are preventive—prevention health promotion, 26, which is percent low birth weight; 27 is a postpartum care measure; 11 is an adolescent well-child measure; 22-A is on oral health, I think one dental visit measure; 35 is an oral access measure; and there's an acute care measure, number 10 which is ER utilization.
So we have—let's talk about this a little bit and see where people are at as far as whether we want to add these back into consideration. And we as we've talked about this, we want to honor the wishes of the group, we also want to make sure we don't get into a slippery slope of going—
Rita Mangione-Smith: Yeah. I feel like we worked really hard to hammer out what we thought was good process to go through, and I want to stay true to that process, but if it is the overwhelming consensus of the group that those should be put back on the table, we're certainly open to that. In fact, when I saw 6.5, I thought, should I round up or should I just leave them at 6.5?
Jeffrey Schiff: Jim.
James Crall: Well, I'll just reiterate, I jumped in on the E-mail last night, in terms of Cathy's, I would like to see them reconsidered.
Female Voice: I kind of agree with Cathy, and the reason I do is because I think we want to be ultra careful here. If that means a little bit of redundancy and revisiting some of these right on the edge measures, I think it's better to be cautious and thorough than to say well, in the interest of time, we just pushed on. Moreover, there's at least one of these measures that has kind of an ancillary set of issues around it that I think it would be useful for folks to know, and that's the second one that you mentioned, number 27, the postpartum care visit.
There is a brand new program that the President has proposed for postpartum care home visitation, and there's quite a lot of money associated with it, and it's moving as part of health reform in both the House and the Senate. Now, who knows where we're going with health reform, but the point is that there's a lot of mandatory money resources associated with that initiative, in addition to the fact that there are many States right now that have smaller programs that support this kind of postpartum care visit. So I just think the landscape is moving in that direction, so on that particular measure, it strikes me that the time is right.
Jeffrey Schiff: Other comments on these measures?
Rita Mangione-Smith: Jeff and I talked about this actually, trying out our little voting thing, that could be our first vote as a group, as to whether or not to put these back on the table or not.
Jeffrey Schiff: Okay, Cathy.
Cathy Caldwell: Jeff, could you repeat which measures they are. I didn't get them all.
Jeffrey Schiff: Oh, sure. Oh, there's a caller on the phone too, okay.
Rita Mangione-Smith: Yes. There are Lisa Simpson and Paul Miles on the phone.
Jeffrey Schiff: Okay. So Lisa or Paul, we'll get to you in a second. Just to review, I'm just going to give you these—the first are all prevention and health promotion and there—I'm going to give them to you in the order that I got them, which is really how they are on the sheet, how they would be added: 26 is percent low birth weight; 27 is postpartum care; 11 is adolescent well-child check; 22-A is in oral health, one dental visit; 35 is an oral access measure; and then there's an acute care measure, number 10 which is ER utilization.
So we have somebody on the phone. Do we have a way to patch them into the speaker system? Are they—go ahead.
Cathy Caldwell: I think there was also one—AS1 on access to primary—I mean it falls into my category which is it was a 6 on validity and 7 on both feasibility and importance, so that was AS1.
Jeffrey Schiff: AS1. Okay. All right, we will add that to this list as well. Now, on the phone, we're listening.
Lisa Simpson: I'm Lisa.
Jeffrey Schiff: Hi, Lisa.
Lisa Simpson: Good morning everyone.
Jeffrey Schiff: Lisa, do you have a comment or question about adding these measures back in?
Lisa Simpson: No, nothing, thank you.
Jeffrey Schiff: And is Paul here? Paul Miles, are you on the phone? So, please just raise your hand if you want to include these measures back into the set.
Jeffrey Schiff: Okay, so we have affirmed that we're going to add these back in.
Male Voice: The other question I had, and this may be something for the framing discussion, I think there is a second section to the legislation beside the 401 section that we're sort of focusing on the quality measure piece for and that has specific requirements for retention data and access and CAHPS® (Consumer Assessment of Health Providers and Systems) measurement. And so when we think about balancing, we may want to think about whether we're including things that score well and that we all like, but they're already required reporting.
And so I just want to put that out on the table for us to think about because, again, if we're going to wind up with 10 things, and if we vote for three things that are going to be required reporting in the statute, we're actually going to wind up with seven things. So just to think about that as we go forward, and again, this could be part of our balancing discussion.
Rita Mangione-Smith: So I think one of the things that we also need to keep in the forefront is what we're doing is making a recommendation. So our recommendation may not match the requirements exactly of the legislation, especially if we didn't find the measure that we feel needs to be an empty chair measure, and we've been pretty clear about that, I think, since July. So I don't want to say we're freewheeling, and we don't pay attention to the legislation at all; certainly not, that's why we have the balancing grid.
But I don't think we already by requirement have three measures that we have to include. I think based on what's happened so far with some of the measures you've mentioned, we really don't have an access, one that we all felt met our criteria, so to me I think that may end up being one of those empty chairs. CAHPS on the other hand is one where I think a lot of us voted such that—on the Delphi process—such that it probably will be in our recommended set. So I think it's that we aren't deciding, we're recommending.
Male Voice: I guess my question also in part is, once we've done the rating that we will do and I think the process is a good one and we're going to come out with a good result here, do we also want to identify the best available measures and some of the categories that go beyond the consensus and Delphi process that also will address some of those other quality reporting requirements in the rules, or is that someone else's work from this panel's recommendations?
Denise Dougherty: [Inaudible]
Jeffrey Schiff: Alright, so I think we've affirmed our process. We've affirmed putting these back in. The next part of our conversation is really to talk a little bit more about the Delphi results and the measures that passed and then the criteria for the balancing set.
Okay. So we wanted to spend a little bit of time here talking about—just to reaffirm what was really the conversation we had last time. A few people are new, a few people aren't here, but we have been going around the country a little bit to the NAC and then we presented at the AHRQ meeting, and I just want to spend a few minutes seeing if this still makes sense to everybody a month and a half or 2 months later, that our job is really to work on this grounded set.
We'll get some measures that won't make the cut that actually exist, that could be intermediate measures and aspirational measures, and I want to be clear to make this distinction. The aspirational measures are measures that need to be developed as opposed to what we may choose to be as grounded measures that are hard to—that are not going to be the easiest to measure but still exist and can be used today and we'll recommend. So I hope I'm not confusing things because I think this is what we affirmed last time. I just want to get a sense of the group about whether this still makes sense as far as the logic of what we're doing. Some nods. This will be way easier, we'll finish early.
Return to Contents
Proceed to Next Section