To identify a comprehensive, valid, and feasible set of primary care practice accountability measures, the researchers followed three key steps (Figure B-1). This appendix provides details of how each step was performed. Appendix C provides detailed results of the measure selection process using these methods.
Step 1. Narrow the Field of Care Coordination Measures
Given the number of measures included in the Care Coordination Measures Atlas and the resources required to assemble a comprehensive, valid, and feasible set of these measures, the process of creating the measure set began by narrowing the field of candidate measures based on a set of inclusion and exclusion criteria. These criteria are meant to focus measurement on those most applicable to the primary care setting. Figure B-2 provides an overview of this process.
Two key inclusion criteria were used for identifying measures eligible to be included in the final measure set: (1) primary care setting and (2) patient age group (adult and pediatric). For the adult measure set, A third criterion (patients with chronic conditions), was used because nearly all Atlas measures met the first two criteria. Definitions of these criteria are detailed below. In determining the applicable setting, patient age, and patient conditions groups for a measure, we relied on information from published Atlas measure sources about the populations in which the measure has been used and any intended patient population. As part of development efforts surrounding the Atlas measure developers were also contacted and asked that they provide feedback on how measures were categorized.
Primary Care Setting
Given the focus on measures of care coordination as it is carried out by primary care practices, selection of measures was limited to those that are designed for or have been used in primary care facilities. Measures that are not setting specific were also included, as they may be used to assess care coordination in any setting.
- Primary Care Facility—Any setting described as primary care or settings providing care by generalists or practitioners in internal medicine, family practitioners, general pediatricians or general practice providers. This includes settings described as a medical or health care home or PCMH.
- Not Setting Specific—The measure application is not limited to a particular type of setting, or the setting was not specified in measure development or application publications.
Measures separately for adult and pediatric populations were assessed separately. The pediatric measure set includes measures applicable to children, measures that are not age specific, and measures where the patient age is not applicable (i.e., the measure focuses on health care providers or a practice, not patients). The adult measure set includes measures applicable to adults and those that are not age specific or where the patient age is not applicable.
- Children—Measure is targeted toward or has been used in a patient population described as pediatric; children, or parents/care takers of children receiving health care.
- Adults—Measure is targeted toward or has been used in an adult population. This includes measures applicable to older adults.
- Not Age Specific—Purpose states measure is intended for application to patients of all ages, or no information is available on the ages of patients to whom the measure has been applied.
- Not Applicable—Measure does not focus on patients.
Some measures were included in both reviews (e.g., measures that are not age specific or where age is not applicable).
For the adult measure set, we further limited inclusion to measures that are applicable to patients with chronic conditions, or that are not condition specific or where the patient condition is not applicable (i.e., the measure focuses on health care providers or a practice, not patients).
- General Chronic Conditions—Patients who are described as having chronic conditions, chronic diseases, or chronic illnesses without specifying particular conditions. A chronic condition is a disease or condition of long duration and typically slow progression.
- General Population or Not Condition Specific—Measure is targeted toward or has been applied to the general population or to a patient group not limited by condition. Validation or application of the measure is not limited to particular patient disease or condition groups, or the disease/condition of interest was not specified.
- Not Applicable—Measure does not focus on patients.
In each of these age groups, we narrowed the field of measures using three additional exclusion criteria: (1) measures not focused entirely on coordination performed by a primary care provider or practice, (2) disease-specific measures, and (3) prioritization groups (after inclusion/exclusion) based on feasibility and degree of focus on care coordination.
Measures Not Focused Entirely on Coordination Performed by Primary Care Provider or Practice
These represent a sub-set of the broader group of measures that were included in the Primary Care Facility or Not Setting Specific categories.i
In assessing applicability of measures for primary care evaluation, we relied on the measure instrument. We reviewed the content and wording of all measure items that mapped to an Atlas care coordination domain, or in the case of non-survey measures, we reviewed the detailed measure specifications. We also reviewed instructions and other introductory materials that accompanied survey-based measure instruments. We then assessed whether the instrument reflected coordination as performed by a primary care provider or a primary care practice.
Measures falling into any of the following categories were excluded from further review:
- Measures that assess coordination in the entire health care system, or overall experiences of care in any part of the health care system.
- Measures that assess how well other health care entities, such as hospitals or behavioral health facilities, coordinate care with primary care providers or practices.
- Measures that assess how well non-primary care providers coordinate care with providers in other settings or specialties, including primary care.
- Inpatient Discharge Measures. When measures focused specifically on assessing coordination at the time of discharge from an inpatient facility, we excluded those that assessed coordination as performed by the discharging facility, such as quality or adequacy of discharge planning or complete and timely transfer of a discharge summary to the appropriate primary care provider or practice.
For the purposes of this measure set, condition-specific measures were excluded, such as those that are applicable only to patients with diabetes, schizophrenia, or HIV/AIDS. We did not exclude measures that are applicable to patients with any chronic condition. Indeed, measures focused on general chronic conditions were of particular interest, especially for the adult population.
Prioritization Groups (After Inclusion/Exclusion) Based on Feasibility and Degree of Focus on Care Coordination
An important consideration in creating a useful measure set is the feasibility of the measures included. Feasibility concerns include the availability of data and the burden of obtaining data or data collection if data are not readily available. Almost all measures under review require some amount of data collection, and a large majority used a survey format. However, few of the measure sources reviewed addressed feasibility and information on typical completion times for survey instruments was rarely available.
As a proxy for the time burden of data collection, the total number of survey items included in a measure was reviewed. This method was not suitable for assessing feasibility of the limited number of measures that do not use a survey format. Tthose items were assessed on a case-by-case basis using any information available on data sources and data collection burden from among the measures' Atlas profile sources. If no information was available on the feasibility of non-survey measures, they were included in the highest priority group for further review.
Recognizing that feasibility must be balanced with the benefits of a measure, we assessed the degree to which a particular measure focuses on care coordination. As an initial gauge of this focus, we reviewed the percent of measure items that mapped to any care coordination domain from the Care Coordination Measures Atlas framework, out of the total number of measure items. As with the proxy measure for feasibility, this method was not suitable for measures that did not contain multiple aspects (e.g. single process measure). Degree of focus on care coordination was assessed on a case-by-case basis for those measures, as was done for feasibility.
These two criteria to categorize measures were combined into one of four priority groups for further review (Table B-1).
Short measures with a strong focus on care coordination (Priority Group 1) are clearly of interest for the measure set and are highest priority for further review and inclusion. Lengthy measures with many mapped measures (Priority Group 2) are not as desirable compared to shorter measures from a user feasibility perspective, but their density of relevant items could be attractive to some users. Therefore, these measures were moved to the next stage of review. Measures that are short but with few mapped items (Priority Group 3) might be useful alternatives if higher priority measures are inadequate (for example, none have adequate validity or reliability). Measures that are long and have a minimal focus on care coordination (Priority Group 4) are lowest priority for inclusion in the measure set, and were reviewed only if no other measures filled a particular measurement need.
We recognize that strict cut-points in separating surveys by length or focus are arbitrary. Before finalizing measure selection procedures, we confirmed that shifting the cut-points slightly would not change which measures were selected for any of the measure sets.
Step 2. Identify the Most Comprehensive Measures
A key goal of the final measure set is that the included instruments measure care coordination comprehensively. Comprehensive was defined as mapping to all of the activity domainsii from the Atlas measurement framework. A list of those activity domains and their definitions may be found in Appendix A.
Starting with the group of measures remaining after narrowing the field based on setting, age, condition, applicability to primary care, feasibility, and focus on care coordination (Step 1), we mapped the measures by domain and perspective. We then identified those measures that mapped to the most activity domains for a particular perspective. In considering breadth of domain coverage, we grouped the Communicate and Interpersonal Communication domains together, because they differ only in whether the mode of communication (personal interactions vs. any other mode) was specified. We considered the Information Transfer domain to be separate from this combined communication domain because this domain is distinguished by its focus on transmission of data through a variety of channels.
The goal of Step 2 was to identify a set of measures to undergo detailed assessment of validity, reliability, links to outcomes, and any further information on feasibility and use. Thus, when more than one measure offered broad domain coverage, both were included in the set to undergo detailed review. Choices among these measures were then determined based on the detailed assessment (Step 3).
We repeated this process for each age group (pediatric and adult) and each perspective (patient/family, health care professional, and system representative).
Step 3. Conduct Detailed Measure Assessment and Select Final Measures
Our primary focus in the detailed assessment is a review of evidence of measures' reliability and validity. Details of how we conducted that assessment are provided below.
In addition to assessing reliability and validity, we also examined any additional information available on feasibility, past or suggested uses of the measure (quality improvement, research or accountability/recognition) the unit of analysis in past applications of the measure, the degree of focus on care coordination (i.e., percent of total instrument items that map to any care coordination domain) and the depth of domain focus (i.e., number of items that map to each domain). Information for the detailed assessments was obtained from the sources in the Atlas profile for the measure. To supplement the sources cited in the Atlas measure profiles, we performed a search of references that cited the initial development, validation or testing publication for each measure. When no published source was available, we performed a search of the measure title using Google Scholar and reviewed resulting sources published in peer-reviewed journals.
It is rarely possible to directly compare results of reliability or validity testing from one study with another, due to differences in study design, analytic methods, and the measures themselves. Therefore, to summarize the weight of evidence in support of each measure that underwent a detailed review, we examined the kinds of testing done and the conclusions drawn from that testing. Two broad categories of conclusions were used—evidence that raises concerns and evidence that does not raise concerns (including supporting evidence)—based on critical assessment of the interpretation and discussion of the results in the published source.
We also noted when evidence was mixed, such as when two sources reported conflicting results, or when a single source reported multiple analyses that addressed the same element of reliability or validity but reported conflicting results. Any concerning results from a single analysis were categorized as raising concerns. For example, if a test of reliability showed for a measure reporting a composite score, one measure sub-domain had low reliability while several others used in the composite score had good reliability, this was scored as a single test that raises reliability concerns.
We approached the review of sources systematically to describe the evidence available, gaps in testing, and areas where a specific type of test is not applicable. Several types of tests of reliability and validity were identified:
Evidence of Reliability
Internal Consistency—Tests of the reliability of a total instrument score, typically using Cronbach's alpha. Not applicable to measures that do not generate composite scores based on multiple items. In summarizing evidence, we focused on tests of the entire instrument rather than tests of subscales, but did note reliability of subscales that were particularly relevant to care coordination.
Inter-rater Reliability—Reliability of the measure when rated by multiple observers, typically assessed through a Kappa statistic. This is not applicable for measures that assess personal experience using a single rater.
Test-retest Reliability—Assessment of reliability when the measure is completed by the same raters or methods over two or more time periods when change would not be expected.
Evidence of Validity
Factor Analysis/Principal Components Analysis (PCA). Analyses to identify or confirm the relatedness of items within the instrument. This is typically performed to validate sub-scales within an instrument and is not applicable if no sub-scales are used or it is not anticipated that a measure captures multiple concepts.
Construct Validity—Analyses performed to assess whether expected distributions are observed, or expected relationships with structures, processes or outcomes are observed. This relationship suggests an underlying "construct"—in this case, what we assume to be "care coordination." We distinguish two levels of construct validity testing:
- Univariate or bivariate testing—description of response distribution or relationship with structures, processes or outcomes using bivariate statistical tests, such as correlations or t-tests. This is a weaker level of evidence than multivariate testing.
- Multivariate testing—assessment of relationships with structures, processes or outcomes using multivariate statistical techniques that control for potential confounding factors, such as multivariate regression analyses.
Convergent Validity—Responses or score on the measure are similar to the score on another validated measure of the same or related concept.
Content Validity—Measure is examined by subject matter experts to assess whether the measure reflects the underlying concept it was designed to capture. This includes expert panel review, Delphi or Nominal Group techniques, and focus groups. Revision following review and feedback is considered evidence supporting validity of the revised measure.
Indirect—Evidence that an earlier or related version of the measure is valid. This includes validation of the instrument outside a health care setting. This is a weaker category of evidence than evidence relating to the version and application of the measure under review.
In some cases, a particular test is not applicable to a measure. For example, assessing inter-rater reliability is not appropriate for measures of personal experience. Similarly, factor analyses are not appropriate for measures that do not group items into subsets.
A table was usedto summarize the type of testing conducted and the conclusions of that testing for each measure that underwent detailed assessment (Table B-2).
This schema was used for summarizing reliability and validity to provide an overview of testing and evidence for measures that underwent detailed assessment, and to highlight gaps where more testing may be needed. We did not use the reliability and validity profile to score measures in a quantitative way, because the weight of evidence is not equal among these categories of evidence. For example, multivariate analyses demonstrating construct validity is much stronger evidence of measure validity than indirect evidence pertaining to a previous or related version of a measure.
When making choices among measures, we based that choice on the details of reliability and validity testing and evidence linking measure results to outcomes.
i In the setting criterion used for the initial measure inclusion, measures were included in the Primary Care Facility category if they had been used in any primary care setting, or assessed a transition that included primary care, ambulatory care in general, or patients receiving care in the community. Measures were included in the Not Setting Specific category if their design or purpose clearly stated application to any health care setting, or if information on settings where the measure has been applied was not available in the Atlas sources reviewed. These definitions include some measures that focus on assessing coordination for a number of entities outside the scope of the primary care practice accountability/recognition measure set, and were thus excluded.
ii We did not consider mapping to the broad approaches domains from the Atlas framework as part of the domain coverage review because these domains reflect complex care delivery models rather than discrete coordination activities.