The Transcranial Doppler Ultrasonography (TCD) Screening among Children with Sickle Cell Anemia (SCA) establishes a claims-based method for identifying receipt of TCD screening among children with SCA. The measure was created by the Quality Measurement, Evaluation, Testing, Review, and Implementation Consortium (QMETRIC) operating through the University of Michigan, a Pediatric Quality Measures Program grantee. This measure was initially endorsed by the National Quality Forum in May 2016 and has maintained endorsement since that time. This measure was also recommended for inclusion in the 2016, 2017, 2018 and 2019 Child Core Set by the Measure Applications Partnership. The measure was not adopted for inclusion in the Child Core Set.
Specifically, the measure calculates the percentage of children ages 2 through 15 years old with SCA (Hemoglobin [Hb] SS or HbSβ0-thalassemia) who received at least one TCD screening within the measurement year. A higher proportion indicates better performance as reflected by appropriate testing. The measure specifications are reflective of the guidelines from the National Heart, Lung, and Blood Institute (NHLBI), and the performance scores calculated through this measure identify areas in need of improvement.
The QMETRIC technical measure specification (PDF, 133.6 KB) has been validated at the health system, health plan, and state levels. A change to ICD-10-CM from ICD-9-CM coding required re-specification of the measure and extensive testing to ensure reliability and validity at the health plan and state level. It also required the development of a valid claims-based case definition. Ultimately, a definition with >90% sensitivity and specificity, was validated. This definition ensured the feasibility of using only administrative claims for this measure.
Availability of Data
This measure exclusively uses administrative claims data, so accurate and complete data is generally available. A lag in claims availability at the state level may create a delay in use of Medicaid claims. QMETRIC found that health plans have more ready access to their own claims and can obtain measure performance data on a more rapid basis.
Specification Variation at Multiple Levels
This measure is endorsed by the National Quality Forum and has been rigorously assessed for reliability and validity at all levels. The detailed measure specification may be used on administrative claims in any context. This measure can be easily aligned across state Medicaid programs, health plans, and health systems, and as it is claims-based, there is very little cost or effort required to collect measure performance data.
There was no difficulty in having QMETRIC’s state or health plan partners implement the measure and obtain performance scores. The unmet challenge was to get some health plans and health systems to act on the poor performance identified by the measure and engage in meaningful quality improvement (QI) activities to address this. QMETRIC found for some health plans, available QI and case management resources are directed at other conditions which (1) have an extrinsic force prompting action (e.g., Core Set, HEDIS) or (2) are high cost to the plan. Health systems had significant difficulty locating some of the patients attributed to them, due to the limited case management resources.
Validity and Reliability Testing
This measure was originally tested for validity and reliability during the first PQMP grant to QMETRIC using ICD-9-CM codes. Children with sickle cell anemia (SCA) were identified through the presence of at least three separate healthcare encounters related to SCA (defined as hemoglobin [Hb]SS) within the measurement year. SCA-related healthcare encounters were identified through the following ICD-9-CM codes: 282.61 (Hb-SS disease w/o crisis) and 282.62 (Hb-SS disease with crisis). Children ages 2 through 15 years are included within the target population (i.e., must not have a 2nd or 16th birthday within the measurement year).
It is important to note that accurate calculation of this measure requires that the target population be selected from among children who have all of their health services for the measurement year included in the administrative claims data set. For children who have dual enrollment in other health plans, their claims may not be complete since some of their health services may have been paid for by another health plan. Inclusion of children with other health insurance would potentially cause this measure to be understated. As a consequence, this measure requires that children must not only be continuously enrolled within the health plan from which claims are available, the enrollment files must also be assessed to determine whether other forms of health insurance existed during the measurement year. Children with evidence of other insurance during the measurement year (i.e., coordination of benefits) must be excluded from the target population.
Performance Rate Calculation
- Identify the denominator: Determine the eligible population using administrative claims. The eligible population is all individuals who satisfy all specified criteria, including age, continuous enrollment, and diagnosis requirements within the measurement year.
- Identify the numerator: Identify numerator events using administrative claims for all individuals in the eligible population (denominator) within the measurement year.
- Calculate the rate (numerator / denominator).
Data Sets and Data Elements
This measure was tested by QMETRIC using the following data sets:
- Michigan Medicaid administrative claims data provided by the Michigan Department of Health and Human Services (MDHHS): Consisted of all Medicaid claims for Medicaid enrollees within the state of Michigan.
- Medicaid Analytic eXtract (MAX) administrative claims data for 6 state Medicaid programs provided by the Centers for Medicare & Medicaid Services (CMS): Consisted of Medicaid claims reported to CMS for Medicaid enrollees within 6 state Medicaid programs with moderate to high prevalence of SCA: Florida, Illinois, Louisiana, Michigan, South Carolina, and Texas.
- Medical record data from three Michigan medical centers: Children’s Hospital of Michigan (CHM), Detroit, Michigan; Hurley Medical Center (HMC), Flint, Michigan; and University of Michigan Health Services (UMHS), Ann Arbor, Michigan. These three large medical centers are located in urban areas which are reflective of the residence of the vast majority of children with SCA living in Michigan.
- Michigan Newborn Screening (NBS) Results: Consisted of all births within Michigan.
The primary information needed for this measure includes a unique member identifier, health plan enrollment information, date of birth, dates of service, diagnosis codes, and procedure codes. These data are widely available, although obtaining them may require a restricted-use data agreement. For multiple-state comparisons, national Medicaid data are available from CMS. When the measure is used at the single-state level, state health departments can use their own Medicaid data.
QMETRIC testing determined that this measure is feasible using existing data from administrative claims systems. While QMETRIC testing efforts support the feasibility of implementing this measure, the testing process demonstrated the technical challenges that may exist when identifying SCA cases from very large administrative claims files, such as MAX data. This measure was also tested using Medicaid administrative claims data acquired directly from the state of Michigan. Acquisition of data directly from state Medicaid programs requires the cooperation of those jurisdictions, as well as modification of the statistical programming code developed for use with MAX files. Such modifications are necessary given the unique structure of the data files obtained directly from state Medicaid programs.
Reliability of MAX Data
MAX data from Florida, Illinois, Louisiana, Michigan, South Carolina, and Texas were used to test the reliability of this measure. The reliability of MAX data to evaluate TCD screening is of high importance since this is the only national source of state Medicaid data available upon which state-to-state comparisons may be conducted. The reliability of this measure was calculated using a signal-to-noise analysis. The signal-to-noise analysis was focused on assessing the reliability to confidently distinguish the performance of one state’s Medicaid program from that of another state. For this approach, reliability was estimated with a beta-binomial model (RAND Corporation, TR-653-NCQA, 2009).
State-specific reliability was very good; observed reliability was consistently greater than 0.95. In general, reliability scores can range from 0.0 (all variation is attributable to measurement error) to 1.0 (all variation is caused by real differences). While there is not a clear cut-off for minimum reliability level, values above 0.7 are considered sufficient to distinguish differences between some states and the mean; reliability values above 0.9 are considered sufficient to see differences between states (RAND Corporation, TR-653-NCQA, 2009). The median reliability observed across states was 0.98 (range: 0.96-0.99), which is consistent with a high degree of reliability.
In addition, the reliability of the data element abstracted from the medical chart was assessed by identifying a subset of the charts to be re-abstracted by another trained medical record abstractor; the results of the two abstractors were compared using percent agreement and kappa. Ten charts were chosen for evaluation of inter-rater reliability; the two trained abstractors had 100% agreement with each other for abstracting receipt of TCD screening from the medical records, resulting in a kappa of 1.00. A kappa of greater than .81 is considered almost perfect agreement (Landis and Koch, 1997).
Validity of Critical Data Elements
Numerator: The accuracy of administrative claims in identifying receipt of TCD screening was assessed through comparison to the gold standard of medical charts. An audit was conducted by trained medical record abstractors to compare administrative claims data with corresponding medical records data. Medical records were abstracted for all children meeting the TCD screening measure specification criteria; agreement between the medical records and the administrative claims was assessed using kappa. Furthermore, the sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) of administrative claims for receipt of TCD screening were calculated; the medical charts were the gold standard for comparison.
For this comparison, children with SCA who were enrolled within Michigan Medicaid were successfully matched with their Michigan Medicaid administrative claims data. Among these children, by comparing administrative claims data with medical records QMETRIC determined that TCD screening was identified in both for approximately 50% of cases. Similarly, approximately 45% of cases were classified as not having a TCD in both data sources, yielding an overall agreement of 96.7% (kappa = 0.93, 95% confidence interval (CI): 0.86, 1).
Using administrative claims to identify receipt of TCD screening resulted in a sensitivity of 94% (95% CI: 83%-99%), a specificity of 100% (95% CI: 91%-100%), a NPV of 93% (95% CI: 81%-99%), and a PPV of 93% (95% CI: 92%-100%) compared with the gold standard of medical records.
Denominator: The accuracy of the case definition using ICD-9-CM codes (at least 3 claims for SCA (Hemoglobin SS) within the measurement year) to identify children with SCA was assessed through comparison to the gold standard of newborn screening results for the state of Michigan for children enrolled in Michigan Medicaid in 2010 and 2011 with at least one SCD-related healthcare claim within their enrollment year(s). The area under the receiver operating characteristic (ROC) curve, sensitivity, specificity, PPV, and NPV were calculated for the case definition. As a comparison, these values were also calculated for those with a minimum of at least 1 or 2 HbSS claims within each year.
A sensitivity of over 90% and a specificity of approximately 80%, as well as the reliability across years, allow QMETRIC to conclude that the denominator, using ICD-9-CM codes, is valid for accurately identifying children with SCA within administrative claims. These results indicate that the case definition used has a very high ability to correctly identify true cases and a somewhat lower ability to distinguish false positives. However, other less stringent case definitions resulted in substantially more misclassification than the chosen definition of at least 3 HbSS claims within the measurement year.
Empirical Validity Testing of Performance Measure
Although a state would typically have direct access to its own Medicaid data, it is unlikely that a state would have similar access to other states’ data for comparison. However, CMS develops and maintains standardized MAX data for public use using administrative claims submitted by each state Medicaid program. The MAX data are the only national, person-level administrative claims dataset available for the Medicaid program. As a consequence, MAX data, rather than data acquired directly from individual Medicaid programs, are likely to be used to perform cross-state comparisons of TCD screening among children with SCA. Since states submit their Medicaid data to CMS for conversion into the MAX datasets, a state’s own Medicaid data can be considered the authoritative source for administrative claims.
QMETRIC’s empirical validity testing of this performance measure compared the MAX data for the state of Michigan (obtained from CMS) to the gold standard of Michigan Medicaid data (obtained directly from Michigan’s claims data warehouse) for the same time period (2007-2009). Note that the testing time period was constraint to align with the most recent MAX data available from CMS at the time of this analysis. Rates of TCD screening using each source of data were calculated and compared using z-tests for two proportions; for these tests, the null hypothesis was that the rate in each year would be the same in both Michigan Medicaid data and MAX data. Additionally, the correlation coefficient and squared correlation coefficient were calculated to identify the extent of the linear relationship between the two data sources.
The comparison of rates of TCD screening from the Michigan Medicaid data as compared to MAX data showed that the number of TCD cases among children with SCA ranged from 45 to 114 screenings in the claims acquired directly from the Medicaid data warehouse, versus a range of 26 to 93 screenings from MAX data for the same time period.
These results suggest that, compared with the Michigan Medicaid data, MAX data has a very high degree of validity. When TCD screening was assessed for the same state (Michigan) from these two data sources for the same time period (2007-2009), no differences in rates were observed (all p-values >0.20). Additionally, the high values of the correlation coefficient and the squared correlation coefficient indicate a high level of reliability. Correlation coefficients of greater than 0.70 indicate a strong positive linear relationship; therefore, these results suggest that compared with Michigan Medicaid data, MAX data is highly valid. The squared correlation coefficient value of 0.96 indicates that nearly 96% of the variability in the MAX data from CMS for the state of Michigan can be explained by variation in the data received directly from the Michigan Medicaid program. This finding indicates that the strength of the relationship between the two data sources is extremely strong.
Face Validity of Performance Measure Score
The face validity of this measure was established by a panel of national experts and advocates for families of children with sickle cell disease (SCD) convened by QMETRIC. This expert panel included nationally recognized experts in SCD, representing hematology, pediatrics, and SCD family advocacy. In addition, this measure’s validity was considered by experts in state Medicaid program operations, health plan quality measurement, health informatics, and health care quality measurement. In total, the QMETRIC SCD panel included 14 experts, providing a comprehensive perspective on SCD management and the measurement of quality metrics for states and health plans. The expert panel assessed whether the performance of the measure would result in improved quality of care for children with SCD. Specifically, in respect to TCD screening, the panel weighed evidence to determine if the performance of TCD as outlined in the measure would improve the quality of care provided to patients. The voting process to prioritize the measure was based on the ability of the measure to distinguish good from poor quality.
The expert panel concluded that this measure has a very high degree of face validity through a detailed review of concepts and metrics considered to be essential to effective SCD management and treatment. Concepts and draft measures were rated by this group for their relative importance. This measure was highly rated, receiving an average score of 8.5 (with 9 as the highest possible score) by both expert panels. In addition, the expert panel concluded that the performance of TCD as outlined in this measure would improve the quality of care provided to patients, and the measure would be able to distinguish good from poor quality.
Demographic, Clinical and Social Risk Adjustment
This measure was not risk adjusted.
During testing using ICD-9-CM codes, there were no identified gender disparities in TCD screening among children with SCA (chi-square=1.2, p-value=0.28). The data used for performance scores was state Medicaid programs; therefore, there were no disparities identified by insurance or socioeconomic status. Younger children (ages 2-6) were more likely to receive TCD screening than older children (chi-square=99.01, p-value<0.0001). For those 2 to 6 years old, 36% received a TCD screen; for those ages 7 to 11 years, 31% received a TCD screen; and for those ages 12-15 years, 25% were screened.
A benchmark was not established for this measure for two primary reasons: (1) a lack of any real variation among the different measured stakeholders and (2) there were no high performers among the measured entities. The QMETRIC team was not able to establish a standard for groups to aspire to with regard to performance. Due to the lack of variation and poor performance, any benchmark would have been arbitrary.
Transition from ICD-9-CM to ICD-10-CM codes
Because the measure had been validated and endorsed by NQF using ICD-9-CM codes, it was necessary to revalidate the measure using ICD-10-CM codes. The re-specification of the measure had to be completed manually due to the failure of the “automated” conversion tools to perform in a valid manner. As such, this was a very laborious and time-consuming endeavor. However, it was an essential process to result in a valid and reliable claims-based measure using ICD-10-CM codes.
As a first step in re-specifying the measure using ICD-10-CM codes, it was necessary to develop a valid claims-based case definition for identifying children with SCA. A manuscript has been published by the QMETRIC team that provides a complete description of the process used to develop, test, and validate a new case definition (Reeves, et al., 2020).
A summary of this process is provided in the following excerpt from Reeves SL, et al. manuscript.
Using specific SCA-related (D5700, D5701, and D5702) and nonspecific (D571) diagnosis codes, 23 SCA case definitions were applied to Michigan Medicaid claims (2016) to identify children with SCA. Measures of performance (sensitivity, specificity, area under the ROC curve) were calculated using newborn screening results as the gold standard. A parallel analysis was conducted using New York State Medicaid claims and newborn screening data. In Michigan Medicaid, 1597 children had ≥1 D57x claim; 280 (18%) were diagnosed with SCA. Measures of performance varied, with sensitivities from 0.02 to 0.97 and specificities from 0.88 to 1.0. The case definition of ≥1 outpatient visit with a SCA-related or D571 code had the highest area under the ROC curve, with a sensitivity of 95 percent and specificity of 92 percent. The same definition also had the highest performance in New York Medicaid (n = 2454), with a sensitivity of 94% and specificity of 86%. Children with SCA can be accurately identified in administrative claims using this straightforward case definition. This methodology can be used to monitor trends and use of health services after transition to ICD-10-CM.
The development of this new valid case definition ensured the feasibility of the use of this claims-only measure.
The re-specification continued with a translation of all other codes required for measure implementation which included a manual review to ensure the accuracy of this process. The new ICD-10-CM measure specification was then provided to each of three partner Medicaid health plans who were asked to determine their performance scores using their claims data. The new specification was also provided to the New York Medicaid program so that they could also determine the performance scores for their state. This was done for both 2018 and 2019 measurement years.
Using Michigan Medicaid administrative claims (available with permission via a data use agreement), the overall performance score for the state of Michigan was calculated. State Medicaid claims were then sorted by health plan and a performance score for each partner health plan was determined. QMETRIC then developed an algorithm to assign sickle cell patients to specific health systems within the State of Michigan. Again, using the Michigan Medicaid claims, patients were sorted to each partner health system and a performance score for the patients assigned to each health system was determined.
Very high reliability was found in the performance scores that were calculated using Michigan Medicaid claims for each partner health plan as compared with the scores they calculated using their own claims data.
The re-specified measure has been proved to be valid at all levels for which it was used. The lag in claims availability at the state level creates a delay in use of Medicaid claims. Health plans have more ready access to their own claims and can use the measure on a more rapid basis.
Adams, John L., The Reliability of Provider Profiling: A Tutorial. Santa Monica, CA: RAND Corporation, 2009. Available at: https://www.rand.org/pubs/technical_reports/TR653.html.
Landis, J.R. and Koch, G.G. The Measurement of Observer Agreement for Categorical Data. Biometrics 1997; 1: 159-174. DOI:10.2307/2529310.
Reeves SL, et al. Performance of ICD-10-CM diagnosis codes for identifying children with SCA. Health Serv Res 2020; 00: 1–8. DOI: 10.1111/1475-6773.13257.