Bone Morphogenetic Protein: The State of the Evidence for On-Label and Off-Label Use

Disposition of Comments

Comments received from draft review on Bone Morphogenetic Protein: The State of the Evidence for On-Label and Off-Label Use.

Project ID: BMPE0109

 

Table 1: Invited Peer Reviewer Comments

Reviewer1Section2Reviewer CommentsAuthor Response3
1Executive summary
 
The summary is long and hard to follow. It presents an overview of the methods rather than findings. The executive summary table lacks any synthesis of study size, source, quality, and outcomes. The text within the “conclusion column” re-iterates methods and states a general conclusion for each key question, without giving constructive information and basis for the conclusion.
 
The Executive Summary – like any summary or abstract – synopsizes methods to let the reader know how the assessment was done. The conclusions column in the summary table provides a high-level synthesis of the evidence review for each Key Question based on the AHRQ-modified GRADE framework. This takes into account study size, source, outcomes, and quality based on the USPSTF criteria.
1Introduction/Background
  • The introduction provide good general context for BMP. The assertion that age influences fracture healing needs to be supported by citations (page 13, paragraph 5).
  • Animal studies supporting bone formation properties of BMP are not discussed.
  • The FDA approval studies and data would be helpful, such as FDA summary and effectiveness data and specific FDA prescription, training, and labeling information as an appendix.
  • A reference citation was provided on page 13, paragraph 5 regarding patient age as a factor in bone healing.
  • Animal studies are outside the scope of the assessment.
  • Published data describing results from the FDA pivotal trials are included and assessed in the report.
1Methods
  • The search strategy was extensive. The selected time window of 1998-2009 is not explained but seems reasonable, given the FDA approval dates.
  • The patient population description is vague. Again, reference to the populations studied in the FDA approval application may be helpful.
  • Separating fracture and spine studies is critical, but the methods do not describe this or explain why not.
  • “DDD” is not a skeletal bone defect, and neither is an arthrodesis procedure.
  • The discussion of radiographic outcomes for both fractures and spinal fusion needs more details and referencing. FDA definitions would be helpful, both for radiographic success and clinical success.
  • Neurological status outcomes description is not specific enough for application to literature review and it contains no references.
  • Methods described for harms assessment (key questions 7 and 8) are particularly problematic. Assessment of harms is very limited. This aspect of the review is perhaps the most important from patient perspective, and the information presented does not synthesize well the safety characteristics of BMP. McMaster and AHRQ are cited as the source for the harms ascertainment, but even these modified questions are not addressed in the body of the report or in tables 36 and 37. How were harms defined and ascertained in the FDA studies? How were they defined and ascertained in the published studies? How do they compare across BMP-products? How do they compare in on-label vs. off-label applications? These questions are not answered by the report. Although Table 36 and Table 37 contain a lot of information, the information is not well-organized. The data are not synthesized in any structured way. The systematic harms ascertainment methods advocated by AHRQ would have been very helpful (AHRQ series paper 4: assessing harms when comparing medical interventions: AHRQ and the effective health-care program. Chou R, Aronson N, Atkins D, Ismaila AS, Santaguida P, Smith DH, Whitlock E, Wilt TJ, Moher D. J Clin Epidemiol. 2010 May;63(5):502-12. Epub 2008 Sep 26. PMID: 18823754).
  • The methods for data abstraction, verification, and creation of evidence table should be more specific and clear. Exactly what constituted primary data for the different study designs? How were accuracy checks performed?
  • Use of USPSTF study quality criteria in Appendix 5 is very helpful; condensing the criteria into “good-fair-poor” categories makes it hard to evaluate the studies in the body of the report is less useful. The low rate reporting on how harms were identified needs to be highlighted in the body of the report; if this is not described, the remaining five characteristics are nor really relevant.
  • Assessment of applicability is difficult to interpret based on the description in the methods. Rather than describing the EPICOT framework in a general sentence, it would be more useful to compare the study populations to the patients enrolled in the FDA approval studies. Also, separating spine and tibia studies would provide more clear applicability assessment. The report has minimal figures; the two on decision modeling are clear.
  • The search was designed to take into account the FDA approval dates.
  • The population description was broad, patients with a bony defect that requires repair. However, the KQs address specific indications for on-label uses, leaving the off-label uses less specific because it was unclear at the beginning what would be found in the literature.
    Spine and fracture results are reported separately in the Results.
  • The reviewer is correct in that “DDD” is not a skeletal bone defect nor is an arthrodesis procedure. These errors are corrected in the text.
  • Radiographic outcomes reported from included articles conform to FDA definitions and are in accord with standard use.
  • The neurological status outcomes description is reported as it is used in the Neck Disability Index instrument.
  • A major conclusion of this report is that the quality of reporting of harms in the literature is inconsistent. Our team systematically culled out harms data from every included comparative study and looked at noncomparative studies for those data (compiled in the Appendix). We included noncomparative studies because of the known limitations of harms reporting in RCTs. The results we compile represent what was actually reported (Table 36). The results of our modified McHarms survey highlight the limitations of the reporting (Table 37), which was the purpose of KQ8. Given the inconsistency and lack of comparability across studies, quantitative synthesis of the data is not valid. We state that the absence of harms reporting in a study does not necessarily provide evidence of the absence of harms. We agree about the importance of harms to patients, and believe our assessment underscores the weakness of evidence in the literature and the need for better collection and analysis of such data using validated methods
  • Data abstraction and analysis methods are presented on page 26 of the draft. Study selection criteria are described on page 25. Accuracy of abstracted data was verified by a second investigator with differences resolved by discussion or third party intervention as needed.
  • We systematically abstracted all BMP-related harms from all included articles, comparative and noncomparative. The quality of harms reporting was addressed by KQ8 using a modified McHarms survey. The USPSTF quality ratings are presented in the Appendix with annotation; overall ratings are carried over into the report tables and text.
  • It is unclear why comparing the included study populations to those in the FDA approval studies would make the report “more useful”. The results of FDA pivotal trials are included in the assessment. Spine and tibia studies are reported separately in the assessment.

 
1Results
  • Appendix 1 contains an amazing collation of very important information. The research team has done an outstanding job of assembling this information in Tables A through P. However, this information is not captured or synthesized well in the results section. The text is difficult to follow. Interventions, populations, benefits, and harms are interjected variably.
  • We appreciate the reviewer’s kind words about the collation job we have performed. We disagree with the reviewer in that we do not believe synthesis of the data using quantitative measures is applicable, but rather would in most cases be inappropriate due to interstudy heterogeneity and size differential. The Appendix data are qualitatively synthesized according to KQ and indications in the body of the report.
1Key Questions
  • The text does not directly answer the question. For example, the outcomes in Table 6 should be evaluated quantitatively, and the column headings defined for each of the three studies. No comparison is made to the FDA premarket approval studies. The text addresses study methods, patient demographics, benefits, and harms, without directly comparing and contrasting these features in succinctly. In part, this is due to lack of explicit definition of success in the pre-specified work plan.
  • Upon review of the draft, it became evident that the paper by Dawson et al, 2009 reported an off-label use. It was moved to the appropriate section and the text and tables were adjusted throughout the assessment to account for this. As a result, we were left with two studies, one much larger than the second, which would have overwhelmed if not negated the value of any qualitative analysis. The assessment compiles the data systematically, synthesizes it according to the AHRQ-modified GRADE convention, and reports it in that context.
1Cost-Effectiveness AnalysisThis section of the report is the most detailed, well written, and clear. My only concern is the potential for bias in the source studies, charge/cost estimates, and poor quality of source data for transition probabilities

The following sentences have been added to the Discussion and Conclusion section:

There was a limited evidence base for both open tibial fracture and spinal fusion, each consisting of a single randomized controlled trial. Biases may have existed in the source studies, for example possibly biased assessment of outcomes would result in inaccurate transition probabilities.

1Summary and conclusions
  • Conclusions are justified by the data. However, the potential for sponsorship bias should be mentioned.
  • Table 59 is again filled with large sections of text rather than clear summary of numbers. More specific numbers rather than general estimates such as “low/moderate” would help better understand the answers to key questions. Study limitations are not addressed adequately in the summary.
  • Sponsorship bias is not characterized. While the high proportion of industry sponsorship among the included studies suggests potential bias exists, it was not systematically investigated or quantified.
  • Table 59 is the same as in the Executive Summary, and all comments on it were covered in the text above pertaining to that section of the report.
2GeneralI found the extensive use of abbreviations to make reading difficult, as I was constantly looking back for definitions. I would recommend keeping the most familiar or obvious abbreviations (BMP, RCT, FDA, QALY), but simply writing out for each use the terms that are abbreviated with less familiar abbreviations (unless the meaning is reiterated in every section where they are used). Examples that threw me were AGB, ALGB, ICBG, FRA, DBM, HA-TCP, DSP, and the like.Revisions will be made throughout the text to limit abbreviations.
2Executive SummaryGoodNo response
2Introduction/BackgroundGoodNo response
2MethodsGoodNo response
2Results

Generally good.

It would have helped me to identify exactly what made certain trials off-label use. The reason is that, at least theoretically, BMP might be effective for one off-label use, but not for others. Here, all off-label uses seem to be treated as one. For example, in off-label uses of BMP2 in the lumbar spine, were studies off-label because they were used for more than one spinal level, or because they were not for DDD, or for other reasons? Could we separate trials of the different reasons they were off-label? Maybe BMP works for spinal stenosis, say, but not for multi-level fusions. We can�t get a sense of that here.

Off-label uses in RCTs in the lumbar-sacral spine (table 23) were explicated in the table, and text was adjusted throughout to reflect changes in table. Trials were not otherwise separated. There are several different reasons the trials in table 23 were deemed off-label. These include use of a nonapproved formulation, or matrix, in conjunction with the approved rhBMP2 (InFuse�); use of a non-anterior surgical approach with InFuse�; use of InFuse� with a nonapproved interbody entity; and, use in multi-level fusion. While the trials differed in rhBMP2 use, they were generally consistent in direction of effect, with statistically significant findings for radiographic success favoring BMP in three, including the two largest RCTs. This suggests the off-label factor(s) does not affect the result. BMP appeared to have benefit in this setting despite differences among the studies
On page 33, there�s a funny typo: below table 5, discussing lumbar fusions, there is a sentence that all patients had symptomatic single level DDD, but includes arm pain as a symptom. Not for the lumbar spine, I don�t thinkReference to arm pain was deleted
In the cost-effectiveness analysis, it took me a while to tumble to the baseline assumption that the costs of fusions with and without BMP were the same, thanks to DRG bundling. It would be nice to make this point more explicitly on page 81, since it is so counter-intuitive.

The last sentence below was added to make this clearer:

Analyses included direct health care costs reported as Medicare payments from free publicly available sources, valued in 2007 U.S. dollars (Tables 44�49). Cost categories included initial hospitalization (hospital and physician costs) and secondary interventions (hospital/outpatient surgical center and physician costs). It was assumed that initial hospitalization was paid according to the diagnosis-related groups (DRG) system. Thus, base case analyses assume identical initial hospitalization costs whether BMP was used or not.

2Discussion/ConclusionDiscussion/Conclusion: Again, it would be helpful if conclusions regarding off-label use could be itemized by indication (the reason for being off-label)See above
2Tables, Figures, AppendicesGoodNo response
2ReferencesI could easily have missed it, but I didn�t see mention of the Cahill article on complications of BMP that appeared in JAMA during the search period (Cahill KS, et al. Prevalence, complications, and hospital charges associated with use of bone-morphogenetic proteins in spinal fusion procedures. JAMA 2009; July 1; 302: 58-66). Is there some reason?Cahill et al presents a retrospective overview of complications associated with BMP use in spinal fusion, based on the Nationwide Inpatient Sample Database, a 20% sample of US community hospitals. It does not separate data according to BMP product, nor is it necessarily representative of BMP use in the US. It was excluded according to our predefined study inclusion criteria.

1 Peer reviewers are not listed in alphabetical order.
2 If listed, page number, line number, or section refers to the draft report.
3 If listed, page number, line number, or section refers to the final report.

 

Current as of September 2012
Internet Citation: Bone Morphogenetic Protein: The State of the Evidence for On-Label and Off-Label Use: Disposition of Comments. September 2012. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/research/findings/ta/comments/bmpetab1.html