Disposition of Comments
Project ID: BMPE0109
Table 1: Invited Peer Reviewer Comments
The summary is long and hard to follow. It presents an overview of the
methods rather than findings. The executive summary table lacks any synthesis
of study size, source, quality, and outcomes. The text within the “conclusion
column” re-iterates methods and states a general conclusion for each
key question, without giving constructive information and basis for the conclusion.
The Executive Summary – like any summary or abstract – synopsizes
methods to let the reader know how the assessment was done. The conclusions
column in the summary table provides a high-level synthesis of the evidence
review for each Key Question based on the AHRQ-modified GRADE framework.
This takes into account study size, source, outcomes, and quality based on
the USPSTF criteria.
- The introduction provide good general context for BMP. The assertion
that age influences fracture healing needs to be supported by citations
(page 13, paragraph 5).
- Animal studies supporting bone formation properties of BMP are
- The FDA approval studies and data would be helpful, such as FDA
summary and effectiveness data and specific FDA prescription, training,
and labeling information as an appendix.
- A reference citation was provided on page 13, paragraph 5 regarding
patient age as a factor in bone healing.
- Animal studies are outside the scope of the assessment.
- Published data describing results from the FDA pivotal trials
are included and assessed in the report.
- The search strategy was extensive. The selected time
window of 1998-2009 is not explained but seems reasonable, given the FDA
- The patient population description is vague. Again, reference
to the populations studied in the FDA approval application may be helpful.
- Separating fracture and spine studies is critical, but the methods do
not describe this or explain why not.
- “DDD” is not a skeletal
bone defect, and neither is an arthrodesis procedure.
- The discussion of
radiographic outcomes for both fractures and spinal fusion needs more details
and referencing. FDA definitions would be helpful, both for radiographic
success and clinical success.
- Neurological status outcomes description
is not specific enough for application to literature review and it contains
- Methods described for harms assessment (key questions
7 and 8) are particularly problematic. Assessment of harms is very limited.
This aspect of the review is perhaps the most important from patient perspective,
and the information presented does not synthesize well the safety characteristics
of BMP. McMaster and AHRQ are cited as the source for the harms ascertainment,
but even these modified questions are not addressed in the body of the
report or in tables 36 and 37. How were harms defined and ascertained in
the FDA studies? How were they defined and ascertained in the published
studies? How do they compare across BMP-products? How do they compare in
on-label vs. off-label applications? These questions are not answered by
the report. Although Table 36 and Table 37 contain a lot of information,
the information is not well-organized. The data are not synthesized in
any structured way. The systematic harms ascertainment methods advocated
by AHRQ would have been very helpful (AHRQ series paper 4: assessing harms
when comparing medical interventions: AHRQ and the effective health-care
program. Chou R, Aronson N, Atkins D, Ismaila AS, Santaguida P, Smith DH,
Whitlock E, Wilt TJ, Moher D. J Clin Epidemiol. 2010 May;63(5):502-12.
Epub 2008 Sep 26. PMID: 18823754).
- The methods for data abstraction, verification, and creation
of evidence table should be more specific and clear. Exactly what constituted
primary data for the different study designs? How were accuracy checks
- Use of USPSTF study quality criteria in Appendix 5 is very helpful;
condensing the criteria into “good-fair-poor” categories makes
it hard to evaluate the studies in the body of the report is less useful.
The low rate reporting on how harms were identified needs to be highlighted
in the body of the report; if this is not described, the remaining five
characteristics are nor really relevant.
- Assessment of applicability is difficult to interpret based on
the description in the methods. Rather than describing the EPICOT framework
in a general sentence, it would be more useful to compare the study populations
to the patients enrolled in the FDA approval studies. Also, separating
spine and tibia studies would provide more clear applicability assessment.
The report has minimal figures; the two on decision modeling are clear.
- The search was designed to take into account the FDA approval
- The population description was broad, patients with a bony
defect that requires repair. However, the KQs address specific indications
for on-label uses, leaving the off-label uses less specific because it
was unclear at the beginning what would be found in the literature.
and fracture results are reported separately in the Results.
- The reviewer
is correct in that “DDD” is not a skeletal
bone defect nor is an arthrodesis procedure. These errors are corrected
in the text.
- Radiographic outcomes reported from included articles
conform to FDA definitions and are in accord with standard use.
neurological status outcomes description is reported as it is used in the
Neck Disability Index instrument.
- A major conclusion of this report
is that the quality of reporting of harms in the literature is inconsistent.
Our team systematically culled out harms data from every included comparative
study and looked at noncomparative studies for those data (compiled in
the Appendix). We included noncomparative studies because of the known
limitations of harms reporting in RCTs. The results we compile represent
what was actually reported (Table 36). The results of our modified McHarms
survey highlight the limitations of the reporting (Table 37), which was
the purpose of KQ8. Given the inconsistency and lack of comparability across
studies, quantitative synthesis of the data is not valid. We state that
the absence of harms reporting in a study does not necessarily provide
evidence of the absence of harms. We agree about the importance of harms
to patients, and believe our assessment underscores the weakness of evidence
in the literature and the need for better collection and analysis of such
data using validated methods
- Data abstraction and analysis methods
are presented on page 26 of the draft. Study selection criteria are described
on page 25. Accuracy of abstracted data was verified by a second investigator
with differences resolved by discussion or third party intervention as
- We systematically abstracted all BMP-related harms from all
included articles, comparative and noncomparative. The quality of harms
reporting was addressed by KQ8 using a modified McHarms survey. The USPSTF
quality ratings are presented in the Appendix with annotation; overall
ratings are carried over into the report tables and text.
- It is
unclear why comparing the included study populations to those in the FDA
approval studies would make the report “more useful”.
The results of FDA pivotal trials are included in the assessment. Spine
and tibia studies are reported separately in the assessment.
- Appendix 1 contains an amazing collation of very important information.
The research team has done an outstanding job of assembling this information
in Tables A through P. However, this information is not captured or synthesized
well in the results section. The text is difficult to follow. Interventions,
populations, benefits, and harms are interjected variably.
- We appreciate the reviewer’s kind words about the collation
job we have performed. We disagree with the reviewer in that we do not
believe synthesis of the data using quantitative measures is applicable,
but rather would in most cases be inappropriate due to interstudy heterogeneity
and size differential. The Appendix data are qualitatively synthesized
according to KQ and indications in the body of the report.
- The text does not directly answer the question. For example,
the outcomes in Table 6 should be evaluated quantitatively, and the column
headings defined for each of the three studies. No comparison is made to
the FDA premarket approval studies. The text addresses study methods, patient
demographics, benefits, and harms, without directly comparing and contrasting
these features in succinctly. In part, this is due to lack of explicit
definition of success in the pre-specified work plan.
- Upon review of the draft, it became evident that the paper by Dawson et
al, 2009 reported an off-label use. It was moved to the appropriate section
and the text and tables were adjusted throughout the assessment to account
for this. As a result, we were left with two studies, one much larger than
the second, which would have overwhelmed if not negated the value of any
qualitative analysis. The assessment compiles the data systematically, synthesizes
it according to the AHRQ-modified GRADE convention, and reports it in that
||This section of the report is the most detailed, well written, and clear.
My only concern is the potential for bias in the source studies, charge/cost
estimates, and poor quality of source data for transition probabilities
The following sentences have been added to the Discussion and Conclusion
There was a limited evidence base for both open tibial fracture and spinal
fusion, each consisting of a single randomized controlled trial. Biases
may have existed in the source studies, for example possibly biased assessment
of outcomes would result in inaccurate transition probabilities.
||Summary and conclusions
- Conclusions are justified by the data. However, the potential
for sponsorship bias should be mentioned.
- Table 59 is again filled with large sections of text rather than
clear summary of numbers. More specific numbers rather than general estimates
such as “low/moderate” would help better understand the answers
to key questions. Study limitations are not addressed adequately in the
- Sponsorship bias is not characterized. While the high proportion
of industry sponsorship among the included studies suggests potential bias
exists, it was not systematically investigated or quantified.
- Table 59 is the same as in the Executive Summary, and all comments
on it were covered in the text above pertaining to that section of the
||I found the extensive use of abbreviations to make reading difficult, as I was constantly looking back for definitions. I would recommend keeping the most familiar or obvious abbreviations (BMP, RCT, FDA, QALY), but simply writing out for each use the terms that are abbreviated with less familiar abbreviations (unless the meaning is reiterated in every section where they are used). Examples that threw me were AGB, ALGB, ICBG, FRA, DBM, HA-TCP, DSP, and the like.
||Revisions will be made throughout the text to limit abbreviations.
It would have helped me to identify exactly what made certain trials off-label
use. The reason is that, at least theoretically, BMP might be effective
for one off-label use, but not for others. Here, all off-label uses seem
to be treated as one. For example, in off-label uses of BMP2 in the lumbar
spine, were studies off-label because they were used for more than one
spinal level, or because they were not for DDD, or for other reasons? Could
we separate trials of the different reasons they were off-label? Maybe
BMP works for spinal stenosis, say, but not for multi-level fusions. We
can’t get a sense of that here.
|Off-label uses in RCTs in the lumbar-sacral spine (table 23) were explicated
in the table, and text was adjusted throughout to reflect changes in table.
Trials were not otherwise separated. There are several different reasons
the trials in table 23 were deemed off-label. These include use of a nonapproved
formulation, or matrix, in conjunction with the approved rhBMP2 (InFuse®);
use of a non-anterior surgical approach with InFuse®; use of InFuse® with
a nonapproved interbody entity; and, use in multi-level fusion. While the
trials differed in rhBMP2 use, they were generally consistent in direction
of effect, with statistically significant findings for radiographic success
favoring BMP in three, including the two largest RCTs. This suggests the
off-label factor(s) does not affect the result. BMP appeared to have benefit
in this setting despite differences among the studies
|On page 33, there’s a funny typo: below table 5, discussing lumbar fusions,
there is a sentence that all patients had symptomatic single level DDD, but
includes arm pain as a symptom. Not for the lumbar spine, I don’t think
||Reference to arm pain was deleted
|In the cost-effectiveness analysis, it took me a while to tumble to the
baseline assumption that the costs of fusions with and without BMP were the
same, thanks to DRG bundling. It would be nice to make this point more explicitly
on page 81, since it is so counter-intuitive.
The last sentence below was added to make this clearer:
direct health care costs reported as Medicare payments from free publicly
available sources, valued in 2007 U.S. dollars (Tables 44–49). Cost categories
included initial hospitalization (hospital and physician costs) and secondary
interventions (hospital/outpatient surgical center and physician costs).
It was assumed that initial hospitalization was paid according to the diagnosis-related
groups (DRG) system. Thus, base case analyses assume identical initial hospitalization
costs whether BMP was used or not.
||Discussion/Conclusion: Again, it would be helpful if conclusions regarding off-label use could be itemized by indication (the reason for being off-label)
||Tables, Figures, Appendices
||I could easily have missed it, but I didn’t see mention of the Cahill article on complications of BMP that appeared in JAMA during the search period (Cahill KS, et al. Prevalence, complications, and hospital charges associated with use of bone-morphogenetic proteins in spinal fusion procedures. JAMA 2009; July 1; 302: 58-66). Is there some reason?
||Cahill et al presents a retrospective overview of complications associated with BMP use in spinal fusion, based on the Nationwide Inpatient Sample Database, a 20% sample of US community hospitals. It does not separate data according to BMP product, nor is it necessarily representative of BMP use in the US. It was excluded according to our predefined study inclusion criteria.
1 Peer reviewers are not listed in alphabetical order.
2 If listed, page number, line number, or section refers to the draft report.
3 If listed, page number, line number, or section refers to the final report.