Romosozumab, an anti-sclerostin antibody, has been approved for prescription in the USA, Europe, Japan, and South Korea to treat patients with osteoporosis who have a high risk of fracture. The Active-Controlled Fracture Study in Postmenopausal Women with Osteoporosis at High Risk (ARCH) trial randomly assigned women to receive alendronate or romosozumab for 12 months. It found a difference between the alendronate and romosozumab group in the incidence of major cardiovascular events (CVD), also defined as adjudicated major adverse cardiovascular events (MACE) [1]. After 12 months, those in the alendronate group continued alendronate for a median of 33 months and those assigned to romosozumab switched to alendronate for the duration of the trial.

When a randomized trial with balanced baseline characteristics of subjects finds a difference in rates of events, such as CVD, between two active drugs, such as alendronate and romosozumab, there are three explanations: (1) the difference is attributable to chance, (2) alendronate reduced the rate of CVD, or (3) romosozumab increased the rate of CVD. As the probabilities of these alternatives must all sum to 1.0, a decrease in the probability of one of these explanations increases the probability that the other alternatives are true.

Importantly, the likelihood that any one of these explanations is true depends on other information. This Bayesian approach to the interpretation of trials is similar the Bayesian interpretation of diagnostic tests: the result of a test must be interpreted in the context of other information [2]. For instance, the probability that a positive test is true depends on the prior probability that the patient has the disease based on other information. If the prior probability is low, then a positive result of the test is more likely to be false. Similarly, the probability that a result of a trial is true critically depends on other information. The probability that a result is true, for example, that alendronate reduced CVD rates, depends on the prior probability that alendronate reduced CVD rates in other randomized trials. Although it is difficult to quantify these probabilities, other information is critical to judging which alternative explanation is true.

Alternative explanations for the difference in rates of CVD events in ARCH

Consider the alternative that the difference is attributable to chance. The number of serious CVD events and participants in each group was 38/2014 vs. 50/2040, with a HR = 1.32 and 95% confidence interval (C.I.) of 0.87 to 2.01 [1, 3] having a non-significant p value of 0.20. The data were also analyzed as a traditional subgroup of events, called “Major Adverse Cardiovascular Events (MACE)” by removing non-coronary vascular events and heart failure. There is no apparent biological mechanism of the drugs to support analyzing any subgroup of CV events. However, since non-coronary vascular events and heart failure were more numerous in the alendronate than romosozumab group, the ratios for MACE were 22/2014 vs. 41/2040, with a hazard ratio of 1.87 and 95% C.I. = 1.11 to 3.17 [3] having a p value of 0.02. The confidence intervals and p values must be put into perspective. The probability that the difference is due to chance must be interpreted in the context of other information about the alternative explanations.

Consider the alternative that alendronate reduced the rate of CVD events. A prior meta-analysis of randomized trials found no effect of bisphosphonates on the risk of CVD events (relative risk [RR] = 1.03, 95% confidence interval, 0.91 to 1.17) [4]. The largest trial of alendronate that reported CVD events, FIT-I, found no effect of alendronate on CVD events over 3 years (RR = 0.99; 95% CI = 0.80–1.22). Importantly, these results mean that if comparison of alendronate with placebo finds a relative risk of CVD with alendronate that is outside those confidence limits, it is probably due to chance. Combining the FRAME and ARCH data, the FDA performed a network meta-analysis showing that 12 months of alendronate reduced the risk of MACE by 45% (hazard ratio = 0.55, 95% CI 0.27, 1.14) after adjustment for potential confounders [5]. This hazard ratio is well outside the 0.91 lower 95% confidence limit for the previous meta-analysis and 0.8 for the FIT-1 clinical trial, strongly indicating that a lower rate of MACE or CVD events in the alendronate group ARCH is attributable to chance.

Other data suggest that long-term bisphosphonates might reduce the risk of vascular events. A placebo-controlled trial of zoledronate, a bisphosphonate, reported trends toward fewer myocardial infarctions (odds ratio = 0.61, 95% CI, 0.36–1.02) and fewer vascular events (odds ratio = 0.76, 0.52–1.09) compared in the placebo group (hazard ratio 0.60 [95% CI, 0.36 to 1.00]); however, the effect emerged over 6 years, not 12 months [6]. There is evidence that alendronate reduces arterial calcification that could manifest as a sustained reduction in the rate of CV events [4]. However, there is no support that alendronate would rapidly but only transiently reduce the risk of CVD events.

Consider the alternative that romosozumab increased the rate of CVD events. The large FRAME placebo-controlled trial found no significant difference between the rates of MACE events that had been adjudicated: 46 (1.3%) in the placebo and 46 (1.3%) in the romosozumab group [7]. Mechanistic studies by Amgen and UCB found no biological mechanism for an effect of romosozumab on atherosclerosis or thromboembolic events [3]. These data reduce the probability that romosozumab increases the risk of CVD events.

The pattern of events over time

The temporal pattern of events during the ARCH trial also provides information that influences the probability that an alternative is true. Figure 1 illustrates the general pattern of events that would be expected if one treatment (A or B) influences the rate of an event from the onset of treatment and throughout the treatment period. Note that the rates of events in the two groups diverge such that the ratio of the two rates remains constant.

Fig. 1
figure 1

Idealized representation of the results of a trial comparing the cumulative incidence of events in two groups, when the groups differ by a constant ratio of events in group A to group B

Figures from the FDA hearing about romosozumab reveal pattern of MACE events in the first 12 and the entire 36 months of follow-up [3]. The ratio of MACE events in the two groups varies over time. The pattern of CVD events in ARCH would indicate that alendronate provided a potent immediate but transient reduction in CVD events (Fig. 2). There are essentially no events in the alendronate group during the first 3 months, with fewer events than in the romosozumab for the remainder of the 12 months. The rate of CVD events increases steadily thereafter roughly in parallel, or converging on, the rates in the group receiving alendronate after romosozumab (Fig. 3). As noted, there is no precedent or mechanism of action for alendronate that would account for a potent, immediate but transient cardio-protective effect of alendronate. This further decreases the probability that the difference is due to alendronate.

Fig. 2
figure 2

Cumulative 12-week incidence of MACE CVD events in the alendronate and romosozumab groups in the ARCH Trial. Accessed at https://www.fda.gov/advisory-committees/advisory-committee-calendar/january-16-2019-meeting-bone-reproductive-and-urologic-drugs-advisory-committee-meeting-announcement; Amgen presentation: Cardiovascular Safety

Fig. 3
figure 3

Cumulative 36-week incidence of MACE CVD events in the alendronate and romosozumab groups in the ARCH Trial. Accessed at https://www.fda.gov/advisory-committees/advisory-committee-calendar/january-16-2019-meeting-bone-reproductive-and-urologic-drugs-advisory-committee-meeting-announcement; Amgen presentation: Cardiovascular Safety. The overlying lines project the cumulative incidence in each group assuming rates observed in the first 12 months continue for 36 months, after transition of the romosozumab group to alendronate

When participants in the romosozumab group were switched to alendronate, the rate of CVD events did not change (Fig. 3) further decreasing the probability that difference in CVD events during 12 months of treatment is attributable to romosozumab. The patterns of CVD events with alendronate were inconsistent: starting alendronate after romosozumab did not reproduce the acute and substantial reduction in CVD events as was observed in the first 12 months in the alendronate group (Fig. 3). This inconsistency in the pattern of events in the alendronate group reduces the probability that the difference during the first 12 months is attributable to alendronate.

In summary, which explanation for the difference in CVD rates in ARCH is most likely depends on other data. Limited data support that alendronate might reduce CVD; however, they provide no support for an acute and transient benefit of alendronate. The difference in all CVD serious adverse events was not statistically significant. The estimated 45% reduction in subgroup of MACE events for alendronate vs. placebo is outside the confidence limits for previous trials, strongly suggesting that the difference with alendronate in ARCH is due to chance and that the nominal p values and confidence limits for the difference underestimate the probability that the difference is due to chance. The absence of an effect in a placebo-controlled trial of romosozumab and the fact that rates of CVD do not change when romosozumab is discontinued substantially reduces the probability that the difference is attributable to romosozumab and further increase the probability that the difference is due to chance. Together, these other data indicate that the difference in rates of CVD between alendronate and romosozumab in the ARCH trial is probably due to chance.