Original articlesCombined evidence from multiple outcomes in a clinical trial
Introduction
If a single outcome measure is clearly more important than all other outcome measures, then this most important measure may decide which treatment should be chosen for future patients. In some clinical trials, however, treatment groups are compared with respect to several outcome measures that are about equally important to patients. In that case the main outcome measures may be replaced by a single-summary measure that totals favorable and unfavorable results. Although this article focuses on quantitative outcome measures, in this introduction dichotomous outcomes also are considered to clarify the line of thought.
A zero–one variable, also called a dummy or indicator variable, may be used to indicate the absence (0) or presence (1) of a certain outcome. If there are two or more important dichotomous outcomes, these outcomes may be combined in several ways. A new dichotomy may indicate whether or not at least one of the outcomes is present in the patient under consideration, or a sum score may be used, or some ordinal scale may be adequate. This is explained in the next paragraphs.
As an example, consider a randomized clinical trial that compares two methods of treatment of the appendix stump after appendectomy. If the possible outcomes “wound infection” and “postoperative ileus” are considered equally important, these two outcome measures “infection” and “ileus” may be replaced by the single outcome measure “infection or ileus,” defined as presence of infection or ileus or both, that is presence of at least one of the unfavorable outcomes. The combined outcome measure “infection or ileus” should indicate which treatment is best. If the one treatment reduces the risk of infection from 8% to 4% and the risk of ileus from 2% to 1%, compared to the other treatment, then the combined risk is about reduced from 10% to 5%. In this case, the combined outcome has better power than each of the separate outcomes; in other words, the combined risk requires a slightly smaller study size. However, if the risk of ileus is 2% for both treatments, the combined risk is reduced from 10% to 6% and goes with less power than the risk of infection alone; in other words, the combined risk requires a slightly larger study size. If it is known beforehand that both treatments have the same risk of ileus, ileus should not be considered an important outcome and it should not be incorporated into a combined outcome measure, since this outcome would not help to choose between treatments. Of course, the combined outcome should be defined in the research protocol, before any data are gathered.
As another example, consider a randomized clinical trial where patients with a high risk of stroke are treated with either aspirin or placebo. It is expected that aspirin substantially reduces the risk of a major cerebral infarction, but it is also expected that aspirin slightly increases the risk of a major cerebral bleeding. The combined risk of infarction or bleeding goes with lower statistical power than the risk of infarction alone. But it is more honest to compare treatments regarding the combined risk, that is infarction or bleeding, thus simultaneously taking account of advantages and disadvantages in a single statistical test (logrank or chi square).
A sum score may be better if important outcomes are present in many patients. Absence or presence may be coded as zero or one. Such zero–one variables may be totalled to create a sum score that represents the number of important outcomes (events) in a patient, within a certain time period; a sum score may be used in a nonparametric statistical test.
If a treatment has serious side effects, the following ordered outcome scores may be used to evaluate each patient in a global way.
- 1.
Score 1: Serious side effects and no beneficial effect, or side effects are much more important than beneficial effects.
- 2.
Score 2: Side effects are (slightly) more important than beneficial effects.
- 3.
Score 3: Balance. Side effects and beneficial effects are (about) equally important, or both are absent.
- 4.
Score 4: Beneficial effects are (slightly) more important than side effects.
- 5.
Score 5: Great beneficial effect and no side effect, or beneficial effects are much more important than side effects.
Of course, in a particular trial this ordinal scale may be adapted and clarified according to the research question in that trial. The combined outcome measure should be defined in the research protocol, in a way that best reflects what is important for patients admitted to the trial. Moreover, it must be decided whether the outcome is best assessed by the patient or the physician or both.
It may be unavoidable that some patients withdraw before the end of the planned follow-up period. Withdrawals for reasons that are certainly not related to the outcome or to the treatment (moving away, not randomized, incorrectly admitted to the trial) can be excluded from the statistical analyses, since their exclusion would not bias treatment comparison 1, 2, 3. Withdrawals for reasons that may be related to outcome or to treatment, however, should be included in the statistical analysis. Patients who withdraw because of serious side effects (or unpleasant trial procedures, or death) or lack of any beneficial therapeutic effect demonstrate the worst possible outcome [1]: Score 1 in the previous paragraph, or even Score 0 as an extension of the scoring system. Patients who withdraw because of early recovery demonstrate the best possible outcome [1]: Score 5 in the previous paragraph, or even Score 6 as another extension of the scoring system. In some trials, however, the scoring system may be reduced to a dichotomy that just distinguishes treatment successes and treatment failures [3]. In case an investigator insists on comparing group means of a quantitative outcome measure in study completers, this investigator should also compare group proportions of withdrawals due to unfavorable reasons [2].
Further in this article only quantitative outcome measures are considered. These quantitative measures may have different standard deviations and, therefore, should not be totalled in a straightforward manner. Moreover, there may be clusters of similar outcomes that are highly correlated, which should be taken into account.
A simple procedure consists of the following four steps: (1) the primary outcome measures are stated in the research protocol; (2) similar measures are replaced by their average; (3) each outcome measure is properly standardized to take account of different standard deviations; and (4) a mean summary measure is computed for each patient in the trial and then used in a two-sample t-test. An example demonstrates the computational simplicity of the procedure.
The summary measure combines all the evidence and it has great power if the separate outcome measures show about the same treatment effect. But the summary measure may have poor power if some outcome measures show a much smaller treatment effect than other outcome measures. If the outcome measures point in different directions, the overall conclusion may be that the investigated treatments are about equivalent. A sample size formula is presented that takes account of the mean treatment effect on the outcomes in the summary measure, the number of outcomes, and their mean correlation. If conclusions are drawn regarding separate outcomes, there is a multiple comparisons problem, and P-values may be adjusted according to Bonferroni or Hochberg.
Section snippets
Example
Seventy-two patients with acute lateral distortions of the ankle (sprained ankle) were treated with an ointment [4]. In a randomized study, n1 = 36 patients received the active treatment and another n2 = 36 patients were treated with a placebo. The summary measure procedure consists of four steps. The present section contains the first two steps that are basic and very generally applicable. The next section contains the third and fourth step.
Summary measure procedure
In step 2, each cluster of highly correlated outcomes is replaced by a new outcome measure that represents the cluster; step 2 may be omitted if there are no such clusters. In steps 3 and 4 the new outcome measures are combined into a global summary measure that is used in a two-sample t-test.
Smallest clinically relevant treatment effect
It is assumed that there is no bias in a well-designed experiment (randomization, blinding, analysis by intention to treat, and so on). Regarding a certain outcome measure, μ1 and μ2 denote the expected means in the treatment groups, and μ1−μ2 denotes the true difference in effectiveness between the treatments; for Pain this may be 5 points on the visual analogue scale, which is about half the standard deviation. The standardized difference δ = (μ1−μ2)/σ is the expected difference in treatment
Discussion
The ultimate goal of a clinical trial is to decide which treatment should be chosen for future patients. It may be hard to make a scientifically valid choice if there are many important outcome measures that may create a diffuse picture. The research protocol should describe a clearly structured statistical analysis, with probability .05 (or less) of a false significant result and sufficient power under sensible hypothesized treatment effects. The recommended first step is that the research
References (17)
- et al.
Use of factor analysis to consolidate multiple outcome measures in chronic obstructive pulmonary disease
J Clin Epidemiol
(1991) - et al.
Extensions of multiple testing procedures based on Simes' test
J Stat Plan Inference
(1995) A new approach to the analysis of clinical drug trials with withdrawals
Biometrics
(1980)- et al.
Testing for treatment differences with dropouts present in clinical trials—a composite approach
Stat Med
(1997) Clinical trialsa practical approach
(1983)- et al.
Procedures for two-sample comparisons with multiple endpoints controlling the experimentwise error rate
Biometrics
(1991) Exact t and F-tests for analyzing studies with multiple endpoints
Biometrics
(1996)Planning group sizes in clinical trials with a continuous outcome and repeated measures
Stat Med
(1999)
Cited by (10)
Single versus multiple drug focus in substance abuse clinical trials research
2003, Drug and Alcohol DependenceHow to report multiple outcome metrics in virtual reality simulation
2015, European Surgery - Acta Chirurgica AustriacaComposite Pain Index: Reliability, Validity, and Sensitivity of a Patient-Reported Outcome for Research
2015, Pain Medicine (United States)Statistical Rules of Thumb: Second Edition
2008, Statistical Rules of Thumb: Second Edition