Combined evidence from multiple outcomes in a clinical trial

doi:10.1016/S0895-4356(00)00238-9

Journal of Clinical Epidemiology

Volume 53, Issue 11, November 2000, Pages 1137-1144

https://doi.org/10.1016/S0895-4356(00)00238-9 Get rights and content

Abstract

Clinical investigators are encouraged to apply recently developed statistical methodology. For each patient in a trial, favorable and unfavorable results from multiple outcomes may be summarized in a suitable summary measure. This summary measure may be used in a two-sample t-test to decide which treatment is best. An example illustrates how the evidence from the main outcome criteria may be combined. The required study size depends on the mean treatment effect on the outcomes in the summary measure. When separate outcomes are considered, there is a multiple comparisons problem, for which Hochberg offered a simple solution. Evaluation of a single-summary measure may require a larger or a smaller study size than evaluation of separate outcomes, depending on whether treatment effects are about the same or very different.

Introduction

If a single outcome measure is clearly more important than all other outcome measures, then this most important measure may decide which treatment should be chosen for future patients. In some clinical trials, however, treatment groups are compared with respect to several outcome measures that are about equally important to patients. In that case the main outcome measures may be replaced by a single-summary measure that totals favorable and unfavorable results. Although this article focuses on quantitative outcome measures, in this introduction dichotomous outcomes also are considered to clarify the line of thought.

A zero–one variable, also called a dummy or indicator variable, may be used to indicate the absence (0) or presence (1) of a certain outcome. If there are two or more important dichotomous outcomes, these outcomes may be combined in several ways. A new dichotomy may indicate whether or not at least one of the outcomes is present in the patient under consideration, or a sum score may be used, or some ordinal scale may be adequate. This is explained in the next paragraphs.

As an example, consider a randomized clinical trial that compares two methods of treatment of the appendix stump after appendectomy. If the possible outcomes “wound infection” and “postoperative ileus” are considered equally important, these two outcome measures “infection” and “ileus” may be replaced by the single outcome measure “infection or ileus,” defined as presence of infection or ileus or both, that is presence of at least one of the unfavorable outcomes. The combined outcome measure “infection or ileus” should indicate which treatment is best. If the one treatment reduces the risk of infection from 8% to 4% and the risk of ileus from 2% to 1%, compared to the other treatment, then the combined risk is about reduced from 10% to 5%. In this case, the combined outcome has better power than each of the separate outcomes; in other words, the combined risk requires a slightly smaller study size. However, if the risk of ileus is 2% for both treatments, the combined risk is reduced from 10% to 6% and goes with less power than the risk of infection alone; in other words, the combined risk requires a slightly larger study size. If it is known beforehand that both treatments have the same risk of ileus, ileus should not be considered an important outcome and it should not be incorporated into a combined outcome measure, since this outcome would not help to choose between treatments. Of course, the combined outcome should be defined in the research protocol, before any data are gathered.

As another example, consider a randomized clinical trial where patients with a high risk of stroke are treated with either aspirin or placebo. It is expected that aspirin substantially reduces the risk of a major cerebral infarction, but it is also expected that aspirin slightly increases the risk of a major cerebral bleeding. The combined risk of infarction or bleeding goes with lower statistical power than the risk of infarction alone. But it is more honest to compare treatments regarding the combined risk, that is infarction or bleeding, thus simultaneously taking account of advantages and disadvantages in a single statistical test (logrank or chi square).

A sum score may be better if important outcomes are present in many patients. Absence or presence may be coded as zero or one. Such zero–one variables may be totalled to create a sum score that represents the number of important outcomes (events) in a patient, within a certain time period; a sum score may be used in a nonparametric statistical test.

If a treatment has serious side effects, the following ordered outcome scores may be used to evaluate each patient in a global way.

1.
Score 1: Serious side effects and no beneficial effect, or side effects are much more important than beneficial effects.
2.
Score 2: Side effects are (slightly) more important than beneficial effects.
3.
Score 3: Balance. Side effects and beneficial effects are (about) equally important, or both are absent.
4.
Score 4: Beneficial effects are (slightly) more important than side effects.
5.
Score 5: Great beneficial effect and no side effect, or beneficial effects are much more important than side effects.

Of course, in a particular trial this ordinal scale may be adapted and clarified according to the research question in that trial. The combined outcome measure should be defined in the research protocol, in a way that best reflects what is important for patients admitted to the trial. Moreover, it must be decided whether the outcome is best assessed by the patient or the physician or both.

It may be unavoidable that some patients withdraw before the end of the planned follow-up period. Withdrawals for reasons that are certainly not related to the outcome or to the treatment (moving away, not randomized, incorrectly admitted to the trial) can be excluded from the statistical analyses, since their exclusion would not bias treatment comparison 1, 2, 3. Withdrawals for reasons that may be related to outcome or to treatment, however, should be included in the statistical analysis. Patients who withdraw because of serious side effects (or unpleasant trial procedures, or death) or lack of any beneficial therapeutic effect demonstrate the worst possible outcome [1]: Score 1 in the previous paragraph, or even Score 0 as an extension of the scoring system. Patients who withdraw because of early recovery demonstrate the best possible outcome [1]: Score 5 in the previous paragraph, or even Score 6 as another extension of the scoring system. In some trials, however, the scoring system may be reduced to a dichotomy that just distinguishes treatment successes and treatment failures [3]. In case an investigator insists on comparing group means of a quantitative outcome measure in study completers, this investigator should also compare group proportions of withdrawals due to unfavorable reasons [2].

Further in this article only quantitative outcome measures are considered. These quantitative measures may have different standard deviations and, therefore, should not be totalled in a straightforward manner. Moreover, there may be clusters of similar outcomes that are highly correlated, which should be taken into account.

A simple procedure consists of the following four steps: (1) the primary outcome measures are stated in the research protocol; (2) similar measures are replaced by their average; (3) each outcome measure is properly standardized to take account of different standard deviations; and (4) a mean summary measure is computed for each patient in the trial and then used in a two-sample t-test. An example demonstrates the computational simplicity of the procedure.

The summary measure combines all the evidence and it has great power if the separate outcome measures show about the same treatment effect. But the summary measure may have poor power if some outcome measures show a much smaller treatment effect than other outcome measures. If the outcome measures point in different directions, the overall conclusion may be that the investigated treatments are about equivalent. A sample size formula is presented that takes account of the mean treatment effect on the outcomes in the summary measure, the number of outcomes, and their mean correlation. If conclusions are drawn regarding separate outcomes, there is a multiple comparisons problem, and P-values may be adjusted according to Bonferroni or Hochberg.

Section snippets

Example

Seventy-two patients with acute lateral distortions of the ankle (sprained ankle) were treated with an ointment [4]. In a randomized study, n₁ = 36 patients received the active treatment and another n₂ = 36 patients were treated with a placebo. The summary measure procedure consists of four steps. The present section contains the first two steps that are basic and very generally applicable. The next section contains the third and fourth step.

Summary measure procedure

In step 2, each cluster of highly correlated outcomes is replaced by a new outcome measure that represents the cluster; step 2 may be omitted if there are no such clusters. In steps 3 and 4 the new outcome measures are combined into a global summary measure that is used in a two-sample t-test.

Smallest clinically relevant treatment effect

It is assumed that there is no bias in a well-designed experiment (randomization, blinding, analysis by intention to treat, and so on). Regarding a certain outcome measure, μ₁ and μ₂ denote the expected means in the treatment groups, and μ₁−μ₂ denotes the true difference in effectiveness between the treatments; for Pain this may be 5 points on the visual analogue scale, which is about half the standard deviation. The standardized difference δ = (μ₁−μ₂)/σ is the expected difference in treatment

Discussion

The ultimate goal of a clinical trial is to decide which treatment should be chosen for future patients. It may be hard to make a scientifically valid choice if there are many important outcome measures that may create a diffuse picture. The research protocol should describe a clearly structured statistical analysis, with probability .05 (or less) of a false significant result and sufficient power under sensible hypothesized treatment effects. The recommended first step is that the research

References (17)

A.L. Ries et al.
Use of factor analysis to consolidate multiple outcome measures in chronic obstructive pulmonary disease
J Clin Epidemiol
(1991)
Y. Hochberg et al.
Extensions of multiple testing procedures based on Simes' test
J Stat Plan Inference
(1995)
A.L. Gould
A new approach to the analysis of clinical drug trials with withdrawals
Biometrics
(1980)
W.J. Shih et al.
Testing for treatment differences with dropouts present in clinical trials—a composite approach
Stat Med
(1997)
S.J. Pocock
Clinical trialsa practical approach
(1983)
W. Lehmacher et al.
Procedures for two-sample comparisons with multiple endpoints controlling the experimentwise error rate
Biometrics
(1991)
J. Läuter
Exact t and F-tests for analyzing studies with multiple endpoints
Biometrics
(1996)
H.J.A. Schouten
Planning group sizes in clinical trials with a continuous outcome and repeated measures
Stat Med
(1999)

There are more references available in the full text version of this article.

Cited by (10)

Single versus multiple drug focus in substance abuse clinical trials research
2003, Drug and Alcohol Dependence
Complex patterns of multiple substance use pose clinical and methodological challenges for substance abuse clinical trials research. To increase measurement precision and internal validity, the modal approach has been to target both treatment interventions and outcome assessment to a single class of abused substance. This strategy warrants reconsideration because it entails limitations in recruitment feasibility and generalization of study findings. This report reviews pros and cons of single versus multiple targeted drugs, suggests guidelines for choosing between these strategies and outlines methods for broadening the scope of substance abuse clinical trails to take abuse of multiple substances into account. We recommend that investigators consider moving away from a single drug focus in three ways. First, include systematic assessment of a wide range of psychoactive substance use throughout the trial and evaluate the impact of study treatments on use of all classes of drugs. Second, except where contraindicated, include patients who use and abuse multiple classes of substances even in trials evaluating treatment of a single targeted drug. Third, consider inclusion of polysubstance abusers or those who primarily abuse multiple classes of substances in the same clinical trial. Although many treatment efficacy questions can best be answered by single focus studies, we recommend that such designs be adopted only after less restrictive designs are first considered.
How to report multiple outcome metrics in virtual reality simulation
2015, European Surgery - Acta Chirurgica Austriaca
Composite Pain Index: Reliability, Validity, and Sensitivity of a Patient-Reported Outcome for Research
2015, Pain Medicine (United States)
Statistical Rules of Thumb: Second Edition
2008, Statistical Rules of Thumb: Second Edition
Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar
2006, Medical Care
Chronic fatigue in complementary rehabilitative medicine - Predictors of the outcomes
2006, Rehabilitation

View all citing articles on Scopus

View full text

Original articlesCombined evidence from multiple outcomes in a clinical trial

Abstract

Introduction

Section snippets

Example

Summary measure procedure

Smallest clinically relevant treatment effect

Discussion

J Clin Epidemiol

J Stat Plan Inference

A new approach to the analysis of clinical drug trials with withdrawals

Biometrics

Testing for treatment differences with dropouts present in clinical trials—a composite approach

Stat Med

Clinical trialsa practical approach

Procedures for two-sample comparisons with multiple endpoints controlling the experimentwise error rate

Biometrics

Exact t and F-tests for analyzing studies with multiple endpoints

Biometrics

Planning group sizes in clinical trials with a continuous outcome and repeated measures

Stat Med

Original articles
Combined evidence from multiple outcomes in a clinical trial