The statistician's pageWhat is the Value of a p Value?
Section snippets
A p-Value Primer
Statistics is not a unified science [6]. There are fundamentally different approaches whose advocates argue, on philosophical and epistemological grounds, about their relative merits [5, 7, 8]. The typical study collects data to investigate a possible difference in an outcome variable that is caused by a risk factor or intervention. The statistical conclusions are reached indirectly—using inductive reasoning—by “disproving” a null hypothesis. The null hypothesis usually states that there is no
A True Story
We are among those fortunate biostatisticians who regularly interact with scientifically sophisticated research surgeons. So, when one of them makes a statistical misstatement, we assume that many other clinicians would make the same mistake and consider it an opportunity for an educational effort. Recently, it happened again: on a conference call in which we participated, one of our senior surgeons was discussing a statistical test of significance that resulted in a p value of 0.08, and he
Conditional Probability
A conditional probability is one that is modified by an “if …” or a “given that …” condition. The p value is a conditional probability: It is the probability of observing the observed data (plus other data that is at least as extreme as that observed) given that the null hypothesis (Ho) is true. This can be written in a compact notation: p value = Prob(data | Ho), where Prob means probability, and the vertical line means given. In words, this equation says that “the p value equals the
A Fictional Story
Probability concepts are often illustrated with examples from games of chance—not inappropriately, because the desire for gambling success sponsored the birth of probability theory [14]. In the familiar coin toss experiment, a fair coin is defined as one with a 50% probability of heads (Ho—the null hypothesis). Suppose you undertook an experiment to determine whether a particular coin was fair by tossing it 10 times, and the result was 9 heads. Is that enough evidence to reject this hypothesis
Diagnostic Testing for Coronary Artery Disease
Because we must abandon the coin toss example, where shall we turn to continue our evaluation (devaluation) of the p value? We have intimated that Bayesian analysis is required to produce the desired inverse probability. So let us move to a well-accepted clinical application of Bayesian reasoning—diagnostic testing for coronary artery disease (CAD)—and take advantage of its close connection with hypothesis testing [16, 17]. We will examine a series of patient scenarios to determine the
Bayes to the Rescue
Besides offering the proper paradigm for interpreting diagnostic tests, the Bayesian approach is equally essential in evaluating the results of clinical studies. The main objection is that there is a subjective element. A prior distribution needs to be specified (prevalence) for the variable of interest, before the study begins—and one study's prior prevalence might be different than another's—yet, good science should be completely objective. But Bayes' methods are more practical, more
Operative Mortality With Aprotinin
The essence of the Bayesian approach is that the purpose of an experimental study is to modify current beliefs rather than to be interpreted in complete isolation of preexisting knowledge and experience. We are allowed (obligated) to interpret current study findings in light of previous knowledge, just like we all do in everyday life when we interpret new evidence in light of prior experience.
To exemplify, we will reexamine the results of a recent study of the antifibrolytic drug aprotinin
Comment
Even before you started reading this expose of the overrated p value, you must have wondered about some of its readily apparent shortcomings:
- 1
Any small difference, no matter how clinically unimportant, will be statistically significant (p < 0.05) if the sample size is large enough.
- 2
Any large difference, no matter how clinically important, will be not be statistically significant (p > 0.05) if the sample size is too small.
- 3
Because of 1 and 2, a low p value in a small study is more evidential than
References (35)
- et al.
Collective contributions of women to cardiothoracic surgery: a perspective review
Ann Thorac Surg
(2001) Believability of clinical trials: a diagnostic testing perspective
J Thorac Cardiovasc Surg
(2006)- et al.
Sensitivity and specificity should be de-emphasized in diagnostic accuracy studies
Acad Radiol
(2003) - et al.
What are the odds?
Ann Thorac Surg
(2007) The alternative hypothesis: one-sided or two-sided?
J Clin Epidemiol
(1989)- et al.
A farewell to P-values
Crit Care Resusc
(2004) Two cheers for P-values?
J Epidemiol Biostat
(2001)- et al.
Why P values are not a useful measure of evidence in statistical significance testing
Theor Psychol
(2008) From statistics to statistical science
Statistician
(1999)Comment on Bayarri and Berger
Bayesian Stat
(1999)