CommentaryClarifications on the application and interpretation of the test for excess significance and its extensions
Section snippets
Nomenclature
Francis uses the terms consistency and inconsistency and defines the test as examining the consistency of a set of reported experiments (Francis, 2013). I am afraid that these terms may create some confusion in the literature. The terms “consistency” and “inconsistency” are used interchangeably with the terms “homogeneity” and “heterogeneity” in the field of meta-analysis (Higgins, Thompson, Deeks, & Altman, 2003), and TES is applied typically when many studies and meta-analyses thereof are
Definition of body of evidence
Francis has typically applied the test to probe for bias in sets of multiple experiments published by the same team in the same paper. The experiments are not necessarily the same, but may deviate in important aspects that may or may not induce also differences in the genuine effect sizes. The number of studies included in such bodies of evidence is usually relatively small, often <10. Nevertheless TES always shows that there are too many significant results, because in the examples that
Definition of plausible effect size
TES results depend on the assumptions about the plausible effect size, since these directly affect the power estimates for each study. This is a clear limitation, but, as Francis shows, the conclusions tend to be fairly robust when different assumptions are made about the plausible effect size within a sensible range. I would like to add here some additional considerations. First, it is possible to perform power calculations assuming a distribution of a plausible effect instead of a
Definition of nominal statistical significance threshold
Francis has used the threshold to separate “positive” from “negative” results. This threshold acts as an attractor for investigators in many fields (Bakker et al., 2012, Simmons et al., 2011), but it is not absolute. Some fields increasingly require more stringent thresholds and/or use multiplicity-corrections, some investigators may bias the results of their analysis too much and strike to get -values much below 0.05, and investigators occasionally make leaps to claim significance for
Separating mechanisms of reporting bias
There are many mechanisms of selective reporting. I agree with Francis that fabrication bias, i.e. clear fraud, is unlikely to be a major player in most scientific fields. However, I also doubt that classic publication bias is the main explanation for excess significance in most fields. Classic publication bias means that “negative” results entirely disappear (by authors and/or editors/reviewers). The prevalence of this bias may vary across different scientific fields, proportional to the ease
Proteus phenomenon and complex bias patterns
The notion that reporting biases always favor “positive” over “negative” results is an over-simplification. Incentives for reporting (or not) specific types of results may vary. Occasionally “negative” results may be more attractive to obtain and publish than “positive” results. For example, if a study publishes a prominent observation in a major journal, other scientists may wish to contradict it. A strong contradiction may be attractive also to editors and reviewers. This generates the
Post-test probability of bias
The probability of bias in a body of evidence depends not only on the results of the TES but also on the prior probability of such bias. TES can be seen as a diagnostic test with some sensitivity () and specificity . The post-test odds of bias is: where and are positive and negative likelihood ratios, respectively.
As shown in Table 1, extrapolating
Correcting for bias
Assuming that TES is “positive” and bias does exist, the true effect is likely to be smaller than what is observed, but it is not necessarily null. Empirical pragmatic examples based on the exchange of information between Francis and authors whose papers tagged “positive” TES, prove that TES did indeed pick the presence of biases, at a minimum classic publication bias with some studies being unpublished. Selective analysis and related questionable research practices may be more difficult to
Acknowledgment
I am grateful to Greg Francis for supplying the raw values for figures 4 and 5 of his paper.
References (33)
- et al.
Perceived information gain from randomized trials correlates with publication in high-impact factor journals
Journal of Clinical Epidemiology
(2012) Replication, statistical consistency, and publication bias
Journal of Mathematical Psychology
(2013)- et al.
Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials
Journal of Clinical Epidemiology
(2005) - et al.
Statistically significant meta-analyses of clinical trials have modest credibility and inflated effects
Journal of Clinical Epidemiology
(2011) - et al.
The rules of the game called psychological science
Perspectives in Psychological Science
(2012) - et al.
Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes
JAMA
(2010) - et al.
Replicating genotype-phenotype associations
Nature
(2007) - et al.
Comprehensive field synopsis and systematic meta-analyses of genetic association studies in cutaneous melanoma
Journal of the National Cancer Institute
(2011) - et al.
Comparison of protocols and registry entries to published reports for randomised controlled trials
Cochrane Database of Systematic Reviews
(2011) “Positive” results increase down the hierarchy of the sciences
PLoS ONE
(2010)