Introduction

Causal inference in traditional observational epidemiological studies is hampered by the possibility of confounding and reserve causation [1]. Mendelian randomization (MR) is a method that can be used to uncover casual relationships between an exposure and outcome in the presence of such limitations. MR is a form of instrumental variable analysis, where genetic variants are used as proxies for the exposure of interest [2]. As Mendel’s Laws of Inheritance dictate, alleles segregate randomly from parents to offspring. Thus, offspring genotypes are unlikely to be associated with confounders in the population. In addition, germ-line genotypes are fixed at conception, and therefore, temporally precede the variables under observation, avoiding issues of reverse causation. The MR method involves finding genetic variants which are associated with an exposure, and then testing the association between these variants and the outcome. The causal “de-confounded” relationship between exposure and outcome can then be estimated when the necessary conditions are satisfied (Fig. 1a).

Fig. 1
figure 1figure 1

Design strategies for Mendelian randomization. a Standard MR: The causal relationship between an exposure variable (X) and an outcome (Y) is estimated using genetic variants (Z) as an instrument, regardless of the presence of variables (C) that may confound the observational association between the exposure and outcome. One method of estimation involves calculation of the Wald Ratio, [see Burgess review paper for description of the various instrumental variable (IV) estimators available] [3], where the causal estimate (\( {\widehat{\beta}}_{IV} \)) is derived by dividing the estimated regression coefficient of the outcome on the single nucleotide polymorphism (SNP) (\( {\widehat{\beta}}_{YZ} \)) by the estimated regression coefficient of the exposure on the SNP (\( {\widehat{\beta}}_{XZ} \)). b Two-sample MR. c Bidirectional MR. d Mediation and two-step MR. e Multivariable MR. f Factorial MR

In understanding how MR works, it can be useful to think of an MR study as being analogous to a randomized controlled trial (RCT), except that genotypes are used to randomize participants into different levels of the exposure/treatment. However, it is important to realize that this analogy is not perfect e.g., RCTs typically involve treatments over a short duration, whereas an individual’s genetics influences their biology from conception, meaning that many causal estimates from MR studies might reflect life-long exposures as well as developmental compensation that may arise from inheriting these mutations [4••].

Although initial applications of MR mostly focused on estimating the causal effect of environmental exposures on medically relevant outcomes, in recent years MR has found utility across a wide range of domains including the development of pharmaceutical agents (i.e., drug target validation, drug target repurposing, and side effect identification) and in the interpretation of high-dimensional omics studies. Table 1 lists several recent studies illustrating how MR has been used successfully across a wide variety of different contexts [537•, 106].

Table 1 Recent Mendelian randomization studies

Core Assumptions Underlying Causal Inference in Mendelian Randomization Studies

In order for a genetic variant to qualify as a valid instrument for causal inference in a MR study, it must satisfy three core assumptions:

  • Assumption 1: The genetic variant must be truly associated with the exposure (NB the SNP need not be the functional variant responsible for the SNP-exposure association). Typically, SNPs which pass genome-wide significance (P < 5 × 10−8) and have been replicated in an independent sample are used as instruments in MR studies. The use of weak instruments can bias MR estimates towards the confounded observational estimate in one-sample MR settings and towards the null in two-sample MR settings (with non-overlapping samples). As common genetic variants frequently explain a small proportion of a trait’s variance, it may be useful to combine the effects of many SNPs together in an allelic score and use this as an instrument in MR studies.

  • Assumption 2: The genetic variant should not be associated with confounders of the exposure-outcome relationship. Although it is technically impossible to prove that this assumption holds in a MR study, it may be possible to disprove it by examining the association between the variant and known confounders of the exposure-outcome relationship.

  • Assumption 3: The genetic variant should only be related to the outcome of interest through the exposure under study. This is commonly referred to as the “no pleiotropy” assumption or the exclusion restriction criterion. Horizontal pleiotropy, where a SNP is associated with multiple traits independently of the exposure of interest, potentially violates this assumption. While it is not possible to prove that this assumption holds in an MR study, various extensions of the basic MR design can be used to detect its presence, and estimate the causal effect of the exposure even in the presence of such violation of the assumption (see below).

Even when these core assumptions have been met, MR has a number of limitations which need to be considered (summarized in Table 2), and which have been discussed at length elsewhere [2, 45,47,48,49,49, 50•, 51•, 52•].

Table 2 Potential pitfalls in the interpretation of MR Studies and suggestions for dealing with these

Design of Mendelian Randomization Studies

The term MR covers a variety of approaches that use genetic variants to make inferences about the causal relationship between traits of interest [45, 52•]. Figure 1 illustrates some extensions to the basic MR design which are described in more detail in the paragraphs below.

Two-Sample Mendelian Randomization

Prior to 2011, most MR analyses were conducted using genetic instruments, exposure, and outcome of interest from individuals measured in the same sample (this is termed one-sample MR or single-sample MR). In such a scenario, the causal effect of the exposure on the outcome was typically estimated using 2-stage least-squares (2SLS) regression [53] (Fig. 1a). However, it is also possible to use MR to estimate causal effects where data on the exposure and outcome have been measured in different (or only partially overlapping) samples. This is known as two-sample MR [54•] (Fig. 1b). There are many advantages of using two-sample MR including in situations where it is difficult and/or expensive to measure the exposure and outcome in the same set of individuals (e.g., studies involving molecular gene expression data). Two-sample MR greatly increases the scope of MR analysis and continues to grow in popularity. For example, two-sample MR analyses can be performed on publicly available genome-wide association study (GWAS) summary data, a fact that has been taken advantage of by web software (and R packages) like MR-Base [55••]. Two-sample MR is understandably becoming increasingly popular in the research community. The percentage of all MR studies that used the two-sample design framework rose from close to 0% in 2011 to around 40% in 2016 [56].

Bidirectional Mendelian Randomization

In bidirectional MR, instruments for both exposure and outcome are used to evaluate whether the “exposure” variable causes the “outcome” or whether the “outcome” variable causes the “exposure” (Fig. 1c) [57•]. For example, in explaining the observational relationship between low levels of LDL cholesterol and risk of cancer, it may not be clear whether low levels of LDL cholesterol are causal for cancer, whether the presence of (undetected) cancer has a negative effect on LDL cholesterol, or whether the correlation between the two is due to latent confounding [58]. Bidirectional MR can help tease apart these relationships. MR analysis is first performed in one direction (i.e., “exposure” to “outcome”), and then performed in the opposite direction (i.e., “outcome” to “exposure”) using the SNPs robustly associated with each trait in the separate GWASs. The approach assumes that the causal association works through an underlying mechanism where it is possible to determine a single causal temporal direction. However, the complexity of biological systems, such as the existence of feedback loops between exposure and outcome variables, may make interpretation of the results of such analyses difficult [52•]. In these situations, it may be possible to use structural equation modeling to estimate feedback loops, although the properties of such approaches have yet to be examined thoroughly [51•].

Two-Step Mendelian Randomization

Two-step MR is used to assess whether an intermediate trait acts as a causal mediator between an exposure and an outcome [59•]. As shown in Fig. 1d, in the first step of the procedure, genetic instruments for the exposure are used to estimate the causal effect of the exposure variable on the potential mediator. In the second step of the procedure, genetic instruments for the potential mediator are used to assess the causal effect of the mediator on the outcome. Evidence of association in both steps implies some degree of mediation of the association between the exposure and the outcome by the intermediate variable. The magnitude of the direct effect (which is the effect of exposure on the outcome independent of the mediator) and indirect effect (which is the effect of the exposure on the outcome via the mediator) can be estimated separately by this method [60•]. However, this does require the assumptions of linearity and homogeneity for both the exposure-mediator and exposure-outcome relationships and no statistical interaction between exposure and mediator [60•]. Two-step MR and two-sample MR can be combined to facilitate the investigation of causal mediation in very large samples of individuals [50•].

Multivariable Mendelian Randomization

In some situations, genetic variants are pleiotropically associated with multiple correlated phenotypes. For example, genetic variants associated with lipoprotein metabolism rarely correlate with only one specific lipid fraction [61, 62•]. Single variable MR is likely to result in misleading conclusions regarding causality due to the presence of this horizontal pleiotropy. Multivariable MR is able to overcome this problem by using instruments associated with multiple exposures to jointly estimate the independent causal effect of each of the risk factors on the outcome (Fig. 1e) [63•, 64, 65]. For example, multivariable MR has recently been successfully employed in examining the relationship between high-density lipoprotein cholesterol and coronary heart disease. Univariate MR analyses, which ignore potential pleiotropic effects from other lipid fractions, suggest that increasing HDL levels lowers the risk of coronary heart disease. However, multivariable MR, which is able to account for SNPs’ pleiotropic effects through low-density lipoprotein and triglyceride levels [21•, 22], indicates that HDL is not causal for coronary heart disease, consistent with much of the evidence from randomized controlled trials [66,68,68].

Factorial Mendelian Randomization

The manner by which causes of disease act together to increase disease risk can have important public health implications, as above-additive effects act together to generate a greater burden of disease in the population [69]. Factorial MR can be used to determine the combined causal effects of the co-occurrence of two or more risk factors for disease [6•, 45] (Fig. 1f). In order to conduct factorial MR, individual level genotype data are required. For example, Ference et al. conducted a factorial MR study in order to investigate the effects of HMGCR and PCSK9 inhibition on CHD risk. In this study, a weighted genetic score for PCKS9 inhibition was constructed (with the weighting based on each SNP’s effect on LDL cholesterol levels) and participants were allocated into either a high or low inhibition group based on the median value of the PCSK9 score. The genetic score for HMGCR inhibition was constructed and the individuals were further allocated into groups based on the median value of the HMGCR score (Fig. 1f). The causal estimates for PCSK9 and HMGCR inhibition, and the combined effect of the two on CHD could then be determined. Results from this factorial analysis suggested that HMGCR and PCSK9 inhibition have independent effects on CHD, and act together in an additive manner to reduce CHD risk [16]. Another example of a factorial MR suggested that CETP inhibitors and statins were associated with decreased LDL-C and apoB levels and reduced risk of cardiovascular events. The reduction in CVD risk was proportional to the apoB reduction but less than expected for the LDL-C reduction [70].

Recent Developments

Resources for Performing Mendelian Randomization Analyses

MR, and in particular two-sample MR, provides a powerful, cost-efficient, and simple way to investigate potential causal relationships between many different human traits. Usefully, many GWAS consortia have made the results of their meta-analyses publicly available, greatly facilitating the running of such analyses [71,73,73]. For example, as a centralized GWAS data resource, Phenoscanner [74], can be used to search for genetic association across a large number of phenotypes. In addition, Ben Neale’s group have recently provided GWAS results of more than 2400 human traits based on up to 337,000 individuals from the latest UK Biobank release enabling two-sample MR analyses on a very large number of individuals (data can be downloaded from http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank). Several large-scale biobanks, such as the UK Biobank [75], the China Kadoorie biobank [76], and the HUNT study [77] allow researchers to apply for (a certain level of) genotype and phenotype information on large numbers of participants. These data sources can be used in one- or two-sample MR analyses when combined with other datasets. This idea led to the development of MR-Base [55••], which retrospectively collected, harmonized, and centralized complete GWAS summary datasets from the public domain. The curated summary data corresponds to 135 diseases, almost 2000 phenotypes in 1.5 million individuals and up to 4 billion SNP-trait associations, which is integrated with a software infrastructure (web interface, R package and API) for automating MR analyses. Therefore, MR-Base greatly increases the accessibility of GWAS summary results to other researchers, accelerates identification (discovery strand), prioritization (evidence synthesis strand), and evaluation (translational strand) of intervention targets.

Hypothesis-Free Investigations and “Mining the Phenome”

While there is obvious value in using MR to investigate the relationship between phenotypes for which causality has already been hypothesized, there is also an interest in detecting novel causal relationships. Hypothesis-free study designs such as genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS) have shown tremendous success in recent years, and there are some instances where this strategy has shown promise in detecting putative causal relationships between phenotypes [38•, 51•, 78].

In a recent “one exposure to many outcomes” MR application, Haycock et al. systematically examined the association between telomere length and 22 cancers and 32 primary non-neoplastic diseases. The results suggested that longer telomeres were generally associated with increased risk for site-specific cancers but reduced risk for some non-neoplastic diseases, including cardiovascular diseases. This study highlighted the power of hypothesis-free MR in building a phenome-wide picture of traits of interest as opposed to the traditional “one exposure to one outcome” MR approach [28•].

Automation and data repositories provide solutions to some of the challenges involved in hypothesis-free MR. They trivialize the process of performing the analysis itself, and go some way towards improving reliability by (a) reducing human error [56] and (b) promoting the use of appropriate sensitivity analyses [3, 42••, 43•, 79, 80•, 81•]. However, many challenges still remain. Statistical power in MR is an issue even in the hypothesis-driven case, but hypothesis-free MR comes with a multiple testing burden that may be highly problematic. The nature of the data used in hypothesis-free MR is quite different from other hypothesis-driven study designs. There is often only a single consortium providing summary data for any one disease or trait which means that replication of a putative association in independent samples can be impossible. The emergence of large biobanks [75] may go some way to avoid this problem for many complex traits, but specific diseases for which cases need to be ascertained will still pose a challenge.

Another practical issue surrounds selecting those results from a hypothesis-free scan that are worthy of follow-up. Horizontal pleiotropy can manifest in many different patterns, which means that knowing the appropriate MR method to use for any particular pair of traits is difficult. Relying on a single method could lead to missed associations through of being overly conservative when there is no pleiotropy, or result in too many false positives because of miss-specifying the pleiotropic model. One method that has been developed recently to address this issue is MR-MoE (MR mixture of experts), which seeks to predict the most appropriate model based on the characteristics of the summary data [82•].

Another potential analytical strategy to mine the phenome would be to screen large publicly available disease and multi-omic GWAS summary results for evidence of genetic correlation using LD score regression via LD hub [83••, 84•]. Here, if traits are causally related and have non-zero heritability then there should be non-zero genetic correlations. However, genetic correlations can arise due to genetic confounding and horizontal pleiotropy and do not provide evidence on the direction of causality. Those disease-omic pairs showing evidence of genetic correlation could be followed up by conducting formal MR analyses [85]. One potential drawback of this approach is that the statistical efficiency of LD score regression may not be as high as that of MR in many cases, so selecting the appropriate scenarios in which to apply this as a screening method is important and warrants a priori power calculations.

The Role of MR in Disease Progression and Treatment

To date, the large majority of GWAS identify genetic variants (SNPs) associated with incidence or risk of disease. Such variants are informative for disease prevention, but not necessarily for treatment aimed at influencing disease progression [86, 87•]. For example, only ~ 8% of genetic association hits in the GWAS Catalog (p < 1 × 10−5) were reported by studies that have attempted to identify variants associated with disease progression or severity, and most of these GWAS have limited statistical power owing to small sample sizes (90% have N < 5000) [87•]. In a systematic search of the literature, Paternoster et al. were able to identify only 27 genetic studies that have used MR to identify risk factors influencing disease progression [87•], which leaves massive scope to extend MR methodologies and applications in this area. The introduction of collider bias when studying a selected (e.g., case only) group of individuals [87•] is a particular challenge when studying disease progression (more details of collider bias are given in Table 3 and 4) [44•, 104].

Table 3 Databases and bioinformatic toolkits for performing MR
Table 4 Methods for dealing with limitations of MR

The Development of Approaches to Detect and Correct for Horizontal Pleiotropy in MR Analysis

The possibility of horizontal pleiotropy and the consequent violation of the exclusion restriction criterion are widely seen as the greatest threat to the validity of MR studies. Over the last few years, investigators have developed a suite of approaches that relax the strict requirement that genetic instruments exhibit no horizontal pleiotropy yet still produce causal effect estimates that are asymptotically consistent. These approaches often rely on different sets of assumptions to each other, meaning that if the results from all of these different analyses are largely consistent, then the investigator can be more confident in drawing conclusions regarding causality.

MR-Egger regression [42••] is one such approach where given a set of genetic variants that proxy an exposure variable of interest, estimates of the SNP-outcome association are regressed on estimates of the SNP-exposure association (this can be done in a one or two-sample MR framework), where each data point is weighted by the precision of the SNP-outcome coefficients. The slope of the weighted regression is an estimate of the causal effect of the exposure on the outcome. The intercept in this regression is free to vary, and the degree to which it departs from zero, is a function of the degree of directional pleiotropy present in the data.

The MR-Egger approach relaxes the requirement of no horizontal pleiotropy among the SNPs. Instead it assumes that there is no correlation between the gene-exposure association and the direct effect of the genetic variants on the outcome. This is referred to as the InSIDE assumption (Instrument Strength Independent of Direct Effect) and is a weaker requirement than the stricter exclusion restriction criterion. A drawback of the MR-Egger method is that it tends to suffer from low statistical power and is particularly susceptible to bias from weak instruments.

The weighted median estimator [94•] is a complementary method that permits up to 50% of the information in the MR analysis to come from SNPs that are invalid instruments. The mode-based estimate (MBE) further relaxes the assumption required for the weighted median approach and can estimate the causal effect when the most common pleiotropy value across instruments is zero [80•]. In addition, Bayesian modeling alternatives, such as Bayesian model averaging [81•], are under development, and may provide a framework to model pleiotropic effects and further relax MR assumptions, extending the scope of MR analysis.

In some circumstances, effect estimates are not consistent across independent instruments (e.g., with some genetic instruments showing unexpectedly large or small effects on the outcome, given the magnitude of their exposure effect), which could be indicative of horizontal pleiotropy. Formal statistical tests for heterogeneity can be used to assess this, such as Cochran’s Q statistic (for IVW) and the Rucker’s Q (for MR-Egger) [98, 103].

In addition to the above-mentioned methods, visual inspection can be helpful to identify pleiotropic variants (e.g., outlier detection). For example, funnel plots are used to display the MR estimate of individual genetic variants against their precision. Asymmetry in the funnel plot may arise due to some genetic variants having unusually strong effects on the outcome, which is indicative of directional pleiotropy [42••]. In addition, heterogeneous effects can be visualized by scatterplots of the gene-outcome and gene-exposure associations [94•] and forest plots of Wald ratios for each independent genetic instrument. In the leave-one-out plot, one SNP is removed at a time and the overall effect estimate is recalculated so that influential individual SNPs can be identified. As sensitivity analyses, all the above visualization methods are implemented in the MR-Base R package [55••]. Other graphical approaches have been proposed recently, such as the radial plot [99] and Q-contribution plots [103], which can further help to assess heterogeneity across genetic variants and detection of pleiotropic variants.

The Development of Approaches to Assess Instrument Strength

It is important to assess the instrument strength in order to avoid weak instrument bias in MR analysis. When weak instruments are estimated in GWAS with small sample sizes, MR approaches can violate the “NO Measurement Error” (NOME) assumption, which assume that the SNP-exposure associations (weights of the regression) are estimated without measurement error [43•]. For IVW, weak instruments that violate the NOME assumption can be reliably detected using the mean F-statistic [102]. For MR-Egger, the degree of violation of the NOME assumption can be quantified using the I2 statistic (IGX 2), a number ranging between 0 and 1, with higher values indicating less dilution of the causal effect estimate [43•].

Conclusion

MR is a flexible and robust statistical method which uses genetic variants as instrumental variables to detect and quantify causal relationships in observational epidemiological studies. In this review, we have endeavored to illustrate promising new findings and potential pitfalls of MR. The design strategies, assumptions, limitations, and potential of MR have been discussed. Given the growing availability of large-scale genetic resources and automated toolkits for implementing these methods, such as MR-Base and LD hub, we are now able to analyze all pairwise relationships within large multidimensional data sets in a hypothesis-free manner, producing evidence that can then be followed up in subsequent in-depth investigations.