Genetics: The next step in Mendelian randomization
Understanding how variations in our genome influence our susceptibility to diseases is one of the most compelling research topics in the life sciences. Researchers have used genome-wide association studies – experiments that analyze the DNA sequences of multiple individuals – to identify statistical relationships between genetic variants and specific human traits, such as susceptibility to a disease or various body parameters.
Despite the success of this approach, major challenges persist. First, associations between variants that are located close to each other within the genome can make it difficult to determine which of these genetic changes are responsible for the phenotype of interest (a problem called linkage disequilibrium). Second, even if specific variants can be identified, it is often not straightforward to determine the molecular mechanism by which they impact the trait (Tam et al., 2019).
To overcome these difficulties, studies often include information about other modalities such as transcriptomes, proteins and metabolites (Emilsson et al., 2008; Fraser and Xie, 2009; Nicolae et al., 2010; Wainberg et al., 2019; Schadt, 2009; Suhre et al., 2011). Some ‘multi-omic’ studies use one modality, or ‘layer’, to confirm changes to another, such as confirming changes in levels of mRNA by measuring the respective protein product. However, there is a shortage of examples of mechanistic links between the different layers (Buccitelli and Selbach, 2020; Wörheide et al., 2021). Now, in eLife, Zoltán Kutalik, Eleonora Porcu and colleagues from the Swiss Institute of Bioinformatics and the University of Lausanne – including Chiara Auwerx as first author – report a new approach that uses a technique called Mendelian randomization to reveal a chain of molecular connections between the transcriptome, metabolome, and high-level physiological traits such as biomarkers associated with kidney health (Auwerx et al., 2023).
Mendelian randomization is considered to be an ‘experiment of nature’, as it uses variations already present in the genetic code to determine if exposure to certain conditions (such as the amount of cholesterol in the blood, or the expression level of a gene) affects a specific trait (for instance, increased susceptibility to heart disease). The genetic variants act as a proxy, or ‘instrument’, for exposures that are difficult or impossible to manipulate in the population being studied. Mediation analysis can then be applied to ask if the exposure is responsible for the effects of the instrumental variable on the trait of interest. However, it is necessary to proceed carefully (Sanderson et al., 2022): for example, the instrumental variable being used should not affect the trait of interest through any other mediator.
The computational framework presented by Auwerx et al. integrates results from genome-wide association studies with data on genetic variants that affect the level of transcripts or the composition of metabolites. These variants are typically referred to as eQTL (short for expression quantitative trait loci) and mQTL (metabolite QTL), and can be derived from separate population cohorts, allowing researchers to tap into the vast resources of information that are already available.
First, causal links between transcripts and metabolites are established using overlapping mQTL and eQTL as instrumental variables. Causal effects of metabolites on traits of interest are then determined in the same manner using mQTL and genetic variants identified in genome-wide association studies. The next step in the framework is purely based on this established causality: transcripts that causally affect trait-modifying metabolites have to be causally linked to the same trait, resulting in transcript-metabolite-trait triplets (Figure 1). A statistical calculation, known as multivariate Mendelian randomization, is then performed on these triplets using the metabolite-associated variants as the instrumental variable. This determines what proportion of change in the outcome is a result of the transcript directly (or via unknown mediators) impacting the trait, and what proportion is the result of changes in the level of the metabolite mediating the relationship between them.
Auwerx et al. highlight an intriguing example of genetic variants affecting the transcription of a citrate-exporting protein encoded by a gene called ANKH that has been implicated in mineralization disorders. The resulting change to the export of citrate seems to affect the level of calcium present in the serum of individuals – a connection that was not detected when only transcript levels were correlated with the calcium trait.
By extending the Mendelian randomization approach to include two modalities (transcripts and metabolites), this new framework can detect causal relationships that could not be identified by comparing the genome wide association data to a single modality only. It also provides new insights into how the transcript impacts the phenotype through metabolic changes. With multi-omics studies increasing further in size, it is highly probable that even more advanced statistical approaches may become feasible in the future.
References
-
MRNAs, proteins and the emerging principles of gene expression controlNature Reviews Genetics 21:630–644.https://doi.org/10.1038/s41576-020-0258-4
-
Common polymorphic transcript variation in human diseaseGenome Research 19:567–575.https://doi.org/10.1101/gr.083477.108
-
Mendelian randomizationNature Reviews Methods Primers 2:e00092-5.https://doi.org/10.1038/s43586-021-00092-5
-
Benefits and limitations of genome-wide association studiesNature Reviews Genetics 20:467–484.https://doi.org/10.1038/s41576-019-0127-1
-
Multi-omics integration in biomedical research - a metabolomics-centric reviewAnalytica Chimica Acta 1141:144–162.https://doi.org/10.1016/j.aca.2020.10.038
Article and author information
Author details
Publication history
Copyright
© 2023, Weith and Beyer
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,679
- views
-
- 196
- downloads
-
- 16
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Developmental Biology
- Genetics and Genomics
Alcohol consumption in pregnancy can affect genome regulation in the developing offspring but results have been contradictory. We employed a physiologically relevant murine model of short-term moderate prenatal alcohol exposure (PAE) resembling common patterns of alcohol consumption in pregnancy in humans. Early moderate PAE was sufficient to affect site-specific DNA methylation in newborn pups without altering behavioural outcomes in adult littermates. Whole-genome bisulfite sequencing of neonatal brain and liver revealed stochastic influence on DNA methylation that was mostly tissue-specific, with some perturbations likely originating as early as gastrulation. DNA methylation differences were enriched in non-coding genomic regions with regulatory potential indicative of broad effects of alcohol on genome regulation. Replication studies in human cohorts with fetal alcohol spectrum disorder suggested some effects were metastable at genes linked to disease-relevant traits including facial morphology, intelligence, educational attainment, autism, and schizophrenia. In our murine model, a maternal diet high in folate and choline protected against some of the damaging effects of early moderate PAE on DNA methylation. Our studies demonstrate that early moderate exposure is sufficient to affect fetal genome regulation even in the absence of overt phenotypic changes and highlight a role for preventative maternal dietary interventions.
-
- Evolutionary Biology
- Genetics and Genomics
The nearly neutral theory of molecular evolution posits variation among species in the effectiveness of selection. In an idealized model, the census population size determines both this minimum magnitude of the selection coefficient required for deleterious variants to be reliably purged, and the amount of neutral diversity. Empirically, an ‘effective population size’ is often estimated from the amount of putatively neutral genetic diversity and is assumed to also capture a species’ effectiveness of selection. A potentially more direct measure of the effectiveness of selection is the degree to which selection maintains preferred codons. However, past metrics that compare codon bias across species are confounded by among-species variation in %GC content and/or amino acid composition. Here, we propose a new Codon Adaptation Index of Species (CAIS), based on Kullback–Leibler divergence, that corrects for both confounders. We demonstrate the use of CAIS correlations, as well as the Effective Number of Codons, to show that the protein domains of more highly adapted vertebrate species evolve higher intrinsic structural disorder.