The missing link between genetic association and regulatory function

  1. Noah J Connally  Is a corresponding author
  2. Sumaiya Nazeen
  3. Daniel Lee
  4. Huwenbo Shi
  5. John Stamatoyannopoulos
  6. Sung Chun
  7. Chris Cotsapas  Is a corresponding author
  8. Christopher A Cassa  Is a corresponding author
  9. Shamil R Sunyaev  Is a corresponding author
  1. Department of Biomedical Informatics, Harvard Medical School, United States
  2. Brigham and Women’s Hospital, Division of Genetics, Harvard Medical School, United States
  3. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, United States
  4. Brigham and Women’s Hospital, Department of Neurology, Harvard Medical School, United States
  5. Department of Epidemiology, Harvard T.H. Chan School of Public Health, United States
  6. Altius Institute, United States
  7. Division of Pulmonary Medicine, Boston Children’s Hospital, United States
  8. Department of Neurology, Yale Medical School, United States
  9. Department of Genetics, Yale Medical School, United States
3 figures, 4 tables and 4 additional files

Figures

Figure 1 with 2 supplements
Putatively causative genes identified by each method category.

The leftmost column in each half of the plot displays the entire group of putatively causative genes for our Mendelian set of genes and our (Backman et al., 2021) set of genes, respectively, as well as noting how many are unique to each set or shared between the two sets. The second column in each half indicates how many genes from each set have a nearby GWAS peak or have both a nearby GWAS peak and an expression QTL (eQTL). The remaining columns indicate how many genes were identified through colocalization, transcriptome-wide association studies (TWAS), or chromatin methods, while noting how many of these genes are unique vs. shared between the Mendelian and Backman sets.

Figure 1—figure supplement 1
Enrichment of Mendelian genes near GWAS peaks.

(A) As the window around GWAS peaks shrinks, the enrichment of Mendelian genes within the window becomes increasingly significant, while the enrichment of non-matching trait pairs used as controls (gray lines; permutation test described in 'Materials and methods') is not consistently increased. Some controls achieve nominal significance (dotted horizontal line), but none reach significance once multiple testing is corrected for (solid horizontal line). (B) As above, but for genes from Backman et al., 2021. (C) The combined gene lists from parts (A) and (B). Note that, accounting for multiple test correction (based on the total number of tests across all panels), height does not reach significance using the Mendelian gene list, while T2D is barely significant using the Backman list. However, combining the lists increases power and demonstrates significance for all traits.

Figure 1—figure supplement 2
Change in coloc hits when adjusting expression QTL(eQTL) statistics using Multivariate Adaptive Shrinkage Method (MASH).

By using the Bayesian method MASH to update our measurements of eQTLs based on tissues with similar expression patterns, we increased the number of colocalizations found. However, even in tissues in which the number of genes identified increased substantially, we did not meaningfully increase the number of putatively causative genes identified.

Genes identified as associated with a complex trait by each method.

Columns 'Mend' and 'Backman' indicate whether a gene is from the Mendelian set of putatively causative genes, the Backman et al. set, or both. Subsequent columns indicate whether a gene was identified as a hit using each of our methods: JLIM, coloc, eCaviar, transcriptome-wide association studies (TWAS), and chromatin analysis.

Chromatin-based causative gene identification.

Following the fine-mapping of GWAS variants, three parallel methods were used. The first identified fine-mapped variants falling within regions annotated as enhancers by ChromHMM. The second identified variants within histone modification features and evaluated their relevance using an activity-by-distance (ABD) score that combined the strength of the feature (i.e., the strength of the acetylation or methylation peak) with its genomic distance to the gene of interest ('Materials and methods'). The third repeated both of these—checking for fine-mapped variants within a region and calculating the ABD score—for DNase I hypersensitivity sites.

Figure 3—source data 1

Gene-level results for linked expression and traits.

https://cdn.elifesciences.org/articles/74970/elife-74970-fig3-data1-v2.txt

Tables

Table 1
Putatively causative Mendelian genes.

Each gene includes reference(s) to the known biological role of its coding variants, as established in familial studies, in vitro experiments, and/or animal models. Genes from Backman et al., 2021 are not included here, but can be found in Figure 2.

PhenotypeGenes
Low-density lipoproteinAPOB Soria et al., 1989; Pullinger et al., 1995
APOC2 Hegele et al., 1991
APOE de Knijff et al., 1994
LDLR Brown and Goldstein, 1976
LPL Heizmann et al., 1991; Clee et al., 2001
PCSK9 Abifadel et al., 2003
High-density lipoproteinABCA1 Brooks-Wilson et al., 1999; Bodzioch et al., 1999; Rust et al., 1999; Ordovas et al., 1986
APOA1 Ordovas et al., 1986
CETP Glueck et al., 1975
LIPC Isaacs et al., 2004; Grarup et al., 2008; Iijima et al., 2008
LIPG Yamakawa-Kobayashi et al., 2003
PLTP Jiang et al., 1999

SCARB1 Tai et al., 2003; McCarthy et al., 2003
Height ANTXR1 Stránecký et al., 2013; Bayram et al., 2014
ATR O’Driscoll et al., 2003; Ogi et al., 2012
BLM Ellis et al., 1995; Foucault et al., 1997
CDC6 Bicknell et al., 2011a
CDT1 Bicknell et al., 2011a; Guernsey et al., 2011
CENPJ AlDosari et al., 2010
COL1A1 Wallis et al., 1990
COL1A2 Spotila et al., 1992; De Paepe et al., 1997
COMP Briggs et al., 1995; Mabuchi et al., 2003
CREBBP Menke et al., 2016; Menke et al., 2018; Angius et al., 2019
DNA2 Shaheen et al., 2014
EP300 Woods et al., 2014; Tsai et al., 2011
EVC Polymeropoulos et al., 1996; Ruiz-Perez et al., 2003
EVC2 Ruiz-Perez et al., 2003; Galdzicka et al., 2002
BN1 Faivre et al., 2003; Le Goff et al., 2011; Horn and Robinson, 2011; Takenouchi et al., 2013
FGFR3 Hyland et al., 2003; Toydemir et al., 2006; Makrythanasis et al., 2014
FKBP10 Alanay et al., 2010; Kelley et al., 2011; Barnes et al., 2013
HR Berg et al., 1993; Woods et al., 1996; Goddard et al., 1995; Ayling et al., 1997
KRAS Aoki et al., 2005; Schubbert et al., 2006; Carta et al., 2006
NBN Varon et al., 1998; Tanzanella et al., 2003
NIPBL Tonkin et al., 2004; Krantz et al., 2004
ORC1 Bicknell et al., 2011a; Guernsey et al., 2011; Bicknell et al., 2011b
RC4 Guernsey et al., 2011; Bicknell et al., 2011b
ORC6L Bicknell et al., 2011a; de Munnik et al., 2012
PCNT Rauch et al., 2008; Griffith et al., 2008; Piane et al., 2009
PLOD2 van der Slot et al., 2003; HaVinh et al., 2004; Puig-Hervás et al., 2012
PTPN11 Tartaglia et al., 2001; Maheshwari et al., 2002; Kosaki et al., 2002
RAD21 Deardorff et al., 2012; Kruszka et al., 2019; Goel and Parasivam, 2020
RAF1 Pandit et al., 2007; Razzaque et al., 2007
RECQL4 Lindor et al., 2000; Beghini et al., 2003; Wang et al., 2003
RIT1 Aoki et al., 2013; Bertola et al., 2014; Gos et al., 2014
ROR2 Afzal et al., 2000; van Bokhoven, 2000; Tufan et al., 2005
SLC26A2 Hästbacka et al., 1993; Rossi and Superti-Furga, 2001; Barreda-Bonis et al., 2018
SMAD4 Le Goff et al., 2012; Caputo et al., 2012; Lindor et al., 2012
SRCAP Hood et al., 2012; Le Goff et al., 2013
WRN Yu et al., 1996; Goto et al., 1997; Yu, 1997
Crohn diseaseATG16L1 Hampe et al., 2007
CARD9 Rivas et al., 2011
IL10 Fowler, 2005
IL10RA Gasche et al., 2003; Mao et al., 2012
IL10RB Glocker et al., 2009; Begue et al., 2011
IL23R Duerr et al., 2006; Libioulle et al., 2007; Glas et al., 2007
IRGM McCarroll et al., 2008; Craddock et al., 2010; Prescott et al., 2010
NOD2 Ogura et al., 2001; Hugot et al., 2001
PRDM1 Ellinghaus et al., 2013
PTPN22 Diaz-Gallo et al., 2011
Ulcerative colitisATG16L1 Fowler et al., 2008
CARD9 Rivas et al., 2011
IL23R Glas et al., 2007; Fisher et al., 2008
IRGM McCarroll et al., 2008
PRDM1 Ellinghaus et al., 2013
PTPN22 Diaz-Gallo et al., 2011
RNF186 Rivas et al., 2016; Beaudoin et al., 2013
Type II diabetesABCC8 Reis et al., 2000
BLK Borowiec et al., 2009
CEL Bengtsson-Ellmark et al., 2004; Raeder et al., 2006
EIF2AK3 Harding et al., 2001; Brickwood et al., 2003; Durocher et al., 2006
GATA4 Shaw-Smith, 2014
GATA6 Yorifuji et al., 2012; De Franco et al., 2013
GCK Froguel et al., 1993
GLIS3 Senée et al., 2006
HNF1A Yamagata et al., 1996b; Vaxillaire et al., 1997
HNF1B Horikawa et al., 1997; Lindner et al., 1999
HNF4A Yamagata et al., 1996a; Stoffel and Duncan, 1997
IER3IP1 Poulton et al., 2011; Abdel-Salam et al., 2012; Shalev et al., 2014
INS Støy et al., 2007
KCNJ11 Hani et al., 1998; Gloyn et al., 2004 j
KLF11 Neve et al., 2005
LMNA Cao and Hegele, 2000
NEUROD1 Malecki et al., 1999
NEUROG3 Gradwohl et al., 2000; Rubio-Cabezas et al., 2011; Pinney et al., 2011
PAX4 Shimajiri et al., 2001; Mauvais-Jarvis et al., 2004; Plengvidhya et al., 2007
PDX1 Stoffers et al., 1997; Macfarlane et al., 1999; Hani et al., 1999
PPARG Deeb et al., 1998; Savage et al., 2002
PTF1A Sellick et al., 2004
RFX6 Smith, 2010; Sansbury et al., 2015
SLC19A2 Labay et al., 1999, Oishi et al., 2002; Shaw-Smith et al., 2012
SLC2A2 Laukkanen et al., 2005; Sansbury et al., 2012
WFS1 Strom et al., 1998; Hardy et al., 1999; Khanim et al., 2001
ZFP57 Mackay et al., 2008; Boonen et al., 2013
Breast cancer (selected using MutPanning; Dietlein et al., 2020)AKT1
ARID1A
ATM
BRCA1
BRCA2
CBFB
CDH1
CDKN1B
CHEK2
CTCF
ERBB2
ESR1
FGFR2
FOXA1
GATA3
GPS2
HS6ST1
KMT2C
KRAS
LRRC37A3
MAP2K4
MAP3K1
NCOR1
NF1
NUP93
PALB2
PIK3CA
PTEN
RB1
RUNX1
SF3B1
STK11
TBX3
TP53
ZFP36L1
Table 2
Tissue-trait pairs.

Tissues were selected for each trait based on a priori knowledge of disease biology.

Mendelian traitGWAS traitTissues examined
Breast cancerBreast cancerBreast mammary tissue
Crohn diseaseCrohn diseaseSmall intestine terminal ileum
Colon sigmoid
Colon transverse
Ulcerative colitisUlcerative colitisSmall intestine terminal ileum
Colon sigmoid
Colon transverse
Dyslipidemia
Hyperlipidemia
Tangier’s disease
High-density lipoproteinLiver
Adipose (subcutaneous)
Whole blood
Dyslipidemia
Hyperlipidemia
Low-density lipoproteinLiver
Adipose (subcutaneous)
Whole blood
Mendelian short statureHeightSkeletal muscle
Monogenic diabetesType II diabetesPancreas
Skeletal muscle
Adipose (subcutaneous)
Small intestine terminal ileum
Table 3
Proposed explanations for negative results under the unembellished model.

Many explanations have been proposed for GWAS associations that are not explained by cis-QTLs. This table details the explanations inconsistent with our results, which are explained in the left column and addressed on the right. Explanations involving more detailed models of gene regulation can be found in Table 4. Two of the explanations addressed here involve violations of the assumptions of our and other expression-based complex trait studies. If coding and non-coding variants affect fundamentally different biological pathways, or if trait associations rarely depend on cis-eQTLs, our methods of mapping regulation to traits would have nothing to uncover. Even in the presence of eQTL-driven trait associations, insufficient power to detect trait associations, to detect eQTL associations, or to link the two would result in predominantly negative results.

Violated assumptions
Genes implicated via coding variants are irrelevant for non-coding associations
  • Our genes are enriched for GWAS associations even after removing the effects of coding variants

  • Loss-of-function variants, which underlie many Mendelian-trait genes, can be thought of as large-effect eQTLs

  • Genes identified from Backman et al., 2021 are not based on cognate phenotypes, but the same complex phenotypes as GWAS

Regulatory mechanisms other than cis-eQTLs
  • Splice QTLs are consistently found to explain less phenotypic variance than eQTLs, and they cannot explain the many GWAS associations that fall within intergenic regions

  • Trans-eQTLs are believed to rely on their effects as cis-eQTLs for other genes; the few exceptions to this model (e.g., CTCF binding sites) are not broadly applicable

Insufficient power
Lack of GWAS power
  • GWAS have been shown to have sufficient power to identify small effects even in rare variants

  • 2/3 of the genes we used have nearby GWAS associations, reflecting a strong enrichment and indicating that GWAS discovery is not a limiting factor

  • Our analysis is conditioned on the presence of GWAS associations

Lack of eQTL mapping power
Lack of power for colocalization and TWAS methods
  1. eQTL = expression QTL; TWAS = transcriptome-wide association studies.

Table 4
Explaining negative results with more nuanced models of gene regulation.

To reconcile an expression-based model with our observations requires us to both explain the absence of trait-linked eQTLs as well as explaining away the inconsequence of eQTLs for trait-linked genes. The left-hand side lists additions or changes to the unembellished model, while the right-hand side contains explanations of the models and current relevant research.

Extended models of gene regulation
Context dependency:
a context-specific eQTL, invisible in bulk tissues analyzed to date, replaces or supplements the bulk tissue homeostatic eQTL
Cell type Dobbyn et al., 2018; Zhang et al., 2018; Schmiedel et al., 2018; Glastonbury et al., 2019; Rai et al., 2020; Findley et al., 2021; Neavin et al., 2021; Ota et al., 2021; Patel et al., 2021; Bryois et al., 2021; Arvanitis et al., 2022; Oelen et al., 2022; Perez et al., 2022; Schmiedel et al., 2022; Yazar et al., 2022
  • Only a subset of cell types in the tissue contribute to the GWAS phenotype.

  • An eQTL specific to such a cell type is causative for the phenotype.

  • The eQTL either cannot be detected in bulk tissue because of the cell type’s low prevalence

  • The appropriate cellular or anatomical context has not yet been analyzed

Developmental timing Dobbyn et al., 2018; Strober et al., 2019; Cuomo et al., 2020; Bonder et al., 2021; Jerber et al., 2021; Aygün et al., 2022; Elorbany et al., 2022
  • The GWAS phenotype depends on a specific point in cell/tissue development or differentiation

  • eQTLs present at the correct interval contribute to phenotype, but eQTLs observed at other points do not

Cell state or environment Findley et al., 2021; Ota et al., 2021; Oelen et al., 2022; Schmiedel et al., 2022; Huh and Paulsson, 2011; Knowles et al., 2017; Kim-Hellmuth et al., 2017; Balliu et al., 2021; Mu et al., 2021; Ward et al., 2021; Nathan et al., 2022; Baca et al., 2022
  • The causative eQTL has effects that are undetectable in steady-state expression under normal conditions

  • It may activate only in response to a specific environmental condition, such as immune activation or a metabolic shift

Nonlinear or non-homeostatic:
the relationship between eQTL and genotype is indirect
Nonlinearity Fu et al., 2009; Dori-Bachash et al., 2011; Ghazalpour et al., 2011; Pai et al., 2012; Vogel and Marcotte, 2012; Khan et al., 2013; Wu et al., 2013; McManus et al., 2014; Albert and Kruglyak, 2015; Bader et al., 2015; Battle et al., 2015; Cenik et al., 2015; McManus et al., 2015; Pai et al., 2015; Schafer et al., 2015; Chick et al., 2016; Liu et al., 2016; Schaefke et al., 2018; Buccitelli and Selbach, 2020; Wang et al., 2020a; Kusnadi et al., 2022
  • There may be buffering that prevents a change in expression from producing a change in protein levels

  • Expression below a certain level may not influence phenotype, rendering small eQTLs irrelevant

Steady-state expression may be a poor model Pedraza and Paulsson, 2008; Raj and van Oudenaarden, 2008; Shahrezaei and Swain, 2008; Larson et al., 2009; Raj and van Oudenaarden, 2009; Suter et al., 2011; Dar et al., 2012; Viñuelas et al., 2013; Kumar et al., 2015; Nicolas et al., 2017; Qiu et al., 2019; Wang et al., 2020c
  • Phenotype may depend on the kinetics of expression, which could be cyclical or follow some other pattern

  • Expression may be stochastic, such that only a random subset of cells display the relevant expression pattern at any one time

  1. eQTL = expression QTL.

Additional files

Supplementary file 1

Roadmap epigenomics aliases of tissue types used for functional genomic analysis.

Tissue types from the Roadmap Epigenomics Consortium do not perfectly match those from GTEx. However, there is overlap, and as with GTEx, we analyzed trait-relevant tissues.

https://cdn.elifesciences.org/articles/74970/elife-74970-supp1-v2.csv
Supplementary file 2

Tissue types and biosamples from the DNase I hypersensitive sites index used for functional genomic analysis.

Meuleman et al., 2020 assess DNase I hypersensitive sites across 438 cell and tissue types; we selected the above based on their relevance to our complex traits.

https://cdn.elifesciences.org/articles/74970/elife-74970-supp2-v2.csv
Supplementary file 3

TOPMed URLs used.

https://cdn.elifesciences.org/articles/74970/elife-74970-supp3-v2.zip
Transparent reporting form
https://cdn.elifesciences.org/articles/74970/elife-74970-transrepform1-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Noah J Connally
  2. Sumaiya Nazeen
  3. Daniel Lee
  4. Huwenbo Shi
  5. John Stamatoyannopoulos
  6. Sung Chun
  7. Chris Cotsapas
  8. Christopher A Cassa
  9. Shamil R Sunyaev
(2022)
The missing link between genetic association and regulatory function
eLife 11:e74970.
https://doi.org/10.7554/eLife.74970