Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification

  1. Zhanying Feng
  2. Zhana Duren
  3. Jingxue Xin
  4. Qiuyue Yuan
  5. Yaoxi He
  6. Bing Su
  7. Wing Hung Wong  Is a corresponding author
  8. Yong Wang  Is a corresponding author
  1. CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
  2. School of Mathematics, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
  3. Center for Human Genetics and Department of Genetics and Biochemistry, Clemson University, United States
  4. Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, United States
  5. State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, China
  6. Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, China
  7. Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China

Abstract

Systems genetics holds the promise to decipher complex traits by interpreting their associated SNPs through gene regulatory networks derived from comprehensive multi-omics data of cell types, tissues, and organs. Here, we propose SpecVar to integrate paired chromatin accessibility and gene expression data into context-specific regulatory network atlas and regulatory categories, conduct heritability enrichment analysis with genome-wide association studies (GWAS) summary statistics, identify relevant tissues, and estimate relevance correlation to depict common genetic factors acting in the shared regulatory networks between traits. Our method improves power upon existing approaches by associating SNPs with context-specific regulatory elements to assess heritability enrichments and by explicitly prioritizing gene regulations underlying relevant tissues. Ablation studies, independent data validation, and comparison experiments with existing methods on GWAS of six phenotypes show that SpecVar can improve heritability enrichment, accurately detect relevant tissues, and reveal causal regulations. Furthermore, SpecVar correlates the relevance patterns for pairs of phenotypes and better reveals shared SNP-associated regulations of phenotypes than existing methods. Studying GWAS of 206 phenotypes in UK Biobank demonstrates that SpecVar leverages the context-specific regulatory network atlas to prioritize phenotypes’ relevant tissues and shared heritability for biological and therapeutic insights. SpecVar provides a powerful way to interpret SNPs via context-specific regulatory networks and is available at https://github.com/AMSSwanglab/SpecVar, copy archived at swh:1:rev:cf27438d3f8245c34c357ec5f077528e6befe829.

Editor's evaluation

In this article, the authors develop a method to identify potentially causal tissues and cell types for complex diseases by performing heritability enrichment estimation using information from gene regulatory networks. This article is of significant interest to geneticists and biologists interested in unraveling the molecular basis of disease. The key claims of the article are well supported by the data. The work has the potential to inform our understanding of the genetics of complex diseases.

https://doi.org/10.7554/eLife.82535.sa0

Introduction

Genome-wide association studies (GWAS) have gained a great success to identify thousands of genetic variants significantly associated with a variety of human complex phenotypes. Interpretation of those genetic variants holds the key to biological mechanism discovery and personalized medicine practice. However, this task is hindered by the genetic architecture that the heritability is distributed across SNPs of the whole genome with linkage disequilibrium, which cumulatively affect complex traits. By quantifying the contribution of true polygenic signal considering linkage disequilibrium, LD Score regression (LDSC) provides a widely appreciated method to estimate heritability (Bulik-Sullivan et al., 2015b) and genetic correlation (Bulik-Sullivan et al., 2015a) from GWAS summary statistics.

Another obstacle to genetic variant interpretation is that SNPs contribute to phenotype through gene regulatory networks in certain cellular contexts, that is, causal tissues or cell types. Those tissues are characterized by different types of epigenetic data, which give the active regions of the genome that interact with transcription factors (TFs) to regulate gene expression. Stratified LDSC extends LDSC and can estimate the partitioned heritability enrichment in the functional categories (Finucane et al., 2015). The categories can be nonspecific genome annotations (such as coding, UTR, promoter, and intronic regions) and context-specific regulatory regions called from chromatin data of different cell types, such as DNase-I hypersensitive sites from DNase-seq data, accessible peaks from ATAC-seq data, histone marker, or TF binding sites from ChIP-seq data (AAP and SAP). Using expression data, the functional categories can be alternatively constructed by the 100 kb windows around the transcribed regions of specifically expressed genes (SEGs) (Finucane et al., 2018). Essentially, these strategies summarize the high-dimensional SNP signals from the whole genome into partitioned heritability enrichments of functional categories and successfully identify relevant cellular tissues for many phenotypes (Finucane et al., 2015).

The rapid increase of multi-modal data resources, especially matched gene expression, chromatin states, and TF binding sites (i.e., measured on the same sample), offers an exciting opportunity to construct better functional categories for estimating context-specific heritability enrichment. One efficient way is to integrate large-scale epigenomic and transcriptomic data spanning diverse human contexts to infer regulatory networks (Duren et al., 2017). Those regulatory networks provide rich context-specific information and usually comprise TFs, regulatory elements (REs), and target genes (TGs). Recently, we developed the PECA2 model to infer regulatory network from paired expression and chromatin accessibility data (Duren et al., 2017; Duren et al., 2020). The inferred regulatory networks have been used to identify the master regulators in stem cell differentiation (Li et al., 2019) and interpret conserved regions for the nonmodel organisms (Xin et al., 2020). Noncoding genetic variants can be interpreted in the regulatory networks on how they cooperatively affect complex traits through gene regulation in certain tissues or cell types. For example, genetic variants in the regulatory network of cranial neural crest cells (CNCC) are elucidated on how they affect human facial morphology (Feng et al., 2021). Regulatory networks can help identify two kinds of relevant cell types to COVID19 severity (Feng et al., 2022b). RSS-NET utilizes gene regulatory networks of multiple contexts and shows better tissue enrichment estimation by decomposing the total effect of an SNP through TF-TG regulations (Zhu et al., 2021) and HiChIP RE-TG regulations (Ma et al., 2022). The phenotype-associated SNPs are revealed to be functional in a tissue- or cell-type-specific manner (Westra and Franke, 2014). The advances in constructing regulatory networks and interpreting genetic variants with regulatory networks enlighten us to (1) assemble a more comprehensive context-specific regulatory network atlas by using paired expression and accessibility data across diverse cellular contexts; (2) build context-specific regulatory categories by focusing on RE’s specificity compared with other contexts; and (3) systematically identify enriched tissues or cell types, relevance correlation, and the underlying SN-associated regulations.

Specifically, we propose SpecVar to first leverage the publicly available paired expression and chromatin accessibility data in ENCODE and ROADMAP to systematically construct context-specific regulatory networks of 77 human contexts, which cover major cell types and germ layer lineages. This atlas serves as a valuable resource for genetic variants interpretation in multicellular contexts. SpecVar then constructs regulatory categories in the genome with this atlas, which can significantly improve the heritability enrichment. Based on the heritability enrichment and p-value in our regulatory categories, SpecVar defines the relevance score to give the context-specific representation of the GWAS. In this article, we use six well-studied phenotypes, large-scale facial morphology, and UKBB phenotypes to show that, for a single phenotype, the relevance score of SpecVar can identify relevant tissues more efficiently; and for multiple phenotypes, SpecVar can reveal relevance correlation by common relevant tissues, and underlying shared SNP-associated regulations. Compared to the existing methods, SpecVar shows novelty in three aspects: (1) SpecVar integrates paired gene expression data and chromatin, which are two types of easily accessible data with rich information, into regulatory networks. The gene expression and chromatin accessibility in the regulatory network are complementary to each other to reveal the high-quality active regulatory elements and genes for certain tissues or cell types to interpret genetic variants; (2) SpecVar highlights the comparison with other contexts by specificity to narrow down the regulatory molecules; and (3) SpecVar is more interpretable because it can explain the relevance to tissues by SNP-associated regulatory networks and interpret phenotype correlation through common relevant tissues and shared SNP-associated regulatory network. These results show that SpecVar serves as a promising tool for post-GWAS analysis.

Results

Overview of the SpecVar method

SpecVar assembled a context-specific regulatory network atlas and built the context-specific representation (relevance score and SNP-associated regulatory network) of GWAS summary statistics based on heritability enrichment. Figure 1 summarizes the major steps of SpecVar to construct the context-specific regulatory network atlas and regulatory categories, calculate heritability enrichment and SNP-associated regulatory network, and investigate interpretable relevant tissues and relevance correlation.

Figure 1 with 1 supplement see all
Overview of SpecVar.

(a) SpecVar constructs an atlas of context-specific regulatory networks and regulatory categories. Then SpecVar represents genome-wide association studies (GWAS) summary statistics into relevance scores and SNP-associated regulatory subnetworks. (b) For a single phenotype, SpecVar can use relevance score and SNP-associated regulatory subnetworks to identify and interpret relevant tissues. (c) For multiple phenotypes, based on relevance score, SpecVar can reveal relevance correlation, common relevant tissues, and shared regulations.

We first constructed regulatory networks of M (M = 77 in this work) contexts. Each network is represented by a set of relations between TF and RE and between RE and TG. The M contexts included samples from all three germ layers, such as ‘frontal cortex’ (ectoderm), ‘fetal thymus’ (mesoderm), and ‘body of pancreas’ (endoderm), which ensured the wide coverage and system-level enrichment (Figure 1—figure supplement 1). The context-specific regulatory networks were extracted based on the specificity of REs compared with other contexts, considering the hierarchical relationship of M contexts (Materials and methods, Supplementary file 1a). The REs in the ith context-specific regulatory network were pooled to form a regulatory category Ci in the genome, which restricted the annotation to context-specific REs associated with active binding TFs and nearby regulated TGs (Figure 1a). Our atlas led to M regulatory categories, C1,C2,,CM of SpecVar. Given GWAS summary statistics, the M regulatory categories allowed partitioned heritability enrichment analysis by stratified LDSC. For a phenotype, stratified LDSC modeled genome-wide polygenic signals, partitioned SNPs into categories with different contributions for heritability, and considered SNP’s linkage disequilibrium with the following polygenic model:

(1) Eχj2=Niτilj,i+Na+1

Here, χj2 is the marginal association of SNP j from GWAS summary statistics; N is the sample size; lj,i=kCirjk2 is the LD score of SNP j in the ith regulatory category Ci , where rjk is the genotype correlation between SNP j and SNP k in population; a measures the contribution of confounding biases; and τi represents the heritability enrichment of SNPs in Ci . Stratified LDSC estimated the p value pi for the heritability enrichment τi by block Jackknife (Finucane et al., 2015).

We defined the relevance score (Ri) of this phenotype to ith context (Figure 1a) as follows by combining the heritability enrichment and statistical significance (p-value):

(2) Ri=τi-logpi

The relevance score (R score) provided a decision trade-off between the heritability enrichment and p-value resulting from a hypothesis test. It offered a robust means to rank and select relevant tissues for a given phenotype (Xiao et al., 2014).

Meanwhile, SpecVar associated SNPs with context-specific regulatory networks for biological interpretation. We defined the association score (A score) to prioritize the REs by combining its regulatory strength and association significance with the phenotype (averaged -log(p-value) of SNPs located near the RE and downweighted by their LD scores and distances to this RE). We extracted the REs with significant A scores (FDR0.05), as well as their directly linked upstream TFs, downstream TGs, and associated SNPs, to form the SNP-associated regulatory subnetwork (Figure 1a, Materials and methods). Given GWAS summary statistics of a phenotype, SpecVar obtained M SNP-associated regulatory subnetworks, G1,G2,,GM , allowing us to interpret relevant tissues by SNP’s regulation mechanism.

The relevance score to diverse human contexts and SNP-associated regulatory networks allowed SpecVar to perform post-GWAS analysis. For a single phenotype, the R scores indicated the relevance of this phenotype to M contexts, which can be used to identify relevant tissues. Then in the relevant tissues, we can investigate the SNP-associated regulatory subnetwork to interpret the relevance (Figure 1b, Materials and methods). For multiple phenotypes, we can correlate the R score vectors in multiple contexts to define relevance correlation (Finucane et al., 2018). The relevance correlation might give insights into the association of phenotypes since SpecVar can further interpret the relevance correlation between two phenotypes by common relevant tissues and the shared SNP-associated regulatory subnetwork (Figure 1c, Materials and methods).

Context-specific regulatory networks improve heritability enrichment

We first designed experiments to show that context-specific regulatory networks could improve heritability enrichment. We collected GWAS summary statistics of six phenotypes, including two lipid phenotypes (Willer et al., 2013): low-density lipoprotein (LDL) and total cholesterol; two human intelligential phenotypes (Lee et al., 2018): educational attainment and cognitive performance; and two craniofacial bone phenotypes: brain shape (Naqvi et al., 2021) and facial morphology (Xiong et al., 2019). We used these six phenotypes as a benchmark since their relevant tissues have been previously studied and partially known: lipid phenotypes are associated with the liver for its key role in lipid metabolism (Nguyen et al., 2008); human intelligential phenotypes are associated with brain tissues (Goriounova and Mansvelder, 2019); facial morphology and brain shape have shared heritability in cranial neural crest cells (Naqvi et al., 2021). We compared our context-specific regulatory networks with four alternative methods of functional categories: all regulatory elements (ARE), all accessible peaks (AAP), specifically accessible peaks (SAP) (Finucane et al., 2015), and specifically expressed genes (Finucane et al., 2018) (SEG) (Materials and methods).

First, we showed that SpecVar could achieve higher heritability enrichment in the relevant tissues than other methods (Supplementary file 1b). For LDL, SpecVar obtained a heritability enrichment of 678.91 in the ‘right lobe of liver’, while ARE, SAP, AAP, and SEG gave heritability enrichment of 113.34, –42.09, 50.95, and 4.47, respectively. We conducted Welch’s t-test to assess the significance of the difference between SpecVar and other methods and found that the heritability enrichment of SpecVar was significantly higher than ARE (p=6.9×104), SAP (p=1.4×104), AAP (p=3.4×104), and SEG (p=2.1×104) (Figure 2a). For total cholesterol, SpecVar also gave significantly higher heritability enrichment in ‘right lobe of liver’ than ARE (p=5.7×104), SAP (p=4.4×105), AAP (p=1.6×104), and SEG (p=7.7×105) (Figure 2b). For educational attainment and cognitive performance, they were relevant to brain tissues: ‘frontal cortex’, ‘cerebellum’, ‘caudate nucleus’, ‘Ammon’s horn’, and ‘putamen’. SpecVar obtained the highest averaged heritability enrichment in brain tissues among these methods (Figure 2—figure supplement 1a and b). In the ‘frontal cortex’, SpecVar had significantly higher heritability enrichment than ARE (p=1.2×105), SAP (p=2.0×106), AAP (p=3.0×106), and SEG (p=3.0×106) for educational attainment (Figure 2c). And for cognitive performance in ‘frontal cortex’, SpecVar also had significantly higher heritability enrichment than ARE (p=9.0×106), SAP (p=2.0×106), AAP (p=1.0×106), and SEG (p=1.0×106) (Figure 2d). For brain shape, SpecVar obtained a significantly higher heritability enrichment in its relevant context ‘CNCC’ than the other four methods (ARE p=5.9×104 , SAP p=6.7×104 , AAP p=7.5×104 , and SEG p=8.1×105 , Figure 2e). For facial morphology, SpecVar also gave a much higher heritability enrichment in ‘CNCC’ than the other four methods (ARE p=9.0×106 , SAP p=1.0×106 , AAP p=7.0×106 , and SEG p=1.0×106 , Figure 2e). Second, except for the known relevant tissues, these complex traits may be relevant to other contexts. So, for every method, we ranked the heritability enrichment to get the top 10 contexts and used these top contexts’ heritability enrichment to compare the ability of these five methods to explain heritability. SpecVar also showed the best performance of heritability enrichment among the five methods (Figure 2g). Taking brain shapeas an example, SpecVar achieved significantly higher heritability enrichment (averaged heritability enrichment 96.13) than ARE (averaged heritability enrichment 26.77, t-test p=3.4×103), SAP (42.92, p=1.9×102), AAP (20.34, p=1.8×103), and SEG (2.25, p=3.1×104). We found that specificity could significantly improve the heritability enrichment. Among the five methods we compared, SpecVar and SAP are based on the specificity of ARE and AAP, respectively. SpecVar showed significantly higher heritability enrichment than ARE and SAP showed significantly higher heritability enrichment than AAP (Figure 2g). For brain shape, SpecVar obtained averaged heritability enrichment of 96.31 of the top 10 contexts, which was significantly higher than ARE (averaged heritability enrichment 26.77, p=3.4×103); SAP obtained average heritability enrichment of 42.92, and AAP’s averaged heritability enrichment was 20.34 (p=2.7×103). The other five phenotypes showed a similar improvement (Figure 2g).

Figure 2 with 2 supplements see all
Comparison of heritability enrichment between SpecVar and four alternate methods: all regulatory elements (ARE), all accessible peaks (AAP), specifically accessible peaks (SAP), and specifically expressed genes (SEG).

(a) The heritability enrichment of low-density lipoprotein (LDL) in the ‘right lobe of liver’. (b) The heritability enrichment of total cholesterol in the ‘right lobe of liver’. (c) The heritability enrichment of educational attainment in the ‘frontal cortex’. (d) The heritability enrichment of cognitive performance in the ‘frontal cortex’. (e) The heritability enrichment of brain shape in cranial neural crest cell (CNCC). (f) The heritability enrichment of facial morphology in ‘CNCC’. The sample size of error bars for (a-f) is 200. (g) Boxplot of top 10 contexts’ heritability enrichment of six phenotypes for five methods.

To explore the heritability enrichment improvement of SpecVar, we conducted ablation analysis to study the contribution of two important parts of SpecVar: (1) regulatory network by integrating gene expression and chromatin accessibility data, and (2) specificity by comparing with other contexts. Figure 2—figure supplement 1c shows the relationship and difference of the five methods: SEG is the combination of gene expression and specificity; AAP is only from chromatin accessibility; SAP is the combination of chromatin accessibility and specificity; ARE integrates gene expression and chromatin accessibility; and SpecVar considers integration of gene expression, chromatin accessibility, and specificity. To analyze the effect of the regulatory network in heritability enrichment, we compared SpecVar with SAP and showed the effect of gene expression data. We compared SpecVar with SEG and showed the effect of chromatin accessibility data. We compared SpecVar with ARE and showed the contribution of specificity (Figure 2—figure supplement 1d). To quantify the effect of each component, we caculated the fold change of different methods’ heritability enrichments. We found that chromatin accessibility, which was part of the regulatory network, showed the highest effect in improving heritability enrichments for all six phenotypes. This is consistent with the fact that most genetic variants are located in the noncoding regulatory regions (Claussnitzer et al., 2015; Kumar et al., 2012; Smemo et al., 2014) and chromatin accessibility gives the direct functional evidence for genetic variants. The specificity in SpecVar also contributed at least four-fold improvement in heritability enrichment (Figure 2—figure supplement 1e).

In summary, the experiment on six phenotypes’ GWAS summary statistics proved that SpecVar achieved the best performance in explaining the heritability of phenotypes. These results demonstrated the power of integrating expression and chromatin accessibility data and considering contexts’ specificity.

SpecVar can accurately reveal relevant tissues for phenotypes

After establishing that SpecVar could use the context-specific regulatory networks to improve heritability enrichment, we next showed that for a given phenotype, SpecVar could use R scores to identify relevant tissues more accurately than other methods. In this experiment, we also used the above six phenotypes with their known relevant tissues as a benchmark and first compared SpecVar with the other two specificity-based methods: SAP and SEG (Materials and methods).

For two lipid phenotypes, SpecVar revealed that both LDL and total cholesterol were most relevant to the ‘right lobe of liver’ (Figure 3a and b, Table 1), which was consistent with the existing reports that the liver plays a central role in lipid metabolism, serving as the center for lipoprotein uptake, formation, and export to the circulation (Jha et al., 2018; Nguyen et al., 2008). SpecVar found that LDL and the total cholesterol were also significantly relevant to the ‘fetal adrenal gland’ and the adrenal cortex has been revealed to play an important role in lipid mentalism (Boyd et al., 1983). However, SAP and SEG failed to prioritize liver tissue as the significant relevant tissue. For LDL, SAP identified the ‘frontal cortex’ to be the most relevant tissue. SEG identified the most relevant tissue to be ‘HepG2’, which was human hepatoma cell lines, but the relevance score was relatively lower (Figure 3a, Supplementary file 1c). For total cholesterol, SAP identified the ‘fetal adrenal gland’ and SEG obtained ‘HepG2’ as the most relevant tissues (Figure 3b, Supplementary file 1c).

Figure 3 with 5 supplements see all
Comparison of identifying proper relevant tissues between SpecVar and two other LD Score regression (LDSC)-based method: specifically accessible peaks (SAP) and specifically expressed genes (SEG).

The top five relevant tissues ranked by the relevant score of SpecVar, SAP, and SEG for (a) low-density lipoprotein (LDL), (b) total cholesterol, (c) educational attainment, (d) cognitive performance, (e) brain shape, and (f) facial morphology. The sample size of error bars for (a-f) is 100.

Table 1
The total sample size, number of significant SNP associations, and SpecVar-identified relevant tissues of six phenotypes.

For each relevant tissue, we have two numbers in the bracket: the first is the R score and the second is its false discovery rate (FDR) q-value.

TraitSample sizeSNP associationRelevant tissues (R score and its FDR q-value)
Low-density lipoprotein173,0823077Right lobe of liver (722.74, 1.2e-3), frontal cortex (146.54, 3.3e-4), gastrocnemius medialis (128.02, 4.7e-4), fetal adrenal gland (123.52, 9.5e-4)
Total cholesterol187,3654169Right lobe of liver (714.75, 1.0e-2), fetal adrenal gland (216.74, 4.3e-17), H7-hESC (130.73, 2.1e-3)
Educational attainment1070,75130,519Frontal cortex (167.23, 3.7e-7)
Cognitive performance257,84113,732Frontal cortex (216.62, 7.0e-28),
Ammon’s horn (107.25, 2.3e-10)
Brain shape19,64438,630CNCC (512.56, 2.7e-44), trophoblast (144.15, 3.8e-9)
Facial morphology10,115495CNCC (134.95, 8.0e-10), fibroblast (128.81, 3.8e-26)
  1. CNCC, cranial neural crest cell.

For two human intelligential phenotypes, SpecVar prioritized the ‘frontal cortex’ to be the most relevant tissue for both educational attainment and cognitive performance (Figure 3c and d, Table 1). ‘Frontal cortex’ is the cerebral cortex covering the front part of the frontal lobe and is implicated in planning complex cognitive behavior, personality expression, decision-making, and moderating social behavior (Gabrieli et al., 1998; Yang and Raine, 2009). There were five tissues (‘frontal cortex’, ‘Ammon’s horn’, ‘cerebellum’, ‘putamen’, and ‘caudate nucleus’) from the brain in our atlas and they were significantly higher ranked by SpecVar’s relevance score than nonbrain tissues for educational attainment (Wilcoxon rank-sum test, p=6.1×107 , Figure 3c) and cognitive performance (p=8.0×106, Figure 3d). In comparison, for educational attainment, SAP prioritized brain tissues to be higher ranked than nonbrain tissues, but with a less significant p-value (p=2.3×103, Figure 3c, Supplementary file 1c). SEG could not rank brain tissues to be higher than nonbrain tissues (p=6.4×101, Figure 3c, Supplementary file 1c). For cognitive performance, SAP failed to rank brain tissues as the more relevant tissues (p=6.0×102, Figure 3d, Supplementary file 1c), and SEG identified brain tissues to be more relevant than nonbrain tissues but with a less significant p-value (P=3.2×103, Figure 3d, Supplementary file 1c).

For both facial morphology and brain shape, SpecVar identified CNCC as the most relevant context (Figure 3e and f, Table 1). CNCC is a migratory cell population in early human craniofacial development that gives rise to the peripheral nervous system and many non-neural tissues such as smooth muscle cells, pigment cells of the skin, and craniofacial bones, which make it much more related to facial morphology and brain shape than the other 76 contexts (Cordero et al., 2011; Barlow et al., 2008). Facial morphology and brain shape were also revealed to share heritability in CNCC (Naqvi et al., 2021). But the other two methods failed to identify CNCC as the most relevant context. For brain shape, SAP identified ‘H1-hESC’ and SEG identified ‘tibial nerve’ to be the most relevant tissue (Figure 3e, Supplementary file 1c). For facial morphology, SAP and SEG identified ‘foreskin’ and ‘sigmoid colon’ to be the most relevant tissues, respectively (Figure 3f, Supplementary file 1c).

We next compared with other relevant tissue identification methods that were not based on LDSC. First, we compared SpecVar to CoCoNet, which was based on gene co-expression networks. CoCoNet is built with 38 tissues’ co-expression networks from GTEx, and we applied it to our six phenotypes. We could see CoCoNet identified ‘Breast’ as the most relevant tissue for LDL (Figure 3—figure supplement 1a) and ‘Brain_other’ as the most relevant tissue for total cholesterol (Figure 3—figure supplement 1b). ‘Breast’ was the most relevant tissue for educational attainment (Figure 3—figure supplement 1c), and ‘Stomach’ was the most relevant tissue for cognitive performance (Figure 3—figure supplement 1d). Since there is no CNCC sample in GTEx, CoCoNet revealed ‘Prostate’ as the most relevant tissue for brain shape and facial morphology. These results seemed less reasonable than SpecVar because CoCoNet did not identify liver tissues for LDL and total cholesterol and did not reveal brain tissues for educational attainment and cognitive performance. We also compared with RolyPoly, which was a non-network-based method for discovering relevant tissues. We fitted the RolyPoly model with gene expression profiles of our 77 human contexts and applied it to GWAS of LDL, total cholesterol, educational attainment, cognitive performance, and facial morphology. RolyPoly prioritized the ‘HepG2’ cell line as the most relevant tissue for LDL and total cholesterol (Figure 3—figure supplement 2a and b). ‘HepG2’ is also reasonable to be relevant to lipid phenotypes because it is the nontumorigenic cell with high proliferation rates and epithelial-like morphology that performs many differentiated hepatic functions. For educational attainment, RolyPoly did not identify the five brain tissues as the top-ranked tissue and only include ‘fetal spinal cord’ in the top five relevant tissues (Figure 3—figure supplement 2c). For cognitive performance, there were no brain tissues in the top five tissues (Figure 3—figure supplement 2d). And for facial morphology, RolyPoly failed to identify ‘CNCC’ as relevant tissues (Figure 3—figure supplement 2e). The comparison with two non-LDSC-based methods again showed the superiority of SpecVar to identify proper relevant tissues.

After identifying the relevant tissues, SpecVar could further interpret the relevance by extracting SNP-associated regulatory subnetwork (Materials and methods). For example, we obtained the brain shape’s SNP-associated regulatory subnetwork in CNCC (Figure 4a). There were 62 SNPs associated with 24 REs, 73 TFs, and 52 TGs. The TGs were tightly involved with brain development. For example, POU3F3 is a well-known TF involved in the development of the central nervous system and is related to many neurodevelopmental disorders (Snijders Blok et al., 2019). EMX2 is expressed in the developing cerebral cortex and is involved in the patterning of the rostral brain (Cecchi and Boncinelli, 2000). FOXC2 is a member of the FOX family, which are modular competency factors for facial cartilage (Xu et al., 2018), and its mutation is linked to the cleft palate (Bahuau et al., 2002). By GWAS study, FOXC2 was previously found to be associated with brain shape by its nearest significant SNP ‘16:86714715’ (Naqvi et al., 2021). However, in CNCC, we did not find any accessible peaks that overlapped with this SNP. Instead, we found a CNCC-specific RE that regulated FOXC2 in a locus of the 650k downstream (Figure 4b). GWAS revealed that the SNPs in this region had a strong association with brain shape and had high LD with each other. Our CNCC-specific regulations further prioritized only two SNPs (‘16:87237097’, ‘16:87236947’) located in this CNCC-specific RE, which may influence the expression of FOXC2 and the brain shape phenotypes. We checked the chromatin loops in the database of HiChIP (Zeng et al., 2022) and found that this SNP-associated regulation of FOXC2 was supported by a HiChIP loop in brain tissues to link this SNP locus and FOXC2 promoter. This example showed the power of SpecVar to interpret the genetic variants’ association with phenotypes by detailed regulatory networks in relevant tissues.

Figure 4 with 1 supplement see all
SpecVar uses SNP-associated regulation to interpret relevance to tissues.

(a) The brain shape’s SNP-associated regulatory subnetwork in cranial neural crest cell (CNCC). The dash arrows indicate significant SNPs that are not located in regulatory elements (RE) but near this RE. (b) SNP-associated regulation of FOXC2. There is a group of significant SNPs of brain shape that is located in the 650k downstream of FOXC2, and they are with high linkage disequilibrium. SpecVar prioritizes SNPs located in a CNCC-specific RE as causal genetic variants affecting brain shape through the regulation of FOXC2. The SNP locus and promoter of FOXC2 are linked by a HiChIP loop of the brain tissues.

The SNP-associated regulatory networks can facilitate the fine mapping of GWAS signals. Since there are eQTL data of the liver and brain tissues in the GTEx database, we collected the significant SNP-gene pairs of liver and brain tissues from GTEx to validate the identified SNP-associated regulations. For educational attainment, SpecVar revealed 7611 SNP-TG pairs in its SNP-associated regulatory network of ‘frontal cortex’, and we found that 3693 SNPs (40.1%) of these SNP-TG pairs were also SNPs of eQTL in brain tissues. Among the 2862 SNPs, 788 SNPs (10.4%) had the same TGs with the eGenes in the eQTL database (hypergeometry test, p=7.0×10121). For cognitive performance, there were 7494 SNP-TG pairs in its SNP-associated regulatory network of ‘frontal cortex’. And 3569 of them (47.6%) were also SNPs of eQTL in brain tissues. SpecVar further revealed that 988 SNPs (13.2%) had the same TGs with the eGenes in the eQTL database (p=3.2×10224). For LDL, there were 556 SNP-TGs pairs in its SNP-associated regulatory network of ‘right lobe of liver’, and we found that 45 of the SNP-TG pairs (8.1%) were also SNPs of eQTL. Two of the SNPs had the same TGs with the eGenes in the eQTL database (p=5.0×102). For total cholesterol, there were 461 SNP-TGs pairs in its SNP associated regulatory network of ‘right lobe of liver’. We found that 16 of the SNP-TG pairs (3.5%) were also SNPs of eQTL. There were two of the SNPs that had the same TGs with the eGenes in the eQTL database (p=3.0×102).

In summary, we evaluated SpecVar’s ability to identify relevant tissues using six well-studied phenotypes as the gold standard by comparison with SAP, SEG, CoCoNet, and RolyPoly. The results showed that SpecVar could identify relevant tissues more accurately and stably. Meanwhile, SpecVar provided detailed regulations to interpret the relevance to tissues and help the fine mapping of GWAS signals.

SpecVar reveals the association of multiple phenotypes by relevance correlation

SpecVar’s ability to accurately and robustly identify relevant tissues enlightens us to define the relevance correlation of two phenotypes by Spearman correlation of their R scores (Materials and methods). The relevance correlation may approximate phenotypic correlation since if two phenotypes are correlated, their relevance to human contexts will also be correlated. We used two GWAS datasets with phenotypic correlation computed from individual phenotypic data as the gold standard and compared SpecVar with two other methods SAP and SEG.

The first dataset was GWAS of 78 distances on the human face (Xiong et al., 2019). Based on summary statistics, we computed the relevance correlations of 3003 pairs of distances with SpecVar, SAP, and SEG. We compared the relevance correlation to phenotypic correlation from individual phenotypic data and computed the Pearson correlation coefficient (PCC, Materials and methods) to evaluate the performance of these three methods. SpecVar’s relevance correlation showed the best performance in approximating phenotypic correlation (Figure 5a and b, PCC = 0.522), which outperformed the other two methods: SAP PCC = 0.467 (Figure 5b) and SEG PCC = 0.405 (Figure 5b). We also evaluated the ability to approximate the phenotypic correlation of highly correlated phenotypes. By setting the threshold of phenotypic correlation to 0.4, we obtained the 363 highly correlated phenotype pairs of facial landmark distances and compared the three methods based on their performance on these 363 pairs of phenotypes. We found that SpecVar also performed best with PCC 0.467, which was the largest among the three methods: SAP PCC = 0.454 and SEG PCC = 0.245 (Figure 5c). We also used the mean square error (Figure 5—figure supplement 1a and b) and mutual information (Figure 5—figure supplement 1c and d) as metrics to evaluate the performance (Materials and methods) and SpecVar was the best among these three methods.

Figure 5 with 4 supplements see all
SpecVar defines relevance correlation to reveal association of phenotypes.

(a) The scatter plot of true phenotypic correlation and relevance correlation by SpecVar. Each point means a pair of facial distances. (b) For all phenotype pairs of facial distances, the Pearson correlation coefficient (PCC) between phenotypic correlation and relevance correlation of three methods. (c) For highly correlated phenotype pairs of facial distances, the PCC between phenotypic correlation and relevance correlation of three methods. (d) For all pairs of UKBB phenotypes, the PCC between phenotypic correlation and relevance correlation of three methods. (e) For highly correlated pairs of UKBB phenotypes, the PCC between phenotypic correlation and relevance correlation of three methods. (f) For UKBB phenotype pairs with 25 different heritability thresholds, the PCC between phenotypic correlation and relevance correlation of four methods.

The second GWAS dataset is from UK Biobank. There were 4313 GWAS in UK Biobank, from which we selected 206 high-quality GWAS summary statistics of 12 classes (Supplementary file 1d, Materials and methods). We applied SpecVar and the other two methods to obtain the relevance correlations among these 206 phenotypes and used the phenotypic correlations computed from individual data as validation. First, SpecVar performed best in the approximation of phenotypic correlation (PCC = 0.360), followed by SAP (PCC = 0.315) and SEG (PCC = 0.285) (Figure 5d). For highly correlated phenotypes, SpecVar’s relevance correlation was also closest to phenotypic correlation (Figure 5e). SpecVar’s outperformance of estimating phenotypic correlation was reproduced by using mean square error (Figure 5—figure supplement 2a and b) and mutual information (Figure 5—figure supplement 2c and d) as metrics. We found that the heritability of these 206 phenotypes was quite variable. For example, ‘rose wine intake’ had a heritability of 6.5×103 and ‘corneal resistance factor right’ had a heritability of 0.336. So, we checked whether the heritability would influence the quality of relevance correlation. To do this, we set different thresholds of heritability and obtained a subset of phenotypes for each threshold. Then for the phenotype subset of each heritability threshold, we computed the PCC between relevance correlation and phenotypic correlation. For almost all the thresholds of heritability, SpecVar showed the best performance of PCC and mean square error (Figure 5f, Figure 5—figure supplement 2e). SpecVar also had the smallest variance regarded heritability among these three methods (Figure 5—figure supplement 2f and g). This means that the relevance correlation of SpecVar could estimate phenotypic correlation more accurately and robustly.

SpecVar can interpret the relevance correlation by the common relevant tissues and shared SNP-associated regulations of two phenotypes (Supplementary file 1e). For example, ‘body mass index’ and ‘leg fat-free mass (right)’ were correlated with a phenotypic correlation of 0.697. SpecVar obtained a relevance correlation of 0.602, while SAP obtained a relevance correlation of 0.342 and SEG gave a relevance correlation of 0.437. SpecVar further revealed that these two phenotypes were correlated because they were both most relevant to the ‘frontal cortex’ (Figure 6a). Body mass index has been reported to be related to frontal cortex development (Laurent et al., 2020) and relevant to the reduced and thin frontal cortex (Islam et al., 2018; Shaw et al., 2018). Obesity and fat accumulation are also revealed to be associated with the frontal cortex (Gluck et al., 2017; Kakoschke et al., 2019). SpecVar then extracted these two phenotypes’ SNP-associated regulatory networks in the ‘frontal cortex’ and found that their SNP-associated networks were significantly shared. The significant overlapping was observed at SNP, RE, TG, and TF levels: p=8.2×1063 for SNPs, p=1.4×1047 for REs, p=6.0×1025 for TGs, and p=8.2×1025 for TFs (Figure 6b). The shared regulatory network was involved with body weight and obesity. For example, in the brain, SH2B1 enhances leptin signaling and leptin’s antiobesity action, which is associated with the regulation of energy balance, body weight, and glucose metabolism (Rui, 2014). We found that one common significant SNP ‘16:2896143’of these two phenotypes was located in the specific REs of the ‘frontal cortex’ at the 90k downstream of SH2B1. Even though this RE was near the promoter of NFATC2IP, SpecVar revealed that it regulated the expression of SH2B1, which was supported by a HiChIP loop in brain tissues to associate the SNP-associated RE and promoter of SH2B1 (Figure 6c). These results indicated that one shared SNP was located in the ‘frontal cortex’-specific RE and might regulate the expression of SH2B1 to influence two phenotypes: ‘body mass index’ and ‘leg fat-free mass (right)’.

SpecVar uses common relevant tissues and shared SNP-associated regulatory network to interpret relevance correlation.

(a) Scatter plot of R scores across 77 human contexts of ‘body mass index’ and ‘leg fat-free mass (right)’. (b) The SNP-associated regulatory network of ‘body mass index’ and ‘leg fat-free mass (right)’ are significantly shared. (c) SNP-associated regulation of SH2B1. There is a shared significant SNP of ‘body mass index’ and ‘leg fat-free mass (right)’ that is located at the 90k downstream of SH2B1, and there is a HiChIP loop linking this locus to the promoter of SH2B1. SpecVar prioritizes SNP located in a ‘frontal cortex’-specific regulatory elements (RE) as potential causal genetic variant affecting both ‘body mass index’ and ‘leg fat-free mass (right)’ through regulation of SH2B1.

Through the application of relevance correlation to two datasets with the gold standard of phenotypic correlation, we concluded that SpecVar can use the accurate relevance score to define relevance correlation, which could better estimate phenotypic correlation. SpecVar could further reveal common relevant tissues and shared SNP-associated regulatory networks to interpret correlation of phenotypes.

Discussion

In this article, we introduce the context-specific regulatory network, which integrate paired gene expression and chromatin accessibility data, to construct context-specific regulatory categories for better interpretation of GWAS data. SpecVar is developed as a tool to interpret genetic variants based on GWAS summary statistics. The key message is that integrating chromatin accessibility and gene expression data into context-specific regulatory networks can provide better regulatory categories for heritability enrichment (Gazal et al., 2019). SpecVar is based on the popular model stratified LDSC (Finucane et al., 2015), which includes 52 function categories as the baseline model. In addition, we show that extending the functional categories from noncontext-specific regions to context-specific regions can improve the heritability enrichment, which is consistent with other studies based on gene expression (Finucane et al., 2018) and ChIP-seq (van de Geijn et al., 2020) data.

Our main contribution in SpecVar is from (1) integrating paired gene expression data and chromatin, which are two types of most easily accessible data, into regulatory networks; (2) highlighting the comparison with other contexts by the specificity of the regulatory network to narrow down the genes and REs, which will help us explain how the phenotype-associated SNPs influence the tissue or cell types in a specific way; and (3) more interpretability than existing methods because SpecVar gives more biological insights in explaining the relevance to tissues by SNP-associated regulatory networks and interpreting phenotype correlation through common relevant tissues and shared SNP-associated regulatory network. To study the contribution of two important parts of SpecVar to the improvement of heritability enrichment: (1) regulatory network by integrating gene expression and chromatin accessibility data, and (2) specificity by comparing with other contexts, we conducted ablation analysis. First, we sorted out the main components of SpecVar (Figure 2—figure supplement 1c) and illustrated the relationship and differences of the five methods. Then we compared SpecVar with ARE, SAP, and SEG to show the effect of specificity, gene expression annotations, and chromatin accessibility (Figure 2—figure supplement 1d), respectively. We conducted these comparisons for six phenotypes and computed fold change of heritability enrichment to measure the effect to improve heritability enrichment. We found that the annotation from chromatin accessibility contributed most to improve heritability enrichment (Figure 2—figure supplement 1e). This meant that the regulatory networks of SpecVar, which integrated chromatin accessibility with gene expression data, played a more important role in improving heritability enrichment. And specificity also improved heritability enrichment by four-fold.

SpecVar outperformed the existing methods in three points. First, SpecVar defined the relevance score based on both heritability enrichment and p-value. Because of the variability in the number of REs in the regulatory categories (Supplementary file 1f), using only heritability enrichment or p-value will not give a stable estimation of the relevance of phenotype to tissues. For example, in the experiment of identifying six phenotypes’ relevant tissues, heritability enrichment could select the most relevant tissues for LDL and total cholesterol to be the ‘right lobe of liver’ but failed to get proper tissues for other four phenotypes (Figure 3—figure supplement 3). P value could obtain proper tissues for cognitive performance (‘frontal cortex’) and brain shape (CNCC) but failed to get relevant tissues for LDL, total cholesterol, educational attainment, and facial morphology (Figure 3—figure supplement 4). By combining heritability enrichment and p-value into R score, SpecVar could prioritize proper relevant tissues for all the six phenotypes (Figure 3). Like the R score-based relevance correlation, we could use the heritability enrichment and p-value to compute relevance correlation (Figure 5—figure supplement 3a and b). We found relevance correlation based on heritability enrichment and p-value would give larger MSE (Figure 5—figure supplement 3c and e) and lower PCC (Figure 5—figure supplement 3d and f) than the R score, which showed that SpecVar’s R score can achieve a better approximation of phenotypic correlation. Those comparisons showed that the R score was a good metric to evaluate tissue’s relevance to the phenotype. Second, SpecVar’s regulatory categories had advantages over the existing functional categories to explain heritability. The context-specific regulatory networks formed regulatory categories that enable better heritability enrichment than other methods (Figure 2). The improved heritability enrichment was also observed when we pooled regulatory categories of all the methods together to fit stratified LDSC and re-estimated the heritability enrichment of each context for each method (Figure 2—figure supplement 2). The regulatory categories of SpecVar can be used to calculate R scores to identify relevant tissues more accurately than other methods (Figure 3). And the R score of SpecVar can also be used to compute relevance correlation to better approximate phenotypic correlation than other methods when we do not have comprehensive phenotype measurement in each individual (Figure 5). Third, with the constructed context-specific regulatory network atlas, SpecVar could further interpret the relevant tissue by SNP-associated regulatory networks (Figure 4) and interpret relevance correlation by common relevant tissues and shared SNP-associated regulations in relevant tissues (Figure 6). These three aspects made SpecVar an interpretable tool for heritability enrichment, identifying relevant tissues, and accessing associations of phenotypes.

Identification of relevant contexts can be conducted at different resolutions. First, in this article, SpecVar identified relevant contexts at the tissue level because most of the 77 contexts were tissues or cell lines. Second, we could identify relevant organs or groups of tissues. For example, our 77 human contexts can be hieratically clustered into 36 groups. We pooled the REs in the same groups together to form 36 group-level RE sets. Then we defined an RE to be a group-specific RE if it was not overlapped with REs in other groups. This procedure would form 36 sets of group-specific REs. Finally, we fitted the stratified LDSC model with these 36 sets of group-specific regulatory categories and obtained heritability enrichment, p-value, and R score for each group by SpecVar. We tested the group-based SpecVar on our six phenotypes. For LDL and total cholesterol, SpecVar identified ‘Liver’ tissues to be the most relevant tissue (Figure 3—figure supplement 5a and b). For educational attainment and cognitive performance, SpecVar revealed ‘brain’ tissues to be the most relevant tissues (Figure 3—figure supplement 5c and d). And for brain shape and facial morphology, SpecVar found ‘CNCC’ to be the most relevant tissue (Figure 3—figure supplement 5e and f). Third, we can identify relevant contexts at the cell-type level since the amount of single-cell multimodal data, such as single-cell RNA-seq and single-cell ATAC-seq data, are increasing in recent years (Han et al., 2020). The paired single-cell multi-omics data, which means single-cell data profiled from the same context, enable us to identify cell types and infer regulatory networks for these cell types (Duren et al., 2018; Zeng et al., 2019). It will be promising to construct an atlas of context-specific regulatory networks at cell-type level and build SpecVar model based on these cell-type-specific regulatory networks. This extension of SpecVar to single-cell level holds the promise to identify more detailed relevant cell types for given phenotypes.

Based on the accurate and highly interpretable relevant tissue identification, the relevance correlation of SpecVar provides us with another perspective of associations between two phenotypes: if two phenotypes are correlated, their relevance to human contexts will also be correlated. This rationale is independent of genetic correlation, which is the proportion of variance that two phenotypes share due to genetic causes and can be estimated with GWAS summary statistics by LDSC-GC (Bulik-Sullivan et al., 2015a). When using measured phenotype value correlation as the gold standard of phenotype correlation, we found that SpecVar performed better when the heritability of phenotype was low while LDSC-GC performed better when the heritability was high (Figure 5—figure supplement 4a and b). This indicated that the integration of relevance correlation and genetic correlation might give a better estimation of phenotypic correlation. We validated this idea by regressing phenotypic correlation on relevance correlation and genetic correlation in two GWAS datasets. For the phenotypes of facial distances, if we only used relevance correlation to regress phenotypic correlation, the coefficient of determination (R2) was 0.2720; if we only used genetic correlation, the R2 was 0.0002; if we used the linear combination of relevance correlation and genetic correlation to regress phenotypic correlation, the R2 was 0.2765, which was significantly higher than that only with SpecVar (F-test of R2 increase, p1.8×105) or only with LDSC-GC (p5.3×10213); and if we used a product (nonlinear combination) of relevance correlation and genetic correlation, the R2 was much higher: 0.2911 (Figure 5—figure supplement 4c and d). And for 206 phenotypes of UK Biobank, if we only used relevance correlation, the R2 was 0.1289; if we only used genetic correlation, the R2 was 0.5614; if we used the linear combination of relevance correlation and genetic correlation to regress phenotypic correlation, the R2 was 0.5927, which was significantly higher than that only with SpecVar (p2.2×1016) or only with LDSC-GC (p2.2×1016); and if we used a product of relevance correlation and genetic correlation, the R2 was 0.7375, which was much more improved (Figure 5—figure supplement 4e and f). These results showed that relevance correlation and genetic correlation revealed the association of phenotypes in a complementary way.

Our work can be improved in several aspects. The usage of context-specific regulatory networks contributed most to the improvement of SpecVar. But the context-specific regulatory networks can only cover part of the regulatory elements and genetic variants, which are highly essential and representative. Higher-quality and more comprehensive regulatory networks will help obtain better representation. Currently, we built the atlas of regulatory networks of 77 human contexts and only included CNCC in the early developmental stage, which was far from complete. We expect more developmental stages will be included with multi-omics data from ENCODE (Snyder et al., 2020) and GTEx (Consortium, 2020). On the other hand, SpecVar is based on stratified LDSC, which may not perform well on admixed populations. Thus, considering the effect of mixed ancestries will help SpecVar handle more circumstances. For example, Luo. et al. have developed cov-LDSC to adjust the LDSC model to be suitable for admixed populations (Luo et al., 2021). Building SpecVar model based on cov-LDSC will be promising to perform well on GWAS with mixed populations. Lastly, it will be useful to extend the current approach using a model based on individual whole-genome sequencing data (Li et al., 2020).

Materials and methods

Regulatory network inference with paired expression and chromatin accessibility data by PECA2

Request a detailed protocol

The regulatory networks were inferred by the PECA2 (Duren et al., 2020) model with paired expression and chromatin accessibility data. First, we collected paired expression and chromatin accessibility data of 76 human tissue or cell lines from ENCODE and ROADMAP (Supplementary file 1a). Then with paired expression and accessibility data of each context, PECA2 calculated two scores. One was the trans-regulatory score. Specifically, PECA2 hypothesized that TF regulated the downstream TG by binding at REs. The trans-regulatory score was calculated by integrating multiple REs bound by a TF to regulate TG to quantify the regulatory strength of this TF on the TG. And PECA2 also considered a prior TF-TG correlation across external public data from ENCODE database. In detail, the trans-regulatory score TRSij of ith TF and jth TG was quantified as

(3) TRSij=kBikOkIkj×2Rij×TFiTGj

Here, TFi and TGj are the expressions of the ith TF and jth TG. Bik is the motif binding strength of ith TF on kth RE, which was defined as the sum of the binding strength of all the binding sites of ith TF on kth RE. Ok is the measure of accessibility for kth RE. Ikj represents the interaction strength between kth RE and jth TG, which was learned from the PECA model on diverse ENCODE cellular contexts (Duren et al., 2017; Duren et al., 2018). Rij is the expression correlation of ith TF and jth TG across diverse ENCODE samples. The significance of the TRS was obtained by a background of randomly selected TF-TG pairs and the threshold of the TRS was decided by controlling the false discovery rate (FDR) at 0.001.

The other one was the cis-regulatory score to measure the regulatory strength of RE on a TG. The cis-regulatory score CRSkj of kth RE on jth TG was quantified as

(4) CRSkj=iBikTRSij×Ikj×Ok

We approximated the distribution of log21+CRSkj by a normal distribution and predicted RE-TG associations by selecting the RE-TG pairs that had p-value≤0.05.

The output of PECA2 was a regulatory network with TFs, REs, and TGs as nodes and the regulations among them as edges. This procedure was applied to 76 human contexts with paired expression and chromatin accessibility data and obtained 76 regulatory networks. We noted that the regulatory network of the early development stage CNCC was reconstructed recently (Feng et al., 2021), and we included the regulatory network of CNCC to form our regulatory network atlas of 77 human contexts.

Construction of context-specific regulatory network atlas and regulatory categories

Request a detailed protocol

The context-specific regulatory network was obtained based on the specificity of REs. In detail, we had 77 regulatory networks, and each regulatory network had a set of REs REi,1i77. Firstly, we hierarchically clustered 77 contexts’ regulatory networks into 36 groups by trans-regulatory scores (Supplementary file 1a). Then for a given context, a RE was defined as a context-specific RE if it was not overlapped with REs of other contexts. Formally, the context-specific RE set of ith context Ci was defined as

(5) Ci=REikREi|REikREj,ji

Here, REikREj means REik was not overlapped with any REs in REj:

(6) REikREjREik is not overlapped with any REjl in REj

And we defined ‘overlapped’ (1) for REs from contexts of the different groups, two REs were overlapped if their overlapping base ratio was over 50%; (2) for REs from contexts of the same group, two REs were overlapped if their overlapping base ratio was over 60%. The reason we used different ‘overlapped’ criteria for REs from the same group and different groups was to retain group-specific REs. For example, we had five cell types for the brain tissues: ‘Ammon’s horn’, ‘caudate nucleus’, ‘cerebellum’, ‘frontal cortex’, and ‘putamen’. If we defined RE’s specificity with stringent conditions among these five brain cell types, many REs of brain lineage would be lost.

Finally, the context-specific regulatory network was formed by specific REs and their directly linked upstream TFs and downstream TGs. And the context-specific RE sets Ci,1i77 gave the regulatory categories of SpecVar.

Heritability enrichment and R score of GWAS summary statistics by SpecVar

Request a detailed protocol

SpecVar used stratified LDSC (Finucane et al., 2015) to compute partitioned heritability enrichment. Under the linear additive model, stratified LDSC models the causal SNP effect on phenotype as drawn from a distribution with mean zero and variance

(7) Varβj=iτi1jCi

And with the assumption that the LD of a category that is enriched for heritability will increase the χ2 statistic of an SNP more than the LD of a category that does not contribute to heritability, the expected χ2 statistic is modeled as follows:

(8) Eχj2=NCiτilj,i+Na+1

where N is the sample size; Ci denotes the regulatory category formed by the ith context-specific regulatory network; χj2 is the marginal association of SNP j from GWAS summary statistics; rjk2 is the LD score of SNP j in the ith category, where rjk2 was the genotype correlation between SNP j and SNP k; a measures the contribution of confounding biases; and τi represents heritability enrichment of SNPs in Ci . Stratified LDSC estimates standard error with a block jackknife and uses the standard error to calculate the p-value pi for the heritability enrichment τi (Finucane et al., 2015).

To make a trade-off between the heritability enrichment score and p-value resulting from a hypothesis test, we combined heritability enrichment and statistical significance (p-value) to define the relevance score (Ri) of this phenotype to ith context as follows:

(9) Ri=τi-logpi

The relevance score (R score) offered a new robust means to rank and select relevant tissue for a given phenotype (Xiao et al., 2014; Figure 3—figure supplements 3 and 4, and Figure 5—figure supplement 3).

Significance testing of R score and identification of relevant contexts

Request a detailed protocol

We used block jackknife to estimate the standard error, p-value, and FDR for the R score. Specifically, we computed the R score Ri of a phenotype to ith context. Then we estimated the standard error, p value, and FDR of Ri following the procedures:

  1. For the ith context, we divided its specific RE into 100 folds.

  2. One subsample was generated by removing one of the 100 folds, and we generated 100 subsamples of ith context.

  3. These 100 subsamples of specific REs would form new regulatory categories for fitting stratified LDSC. For each subsample, we obtained heritability enrichment, p-value, and R score by SpecVar.

  4. With the 100 background R scores of the 100 subsamples, we could estimate the standard error (SDi) of Ri for ith context.

  5. We computed the z scores of ith context: Zi=Ri/SDi~N0,1 , and estimated the p-value and FDR q-value.

The R scores and their FDR q-values could be used to select relevant tissues. We used R score ≥100 and FDR ≤0.01 as the threshold to pick up relevant tissues for a phenotype. For six well-studied phenotypes in this article, their relevant tissues, R scores, and FDR are summarized in Table 1.

Four alternative methods to construct representations of GWAS summary statistics

Request a detailed protocol

Based on expression and chromatin accessibility data, there were four alternative methods for constructing regulatory categories: AAP, SAP, SEG, and ARE.

The AAP method used all the chromatin-accessible peaks of each context to form a genome functional category, which was used for partitioned heritability enrichment analysis. The SAP method used the same rules of SpecVar above to obtain specifically accessible peaks of each context, and the context-specific peaks sets of M contexts formed functional categories of SAP. The SEG method was constructed by following the procedure in Finucane et al., 2018. First, the t-statistics for differential expression of each gene in each of the M contexts were calculated. Then for each context, the top 10% genes ranked by t-statistic were selected, and the 100 kb windows around those top 10% genes were used to form a functional category. For the ARE method, we obtained all REs in the regulatory network of a context to be a functional category, and the RE sets of M contexts formed regulatory categories of ARE.

We could conclude the difference and relationship between the five methods (Figure 2—figure supplement 1c): SEG was the combination of gene expression and specificity; AAP was only from chromatin accessibility; SAP was the combination of chromatin accessibility and specificity; ARE integrated gene expression and chromatin accessibility; and SpecVar considered gene expression, chromatin accessibility, and specificity. Supplementary file 1f shows the number and size of genomic regions of each method and the overlapping among these methods.

After obtaining functional categories with these four alternate methods, we could also use stratified LDSC to obtain heritability enrichment and define the R score representation of GWAS summary statistics with Equations 8 and 9. We called them AAP, SAP, SEG, and ARE, respectively. We compared these four alternate methods with SpecVar.

Relevance correlation analysis of SpecVar

Request a detailed protocol

SpecVar defined relevance correlation based on R scores. The R scores to M contexts could be aggregated into a context-specific vector representation of GWAS summary statistics:

(10) R=R1,R2,,RM

For two phenotypes, such as phenotype p and phenotype q, we obtained their R score representations:

Rp=R1p,R2p,,RMp
(11) Rq=R1q,R2q,,RMq

Then the Spearman correlation of their R score representation was used to define the relevance correlation:

(12) ρg=ρRp,Rq=1-6irRip-rRiq2M*M2-1

Here, rRip and rRiq are the ranks of ith context by the R score for the two phenotypes.

For two other specificity-based regulatory categories SAP and SEG, we also used their functional categories to compute heritability enrichment and p-value and defined the R score with Equations 7–9. The R scores of SAP and SEG were used to compute relevance correlation.

Evaluation of relevant tissue identification and relevance correlation

Request a detailed protocol

To evaluate the performance of the SpecVar and other methods, we used different datasets as the gold standard.

For the application to identify relevant tissues, we used six well-studied phenotypes that we had knowledge of the relevant tissues: two lipid phenotypes (LDL and total cholesterol) were relevant to the liver; two human intelligential phenotypes (educational attainment and cognitive performance) were relevant to the brain; two craniofacial bone phenotypes (facial morphology and brain shape) were relevant to CNCC. We used different methods to identify relevant tissues of these six phenotypes and checked whether they obtained the proper relevant tissues.

For relevance correlation, we used the phenotypic correlation computed with individual phenotypic data as the gold standard. First, we computed the PCC between relevance correlation and phenotypic correlation:

(13) PCC=i,jP(pijp¯)(pijp¯)i,jP(pijp¯)2i,jP(pijp¯)2

Here, P is the set of phenotypes, and N is the number of phenotype pairs; pij is the phenotypic correlation computed with individual phenotypic data, and pij` is the relevance correlation; p¯ is the average of pij , and p¯ is the average of pij` . A larger PCC indicated better performance in approximating phenotypic correlation.

We also used the mean square error (MSE) between relevance correlation and phenotypic correlation:

(14) MSE=i,jPpij-pij`2N

A smaller MSE indicated better performance in approximating phenotypic correlation.

The last metric we used to evaluate the performance of approximating phenotypic correlation was mutual information, which was a nonlinear metric. We used the ‘k-nearest neighbor’ stratagem proposed by Kraskov et al. to estimate the mutual information of two vectors (Kraskov et al., 2004).

Extracting SNP-associated regulatory subnetworks in relevant tissues

Request a detailed protocol

Given a phenotype’s GWAS summary statistics and a context, SpecVar identified SNPs-associated regulatory subnetwork by considering the following two factors: (1) the cis-regulatory score of SNP-associated RE should be large enough to indicate its importance in the regulatory network; (2) the risk signal of SNPs (i.e., p-value) on or near this RE should be large to indicate its association with phenotype. We combined these two factors to define the association score (A score) of SNP-associated REs.

First, the regulatory strength of kth RE was measured by the maximum cis-regulatory score of this RE. Formally,

(15) Ck=maxjCRSkj

Here, CRSkj is the cis-regulatory score of kth RE on jth TG. For the kth RE, the larger Ck was, the more important role this RE played in the regulatory network. Second, the risk score of GWAS Sk for kth RE was defined as the average of the -log(p-value) of SNPs located on or near this RE, which were downweighted by their LD scores and distances to RE:

(16) Sk=1|Pk|lPkωllog(pl)edlkd0

Here, Pk is the set of SNPs whose distances were less than 50 kb to the kth RE, and Pk is the total number of this SNP set; ωl (the reciprocal of LD score, downloaded at https://data.broadinstitute.org/alkesgroup/LDSCORE/) is the weight of the lth SNP; pl is the p-value of the lth SNP in summary statistics; dlk is the base pair distance of the lth SNP to kth RE and d0 is a constant, which was set to 5000 as default. For the kth RE, a larger value of Sk indicated a stronger association with the given phenotype.

Finally, we obtained the association score (A score) of kth RE by combining these two factors:

(17) Ak=Ck*Sk

Every RE in the context-specific regulatory network was qualified by the A score. We used the GWAS of six phenotypes to analyze the distribution of A scores and found that the A scores followed a Gaussian distribution (Figure 4—figure supplement 1). So, we hypothesized the distribution of A scores was Gaussian distribution and selected the REs associated with the given phenotype by A scores’ FDR threshold of 0.05. The prioritized REs, as well as their directly linked upstream TFs, downstream TGs, and the associated SNPs, formed the SNP-associated regulatory subnetwork.

GWAS summary statistics of UK Biobank

Request a detailed protocol

The GWAS summary statistics of UK Biobank were downloaded at http://www.nealelab.is/uk-biobank. There were 4176 phenotypes and 11,372 GWAS summary statistics. We selected 206 GWAS summary statistics (Supplementary file 1d) based on the following conditions.

  1. Excluding sex-specific and ‘raw’-type GWAS.

  2. Sample size condition: N50,000 and Ncontrol , Ncase10,000 for binary and categorical phenotypes.

  3. Significant SNP number condition: the number of SNPs that pass the threshold of 5×10-8 was not less than 500.

  4. Manually curation: removing duplicated phenotypes, ‘job’, ‘parent’, and ‘sibling’-associated phenotypes.

Data availability

Codes and regulatory network resources are available at https://github.com/AMSSwanglab/SpecVar, (copy archived at swh:1:rev:cf27438d3f8245c34c357ec5f077528e6befe829, Feng et al., 2022a). Expression and chromatin accessibility data were summarized in Supplementary file 1a. GWAS data used: GWAS summary statistics of LDL and TC were downloaded at http://csg.sph.umich.edu/willer/public/lipids2013/; GWAS summary statistics of EA (GCST006442), CP (GCST006572), BrainShape (GCST90012880-GCST90013164), and Face (GCST009464) were downloaded at GWAS catalog https://www.ebi.ac.uk/gwas/summary-statistics; GWAS summary statistics of UK-Biobank were downloaded at http://www.nealelab.is/uk-biobank. The LDSC genetic correlation and phenotypic correlation computed from individual phenotypic data were downloaded at https://ukbb-rg.hail.is/.

References

    1. Kraskov A
    2. Stögbauer H
    3. Grassberger P
    (2004) Estimating mutual information
    Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics 69:066138.
    https://doi.org/10.1103/PhysRevE.69.066138
    1. Willer CJ
    2. Schmidt EM
    3. Sengupta S
    4. Peloso GM
    5. Gustafsson S
    6. Kanoni S
    7. Ganna A
    8. Chen J
    9. Buchkovich ML
    10. Mora S
    11. Beckmann JS
    12. Bragg-Gresham JL
    13. Chang H-Y
    14. Demirkan A
    15. Den Hertog HM
    16. Do R
    17. Donnelly LA
    18. Ehret GB
    19. Esko T
    20. Feitosa MF
    21. Ferreira T
    22. Fischer K
    23. Fontanillas P
    24. Fraser RM
    25. Freitag DF
    26. Gurdasani D
    27. Heikkilä K
    28. Hyppönen E
    29. Isaacs A
    30. Jackson AU
    31. Johansson Å
    32. Johnson T
    33. Kaakinen M
    34. Kettunen J
    35. Kleber ME
    36. Li X
    37. Luan J
    38. Lyytikäinen L-P
    39. Magnusson PKE
    40. Mangino M
    41. Mihailov E
    42. Montasser ME
    43. Müller-Nurasyid M
    44. Nolte IM
    45. O’Connell JR
    46. Palmer CD
    47. Perola M
    48. Petersen A-K
    49. Sanna S
    50. Saxena R
    51. Service SK
    52. Shah S
    53. Shungin D
    54. Sidore C
    55. Song C
    56. Strawbridge RJ
    57. Surakka I
    58. Tanaka T
    59. Teslovich TM
    60. Thorleifsson G
    61. Van den Herik EG
    62. Voight BF
    63. Volcik KA
    64. Waite LL
    65. Wong A
    66. Wu Y
    67. Zhang W
    68. Absher D
    69. Asiki G
    70. Barroso I
    71. Been LF
    72. Bolton JL
    73. Bonnycastle LL
    74. Brambilla P
    75. Burnett MS
    76. Cesana G
    77. Dimitriou M
    78. Doney ASF
    79. Döring A
    80. Elliott P
    81. Epstein SE
    82. Ingi Eyjolfsson G
    83. Gigante B
    84. Goodarzi MO
    85. Grallert H
    86. Gravito ML
    87. Groves CJ
    88. Hallmans G
    89. Hartikainen A-L
    90. Hayward C
    91. Hernandez D
    92. Hicks AA
    93. Holm H
    94. Hung Y-J
    95. Illig T
    96. Jones MR
    97. Kaleebu P
    98. Kastelein JJP
    99. Khaw K-T
    100. Kim E
    101. Klopp N
    102. Komulainen P
    103. Kumari M
    104. Langenberg C
    105. Lehtimäki T
    106. Lin S-Y
    107. Lindström J
    108. Loos RJF
    109. Mach F
    110. McArdle WL
    111. Meisinger C
    112. Mitchell BD
    113. Müller G
    114. Nagaraja R
    115. Narisu N
    116. Nieminen TVM
    117. Nsubuga RN
    118. Olafsson I
    119. Ong KK
    120. Palotie A
    121. Papamarkou T
    122. Pomilla C
    123. Pouta A
    124. Rader DJ
    125. Reilly MP
    126. Ridker PM
    127. Rivadeneira F
    128. Rudan I
    129. Ruokonen A
    130. Samani N
    131. Scharnagl H
    132. Seeley J
    133. Silander K
    134. Stančáková A
    135. Stirrups K
    136. Swift AJ
    137. Tiret L
    138. Uitterlinden AG
    139. van Pelt LJ
    140. Vedantam S
    141. Wainwright N
    142. Wijmenga C
    143. Wild SH
    144. Willemsen G
    145. Wilsgaard T
    146. Wilson JF
    147. Young EH
    148. Zhao JH
    149. Adair LS
    150. Arveiler D
    151. Assimes TL
    152. Bandinelli S
    153. Bennett F
    154. Bochud M
    155. Boehm BO
    156. Boomsma DI
    157. Borecki IB
    158. Bornstein SR
    159. Bovet P
    160. Burnier M
    161. Campbell H
    162. Chakravarti A
    163. Chambers JC
    164. Chen Y-DI
    165. Collins FS
    166. Cooper RS
    167. Danesh J
    168. Dedoussis G
    169. de Faire U
    170. Feranil AB
    171. Ferrières J
    172. Ferrucci L
    173. Freimer NB
    174. Gieger C
    175. Groop LC
    176. Gudnason V
    177. Gyllensten U
    178. Hamsten A
    179. Harris TB
    180. Hingorani A
    181. Hirschhorn JN
    182. Hofman A
    183. Hovingh GK
    184. Hsiung CA
    185. Humphries SE
    186. Hunt SC
    187. Hveem K
    188. Iribarren C
    189. Järvelin M-R
    190. Jula A
    191. Kähönen M
    192. Kaprio J
    193. Kesäniemi A
    194. Kivimaki M
    195. Kooner JS
    196. Koudstaal PJ
    197. Krauss RM
    198. Kuh D
    199. Kuusisto J
    200. Kyvik KO
    201. Laakso M
    202. Lakka TA
    203. Lind L
    204. Lindgren CM
    205. Martin NG
    206. März W
    207. McCarthy MI
    208. McKenzie CA
    209. Meneton P
    210. Metspalu A
    211. Moilanen L
    212. Morris AD
    213. Munroe PB
    214. Njølstad I
    215. Pedersen NL
    216. Power C
    217. Pramstaller PP
    218. Price JF
    219. Psaty BM
    220. Quertermous T
    221. Rauramaa R
    222. Saleheen D
    223. Salomaa V
    224. Sanghera DK
    225. Saramies J
    226. Schwarz PEH
    227. Sheu WH-H
    228. Shuldiner AR
    229. Siegbahn A
    230. Spector TD
    231. Stefansson K
    232. Strachan DP
    233. Tayo BO
    234. Tremoli E
    235. Tuomilehto J
    236. Uusitupa M
    237. van Duijn CM
    238. Vollenweider P
    239. Wallentin L
    240. Wareham NJ
    241. Whitfield JB
    242. Wolffenbuttel BHR
    243. Ordovas JM
    244. Boerwinkle E
    245. Palmer CNA
    246. Thorsteinsdottir U
    247. Chasman DI
    248. Rotter JI
    249. Franks PW
    250. Ripatti S
    251. Cupples LA
    252. Sandhu MS
    253. Rich SS
    254. Boehnke M
    255. Deloukas P
    256. Kathiresan S
    257. Mohlke KL
    258. Ingelsson E
    259. Abecasis GR
    260. Global Lipids Genetics Consortium
    (2013) Discovery and refinement of loci associated with lipid levels
    Nature Genetics 45:1274–1283.
    https://doi.org/10.1038/ng.2797

Decision letter

  1. Charles Farber
    Reviewing Editor; University of Virginia, United States
  2. David E James
    Senior Editor; University of Sydney, Australia

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and David James as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1. Please improve the description of the advance of the approach relative to other publications. How does this combination of an existing network method and an existing heritability enrichment method constitute substantial progress in the field?

2. Reporting of SpecVar-based heritability enrichment point estimates should be accompanied by appropriate measures of statistical significance (p-values, FDR-adjusted q-values, confidence intervals, etc.).

3. The comparisons between SpecVar and other approaches (AAP, ARE, SAP, SEG) are not sufficiently rigorous.

Reviewer #1 (Recommendations for the authors):

The approach taken in the study is conceptually interesting, however, it's not clear how novel the approach is. Additionally, there are several issues related to reporting results that need to be addressed. In particular, the authors do not report the statistical significance of enrichments using the new annotation aggregation method. Therefore, it is not possible to evaluate whether the higher overall heritability enrichment of the aggregate annotations they report compared to existing annotation approaches is meaningful.

1. Describe the novelty of the approach relative to other publications. How does this combination of an existing network method and an existing heritability enrichment method warrant a substantial advance in the field?

2. Reporting of SpecVar-based heritability enrichment point estimates should be accompanied by appropriate measures of statistical significance (p-values, FDR-adjusted q-values, confidence intervals, etc.).

3. The comparisons between SpecVar and other approaches (AAP, ARE, SAP, SEG) are not sufficiently rigorous.

a. First, please provide additional context. What is the amount of the genome covered by the regions defined by each approach? What is the overlap between these regions? For example, Table S4 shows SpecVar sets typically include substantially fewer regions than SAP. Are SpecVar regions primarily a subset of SAP regions or do they contain distinct/non-overlapping territory in the genome?

b. Second, please provide a way to evaluate whether differences in enrichment between the methods are statistically meaningful.

c. Fitting a joint LDSC model with multiple annotation sets would provide a more rigorous assessment of the performance of SpecVar relative to the other approaches.

4. Y-axes for bar plots in Figure 5 should start at 0. The floating y-axis origin is highly misleading.

5. The approach to defining context-specific REs based on overlap (>50% of bases overlapping for cross-group comparisons, and >60% of bases overlapping for within-group comparisons) seems quite stringent and likely excludes a lot of lineage-specific REs. Calculating enrichments for "group-specific" regulatory networks, where REs are considered group-specific if they do not overlap any of the other 35 groups resulting from the hierarchical clustering (ignoring whether an RE overlaps REs from other contexts within the same group), might provide more biologically relevant region sets.

Reviewer #2 (Recommendations for the authors):

– The authors only show the performance of the SpecVar method on 6 traits, however, it is not clear whether these are representative of all of the traits in UKBiobank. For traits with fewer SNP associations and lower sample numbers it appears that LDSC-SAP produces much high trait relevance scores, however, the authors use different tissues in each method so it may not be a fair comparison.

– The authors use the Pearson correlation coefficient to evaluate the performance of the SpecVar method, however, they should also consider other nonlinear metrics.

– The authors should consider a comparison with other regulatory network-based heritability methods such as CoCoNet, which is based on co-regulated genes. They should also compare to other non-network-based methods.

– The highlighted example SNP-associated network for the FOXC2 variants is interesting, however, the authors should demonstrate whether there are chromatin interactions (HiC or HiChIP) in brain tissues linking these variants in ATAC-seq peaks to the FOXC2 promoter. It would also be helpful to highlight another example using distinct UKBB phenotype and tissue datasets.

– It will be more interesting to adapt this to paired multimodal single-cell data if possible or expand on this in the Discussion.

– Lastly, LDSC-based methods may not perform well on admixed populations, thus some discussion on how this could be adapted using covariate-adjusted approaches (e.g. cov-LDSC) would be helpful.

Reviewer #3 (Recommendations for the authors):

The study would benefit from investigating the following questions:

Which part of the new annotation drives the extreme heritability enrichment from SpecVar, compared to other LDSC-based methods?

How to use the R score for a formal hypothesis test, rather than subjectively picking a few top-ranked tissues for the trait? How to evaluate the false positive rate for the test?

How to explain the inconsistent findings with previous studies, e.g., the association of the liver to LDL?

Are the GWAS signal in context-specific RE colocalised with context-specific eQTL signals?

How to link relevance score correlation across tissues to shared heritability between traits?

https://doi.org/10.7554/eLife.82535.sa1

Author response

Essential revisions:

1. Please improve the description of the advance of the approach relative to other publications. How does this combination of an existing network method and an existing heritability enrichment method constitute substantial progress in the field?

We have followed the editors’ and reviewers’ suggestions to add more description about our novelty. We further conducted the ablation analysis to demonstrate the complementary contribution of two components (regulatory network and specificity) of SpecVar. The ablation analysis showed integrating paired gene expression and chromatin accessibility data into regulatory network played an essential role in improving heritability enrichment for SpecVar (Figure 2-figure supplement 1). Together with additional description and discussion, we highlighted SpecVar’s novelty in the integration of multi-omics data, specificity, and interpretability. We thank the editor and reviewers for their kind suggestions, which help us to greatly clarify the novelty and contribution of our work.

2. Reporting of SpecVar-based heritability enrichment point estimates should be accompanied by appropriate measures of statistical significance (p-values, FDR-adjusted q-values, confidence intervals, etc.).

We thank the reviewers and editor for the constructive suggestion for improving our paper. In the revision, we provided more detailed information on heritability enrichment, including p-values, FDR-adjusted q-values, and confidence. We also improved the statistical rigorousness of SpecVar by block Jackknife stratagem to estimate standard error, p-value, and q value of R scores (Figure 3).

3. The comparisons between SpecVar and other approaches (AAP, ARE, SAP, SEG) are not sufficiently rigorous.

According to the suggestions, we performed a more comprehensive comparison with other methods. (1). We conducted Welch's t-test and showed that the differences between SpecVar and other methods were statistically significant (Figure 2). (2). We fitted a joint LDSC model with multiple annotation sets of all methods and provided a more rigorous assessment of SpecVar’s performance to estimate heritability enrichment (Figure 2-figure supplement 2) and identify relevant tissues (Author response image 1) relative to the other approaches. (3). We further compared with relevant tissue identification methods that were not based on stratified LDSC. We chose a network-based method CoCoNet (Figure 3-figure supplement 1) and a non-network-based method RolyPoly (Figure 3-figure supplement 2). The results also showed SpecVar’s outperformance over other methods. (4). We discussed different levels of relevant tissues: we built tissue groups based SpecVar to show identification of relevant tissue groups or organs (Figure 3-figure supplement 5); our current SpecVar revealed relevant contexts were tissues or cell lines; single-cell-based methods would help identify relevant cell types. (5). In evaluating the performance to approximate phenotypic correlation, we used not only the linear metrics (PCC, MSE) but also the non-linear metric (mutual information) to show SpecVar’s relevance correlation was better to estimate phenotype correlation (Figure 5-figure supplement 1, 2).

Author response image 1
The top 5 relevant tissues of SpecVar, SAP, and SEG ranked by the relevant score estimated by “pooled genome partition” for (a) LDL, (b) total cholesterol, (c) educational attainment, (d) cognitive performance, (e) brain shape, and (f) facial morphology.

Reviewer #1 (Recommendations for the authors):

The approach taken in the study is conceptually interesting, however, it's not clear how novel the approach is. Additionally, there are several issues related to reporting results that need to be addressed. In particular, the authors do not report the statistical significance of enrichments using the new annotation aggregation method. Therefore, it is not possible to evaluate whether the higher overall heritability enrichment of the aggregate annotations they report compared to existing annotation approaches is meaningful.

We thank the reviewers for the interest in SpecVar. We have followed your suggestion to highlight the novelty and conducted an ablation analysis to understand the contribution from two components of SpecVar (multi-omics data integration into regulatory network and specificity). We added the necessary standard error, p-value, and q value to assess the statistical significance of heritability enrichments. We performed Welch's t-test and fitted stratified LDSC with pooled annotation of all methods to show the improved heritability enrichment was statistically meaningful.

1. Describe the novelty of the approach relative to other publications. How does this combination of an existing network method and an existing heritability enrichment method warrant a substantial advance in the field?

We are sorry that we did not make the novelty of our approach clear in the manuscript. As summarized by the reviewer, our method aims to combine the regulatory network with existing heritability enrichment methods to improve heritability enrichment and help identify relevant tissues. The novelty of our approach over existing stratified LDSC and LDSC-SEG lies in three aspects:

1) Our method integrates paired gene expression data and chromatin, which are two types of most easily accessible data now, into the regulatory network. Regulatory networks are involved with sets of macromolecules, mostly proteins and RNAs, that interact to control the level of expression of various genes in the genome. The gene expression and chromatin accessibility in the regulatory network exert constraints on each other to find the truly active genes and regulatory elements for certain tissues or cell types, which give more accurate biological molecules for interpreting genetic variants. In our previous studies, the regulatory networks have been successfully used to identify the master regulators in stem cell differentiation (Li et al., 2019) and to interpret conserved regions for the non-model organisms (Xin et al., 2020). Regulatory networks have also been used to interpret the non-coding genetic variants in certain tissues or cell types (Feng et al., 2021; Ma et al., 2022; Zhu et al., 2021).

2) Our method highlights the comparison to other contexts by specificity. There are some previous findings that the phenotype-associated SNPs often function in a tissue- or cell-type-specific manner (Westra and Franke, 2014). Considering the specificity of the regulatory network helps us to narrow down the genes and REs to be context-specific genes and REs, which will help us explain how the phenotype-associated SNPs influence the tissue or cell types in a specific way.

3) Our method is more interpretable than existing methods. Through context-specific regulatory networks, SpecVar can improve heritability enrichment, achieve accurate relevance measurement to tissues, and reveal correlated phenotypes by relevance correlation. Besides, SpecVar gives more biological insights in explaining the relevance to tissues by SNP-associated regulatory networks and interpreting phenotype correlation through common relevant tissues and shared SNP-associated regulatory networks.

Except for the conceptional description, we conducted additional ablation analysis to study the contribution of two important components of SpecVar to the improvement of heritability enrichment: regulatory network by integrating gene expression and chromatin accessibility data, and specificity by comparing with other contexts. First, we used (Figure 2—figure supplement 1C) to illustrate the relationship and differences of the five methods: SEG was the combination of gene expression and specificity; AAP was only from chromatin accessibility; SAP was the combination of chromatin accessibility and specificity; ARE integrated gene expression and chromatin accessibility; SpecVar considered gene expression, chromatin accessibility, and specificity. Then we compared SpecVar to ARE to show the effect of specificity, compared SpecVar to SAP to show the effect of gene expression annotations, and compared SpecVar to SEG to show the effect of chromatin accessibility (Figure 2—figure supplement 1d). We conducted these comparisons on six phenotypes and computed fold change of different methods’ heritability enrichment to measure the effect. We found that the annotation from chromatin accessibility give the largest effect to improve heritability (Figure 2—figure supplement 1e). This meant that the regulatory networks of SpecVar, which integrated chromatin accessibility with gene expression data, played a more important role in heritability enrichment. And specificity can also improve heritability enrichment.

We thank the reviewer’s suggestion to highlight the novelty of our method and we have added the description of our novelty and the additional ablation analysis to our revised manuscript.

Excerpt from Manuscript: (Page 3) …… Compared to the existing methods, SpecVar shows novelty in three aspects: (1) SpecVar integrates paired gene expression data and chromatin, which are two types of easily accessible data with rich information, into regulatory networks. The gene expression and chromatin accessibility in the regulatory network are complementary to each other to reveal the high quality active regulatory elements and genes for certain tissues or cell types to interpret genetic variants; (2) SpecVar highlights the comparison to other contexts by specificity to narrow down the regulatory molecules; (3) SpecVar is more interpretable because it can explain the relevance to tissues by SNP-associated regulatory networks and interpret phenotype correlation through common relevant tissues and shared SNP-associated regulatory network ……

(Page 6) …… To explore the heritability enrichment improvement of SpecVar, we conducted ablation analysis to study the contribution of two important parts of SpecVar: (i) regulatory network by integrating gene expression and chromatin accessibility data, and (ii) specificity by comparing with other contexts. Figure 2—figure supplement 1c shows the relationship and difference of the five methods: SEG is the combination of gene expression and specificity; AAP is only from chromatin accessibility; SAP is the combination of chromatin accessibility and specificity; ARE integrates gene expression and chromatin accessibility; and SpecVar considers integration of gene expression and chromatin accessibility, and specificity. To show the effect of the regulatory network in heritability enrichment, we compared SpecVar with SAP and showed the effect of gene expression data. We compared SpecVar with SEG and showed the effect of chromatin accessibility data. We compared SpecVar with ARE and showed the contribution of specificity (Figure 2—figure supplement 1d). To quantify the effect of each component, we computed the fold change of different methods’ heritability enrichments. We found chromatin accessibility, which was part of the regulatory network, showed the highest effect in improving heritability enrichments for all six phenotypes. This is consistent with the fact that most genetic variants are located in the non-coding regulatory regions (Claussnitzer et al., 2015; Kumar et al., 2012; Smemo et al., 2014) and accessibility gives the direct functional evidence for genetic variants. The specificity in SpecVar also contributed at least 4-fold improvement for heritability enrichment (Figure 2—figure supplement 1e) ……

2. Reporting of SpecVar-based heritability enrichment point estimates should be accompanied by appropriate measures of statistical significance (p-values, FDR-adjusted q-values, confidence intervals, etc.).

We have followed your suggestions and added confidence intervals (standard error) of heritability enrichment as error bars into figures of our manuscript (Figure 2). In addition, we provided detailed standard error, p-value, and q-value in Supplementary file 1b. The standard error, p-value, and q-value were estimated by block jackknife.

3. The comparisons between SpecVar and other approaches (AAP, ARE, SAP, SEG) are not sufficiently rigorous.

a. First, please provide additional context. What is the amount of the genome covered by the regions defined by each approach? What is the overlap between these regions? For example, Table S4 shows SpecVar sets typically include substantially fewer regions than SAP. Are SpecVar regions primarily a subset of SAP regions or do they contain distinct/non-overlapping territory in the genome?

We have followed your suggestions and provided additional information about the genomic region sets of five methods in Supplementary file 1f. For each context, we presented the amount of genome covered by each method. Since genomic regions of SpecVar were a subset of ARE and genomic regions of SAP were a subset of AAP, we only provided the overlapping among SpecVar, SAP, and SEG for each context. These three methods cover distinct territories in the genome and have overlapping.

Excerpt from Manuscript: (Page 17) …… Supplementary file 1f showed the number and size of genomic regions of each method and the overlapping among these methods ……

b. Second, please provide a way to evaluate whether differences in enrichment between the methods are statistically meaningful.

We thank the reviewers for the advice to statistically evaluate the difference of heritability enrichment to show the advantage of SpecVar. The standard error of heritability enrichment was estimated by block jackknife, which allowed us to conduct a t-test for the significance of the heritability enrichment difference. For example, SpecVar gave LDL’s heritability enrichment in the “right lobe of liver” to be 678.91 with a standard error of 392.43 and ARE gave a heritability enrichment of 113.34 with a standard error of 30.45. We conducted Welch's t-test and revealed the p-value of their difference to be 7.0e-4. We computed all the p-values of difference between SpecVar and the other four methods and showed them in the figure of our manuscript.

Excerpt from Manuscript: (Page 5) …… First, we showed that SpecVar could achieve higher heritability enrichment in the relevant tissues than other methods (Supplementary file 1b). Taking LDL for example, SpecVar obtained a heritability enrichment of 678.91 in the “right lobe of liver”, while ARE, SAP, AAP, and SEG gave heritability enrichment of 113.34, -42.09, 50.95, 4.47, respectively. We conducted Welch’s t-test to assess the significance of the difference between SpecVar and other methods and found heritability enrichment of SpecVar was significantly higher than ARE (p=6.9×104), SAP (p=1.4×104), AAP (p=3.4×104), and SEG (p=2.1×104) (Figure 2a). For total cholesterol, SpecVar also gave significantly higher heritability enrichment in “right lobe of liver” than ARE (p=5.7×104), SAP (p=4.4×105), AAP (p=1.6×104), and SEG (p=7.7×105) (Figure 2b). For educational attainment and cognitive performance, they were relevant to brain tissues: “frontal cortex”, “cerebellum”, “caudate nucleus”, “Ammon’s horn” and “putamen”. SpecVar obtained the highest averaged heritability enrichment in brain tissues among these methods (Figure 2—figure supplement 1a, b). In the “frontal cortex”, SpecVar had significantly higher heritability enrichment than ARE (p=1.2×105), SAP (p=2.0×106), AAP (p=3.0×106), and SEG (p=3.0×106) for educational attainment (Figure 2c). And for cognitive performance in “frontal cortex”, SpecVar also had significantly higher heritability enrichment than ARE (p=9.0×106), SAP (p=2.0×106), AAP (p=1.0×106), and SEG (p=1.0×106) (Figure 2d). For brain shape, SpecVar obtained a significantly higher heritability enrichment in its relevant context “CNCC” than the other four methods (ARE p=5.9×104, SAP p=6.7×104, AAP p=7.5×104, SEG p=8.1×105, Figure 2e). For facial morphology, SpecVar also gave a much higher heritability enrichment in “CNCC” than the other four methods (ARE p=9.0×106, SAP p=1.0×106, AAP p=7.0×106, SEG p=1.0×106, Figure 2e) ……

c. Fitting a joint LDSC model with multiple annotation sets would provide a more rigorous assessment of the performance of SpecVar relative to the other approaches.

We have followed the reviewer’s suggestions and fitted a joint LDSC model with multiple annotation sets. This provided a more rigorous assessment of the performance of SpecVar relative to the other approaches. In detail, we had 77 genomic region sets for SpecVar, ARE, SAP, AAP, and SEG. We pooled them together to form a genome partition of 385 genomic region sets. Then we fitted the stratified LDSC model with this pooled genome partition and obtained heritability enrichments and R scores.

First, we compared the heritability enrichment of five methods in relevant tissues. From Figure 2-figure supplement 2, we could see that for all six phenotypes, SpecVar achieved higher heritability enrichment than other methods. We also used Welch's t-test to evaluate the significance of the heritability difference between SpecVar and other methods and found SpecVar’s heritability enrichments were significantly higher than that of other methods (Figure 2-figure supplement 2).

Second, we used the pooled genome partition to compare three specificity-based methods to show if they could identify relevant tissue for six phenotypes. As in our manuscript, we used heritability enrichment and their p-values to define the R score. We found SpecVar can still identify relevant tissues for these six phenotypes (Author response image 1). For example, SpecVar ranked the “right lobe of liver” to be the most relevant tissue for both LDL and total cholesterol. SAP identified the “Hela-S3” cell line for LDL and total cholesterol. SEG identified “HepG2” as the most relevant tissue for LDL and total cholesterol. And for educational attainment and cognitive performance, SpecVar obtained brain tissues to be top-ranked tissues, such as “the frontal cortex”, “Ammon’s horn”, and “putamen” in the top 5 tissues for educational attainment and “Ammon’s horn”, “putamen” in the top 5 tissues for cognitive performance. But for SAP, it only identified “Ammon’s horn” in the top 5 tissues for educational attainment and identified no brain tissues for cognitive performance. And for SEG, no tissues in the top 5 tissues for educational attainment and cognitive performance were from the brain. Finally, for brain shape and facial morphology, SpecVar could still rank “CNCC” to be the most relevant tissue but SAP and SEG couldn’t identify “CNCC”.

In summary, these results showed SpecVar could improve heritability enrichment and identify relevant tissues for GWAS summary statistics. We have discussed these results in our manuscript.

Excerpt from Manuscript: (Page 12) …… The improved heritability enrichment was also observed when we pooled regulatory categories of all the methods together to fit stratified LDSC and re-estimated the heritability enrichment of each context for each method (Figure 2—figure supplement 2) ……

4. Y-axes for bar plots in Figure 5 should start at 0. The floating y-axis origin is highly misleading.

We are sorry for the misleading y-axis origin, and we have changed them to be from 0 in Figure 5.

5. The approach to defining context-specific REs based on overlap (>50% of bases overlapping for cross-group comparisons, and >60% of bases overlapping for within-group comparisons) seems quite stringent and likely excludes a lot of lineage-specific REs. Calculating enrichments for "group-specific" regulatory networks, where REs are considered group-specific if they do not overlap any of the other 35 groups resulting from the hierarchical clustering (ignoring whether an RE overlaps REs from other contexts within the same group), might provide more biologically relevant region sets.

We agree that our approach to defining context-specific REs is stringent and we follow the reviewer’s suggestion to define specificity at the “group” level. In detail, we have classified the 77 regulatory networks into 36 groups in our manuscript. First, we pooled the REs in the same groups together to form 36 group-level RE sets. Then we defined a RE to be a group-specific RE if it was not overlapped with REs in other groups. This procedure would form 36 sets of group-specific REs. Finally, we fitted the stratified LDSC model with these 36 sets of group-specific genomic regions and obtained heritability enrichment, p-value, and R score for each group.

Figure 3-figure supplement 5 shows the top 5 groups of tissues ranked by their R scores. We can see that SpecVar can also reveal relevant tissues. For LDL and total cholesterol, SpecVar identified Liver to be the most relevant tissue. For educational attainment and cognitive performance, SpecVar revealed brain tissues to be the most relevant tissues. And for brain shape and facial morphology, our method found CNCC to be the most relevant tissue (Figure 3—figure supplement 5a-f).

We also found group based SpecVar gives a larger difference between relevant tissue and irrelevant tissues. We used the fold change between the first ranked relevant group/tissue and the second ranked relevant group/tissue to evaluate the difference between relevant tissue and irrelevant tissues. For LDL, group-based SpecVar gave a fold change of 9.09 and the original SpecVar gave 4.78. For total cholesterol, group-based SpecVar gave a fold change of 8.93 and the original SpecVar gave 3.30. For educational attainment, group-based SpecVar gave a fold change of 14.46 and the original SpecVar gave 1.82. For cognitive performance, group-based SpecVar gave a fold change of 3.72 and the original SpecVar gave 2.01. For facial morphology, group-based SpecVar gave a fold change of 2.60 and the original SpecVar gave 3.56. For brain shape, group-based SpecVar gave a fold change of 1.57 and the original SpecVar gave 1.04 (Figure 3—figure supplement 5g). This showed that group-based SpecVar could give more biologically relevant region sets in most situations.

In summary, we used 36 groups of 77 contexts to build group-based SpecVar and the results also showed that SpecVar can identify the accurate relevant group of tissues. Group-based SpecVar gave larger differences between relevant groups and irrelevant groups with the price of resolution. For example, group-based SpecVar could only tell us educational attainment and cognitive performance were relevant to brain tissues and couldn’t prioritize more specific relevant parts of the brain. And the original SpecVar could reveal these two phenotypes were more relevant to the “frontal cortex”. We thank the reviewer for this suggestion again and we have discussed these results in our revised manuscript.

Excerpt from Manuscript: (Page 13) …… Second, we could identify relevant organs or groups of tissues. For example, our 77 human contexts can be hieratically clustered into 36 groups. We pooled the REs in the same groups together to form 36 group-level RE sets. Then we defined a RE to be a group-specific RE if it was not overlapped with REs in other groups. This procedure would form 36 sets of group-specific REs. Finally, we fitted the stratified LDSC model with these 36 sets of group-specific regulatory categories and obtained heritability enrichment, P-value, and R score for each group by SpecVar. We tested the group-based SpecVar on our six phenotypes. For LDL and total cholesterol, SpecVar identified “Liver” tissues to be the most relevant tissue (Figure 3—figure supplement 5a, b). For educational attainment and cognitive performance, SpecVar revealed “brain” tissues to be the most relevant tissues (Figure 3—figure supplement 5c, d). And for brain shape and facial morphology, SpecVar found “CNCC” to be the most relevant tissue (Figure 3—figure supplement 5e, f) ……

Reviewer #2 (Recommendations for the authors):

Specific comments

– The authors only show the performance of the SpecVar method on 6 traits, however, it is not clear whether these are representative of all of the traits in UKBiobank. For traits with fewer SNP associations and lower sample numbers it appears that LDSC-SAP produces much high trait relevance scores, however, the authors use different tissues in each method so it may not be a fair comparison.

We are sorry that we did not clarify the reason for choosing these 6 traits clear. Generally, it is a difficult task to select representative phenotypes with confirmed knowledge of relevant tissues as gold-standard for validation. We use the following three conditions to select gold standard traits:

1. These traits should be representative and from different aspects. Here LDL and total cholesterol are lipid molecular phenotypes. Educational attainment and cognitive performance are human intelligential traits by intelligence test. Brain shape and facial morphology are human craniofacial traits from pictures of human faces and MRI of human brains.

2. These traits should be well studied to know the relevant tissues. We have prior knowledge about the relevant tissue of these six phenotypes to some extent. Lipid phenotypes are associated with the liver for its key role in lipid metabolism (Nguyen et al., 2008); human intelligential phenotypes are associated with brain tissues (Goriounova and Mansvelder, 2019); facial morphology and brain shape had shared heritability in cranial neural crest cells (CNCC) (Naqvi et al., 2021).

3. These traits should have GWAS with enough power, which means they should have enough sample size and significant SNP associations. The sample size and number of significant SNPs are large for these six phenotypes’ GWAS.

We are also sorry we did not make the tissues used for each phenotype clear. For every phenotype, SpecVar computes its relevance scores, p-values, and q-values to all 77 contexts. And we use relevance scores and q-values to select relevant tissues. In Figure 3 of our manuscript, we only show the top 5 tissues among 77 tissues ranked by relevance scores of each method. The full list of 77 contexts with their relevance score were provided in the Supplementary file 1c.

– The authors use the Pearson correlation coefficient to evaluate the performance of the SpecVar method, however, they should also consider other nonlinear metrics.

We thank the reviewer’s suggestion, and we used mutual information (MI) as an additional nonlinear metric to evaluate the accuracy of estimating phenotypic correlation. As shown in Figure 5-figure supplement 1 and 2, SpecVar achieved larger MI than SAP and SEG for all phenotype pairs and highly correlated pairs in the studies of facial morphology and UKBB. We have added this new metric to our revised manuscript.

Excerpt from Manuscript: (Page 10) …… We also used the mean square error (Figure 5—figure supplement 1a, b) and mutual information (Figure 5—figure supplement 1c, d) as metrics to evaluate the performance (Methods) and SpecVar was the best among these three methods …… SpecVar’s outperformance of estimating phenotype correlation was reproduced by using mean square error (Figure 5—figure supplement 2a, b) and mutual information (Figure 5—figure supplement 2c, d) as metrics ……

– The authors should consider a comparison with other regulatory network-based heritability methods such as CoCoNet, which is based on co-regulated genes. They should also compare to other non-network-based methods.

We thank the reviewer’s suggestion to make additional comparisons with the network-based method and non-network-based method. For the network-based method, we chose CoCoNet, which was based on gene co-expression networks. And for the non-network-based method, we chose RolyPoly, which was based on regression on gene expression profiles.

First, we applied CoCoNet to our six phenotypes with 38 built-in GTEx co-expression networks. Figure 3-figure supplement 1 shows the top 5 tissues ranked by loglikelihood of CoCoNet. We could see CoCoNet identified “Breast” as the most relevant tissue for LDL and “Brain_other” as the most relevant tissue for total cholesterol. “Breast” was the most relevant tissue for educational attainment and “Stomach” was the most relevant tissue for cognitive performance. Since there was no CNCC sample in GTEx, CoCoNet revealed “Prostate” as the most relevant tissue for brain shape and facial morphology. These results seemed less reasonable than SpecVar because CoCoNet didn’t identify liver tissue for LDL and total cholesterol and brain tissues for educational attainment and cognitive performance.

Then we fitted the RolyPoly model with gene expression profiles of our 77 human contexts and applied it to GWAS used in our paper. Since we didn’t have β, standard error in the summary statistics of brain shape, we only applied RolyPoly to LDL, total cholesterol, educational attainment, cognitive performance, and facial morphology. Figure 3-figure supplement 2 shows the top 5 tissues ranked by -log(P-value) of RolyPoly. RolyPoly prioritized the “HepG2” cell line as the most relevant tissue for LDL and total cholesterol. HepG2 cells are nontumorigenic cells with high proliferation rates and epithelial-like morphology that perform many differentiated hepatic functions. For educational attainment, RolyPoly didn’t identify the 5 brain tissues as the top-ranked tissue and only included “fetal spinal cord” in the top 5 relevant tissues. For cognitive performance, there were no brain associated tissues in the top 5 tissues. And for facial morphology, RolyPoly also failed to identify “CNCC” as relevant tissue.

This comparison showed the strength of integrating both gene expression and chromatin accessibility into the regulatory network, which was the main contribution of SpecVar. We thank the reviewer for the suggestion of additional comparisons again and we have added this comparison to our revised manuscript.

– The highlighted example SNP-associated network for the FOXC2 variants is interesting, however, the authors should demonstrate whether there are chromatin interactions (HiC or HiChIP) in brain tissues linking these variants in ATAC-seq peaks to the FOXC2 promoter. It would also be helpful to highlight another example using distinct UKBB phenotype and tissue datasets.

We thank the reviewer’s interest in our example of FOXC2 to show that SNPs are located in the RE to regulate gene expression. We agree that we should check if there are chromatin interactions in brain tissues linking these variants in ATAC-seq peaks to the FOXC2 promoter as additional evidence of this SNP-associated regulation. To do this, we downloaded the HiChIP loops of brain tissue from the recently published HiChIP database (HiChIPdb) (Zeng et al., 2022). We searched the loops of FOXC2 and found there were 16 HiChIP loops linking REs and promoter of FOXC2. And we found this RE was linked by one of these loops to the FOXC2 promoter (Figure 4b), which showed that the SNP-associated regulation of FOXC2 was supported by independent HiChIP data.

It is a good idea to show another example using distinct UKBB phenotypes and tissues. Here we used SH2B1 as an example (Figure 6c). In our manuscript, “body mass index” and “leg fat-free mass (right)” were revealed to be correlated by SpecVar’s relevance correlation. SpecVar further revealed that these two phenotypes were correlated because they were both relevant to the “frontal cortex”. SH2B1 was a target gene in the shared SNP associated regulatory network in the “frontal cortex” of “body mass index” and “leg fat-free mass (right)”. SH2B1 enhances leptin signaling and leptin’s anti-obesity action, which is associated with the regulation of energy balance, body weight, and glucose metabolism (Rui, 2014). We found one common significant SNP located in the specific REs of the “frontal cortex” at the 90k bp downstream of SH2B1. Even though this RE was near the promoter of NFATC2IP, our regulatory network revealed that it regulated the expression of SH2B1, which indicated the common SNP might regulate the expression of SH2B1 to influence phenotypes of “body mass index” and “leg fat-free mass (right)”. We also checked the HiChIP loops of brain tissues in HiChIPdb and found this SNP-associated regulation was supported by a HiChIP loop in brain tissues.

We thank the reviewer’s advice again to add more evidence to our SNP-associated regulation and another example of shared SNP-associated regulation for two UKBB phenotypes. We have added the additional evidence and examples to our revised manuscript.

Excerpt from Manuscript: (Page 9) …… We checked the loops in the database of HiChIP (Zeng et al., 2022) and found this SNP-associated regulation of FOXC2 was supported by a HiChIP loop in brain tissues to link this SNP locus and FOXC2 promoter ……

(Page 11) …… For example, in the brain, SH2B1 enhances leptin signaling and leptin’s anti-obesity action, which is associated with the regulation of energy balance, body weight, and glucose metabolism (Rui, 2014). We found one common significant SNP of these two phenotypes was located in the specific REs of the “frontal cortex” at the 90k downstream of SH2B1. Even though this RE was near the promoter of NFATC2IP, SpecVar revealed that it regulated the expression of SH2B1, which was supported by a HiChIP loop in brain tissues to associate the SNP-associated RE and promoter of SH2B1 (Figure 6c). These results indicated that one shared SNP was located in the “frontal cortex” specific RE and might regulate the expression of SH2B1 to influence two phenotypes: “body mass index” and “leg fat-free mass (right)” ……

– It will be more interesting to adapt this to paired multimodal single-cell data if possible or expand on this in the Discussion.

We thank the reviewer’s suggestion, and we agree that adapting our idea to single cell data will give us a higher resolution of relevant contexts at the cell type level, which is promising in the future. We have discussed this idea in our revision.

Excerpt from Manuscript: (Page 13) …… Third, we can identify relevant contexts at the cell type level since paired multimodal single-cell data, such as single cell RNA-seq and single cell ATAC-seq data, are increasing in recent years (Han et al., 2020). The paired multi-omics data, which means single cell data profiled from the same context, enable us to identify cell types and infer regulatory networks for these cell types (Duren et al., 2018; Zeng et al., 2019). It will be promising to construct an atlas of context-specific regulatory networks at cell type level and build SpecVar model based on these cell-type-specific regulatory networks. This extension of SpecVar to single cell level holds the promise to identify more detailed relevant cell types for given phenotypes ……

– Lastly, LDSC-based methods may not perform well on admixed populations, thus some discussion on how this could be adapted using covariate-adjusted approaches (e.g. cov-LDSC) would be helpful.

We agree that it is important to consider the admixed population and our model should be better if we consider the mixed population. We discussed this potential combination of our idea and cov-LDSC in the “Discussion” section of our manuscript.

Excerpt from Manuscript: (Page 14) …… On the other hand, SpecVar is based on stratified LDSC, which may not perform well on admixed populations. Thus, considering the effect of mixed ancestries will help SpecVar handle more circumstances. For example, Luo. et al. has developed cov-LDSC to adjust the LDSC model to be suitable for admixed populations (Luo et al., 2021). Building SpecVar model based on cov-LDSC will be promising to perform well on GWAS with mixed populations ……

Reviewer #3 (Recommendations for the authors):

The study would benefit from investigating the following questions:

Which part of the new annotation drives the extreme heritability enrichment from SpecVar, compared to other LDSC-based methods?

We agree that SpecVar gives much higher heritability enrichment than the other methods in the relevant tissues. The heritability enrichment of LDL in the “right lobe of liver” was 678.91. This was an extremely high heritability enrichment because the mean and median of heritability enrichment for six phenotypes in 77 tissues were only 15.12 and 6.55, respectively.

First, we followed the reviewer’s suggestion and provided a standard error of each heritability enrichment in our revision (Figure 2). For example, the standard error of LDL’s heritability enrichment in “right lobe of liver” was 392.43, which was larger than the standard error of ARE (30.45), SAP (58.56), AAP (9.96), and SEG (0.83). For the other five traits, SpecVar also gave higher standard error than the other methods, which might be caused by the smaller proportion of the genome covered by SpecVar than the other methods. We could use the standard error to conduct a t-test to assess the heritability enrichment difference between SpecVar and other methods. We found SpecVar had significantly higher heritability enrichment than other methods. Taking LDL in the “right lobe of liver” as an example, SpecVar gave significantly higher heritability than ARE (P=6.9e-4), SAP (P=1.4e-4), AAP (3.4e-4), and SEG (2.1e-4). We also provided the p-values of the heritability enrichment comparison of our manuscript (Figure 2).

Second, this is true that the main difference between SpecVar and other methods is the annotated genomic regions. We totally agree that it is important to understand the difference between the SpecVar annotated SNPs and those from other methods and to understand where the extra heritability enrichment comes from. For LDL in the “right lobe of liver”, if we only used annotation of gene expression (SEG), the heritability enrichment was 4.47, and if we only used annotation of chromatin accessibility (AAP), the heritability enrichment was 50.95. We first built new annotation by integrating gene expression and chromatin accessibility into the regulatory network (ARE), which gave a heritability of 113.34. So, the SpecVar-specific annotation gave a 25-fold improvement in heritability enrichment. And SpecVar further considered the specificity of ARE to narrow down the annotated genomic regions, which gave another 6-fold improvement of heritability on ARE. From this example, we can hypothesize that the regulatory network (SpecVar-specific annotation) may play the main role in improving the heritability enrichment, and specificity (intersection narrowed down) plays a secondary role. To validate this hypothesis, we conducted the ablation analysis. First, we listed main components of SpecVar (Figure 2—figure supplement 1C) to illustrate the relationship and differences of the five methods: SEG was the combination of gene expression and specificity; AAP was only from chromatin accessibility; SAP was the combination of chromatin accessibility and specificity; ARE integrated gene expression and chromatin accessibility; SpecVar considered gene expression, chromatin accessibility, and specificity. Then we compared SpecVar to ARE to show the effect of specificity, compared SpecVar to SAP to show the effect of gene expression annotations, and compared SpecVar to SEG to show the effect of chromatin accessibility (Figure 2—figure supplement 1D). We conducted these comparisons on six phenotypes and computed fold change of heritability enrichment to measure the effect. We found that the annotation from chromatin accessibility gave the largest effect to improve heritability enrichment (Figure 2—figure supplement 1E). This meant that the regulatory networks of SpecVar, which integrated chromatin accessibility with gene expression data, played a more important role in heritability enrichment. And specificity could also improve heritability enrichment.

We thank the reviewer’s suggestion to help us analyze the contribution of each part of our method. We have added this analysis to our manuscript.

Excerpt from Manuscript: (Page 5) …… First, we showed that SpecVar could achieve higher heritability enrichment in the relevant tissues than other methods (Supplementary file 1b). Taking LDL for example, SpecVar obtained a heritability enrichment of 678.91 in the “right lobe of liver”, while ARE, SAP, AAP, and SEG gave heritability enrichment of 113.34, -42.09, 50.95, 4.47, respectively. We conducted Welch’s t-test to assess the significance of the difference between SpecVar and other methods and found heritability enrichment of SpecVar was significantly higher than ARE (p=6.9×104), SAP (p=1.4×104), AAP (p=3.4×104), and SEG (p=2.1×104) (Figure 2a). For total cholesterol, SpecVar also gave significantly higher heritability enrichment in “right lobe of liver” than ARE (p=5.7×104), SAP (p=4.4×105), AAP (p=1.6×104), and SEG (p=7.7×105) (Figure 2b). For educational attainment and cognitive performance, they were relevant to brain tissues: “frontal cortex”, “cerebellum”, “caudate nucleus”, “Ammon’s horn” and “putamen”. SpecVar obtained the highest averaged heritability enrichment in brain tissues among these methods (Figure 2—figure supplement 1a, b). In the “frontal cortex”, SpecVar had significantly higher heritability enrichment than ARE (p=1.2×105), SAP (p=2.0×106), AAP (p=3.0×106), and SEG (p=3.0×106) for educational attainment (Figure 2c). And for cognitive performance in “frontal cortex”, SpecVar also had significantly higher heritability enrichment than ARE (p=9.0×106), SAP (p=2.0×106), AAP (p=1.0×106), and SEG (p=1.0×106) (Figure 2d). For brain shape, SpecVar obtained a significantly higher heritability enrichment in its relevant context “CNCC” than the other four methods (ARE p=5.9×104, SAP p=6.7×104, AAP p=7.5×104, SEG p=8.1×105, Figure 2e). For facial morphology, SpecVar also gave a much higher heritability enrichment in “CNCC” than the other four methods (ARE p=9.0×106, SAP p=1.0×106, AAP p=7.0×106, SEG p=1.0×106, Figure 2e) ……

(Page 6) …… To explore the heritability enrichment improvement of SpecVar, we conducted ablation analysis to study the contribution of two important parts of SpecVar: (i) regulatory network by integrating gene expression and chromatin accessibility data, and (ii) specificity by comparing with other contexts. Figure 2—figure supplement 1c shows the relationship and difference of the five methods: SEG is the combination of gene expression and specificity; AAP is only from chromatin accessibility; SAP is the combination of chromatin accessibility and specificity; ARE integrates gene expression and chromatin accessibility; and SpecVar considers integration of gene expression and chromatin accessibility, and specificity. To show the effect of the regulatory network in heritability enrichment, we compared SpecVar with SAP and showed the effect of gene expression data. We compared SpecVar with SEG and showed the effect of chromatin accessibility data. We compared SpecVar with ARE and showed the contribution of specificity (Figure 2—figure supplement 1d). To quantify the effect of each component, we computed the fold change of different methods’ heritability enrichments. We found chromatin accessibility, which was part of the regulatory network, showed the highest effect in improving heritability enrichments for all six phenotypes. This is consistent with the fact that most genetic variants are located in the non-coding regulatory regions (Claussnitzer et al., 2015; Kumar et al., 2012; Smemo et al., 2014) and accessibility gives the direct functional evidence for genetic variants. The specificity in SpecVar also contributed at least 4-fold improvement for heritability enrichment (Figure 2—figure supplement 1e) ……

How to use the R score for a formal hypothesis test, rather than subjectively picking a few top-ranked tissues for the trait? How to evaluate the false positive rate for the test?

We thank the reviewer to raise the question about the formal hypothesis test and false positive rate of R scores. Here we used the block jackknife stratagem to estimate standard errors of R scores and used these standard errors to calculate z scores, p-values, and false discovery rates. The block jackknife method was also used in the stratified LDSC paper (Finucane et al., 2015).

Given a GWAS summary statistics, we had the following steps to conduct block jackknife to estimate the standard error of its R score to a context:

1) For this context, we divided the context-specific RE into 100 folds.

2) One sub-sample was generated by removing one of the 100 folds and we generated 100 sub-samples of the context-specific REs.

3) The 100 sub-samples of specific REs would form a new genome partition for fitting stratified LDSC.

4) For each sub-sample, we obtained heritability enrichment, p-value, and R score by SpecVar.

5) With the 100 background R scores of the 100 sub-samples, we could estimate the standard error (SD) of R scores for this context.

5) We computed the z score of R score: Z=R/SDN(0,1), and estimated the p-value and FDR q-value.

The R scores and their FDR q-values could be used to select relevant tissues. We used R score 100 and FDR 0.01 as the threshold to pick up relevant tissues for a phenotype. In our paper, the relevant tissues for six phenotypes were listed within Table 1.

How to explain the inconsistent findings with previous studies, e.g., the association of the liver to LDL?

We agree that there is some difference between our results and the LDSC-SEG paper. In the LDSC-SEG paper, there were three liver tissues that were from the GTEx dataset. And in their Figure 2, the -log(P-value) of the three liver tissues’ relevance to LDL was from 3.3-4.4. In our implementation, the -log(P-value) of the “right lobe of liver” to LDL was 4.1. So, there was minor difference between our implementation and the LDSC-SEG paper.

The minor difference between our results and the LDSC-SEG paper may be caused by the expression profiles of different tissues. The expression data of the LDSC-SEG paper was mainly from GTEx and our paper included expression data of more tissues from ENCODE, such as HepG2, adrenal gland, thoracic aorta, etc. Specifically expressed genes were different when we used different tissues for comparison. For example, for the liver tissue, 30% of the specific genes in our paper were different from the specific genes in the LDSC-SEG paper. This will make different heritability enrichment and p-value of liver tissue. And the extra tissues included in our paper also made it possible to select other reasonable samples to be relevant to LDL, such as “HepG2”, which are nontumorigenic cells with high proliferation rates and epithelial-like morphology that perform many differentiated hepatic functions. This made the “right lobe of liver” fail to be ranked to be top 5. Noting that there were expression data of 27 samples in our 77 contexts which were also from GTEx, we could fit LDSC-SEG with these 27 samples. And we found if we only used these 27 GTEx samples, the “right lobe of liver” came to be most relevant tissue (P=5.1e-6, Q=1.2e-4).

Are the GWAS signal in context-specific RE colocalised with context-specific eQTL signals?

We first thank the reviewer’s precise summary of our effort to highlight an example where SpecVar facilitates the interpretation of GWAS signals near FOXC2. We totally agree that more work can be done to consolidate the SNP-associated regulation of FOXC2.

eQTL is a good resource to validate the SNP association regulation. We collected eQTL of all brain tissues from GTEx and obtain 1,431,702 SNP-gene pairs. For FOXC2, we found two eQTLs (chr16:86511797-FOXC2 and chr16:86512080-FOXC2) and they were not in the locus we identified. This inconsistency might be caused by the context since our SNP-associated regulation was identified in CNCC and there are no eQTL data for CNCC. Then we tried another way to consolidate the SNP-associated regulation of FOXC2 by checking if there was a chromatin contact between the SNP locus and FOXC2 promoter. To do this, we downloaded the HiChIP loops of brain tissue from the recently published HiChIP database (HiChIPdb) (Zeng et al., 2022). We searched the loops of FOXC2 and found there were 16 HiChIP loops linking REs and promoter of FOXC2. And we found this RE was linked by one of these loops to the FOXC2 promoter (Figure 4b), which showed that the SNP-associated regulation of FOXC2 was supported by independent HiChIP data. On the other hand, the top GWAS signal in this locus was indeed on the left of the CNCC-specific RE, which meant that they were not active in CNCC. As we know, brain shape is a human complex trait, and it will be relevant to more than one tissues. Here the SNPs of brain shape were only studied in CNCC, and they might also be active in other tissues or at other developmental stages. Including more contexts and cell types may be promising to understand all the genetic variants of this complex trait.

We can used eQTL data to validate SNP-associated regulations of educational attainment, cognitive performance, LDL, and total cholesterol because there are eQTLs of brain and liver in GTEx. For educational attainment, SpecVar revealed that 7,611 SNP-TG pairs in its SNP associated regulatory network of “frontal cortex” and we found 3,693 SNPs (40.1%) of these SNP-TG pairs were also SNPs of eQTL in brain tissues. Among the 2,862 SNPs, 788 SNPs (10.4%) had the same TGs with the eGenes in the eQTL database (hypergeometry test p-value=7.0e-121). For cognitive performance, there were 7,494 SNP-TG pairs in its SNP associated regulatory network of “frontal cortex”. And 3,569 of them (47.6%) were also SNPs of eQTL in brain tissues. SpecVar further revealed that 988 SNPs (13.2%) had the same TGs with the eGenes in the eQTL database (p-value=3.2e-224). For LDL, there were 556 SNP-TGs pairs in its SNP associated regulatory network of “right lobe of liver”. We collected 323,428 eQTL of liver tissue from GTEx and found 45 of the SNP-TG pairs (8.1%) were also SNPs of eQTL. Two of the SNPs had the same TGs with the eGenes in the eQTL database (p-value=0.05). For total cholesterol, there were 461 SNP-TGs pairs in its SNP associated regulatory network of “right lobe of liver”. We found 16 of the SNP-TG pairs (3.5%) were also SNPs of eQTL. There were two of the SNPs that had the same TGs with the eGenes in the eQTL database (p-value=0.03).

We thank the reviewer again for the suggestion to add more evidence to support SpecVar-revealed SNP-associated regulations. We have added the HiChIP loop of FOXC2 and eQTL analysis to our revised manuscript.

Excerpt from Manuscript: (Page 9) …… We checked the loops in the database of HiChIP (Zeng et al., 2022) and found this SNP-associated regulation of FOXC2 was supported by a HiChIP loop in brain tissues to link this SNP locus and FOXC2 promoter……

(Page 9) …… The SNP-associated regulatory can facilitate the fine mapping of GWAS signals. Since there are eQTL data of the liver and brain tissues in the GTEx database, we collected the significant SNP-gene pairs of liver and brain tissues from GTEx to validate the identified SNP-associated regulations. For educational attainment, SpecVar revealed 7,611 SNP-TG pairs in its SNP associated regulatory network of “frontal cortex” and we found 3,693 SNPs (40.1%) of these SNP-TG pairs were also SNPs of eQTL in brain tissues. Among the 2,862 SNPs, 788 SNPs (10.4%) had the same TGs with the eGenes in the eQTL database (hypergeometry test, p=7.0×10121). For cognitive performance, there were 7,494 SNP-TG pairs in its SNP associated regulatory network of “frontal cortex”. And 3,569 of them (47.6%) were also SNPs of eQTL in brain tissues. SpecVar further revealed that 988 SNPs (13.2%) had the same TGs with the eGenes in the eQTL database (p=3.2×10224). Then we collected 323,428 eQTL of liver tissues from GTEx. For LDL, there were 556 SNP-TGs pairs in its SNP associated regulatory network of “right lobe of liver” and we found 45 of the SNP-TG pairs (8.1%) were also SNPs of eQTL. Two of the SNPs had the same TGs with the eGenes in the eQTL database (p=5.0×102). For total cholesterol, there were 461 SNP-TGs pairs in its SNP associated regulatory network of “right lobe of liver”. We found 16 of the SNP-TG pairs (3.5%) were also SNPs of eQTL. There were two of the SNPs that had the same TGs with the eGenes in the eQTL database (p=3.0×102) ……

How to link relevance score correlation across tissues to shared heritability between traits?

We thank the reviewer for the summary about SpecVar’s relevance correlation and we agree that it is interesting to study the observed phenotypic correlation by common genetic factors acting in the shared tissues/cell types/pathways/regulatory networks between traits. SpecVar could utilize the correlation of relevance scores to 77 contexts to estimate relevance correlation. Furthermore, SpecVar revealed two phenotypes were correlated because they were relevant to the same contexts. And in the common relevant context, SpecVar could identify shared SNP-associated regulatory network underlying their relevance correlation.

In the application to the UKBB dataset, SpecVar gave us the relevance correlation, the common relevant tissues, and the shared SNP-associated regulatory network among UKBB phenotypes (Supplementary file 1e). For example, SpecVar found “body mass index” and “leg fat-free mass (right)” were correlated with a relevance correlation of 0.602. SpecVar further revealed that these two phenotypes were correlated because they were both relevant to the “frontal cortex” (Figure 6a). Body mass index has been reported to be related to frontal cortex development (Laurent et al., 2020) and relevant to the reduced and thin frontal cortex (Islam et al., 2018; Shaw et al., 2018). Obesity and fat accumulation are also revealed to be associated with the frontal cortex (Gluck et al., 2017; Kakoschke et al., 2019). SpecVar further extracted these two phenotypes’ SNP-associated regulatory networks in the “frontal cortex” and found their SNP-associated networks were significantly overlapped. The significant overlap was observed at SNP, RE, TG, and TF levels: p=8.2×1063 for SNPs, p=1.4×1047 for REs, p=6.0×1025 for TGs, and p=8.2×1025 for TFs (Figure 6b). The shared regulatory network was involved with body weight and obesity. For example, in the brain, SH2B1 enhances leptin signaling and leptin’s anti-obesity action, which is associated with the regulation of energy balance, body weight, and glucose metabolism (Rui, 2014). We next studied the detailed SNP-associated regulation of SH2B1 (Figure 6c). We found that one common significant SNP was located in the specific REs of the “frontal cortex” at the 90k bp downstream of SH2B1. Even though this RE was near the promoter of NFATC2IP, our regulatory network revealed that it regulated the expression of SH2B1, which indicated the common SNP may regulate the expression of SH2B1 to influence phenotypes of “body mass index” and “leg fat-free mass (right)”. We also checked the loops of brain tissues in HiChIPdb and found this SNP-associated regulation was also supported by a HiChIP loop in brain tissues.

We have added the advantage of SpecVar to reveal the common genetic factors acting in the common relevant tissues and shared regulatory networks underlying the correlation between phenotypes to our revision.

Reference

Feng, Z. Y., Duren, Z. N., Xiong, Z. Y., Wang, S. J., Liu, F., Wong, W. H., and Wang, Y. (2021). hReg-CNCC reconstructs a regulatory network in human cranial neural crest cells and annotates variants in a developmental context. Communications Biology, 4(1). https://doi.org/ARTN 442

10.1038/s42003-021-01970-0

Finucane, H. K., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P. R., Anttila, V., Xu, H., Zang, C. Z., Farh, K., Ripke, S., Day, F. R., Purcell, S., Stahl, E., Lindstrom, S., Perry, J. R. B., Okada, Y., Raychaudhuri, S., Daly, M. J.,... Consortium, R. (2015). Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics, 47(11), 1228-+. https://doi.org/10.1038/ng.3404

Gluck, M. E., Viswanath, P., and Stinson, E. J. (2017). Obesity, Appetite, and the Prefrontal Cortex. Curr Obes Rep, 6(4), 380-388. https://doi.org/10.1007/s13679-017-0289-0

Goriounova, N. A., and Mansvelder, H. D. (2019). Genes, Cells and Brain Areas of Intelligence. Front Hum Neurosci, 13, 44. https://doi.org/10.3389/fnhum.2019.00044

Islam, A. H., Metcalfe, A. W. S., MacIntosh, B. J., Korczak, D. J., and Goldstein, B. I. (2018). Greater body mass index is associated with reduced frontal cortical volumes among adolescents with bipolar disorder. Journal of Psychiatry and Neuroscience, 43(2), 120-130. https://doi.org/10.1503/jpn.170041

Kakoschke, N., Lorenzetti, V., Caeyenberghs, K., and Verdejo-Garcia, A. (2019). Impulsivity and body fat accumulation are linked to cortical and subcortical brain volumes among adolescents and adults. Scientific Reports, 9. https://doi.org/ARTN 2580

10.1038/s41598-019-38846-7

Laurent, J. S., Watts, R., Adise, S., Allgaier, N., Chaarani, B., Garavan, H., Potter, A., and Mackey, S. (2020). Associations Among Body Mass Index, Cortical Thickness, and Executive Function in Children. JAMA Pediatrics, 174(2), 170-177. https://doi.org/10.1001/jamapediatrics.2019.4708

Li, L., Wang, Y., Torkelson, J. L., Shankar, G., Pattison, J. M., Zhen, H. H., Fang, F., Duren, Z., Xin, J., Gaddam, S., Melo, S. P., Piekos, S. N., Li, J., Liaw, E. J., Chen, L., Li, R., Wernig, M., Wong, W. H., Chang, H. Y., and Oro, A. E. (2019). TFAP2C- and p63-Dependent Networks Sequentially Rearrange Chromatin Landscapes to Drive Human Epidermal Lineage Commitment. Cell Stem Cell, 24(2), 271-284 e278. https://doi.org/10.1016/j.stem.2018.12.012

Ma, S., Chen, X., Zhu, X., Tsao, P. S., and Wong, W. H. (2022). Leveraging cell-type-specific regulatory networks to interpret genetic variants in abdominal aortic aneurysm. Proc Natl Acad Sci U S A, 119(1). https://doi.org/10.1073/pnas.2115601119

Naqvi, S., Sleyp, Y., Hoskens, H., Indencleef, K., Spence, J. P., Bruffaerts, R., Radwan, A., Eller, R. J., Richmond, S., Shriver, M. D., Shaffer, J. R., Weinberg, S. M., Walsh, S., Thompson, J., Pritchard, J. K., Sunaert, S., Peeters, H., Wysocka, J., and Claes, P. (2021). Shared heritability of human face and brain shape. Nature Genetics. https://doi.org/10.1038/s41588-021-00827-w

Nguyen, P., Leray, V., Diez, M., Serisier, S., Le Bloc'h, J., Siliart, B., and Dumon, H. (2008). Liver lipid metabolism. J Anim Physiol Anim Nutr (Berl), 92(3), 272-283. https://doi.org/10.1111/j.1439-0396.2007.00752.x

Rui, L. (2014). SH2B1 regulation of energy balance, body weight, and glucose metabolism. World J Diabetes, 5(4), 511-526. https://doi.org/10.4239/wjd.v5.i4.511

Shaw, M. E., Sachdev, P. S., Abhayaratna, W., Anstey, K. J., and Cherbuin, N. (2018). Body mass index is associated with cortical thinning with different patterns in mid- and late Life. International Journal of Obesity, 42(3), 455-461. https://doi.org/10.1038/ijo.2017.254

Westra, H. J., and Franke, L. (2014). From genome to function by studying eQTLs. Biochim Biophys Acta, 1842(10), 1896-1902. https://doi.org/10.1016/j.bbadis.2014.04.024

Xin, J., Hao, J., Chen, L., Zhang, T., Li, L., Chen, L., Zhao, W., Lu, X., Shi, P., and Wang, Y. (2020). ZokorDB: tissue specific regulatory network annotation for non-coding elements of plateau zokor. Quantitative Biology, 8(1), 43-50. https://doi.org/10.1007/s40484-020-0195-4

Zeng, W., Liu, Q., Yin, Q., Jiang, R., and Wong, W. H. (2022). HiChIPdb: a comprehensive database of HiChIP regulatory interactions. Nucleic Acids Res. https://doi.org/10.1093/nar/gkac859

Zhu, X., Duren, Z., and Wong, W. H. (2021). Modeling regulatory network topology improves genome-wide analyses of complex human traits. Nature Communications, 12(1), 2851. https://doi.org/10.1038/s41467-021-22588-0

https://doi.org/10.7554/eLife.82535.sa2

Article and author information

Author details

  1. Zhanying Feng

    1. CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
    2. School of Mathematics, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing, China
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5727-3929
  2. Zhana Duren

    Center for Human Genetics and Department of Genetics and Biochemistry, Clemson University, Greenwood, United States
    Contribution
    Resources, Formal analysis, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Jingxue Xin

    Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, United States
    Contribution
    Formal analysis, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Qiuyue Yuan

    Center for Human Genetics and Department of Genetics and Biochemistry, Clemson University, Greenwood, United States
    Contribution
    Formal analysis, Visualization, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Yaoxi He

    State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
    Contribution
    Formal analysis, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Bing Su

    1. State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
    2. Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
    Contribution
    Formal analysis, Writing – review and editing
    Competing interests
    No competing interests declared
  7. Wing Hung Wong

    Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, United States
    Contribution
    Formal analysis, Supervision, Methodology, Project administration, Writing – review and editing
    For correspondence
    whwong@stanford.edu
    Competing interests
    No competing interests declared
  8. Yong Wang

    1. CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
    2. School of Mathematics, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing, China
    3. Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
    4. Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
    Contribution
    Conceptualization, Software, Formal analysis, Supervision, Funding acquisition, Validation, Methodology, Project administration, Writing – review and editing
    For correspondence
    ywang@amss.ac.cn
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0695-5273

Funding

National Key Research and Development Program of China (2022YFA1004800 2020YFA0712402)

  • Yong Wang

Strategic Priority Research Program of the Chinese Academy of Science (XDPB17)

  • Yong Wang

CAS Young Scientists in Basic esearch (YSBR-077)

  • Yong Wang

National Natural Science Foundation of China (12025107)

  • Yong Wang

National Natural Science Foundation of China (11871463)

  • Yong Wang

National Natural Science Foundation of China (11688101)

  • Yong Wang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We acknowledge funding from the National Key Research and Development Program of China (2022YFA1004800 and 2020YFA0712402), Strategic Priority Research Program of the Chinese Academy of Sciences (XDPB17), CAS Project for Young Scientists in Basic Research, Grant No. R, and the National Natural Science Foundation of China (grants 12025107, 11871463, and 11688101).

Senior Editor

  1. David E James, University of Sydney, Australia

Reviewing Editor

  1. Charles Farber, University of Virginia, United States

Publication history

  1. Received: August 8, 2022
  2. Preprint posted: September 7, 2022 (view preprint)
  3. Accepted: December 13, 2022
  4. Accepted Manuscript published: December 16, 2022 (version 1)
  5. Version of Record published: January 3, 2023 (version 2)

Copyright

© 2022, Feng et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 273
    Page views
  • 40
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Zhanying Feng
  2. Zhana Duren
  3. Jingxue Xin
  4. Qiuyue Yuan
  5. Yaoxi He
  6. Bing Su
  7. Wing Hung Wong
  8. Yong Wang
(2022)
Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification
eLife 11:e82535.
https://doi.org/10.7554/eLife.82535

Further reading

    1. Computational and Systems Biology
    2. Neuroscience
    Zhe Chen, Garrett J Blair ... Hugh T Blair
    Tools and Resources

    Epifluorescence miniature microscopes ('miniscopes') are widely used for in vivo calcium imaging of neural population activity. Imaging data is typically collected during a behavioral task and stored for later offline analysis, but emerging techniques for online imaging can support novel closed-loop experiments in which neural population activity is decoded in real time to trigger neurostimulation or sensory feedback. To achieve short feedback latencies, online imaging systems must be optimally designed to maximize computational speed and efficiency while minimizing errors in population decoding. Here we introduce DeCalciOn, an open-source device for real-time imaging and population decoding of in vivo calcium signals that is hardware compatible with all miniscopes that use the UCLA Data Acquisition (DAQ) interface. DeCalciOn performs online motion stabilization, neural enhancement, calcium trace extraction, and decoding of up to 1024 traces per frame at latencies of <50 ms after fluorescence photons arrive at the miniscope image sensor. We show that DeCalciOn can accurately decode the position of rats (n=12) running on a linear track from calcium fluorescence in the hippocampal CA1 layer, and can categorically classify behaviors performed by rats (n=2) during an instrumental task from calcium fluorescence in orbitofrontal cortex (OFC). DeCalciOn achieves high decoding accuracy at short latencies using innovations such as field-programmable gate array (FPGA) hardware for real time image processing and contour-free methods to efficiently extract calcium traces from sensor images. In summary, our system offers an affordable plug-and-play solution for real-time calcium imaging experiments in behaving animals.

    1. Computational and Systems Biology
    2. Immunology and Inflammation
    Anastasia O Smirnova, Anna M Miroshnichenkova ... Alexander Komkov
    Tools and Resources

    High-throughput sequencing of adaptive immune receptor repertoires is a valuable tool for receiving insights in adaptive immunity studies. Several powerful TCR/BCR repertoire reconstruction and analysis methods have been developed in the past decade. However, detecting and correcting the discrepancy between real and experimentally observed lymphocyte clone frequencies is still challenging. Here we discovered a hallmark anomaly in the ratio between read count and clone count-based frequencies of non-functional clonotypes in multiplex PCR-based immune repertoires. Calculating this anomaly, we formulated a quantitative measure of V- and J-genes frequency bias driven by multiplex PCR during library preparation called Over Amplification Rate (OAR). Based on the OAR concept, we developed an original software for multiplex PCR-specific bias evaluation and correction named iROAR: Immune Repertoire Over Amplification Removal (https://github.com/smiranast/iROAR). The iROAR algorithm was successfully tested on previously published TCR repertoires obtained using both 5' RACE (Rapid Amplification of cDNA Ends)-based and multiplex PCR-based approaches and compared with a biological spike-in-based method for PCR bias evaluation. The developed approach can increase the accuracy and consistency of repertoires reconstructed by different methods making them more applicable for comparative analysis.