Haplotype function score improves biological interpretation and cross-ancestry polygenic prediction of human complex traits

eLife assessment

This valuable paper presents a new approach for association testing, using the output of neural networks that have been trained to predict functional changes from DNA sequences. As such, the approach is an interesting addition to statistical genetics, and the evidence for the presented method being able to identify trait-associations in regions where GWASs are typically underpowered is solid. A limitation is, however, that it is unclear how the quality of these associations compares to those detected using conventional methods. Additional work assessing this method's power and characterizing false positives / false negative regions would be critical to ensure that the method is broadly adopted by the field.

https://doi.org/10.7554/eLife.92574.3.sa0

Significance of the findings:

Valuable: Findings that have theoretical or practical implications for a subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Solid: Methods, data and analyses broadly support the claims with only minor weaknesses

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
eLife digest
Introduction
Results
Discussion
Methods
Data availability
References
Article and author information
Metrics

Abstract

We propose a new framework for human genetic association studies: at each locus, a deep learning model (in this study, Sei) is used to calculate the functional genomic activity score for two haplotypes per individual. This score, defined as the Haplotype Function Score (HFS), replaces the original genotype in association studies. Applying the HFS framework to 14 complex traits in the UK Biobank, we identified 3619 independent HFS–trait associations with a significance of p < 5 × 10⁻⁸. Fine-mapping revealed 2699 causal associations, corresponding to a median increase of 63 causal findings per trait compared with single-nucleotide polymorphism (SNP)-based analysis. HFS-based enrichment analysis uncovered 727 pathway–trait associations and 153 tissue–trait associations with strong biological interpretability, including ‘circadian pathway-chronotype’ and ‘arachidonic acid-intelligence’. Lastly, we applied least absolute shrinkage and selection operator (LASSO) regression to integrate HFS prediction score with SNP-based polygenic risk scores, which showed an improvement of 16.1–39.8% in cross-ancestry polygenic prediction. We concluded that HFS is a promising strategy for understanding the genetic basis of human complex traits.

eLife digest

Scattered throughout the human genome are variations in the genetic code that make individuals more or less likely to develop certain traits. To identify these variants, scientists carry out Genome-wide association studies (GWAS) which compare the DNA variants of large groups of people with and without the trait of interest.

This method has been able to find the underlying genes for many human diseases, but it has limitations. For instance, some variations are linked together due to where they are positioned within DNA, which can result in GWAS falsely reporting associations between genetic variants and traits. This phenomenon, known as linkage equilibrium, can be avoided by analyzing functional genomics which looks at the multiple ways a gene’s activity can be influenced by a variation. For instance, how the gene is copied and decoded in to proteins and RNA molecules, and the rate at which these products are generated.

Researchers can now use an artificial intelligence technique called deep learning to generate functional genomic data from a particular DNA sequence. Here, Song et al. used one of these deep learning models to calculate the functional genomics of haplotypes, groups of genetic variants inherited from one parent. The approach was applied to DNA samples from over 350 thousand individuals included in the UK BioBank. An activity score, defined as the haplotype function score (or HFS for short), was calculated for at least two haplotypes per individual, and then compared to various complex traits like height or bone density.

Song et al. found that the HFS framework was better at finding links between genes and specific traits than existing methods. It also provided more information on the biology that may be underpinning these outcomes. Although more work is needed to reduce the computer processing times required to calculate the HFS, Song et al. believe that their new method has the potential to improve the way researchers identify links between genes and human traits.

Introduction

Genome-wide association studies (GWAS) have witnessed remarkable advancements over recent years, both in terms of sample size and genetic discovery. However, the elucidation of downstream mechanisms and subsequent applications still face certain limitations (Visscher et al., 2017). One caveat is that the statistical power of GWAS on a variant relies on its population frequency (Li et al., 2020; Null et al., 2022; Zhou et al., 2022), whereas most variants with large effect size are rare (Zeng et al., 2021), leading to insufficient discoveries. Moreover, linkage disequilibrium (LD) among neighboring variants can significantly inflate false positive results (Nowbandegani et al., 2022). The variability of LD structure among different populations further compounds the challenges associated with training predictive models and discovering causal genes. Lastly, most trait-relevant variants reside in non-coding regions (Watanabe et al., 2019), which lack direct functional annotations as coding variants. The prevalent approach to addressing this issue is to annotate each variant based on its location within functionally significant regions (Finucane et al., 2015; Grotzinger et al., 2022; Iotchkova et al., 2019; Weissbrod et al., 2020; Zheng et al., 2022), such as transcription factor-binding sites or enhancers. While this strategy has considerably advanced the analysis, it is not optimal, as a variant’s placement within a functionally important region does not inherently signify that the variant has substantial functional impacts.

The central dogma, proposing that DNA alterations’ effects on phenotype are mediated via RNA and protein changes, offers a novel strategy to address these challenges. More precisely, by replacing the original genotypes in association studies with the aggregated impact of variants on transcription or functional genomics, the central dogma ensures the preservation of the majority of genetic information. This ‘aggregated impact’ offers several benefits for GWAS analysis: it provides direct biological interpretations, bypasses the effects of LD and population genetic history, and amalgamates information from both common and rare variants. One successful implementation of this strategy is Polygenic Transcriptome Risk Scores (PTRS) (Hu et al., 2022; Liang et al., 2022), which employ genetically determined transcription levels rather than genotypes to predict complex trait, and achieved remarkable portability. Nonetheless, the accuracy of imputing transcription levels from genotypes, given the sample size of currently available cohorts such as the Genotype-Tissue Expression project, GTEx (Aguet et al., 2020), remains limited (R² around 0.1 for most genes) (Barbeira et al., 2018). Thus, the performance of PTRS is yet to reach its optimal potential.

Following the success of PTRS, we made one step forward to utilize functional genomics in this strategy. Compared with transcription levels, predicting genetically determined functional genomic levels has achieved much higher accuracy by multiple recent deep learning (DL) studies (Avsec et al., 2021; Chen et al., 2022; Kelley, 2020; Yan et al., 2021; Zhou et al., 2018). These DL models utilize segments of the human reference genome as training samples, substantially increasing the sample size. Furthermore, functional genomics serve as a mediator between DNA and transcription, thus lessening the influence of non-genic factors such as the environment. Given these advancements, we propose that using the outputs of one of the state-of-the-art DL models, Sei (Chen et al., 2022), as the ‘aggregated impact’ in this novel strategy could effectively address the challenges aforementioned. Sei accepts a DNA sequence and computes multiple sequence class scores that represent different facets of the functional genomic activities of that sequence. This score integrates impacts from all variants, even those as rare as singletons, into one continuous variable, and is, in theory, unaffected by LD. In line with this notion, a recent similar strategy called cistrome-wide association study integrated variant–chromatin activity and variant–phenotype association to boost power of genetic study of cancer (Baca et al., 2022).

In this study, we present an analytical framework founded on this strategy (Figure 1) and implement it on complex traits in the UK Biobank to pinpoint causal loci and genes, decipher biological mechanisms, and devise cross-ancestry prediction models. We segmented the human reference genome into multiple 4096 bp loci, generated DNA sequences for each locus for two haplotypes per individual, and employed Sei to compute the functional genomic activities of these sequences. We designated this activity score as the Haplotype Function Score (HFS) and analyzed the association between the HFS and each trait. Our findings confirm that the HFS framework offers a unique improvement in the biological interpretation and polygenic prediction of complex traits compared to classic SNP-based methods, thereby demonstrating its value in genetic association studies.

Figure 1 with 3 supplements see all

Download asset Open asset

Flowchart of the study.

Ind: individual.

Results

Overview of genome-wide HFS

We used the HFS framework to analyze imputed genotype data from the UK Biobank (Figure 1). We segmented the human genome (hg38) into 617,378 discrete, non-overlapping loci, each 4096 base pairs long. Of these, 590,959 loci carried at least one non-reference haplotype in the UKB cohort (see Method and Supplementary file 1a). After quality control, these loci contained approximately 1.2 billion haplotypes, with a median count of 819 per loci (Figure 1—figure supplement 1). We then employed the DL framework, Sei (Chen et al., 2022), to compute sequence class scores for each haplotype. In its sequence mode, Sei accepts DNA sequences in fasta format and produces multiple distinct sequence class scores, 39 of which were included in our study (Method). Our analysis identified significant variation in sequence class scores across different loci. In fact, 49.7% of loci housed haplotypes whose sequence class (as defined by the maximum of the 39 sequence class scores) differed from the reference haplotype sequence class. Using the reference sequence class as a benchmark, we noted that 16.8% of loci showed a difference between the maximum and minimum haplotype scores that surpassed the score of the reference haplotype. Moreover, the correlation between sequence class scores of adjacent loci was low, with a median R² value of 0.013 (Figure 1—figure supplement 2), effectively reducing the impact of LD in association studies. Further evaluation indicated that this low LD was led by two factors: integration of rare variant impacts and segmentation. Firstly, excluding rare variants from HFS caused the LD raised to median = 0.14 (Method; Figure 1—figure supplement 2C). Secondly, median LD of SNPs from adjacent loci was 0.06, which was significantly higher than HFS LD (paired Wilcoxon p = 1.76 × 10⁻⁵) but significantly lower than HFS LD without rare variants (paired Wilcoxon p < 2.2 × 10⁻¹⁶).

Expanding on the sequence class scores, we defined HFS for each locus. Specifically, we computed the mean sequence class score of two haplotypes per individual, reflecting an additive model. We selected the score corresponded to the sequence class of reference sequence as the HFS of the corresponding locus, and its association with each trait was computed using a generalized linear model. Simulation analysis revealed that when a non-reference sequence class score was associated the trait, reference class score could still capture median 70% of HFS–trait association R². We applied this framework to 14 polygenic traits in the UKB British ancestry training set (n = 350,587; Supplementary file 1b and Method), identifying 16,597 significant HFS–trait associations at a threshold of p < 5 × 10⁻⁸ (n = 15 for insomnia, n = 7573 for height; Supplementary file 1b), equating to roughly 3619 independent associations. The most significant associations were between the ‘promotor’ score of chr7:121327898–121331994 (WNT16) and bone mineral density (BMD; regression beta = −0.02, p < 10⁻³⁰⁰), and the ‘promotor’ score of chr9:4760952–4765048 (AK3) and platelet count (beta = 3.20, p = 2.79 × 10⁻²⁶²; Supplementary file 1c).

When comparing HFS association with the standard SNP-based GWAS on the same data, we found that 98% of significant HFS loci also harbored a significant SNP. There were a few cases (n = 0–5) where significant HFS loci did not harbored even marginal SNP association (GWAS p > 0.01), which were due to the lack of common SNP in these loci. HFS association p-value was higher than GWAS p-value in 95% of significant loci, suggested that HFS did not improve power to detect marginal effect. The genomic control inflation factor (λ_GC) for the HFS association test varied between 0.99 for asthma and 1.50 for height, closely resembling the SNP GWAS (Pearson correlation coefficient [PCC] = 0.91, paired t-test p = 0.16; Method and Figure 1—figure supplement 3). We concluded that HFS-based association tests had adequate power and do not introduce additional p-value inflation.

Fine-mapping based on HFS

Based on these data, we applied SUSIE to fine-map the causal loci that were associated with each of the 14 traits. We divided hg38 genome into 1361 independent blocks as defined by MacDonald et al., 2022, and applied SUSIE (Wang et al., 2020) to loci HFS in each of these blocks (number of loci per block = 4–2392). As shown in Figure 2 and Supplementary file 1d, we identified a total of 2699 causal loci–trait associations at the threshold of posterior inclusion probability (PIP) >0.95, hereafter referred to as ‘causal loci’. Compared with SNP-based functionally aware fine-mapping methods PolyFun (Weissbrod et al., 2020) and SbayesRC (Zheng et al., 2022), HFS-based SUSIE detected −11 to 334 more causal signals (median = 63, Supplementary file 1e) for each trait. We cautioned that these methods use summary statistics as input and are by nature less sensitive than individual data-based methods. Yet, we suggested that such impact would be mild, since we used in-sample LD reference (from UKB European sample).

Figure 2 with 1 supplement see all

Download asset Open asset

Fine-mapping result summary.

Gray bar plots indicated the number of loci with posterior inclusion probability (PIP) >0.95 in Haplotype Function Score (HFS) + SUSIE (causal loci). Black bar plots indicated number of SNP with PIP >0.95 in PolyFun or SbayesRC analysis (the larger number was shown). Each grid of heatmap showed the odds ratio of each sequence class loci being causal loci for each trait. ‘All_OR’ indicated odds ratio for pooling all traits together. Enh: enhancer. TF: transcription factor-binding site.

Among these causal loci, only 22% were also lead loci in association analysis (loci with the lowest p-value in 200 kb region), and 58% had association p-value >5 × 10⁻⁸. In line with previous SNP-based analysis (Weissbrod et al., 2020), this result highlighted the importance of using causal signals instead of lead signals in post-GWAS analysis. We found 67 causal loci showing pleiotropic effects on at least two independent traits, including ‘CTCF-Cohesin’ score of chr9:89596537–89600633 that was associated with age at menarche, body mass index (BMI) and height (PIP >0.97; Supplementary file 1d). We also found that rare variants played an important role in the good find-mapping performance of HFS: when variants with MAF <0.01 were removed, 55.3% of the causal signals would be missed in HFS + SUSIE analysis.

When looking at the reference sequence class of loci, those with functional importance were more likely to be causal loci, including ‘Promoter’ (odds ratio [OR] = 2.33, p = 1.41 × 10⁻¹⁴), ‘Bivalent stem cell enhancer’ (OR = 2.22, p = 1.11 × 10⁻⁸), and ‘Transcribed region 1’ (OR = 1.71, p = 1.581 × 10⁻¹⁰, Figure 2). Such functional enrichment was even higher for pleiotropic loci (‘Promoter’: OR = 7.20, p = 3.35 × 10⁻⁵). We also observed trait-specific patterns of such sequence class enrichment, such as ‘CEBPB-binding site’ (Insomnia: OR = 5.25, p = 0.01) and ‘FOXA1/AR/ESR1-binding site’ (intelligence: OR = 4.69, p = 0.01, Figure 2 and Supplementary file 1f). These results demonstrated the expected functional patterns of causal loci, and indicated that HFS-based fine-mapping was biologically interpretable and reliable.

Despite the functional enrichment, we applied several secondary analyses to verify the reliability of HFS-based SUSIE result. Firstly, we took causal SNP fine-mapped by PolyFun (Weissbrod et al., 2020) as positive control, and find that compared with genomic region-matched control loci, causal loci were significantly enriched for causal SNP (OR = 1.33–5.08, Fisher’s test p = 0.12–4.72 × 10⁻⁵², Supplementary file 1e). Secondly, we calculated the heritability tagged by causal loci and PolyFun causal SNP in independent test set (defined as the R² of linear regression; Method), and found that causal loci tagged 38–251% more heritability than causal SNP (median = 151%; Supplementary file 1e). This was not an artifact of larger number of causal loci, since the Akaike information criterion (AIC) was similar between causal loci and causal SNP (paired t-test p = 0.36; Supplementary file 1e). Thirdly, for traits with sufficient causal loci coverage, we also applied Linkage Disequilibrium Score regression (LDSC) on independent GWAS summary statistic to evaluate heritability enrichment in causal loci. On average, causal loci showed 124-fold enrichment of heritability, significantly larger than genomic region-matched control loci (124- vs 101-fold; p = 0.0002, Method and Figure 2—figure supplement 1). Lastly, we applied simulation analysis and found that HFS + SUSIE showed similar advantages over SNP-based methods as in real data, with high accuracy and low false-positive rate (FDR) (Supplementary materials).

We further applied a sliding-window analysis (step = 2048 bp, Method) to test whether HFS-based result is robust against the choice of sequence interval. 29.4% of causal loci (PIP >0.95) in the original analysis were still causal in sliding-window analysis. 31.1% and 29.3% of causal loci whose 5′ and 3′ overlapping locus had PIP >0.95 in sliding-window analysis, respectively, while themselves were no longer causal. Besides, HFS + SUSIE was also robust when the predefined number of causal loci (L = 2–10) was changed, and the number of detected loci was not changed. Lastly, removing insertion and deletion would reveal 9% more significant association (p < 5 × 10⁻⁸) but 4.7% less causal association (PIP >0.95), and slightly increased inflation factor (Wilcoxon p = 0.0001, Figure 2—figure supplement 1). Taken together, HFS-based SUSIE is a powerful and robust strategy for individual data-based genetic fine-mapping.

Biological interpretation based on HFS

Pinpointing causal loci of complex traits provides the opportunity of analyzing the biological mechanism of them. Thus, based on the HFS-based fine-mapping result, we applied a linear regression model to analyze the underlying pathways, cell types, and tissues of each complex trait. For each locus, we annotated its relevance to a pathway by combined SNP to Gene (CS2G) strategy (Gazal et al., 2022), and regressed the PIP against this annotation, with a set of baseline annotations included as covariates, similar to the LDSC framework (Finucane et al., 2018) (Method). After p-value correction and recurrent pathway removal (Method), we detected a total of 727 pathway–trait associations (Figure 3A and Supplementary file 1g). The most significant associations were ‘megakaryocyte differentiation’ with platelet count (p = 2.26 × 10⁻³⁴), ‘Insulin-like growth factor receptor signaling pathway’, ‘Endochondral ossification’ with height (p = 4.95 × 10⁻³³ and 1.17 × 10⁻²⁷), ‘PD-1 signaling’ with allergic disease (p = 5.55 × 10⁻²⁵), and ‘major histocompatibility complex pathway’ with asthma (p = 1.22 × 10⁻²³). In fact, asthma and allergic disease were predominantly associated with more than 80 immune-related pathways. These associations were all in line with existing knowledge of trait mechanism, and extended the understanding of their genetic basis. For example, PD-1 has recently been suggested as potential targets of allergic diseases like atopic dermatitis (Galván Morales et al., 2021), but such association has not been highlighted by previous genetic association studies.

Figure 3

Download asset Open asset

Biological enrichment analysis based on Haplotype Function Score (HFS) fine-mapping.

x-axis indicated t statistics of the analyzed term in a multivariate linear regression (Method). Cell: single-cell ATAC peak for 222 cell types from Zhang et al., 2021a. Tissue: active chromatin regions of 222 tissues from epimap (Boix et al., 2021). For each trait, we showed the most significant term plus one or two terms with high biological interpretation that also passed significance threshold. Full enrichment result is shown in Supplementary file 1g and Supplementary file 1h.

For other traits, the most significant associations also replicated known mechanisms, such as ‘osteoblast differentiation’, ‘Wnt ligand biogenesis and trafficking’ with BMD (p = 4.59 × 10⁻¹³ and 2.78 × 10⁻¹²); ‘circadian pathway’ with chronotype (p = 4.25 × 10⁻¹²); ‘calcium regulated exocytosis of neurotransmitter’, ‘Arachidonic acid metabolism’ with intelligence (p = 5.52 × 10⁻⁷ and 2.78 × 10⁻⁶); ‘GPCR pathway’ and ‘adipogenesis’ with BMI (p = 4.97 × 10⁻¹⁰ and 2.02 × 10⁻⁷) and ‘physiological cardiac muscle hypertrophy’ with systolic blood pressure (p = 6.32 × 10⁻¹¹). We also highlighted less significant association which provided novel insights, such as ‘synaptic vesicle docking’ and ‘neuron migration’ with chronotype (p = 4.00 × 10⁻⁷ and 4.55 × 10⁻⁷), ‘Prostaglandins synthesis’ with insomnia (p = 5.30 × 10⁻⁹), ‘behavioral response to cocaine’ with alcohol intake (p = 3.39 × 10⁻⁸) and ‘roof of mouth development’ and ‘glycoside metabolism’ with forced vital capacity (FVC) (p = 2.19 × 10⁻¹² and 5.73 × 10⁻¹¹).

For cell type and tissue analysis (Figure 3B and Supplementary file 1h), we applied the same linear model to evaluate whether causal loci enriched in active chromatin regions of each cell type (Method). We found 153 biologically interpretable associations with complex traits. For example, fetal megakaryocyte (p = 5.67 × 10⁻²²) and child spleen (p = 2.15 × 10⁻¹³) were found to be key cell type and tissue of platelet count. Systolic blood pressure was significantly associated with multiple heart and artery tissues and fetal cardiomyocyte (p < 1.63 × 10⁻⁵), whereas allergic disease was associated with multiple immune cells including natural killer, Treg, and B cells (p < 4.79 × 10⁻¹⁶). For brain-related traits, we found 21 significant associations, 14 of which were from central nervous system. For example, adult hippocampus and cingulate gyrus were both linked to alcohol intake, smoking, and insomnia (p < 1.11 × 10⁻⁵), whereas chronotype was associated with embryonic brain germinal matrix (p < 8.68 × 10⁻⁶) and intelligence with embryonic neuron-derived stem cell (p < 6.89 × 10⁻⁷).

We also applied other modified strategies for this task but did not get satisfying result. For example, using cS2G to link locus to gene lists specifically expressed in each cell type suffered from scRNA dataset batch effect, whereas linear mix model was less sensitive than standard linear model (Supplementary Materials).

Taken together, our result suggests that fine-mapping results based on HFS could pinpoint the causal pathways, cell types, and tissues underlying complex traits, and is valuable for the biological interpretation of genetic association study.

Highlighted genes for complex traits

Enhanced power of fine-mapping and biological enrichment could reveal novel key genes for trait mechanism study. Below we integrated fine-mapping result and their functional annotation in several case studies to find causal signals and trait-relevant genes in regions not resolved by previous genetic association studies.

In our study, platelet count had large number of causal loci (Figure 2) which showed significant functional enrichment (Figure 3). To find key loci and genes underlying platelet count, we focused on causal loci that overlapped with active regions in ‘fetal megakaryocyte’ and ‘child spleen tissue’, and applied cS2G (Gazal et al., 2022) to link them to two key pathways (‘megakaryocyte differentiation’ and ‘platelet morphogenesis’, Method and Figure 4A). We chose these annotations based on p-value in biological enrichment analysis in Figure 3. A total of 25 loci were highlighted (Figure 4A), which were recurrently linked to well-known platelet-regulating genes like MEF2C, SH2B3, FLI1, RUNX1, THPO, and NFE2. Among them we noticed a less-studied gene RBBP5, a target of key transcriptome factor MEF2C during megakaryopoiesis (Kong et al., 2019). Specifically, in 1q32.1 region, HFS + SUSIE identified two loci with PIP >0.9 (Figure 4B). SNP-based association also found significant association in this region, but SNP fine-mapping (Weissbrod et al., 2020) could not resolve this signal and only found seven signals between PIP = 0.1–0.5. This was unlikely a statistical inflation, since HFS-based association test p-value was actually higher than SNP-based one (Figure 4—figure supplement 1). One of the causal loci, chr1:47401806–47405902 (PIP = 1), overlapped with spleen active chromatin and harbored a cCRE in megakaryocyte, and was linked to RBBP5 and three other genes. RBBP5 is known to be involved in megakaryocyte differentiation during megakaryopoiesis and was regulated by MEF2C (Kong et al., 2019), but previous genetic association studies provided little evidence for its association with platelet count.

Figure 4 with 2 supplements see all

Download asset Open asset

Haplotype Function Score (HFS) linked trait to causal genes.

(A) Target genes of causal loci identified by HFS + SUSIE for platelet count. Only genes that showed functional convergence were shown. (B) Regional plot for RBBP5. HFS: loci posterior inclusion probability (PIP) calculated by HFS + SUSIE. SNP: SNP PIP calculated by PolyFun. cCRE: credible cis-regulation elements. (C) Regional plot of major histocompatibility complex (MHC) region for asthma. Thickened curve linked highlighted causal loci to its target genes predicted by cS2G (Gazal et al., 2022).

The major histocompatibility complex (MHC) region has long been a challenge of genetic association study due to its long-range LD, and is often excluded in fine-mapping tools. However, many disorders like schizophrenia (Sekar et al., 2016) and immune diseases (Nawijn et al., 2011) are robustly associated with MHC region. In our HFS-based fine-mapping of asthma, we found 15 loci within MHC region had PIP >0.95, 11 of which overlapped with active chromatin regions in Treg or natural killer cells (Figure 4C and Supplementary file 1j). This result showed good discrimination between causal and non-causal loci: despite these 15 likely causal loci, only six loci had PIP between 0.25 and 0.95. Since MHC region harbored a large number of genes, these causal loci were linked to as much as 105 potential target genes, which hindered the discovery of true targets. We further filtered them based on the involvement in pathway ‘TNFR2-NFKB pathway’ and ‘innate lymphocyte [ILC] development’, since these pathways were most significantly associated with asthma (Figure 3), even after excluding MHC region (p = 2.57 × 10⁻¹³ and 1.39 × 10⁻¹⁷). We found five genes (LTA, LTB, TNF, PSMB8, and PSMB9) that were predicted to be regulated by five causal loci overlapped with active chromatin regions (Figure 4C), which could be considered as potential key genes for further validation.

Similarly, we fine-mapped MHC region for other allergic diseases (Figure 4—figure supplement 2 and Supplementary file 1j) and found potential key genes including HLA family and AGER. We also highlighted other gene–trait association not previously emphasized by GWAS, including GATA4 and NPPA (cardiac muscle hypertrophy) with SBP, ALOX5 (arachidonic acid metabolism) with intelligence and CRY1 (circadian pathway) with chronotype, as further discussed in Supplementary file 1k, l, m and supplementary information.

On the other hand, HFS perform worse than SNP-based fine-mapping on exonic regions. Taking height as an example, PolyFun detected 125 causal SNPs (PIP >0.95) in the exonic regions, but only 16% (20) of loci that harbored them also reached PIP >0. 5 (11 reached PIP >0.95) in HFS + SUSIE analysis. Among the 105 loci that missed such signals (HFS PIP <0.5), 12 had a nearby loci (within 10 kb) showing HFS PIP >0.95, which likely reflected false positive led by LD. Thus, SNP-based analysis should be prioritized over HFS in coding regions.

HFS-based polygenic prediction

Lastly, we analyzed the potentiality of HFS in polygenic prediction accuracy. Compared with state-of-the-art SNP-based polygenic risk score (PRS) algorithm LDAK-BOLT (Zhang et al., 2021b), HFS-based PRS (weighted by SUSIE posterior effect size) reached 47–90% of R² in independent European test set (meta-analyzed proportion = 75.6%, 95% confidence interval = 75.3–75.8%, Figure 5—figure supplement 1). The gap between performance of HFS- and SNP-based PRS reflected the fact that HFS only captured (the majority of) functional genomic alterations and missed the information of amino acid sequence and post-translational modification. We thus proposed that integrating information from HFS and SNP could provide better performance. Specifically, in the large European training set we trained SNP PRS model by LDAK. Then, in a small tuning sample of target ancestry, we calculated per-locus HFS prediction score of height (sum of HFS within this block, weighted by SUSIE posterior effect size), then used machine learning to integrate them with LDAK PRS into a final polygenic prediction score, hereafter referred to as ‘HFS + LDAK’. To choose the proper machine-learning tools to achieve this goal, in British European test set we applied LASSO, ridge regression, and elastic net and compared the result (Figure 5B). They gave comparable result with only difference of R² around 0.01, and all of them were profoundly better than simple linear regression. We chose LASSO as the algorithm in the formal analysis.

Figure 5 with 1 supplement see all

Download asset Open asset

Haplotype Function Score (HFS)-based polygenic prediction.

(A) Prediction R² of HFS-based polygenic risk score (PRS) using different threshold of posterior inclusion probability (PIP). allSNP: SNP-based PRS calculated by LDAK-BOLT (Zhang et al., 2021b). n: number of features included in the corresponding PRS. (B) Prediction R² of per-block HFS score in British European test set by different methods. EN: elastic net. (C) Prediction R² of different tools in non-British European (NBE), South Asian (SAS), East Asian (EAS), and African (AFR) groups in UK Biobank.

Using height as a representative trait, we first estimated the proportion of variance captured by top loci, and found that HFS of loci with PIP >0.4 (n = 5101) captured roughly 80% of variance explained by all genome-wide loci (n = 1,200,024 corresponded to sling-window strategy; Figure 5A). We then calculated HFS + LDAK in non-British European (NBE), South Asian (SAS), East Asian (EAS), and African (AFR) population in UK Biobank, and observed 17.5%, 16.1%, 17.2%, and 39.8% improvement over LDAK alone (p = 3.21 × 10⁻¹⁶, 0.0001, 0.002, and 0.001, respectively. Figure 5C). As a comparison, we integrated LDAK with PolyFun-pred (Weissbrod et al., 2022) and SbayesRC (Zheng et al., 2022) using Polypred framework (Weissbrod et al., 2022), but did not observe significant improvement over LDAK alone (difference in R² < 0.01, p = 0.001–0.07, Figure 5C). Since PolyFun-pred + BOLT-LMM has been shown to significantly outperformed BOLT-LMM alone (Weissbrod et al., 2022), we reasoned that the improvement of LDAK over BOLT-LMM might have attenuated the improvement brought about by PolyFun-pred, making it difficult to reach significance threshold. Taken together, we concluded that HFS could bring about mild but significant improvement to classic SNP-based PRS in the task of cross-ancestry polygenic prediction.

Discussion

In this study, we designed the new HFS framework for genetic association analysis and demonstrated that it could improve classic SNP-based analysis in terms of causal loci and gene identification, biological interpretation and polygenic prediction. We suggest that HFS is a promising strategy for future genetic studies, but more progresses in algorithm and computation and data resources are still desired.

Compared with SNP, HFS has several compelling features. For instance, LD between adjacent HFS is much lower than SNP, which enhances the precision of statistical fine-mapping. For those false-positive variants caused by LD, they are expected to make little impacts on functional genomics, thus their HFS would be close to reference and would not influence downstream analysis significantly. In line with these advantages, we showed that HFS-based fine-mapping had high statistical power, and downstream enrichment analysis was capable of revealing biologically interpretable mechanisms. As a typical example, our findings of enrichment of intelligence-associated loci in arachidonic acid metabolism pathway is in line with the well-known role of polyunsaturated fatty acid in neurodevelopment (Helland et al., 2003). Nonetheless, previous GWAS provided little evidence on this association. Secondly, HFS could integrate effects of all variants within a locus, regardless of their population frequency. Thus, HFS could capture information from rare variants overlooked by classic association study and improve polygenic prediction, as shown by our result. In fact, HFS framework could directly extend to whole-genome sequencing data and capture all mutations as rare as singleton, making one step forward to fill in the ‘missing heritability’.

Despite its potential, the current HFS framework carries several drawbacks and necessitates significant enhancements. A key limitation is the substantial computational cost. In this study, the transformation phase of the genotype–haplotype sequence for UK Biobank SNP data required hundreds of thousands of CPU core hours. This computation cost would increase exponentially when analyzing whole-genome sequencing data or employing a sliding-window strategy. A potential solution could involve developing a new algorithm that bypasses the variant calling stage and directly generates DNA sequences per locus from raw sequencing or SNP array data. For the sequence-to-HFS step, Sei (Chen et al., 2022) required about 1.8 GPU hours per one million sequences. Intriguingly, the majority of Sei’s output is unused in the HFS framework, since Sei predicts over 20,000 functional genomic features, while the HFS only represents one of their integrated scores. Future development of novel DL models that predict functional genomics in a manner more fitting to the HFS framework could considerably reduce computation costs. Lastly, it is currently unfeasible to incorporate all genome-wide HFS into a single LASSO model. This limitation forced us to first integrate HFS into pre-locus score, which inevitably sacrificed the accuracy.

Another hurdle arises in integrating HFS with other genomic features. Intrinsically, HFS captures only the variant effect mediated by functional genomics, while a genetic variant might also influence amino acids, post-transcriptional modifications (PTMs) (Park et al., 2021), and 3D chromosomal structures (Zhou, 2022). Therefore, HFS alone cannot wholly replace SNP without any loss, as our results demonstrate that the HFS-based prediction model captured approximately 70% of the variance explainable by the SNP-based prediction model. One potential solution is to extend the concept of HFS, applying DL to quantify the genetically determined values of PTMs, protein biochemical properties (Pejaver et al., 2020), and protein and chromosomal structures, potentially employing AlphaFold (Jumper et al., 2021)-derived features (Liu et al., 2022). Analyzing HFS in conjunction with these multi-modal function scores could provide a comprehensive depiction of the genetic architecture of complex traits. However, the colossal computational cost is currently prohibitive. As a compromise, we simply performed joint analysis of HFS with SNP PRS in our prediction model analysis. This approach is far from optimal, as it led to only moderate improvement and did not enhance fine-mapping and biological enrichment analysis.

The challenge of using sequence-based DL models in HFS applications is further compounded by their difficulty in predicting variations between individuals. Recent studies (Huang et al., 2023; Sasse et al., 2023) indicate that DL models, trained on the reference human genome, demonstrate limited accuracy in predicting gene expression levels across different individuals. This limitation is likely due to the models' inability to account for long-range regulatory patterns, which are crucial for understanding the impact of variants on gene expression and vary across genes. In contrast, our study leveraged sequence-determined functional genomic profiles in association studies, which mitigates this issue to an extent. For instance, although sei cannot identify the specific gene regulated by a given input sequence, it can predict changes in the sequence’s functional activity. Future improvements in DL models' ability to predict interindividual differences could be achieved by incorporating cross-individual data in the training process. An example of such data is the EN-TEX (Rozowsky et al., 2023) dataset, which aligns functional genomic peaks with the specific individuals and haplotypes they correspond to.

In summary, our results demonstrate that incorporating HFS to represent genetically determined functional genomic activities in genetic association studies offers robust improvements in both the biological interpretation and polygenic prediction of complex traits. Thus, the application of the HFS framework in future genetic association studies holds considerable promise.

Methods

Sample description

This study analyzed UK Biobank data, with application ID 84436, and was adhered to the ethics and privacy policy of UK Biobank. We only included participants with array imputed genotype data in bgen format that passed UKB quality control, and removed related individuals. We randomly selected 350,587 self-identified British ancestry Caucasians as training sample. The remaining participants were grouped according to their ancestry, where non-British European, South Asian, East Asian, and African groups serve as test samples.

All phenotypes analyzed (Supplementary file 1b) were collected from UKB table browser, which came from self-report or physical measurement. Phenotypes were first adjusted by age, sex, top 10 principal components, Townsend index, and genotype array quality metrics by linear regression. We then applied inverse-normal transformation on the residuals. Binary phenotypes were adjusted in the same way except by generalized linear regression.

Genotype data processing

We first segmented hg38 genome into 4096 bp loci. To do so, we downloaded chromatin state annotation of 222 human tissues at different developmental stage (embryo, newborn, and adult) from epimap (Boix et al., 2021) database. For each tissue, all chromosomal regions annotated as ‘transcription start site (TSS), transcription region (TX), enhancer, promoter’ in at least half of the samples were marked as active regions. The union of active regions across all tissues was taken, and regions annotated as genomic gaps (centromere, ambiguous base pairs, etc.) in the Hg38 genome were removed. Then, for this series of active regions, if the length is less than 4096 bp, the locus is defined as a 4096-bp area centered around the active region. If the length is greater than 4096 bp, 4096-bp length loci are gradually delineated from the midpoint outward. Finally, non-overlapping 4096 bp blocks were used to cover the remaining genomic regions. This resulted in about 617,378 genomic regions in total. In the sliding-window analysis, all these blocks were shifted 2048 bp toward 5′ end, generating another 617,378 blocks. We repeated the fine-mapping analysis and applied polygenic analysis on these combined blocks, using height as a representative trait.

For each of the loci, we obtained ID of variants within this locus by bedtools (Quinlan and Hall, 2010), then extracted genotypes from UKB.bgen file by bgenix, finally used Plink (Purcell et al., 2007) to remove all variants with INFO <0.8, Hardy–Weinberg p < 10⁻⁶, allele count <10 or missing rate >10%, and removed individual that missed more than 10% of retained variants in this locus. The output vcf file was liftover to hg38 by Crossmap (Zhao et al., 2014) and phased by SHAPEIT4 (Delaneau et al., 2019). Phased vcf was transformed to.haps format by Plink, which in turn gave rise to two files: a vcf file containing information of each haplotype, and an n x 2 matrix in plain text that recorded the id of two haplotypes per individual.

HFS calculation

There has been several DL models that predict functional genomic profiles based on DNA sequence (Avsec et al., 2021; Chen et al., 2022; Kelley, 2020; Yan et al., 2021; Zhou et al., 2018). Among them, we chose sei (Chen et al., 2022) to calculate HFS for the following reasons: (1) the required input length (4096 bp) is moderate; (2) it represents 21,906 functional genomic tracks, more comprehensive than other models; (3) it integrated information of the entire sequence, not only the few bp at the center. For each haplotype at each locus, we generated its corresponding DNA sequence by bcftools (Danecek et al., 2021) consensus option. At each locus, the start point of each sequence was matched to the start point of reference sequence. When insertion variants made the sequence longer than 4096 bp, we discarded base pairs at the 3′ end. Likewise, with deletion variants, we added N to the 3′ end. We applied sei to predict 21,906 functional genomic tracks for each sequence, without normalizing for histone mark (divided each track score by the sum of histone mark score) as suggested by the sei author. We then used the projection matrix provided by sei to calculate forty sequence class scores, which could be regards as the weighted sum of these tracks and represented different aspect of functional genomic activities. We discarded the last score (heterochromatin 6 [centromere]), since its proportion is too low and is functionally trivial, leading to 39 scores per haplotype.

On each individual, we derived from each sequence class score the mean of two haplotypes, corresponding to additive model. For HFS LD calculation, we extracted the mean value of sequence class score corresponding to reference sequence class of adjacent loci, and calculate R² value between them. The sequence class score of the reference sequence class was defined as the HFS for this locus, and was used for downstream trait association analysis.

HFS–trait association

For each locus, we calculated the association between trait-specific HFS and adjusted, normalized trait value by linear regression, without any covariates (this is because all selected covariates have been adjusted at the normalization step). For uniformity, we set the significance threshold at p < 5 × 10⁻⁸, even if it was over-stringent for n = 590,959 loci. Among significant associations, we defined an independent association as the locus with the lowest p-value in the 200 kb regions. As a positive control, we applied quantitative and binary GWAS with REGENIE (Mbatchou et al., 2021), using default settings and the same British training sample. The main difference is that we used raw trait values in REGENIE, and provided the same covariates. We calculated the genomic control inflation factor, λ_GC, as the median of Χ² statistics, separately for HFS association test and GWAS only those SNPs in hapmap3 (Altshuler et al., 2010) project were calculated. We compared the λ_GC between HFS and SNP by Pearson correlation analysis and paired t-test.

Fine-mapping analysis

We divided hg38 genome into 1361 independent blocks as defined by MacDonald et al., 2022, and applied SUSIE to HFS of all loci within each block, separately for each trait (parameters: maximum number of causal signal = 10, coverage = 0.95). We subtracted reference HFS value for each locus prior to analysis, such that homozygous reference haplotype corresponded to HFS = 0. To avoid influence of sei prediction noise, we rounded the HFS value at two decimals. This is due to the fact that even if a variant actually makes no impact on functional genomics, Sei would still output a value that are close to but not equal to reference sequence class score. Rounding procedure would set such HFS to zero and remove the random value from sei. Loci whose HFS had PIP >0.95 were defined as causal loci, and loci that had causal association with multiple traits were defined as pleiotropic loci. As a positive control, we applied PolyFun (Weissbrod et al., 2020) and SbayesRC (Zheng et al., 2022) on the GWAS summary statistics by REGENIE on the same training set, and extracted the reported PIP to define causal SNP.

To analyze the functional characteristics of causal loci, we first defined the sequence class of each locus by the maximum sequence class score of reference haplotype. We then tested whether each sequence class contained excess causal loci of each trait by Fisher’s test. For each causal locus, we also defined a ‘control’ locus as the nearest locus that matched the p-value of this causal locus, and tested whether causal loci carried more PolyFun causal SNP than control loci by Fisher’s test. Furthermore, For traits whose causal loci covered >0.1% of genome-wide SNP, we applied LDSC (Finucane et al., 2015) to quantify the heritability enrichment in causal and control loci, and compare their difference by jackknife method. To avoid winner’s curse, we used external GWAS summary statistics for this analysis (Mikaelsdottir et al., 2021; Yengo et al., 2022). As an alternative method to quantify the heritability captured by causal loci, we ran multivariate linear regression in independent British test set where HFS of causal loci were independent variables and trait value were dependent variable, and calculated the R² and AIC. We applied the same analysis on causal SNP, and compared AIC between HFS and SNP multivariate regression.

Functional enrichment analysis

Similar to the idea of LDSC (Finucane et al., 2015), we first generated a series of baseline annotation of each locus, then tested whether locus PIP was associated with functional annotations after controlling the impact of these baseline annotations. Specifically, we defined the following baseline annotations:

Number of haplotypes, range of HFS distribution of all haplotypes (scaled by reference HFS), and 39 sequence class score of reference haplotype.
Genomic regions of conserved base, high Phastcons score (Siepel et al., 2005) in mammals, primates and vertebrate, exon, intron, untranslated regions at 3′ and 5′ and 200 bp flanking regions of TSS. We used bedtools intersect -f 0.1 option to annotate each locus by these annotations.
Maximum B statistics (McVicker et al., 2009), minimum allele age, and ASMC_avg (Palamara et al., 2018) of all variants within this locus.

Type 2 and 3 annotations were directly obtained from LDSC (Finucane et al., 2015) baseline annotations. We did not include annotations related to functional genomics, since 39 sequence class scores were used to capture functional genomic characteristics. Conditioned on these baseline annotations, we analyzed the enrichment of PIP in the following functional annotations:

Biological pathways: We downloaded all pathways from MsigDB (Subramanian et al., 2005), C2: canonical pathways category (including Reactome (Fabregat et al., 2018), Pathway Interaction Database (PID) (Schaefer et al., 2009), Biocarta and Wikipathway) and C6: Gene ontology (Ashburner et al., 2000) (biological process) category. We retained only pathways with >5 and <500 genes. We generated a gene × pathway binary matrix and applied hierarchical clustering so that similar pathways were placed close to each other. We sequentially compared adjacent pathways, and removed the smaller one if the fraction of overlap >30%. A total of 3219 pathways were retained. We then linked each locus to these pathways by cS2G (Gazal et al., 2022) strategy. Specifically, a locus L would be annotated as 1 for pathway P only if L contained a SNP that was link to P with cS2G score >0.5.
Tissue-specific chromatin activity: We downloaded chromHMM (Ernst and Kellis, 2012) chromatin state annotation for 833 samples from epimap (Boix et al., 2021), and grouped them according to developmental stages and second-level tissue types. For each group, all chromosomal regions annotated as ‘transcription start site (TSS), transcription region (TX), enhancer, promoter’ in at least half of the samples were marked as active regions. We used bedtools intersect -f 0.1 option to annotate whether each locus was active in each tissue.
Cell type-specific open chromatin regions: We downloaded scATAC-seq peak data from Zhang et al., 2021a, and annotated each locus by bedtools intersect -f 0.1 option.

We applied multivariate linear regression of PIP against baseline annotations +one of the functional annotations. Regression coefficient >0 and Bonferroni-adjusted regression p-value <0.05 were used as significance threshold. From the final results, we manually removed those pathways and cell types that reached significance threshold in more than half of the traits, since these pathways likely reflected unrecognized confounders.

Polygenic prediction

We used the posterior effect size estimated by SUSIE on sliding-window strategy (doubling the number of loci) as weights, and calculated the weighted sum of HFS as the PRS of each trait, and calculated R² in independent British test sample with simple linear regression. As a positive control, we applied LDAK-BOLT (Zhang et al., 2021a) algorithm on the SNP array data (about seven hundred thousand variants) with tenfold cross-validation and max iteration = 200 in the same training sample, and calculated SNP-based PRS with the output SNP weights. Normalized trait values were analyzed, without any covariates provided. Array data were filtered by Plink with option --geno 0.1 --hwe 1e-6 --mac 100 --maf 0.01 --mind 0.1.

To train the refined model that predict height, we first calculated per-block HFS-based prediction score of height as the weighted sum of HFS within this block. Then, within each target ancestry group (non-British European (NBE), South Asian (SAS), East Asian (EA), and African (AFR) participants in UK Biobank), we randomly selected half as tuning sample and half as test sample. In the tuning sample, we applied LASSO regression that included both LDAK PRS and genome-wide per-block HFS score (1361 in total). The choice of LASSO regression was based on a comparison on British European test set (Figure 5B), where LASSO, ridge, and elastic net gave similar results and LASSO was relatively better. In the tuning sample of target ancestry, LASSO estimated the weights to combine per-block HFS score and LDAK PRS. We calculated the final prediction score in the test sample using these weights, and evaluated its prediction by linear regression R². Since the outcome (height) has already been adjusted and standardized, no covariates were included in this step. Additionally, we applied PolyFun-pred (Weissbrod et al., 2022) and SbayesRC (Zheng et al., 2022) to the summary statistics of height (calculated by REGENIE in the same training sample), and integrated their effect size with LDAK weight in the tuning sample using Polypred (Weissbrod et al., 2022) method. PRS for LDAK, LDAK + PolyFun and LDAK + SbayesRC were calculated by plink score option, excluding variants with INFO <0.8, Hardy–Weinberg p < 10⁻⁶, allele count <2 or missing rate >10% in the target test set.

Simulation analysis

We simulated trait levels using HFS data from chromosome 1 in a randomly selected 50,000 samples from UKB EUR training data. We randomly selected 1% (500) loci, assigned effect size from standard normal distribution, and calculated the aggregated genetic liability. We then simulated trait levels with h² = 0.1. We applied HFS + SUSIE as well as REGENIE + PolyFun on simulated traits and calculated the area under curve (AUC), FDR at PIP >0.95 for HFS + SUSIE. We repeated this procedure for 30 times.

On average, HFS + SUSIE showed high accuracy in identifying causal loci (median AUC = 0.92) and the FDR at PIP >0.95 is median 0.059. In line with real data analysis, the number of causal loci identified by HFS + SUSIE is 1.12-fold more than PolyFun on average. Furthermore, HFS + SUSIE showed good discrimination between causal and non-causal loci: the number of PIP >0.95 loci is larger than 0.5 < PIP < 0.95 loci.

Alternative strategy on biological enrichment analysis

Despite the standard linear regression as we applied in the main text, we also applied a linear mixed regression which took independent blocks as random effect. For each regression, we included one biological term plus all baseline annotations. The regression coefficient and p-value of each biological term were estimated by mgcv R package. After p-value correction, most of the significant terms were those recurrently appeared in more than half of the traits, which were considered artifacts of hidden covariates. When removing these recurrent terms, less than five significant terms remained for each trait. We concluded that linear mixed regression was less sensitive than standard linear regression for identifying trait-specific biological association.

We also tried another strategy for cell type-specific analysis. We first downloaded C8 category from MsigDB, which contained gene lists specifically expressed in about 800 cell types, derived from multiple single-cell RNA sequencing studies. We then linked each locus to these gene lists by CS2G method, then applied linear regression, similar to pathway analysis. We found that most traits predominantly linked to nearly all cell types from a specific study, which showed study batch effect instead of biological functions. For example, smoking was associated with all neuron subtypes, pericytes and immune cells from one brain scRNA data, but did not showed association with immune cells and pericytes from other scRNA studies. We reasoned that the curated cell type-specific gene lists contained batch effects that were not yet corrected. Thus, in the main text, we reported association between PIP and single-cell ATAC peak from one study, which reduced the batch effect.

Highlighted genes for complex traits

For chronotype, we found one circadian gene CRY1 that were predicted to be target of locus chr12:1070930221107097118, which had PIP = 0.56. This locus was active in cingulate gyrus, and belong to sequence class ‘enhancer-multi tissue’. CRY1 was known to participate in circadian pathway, and was not highlighted by previous GWAS. SNP-based fine-mapping also found no SNP with PIP >0.1 that was predicted to link to CRY1. We suggested that it was a novel promising target gene for understanding mechanisms of chronotype.

For systolic blood pressure, we found chr8:11726583–11730679 (PIP = 0.999) that resided on gene GATA4. This locus was active in both adult heart ventricle and in fetal cardiomyocyte. GATA4 took part in physiological myocardial hypertrophy. SNP fine-mapping got PIP <0.34 for all SNPs linked to GATA4. Previous GWAS has found its homolog GATA2 as a key gene in blood pressure, and our new result supported GATA4 as another key genes.

For intelligence, we found chr10:45559452–45563548 that was active in caudate nucleus and was associated with intelligence at PIP >0.5. It was predicted to regulate ALOX5, a key enzyme in the arachidonic acid metabolism. It is known that supplement of Arachidonic acid is beneficial for child intelligence development, and that arachidonic acid takes part in neurodevelopment. However, few genes related to arachidonic acid has been associated with intelligence.

Statistical analysis

All p-values were two-sided and adjusted by Bonferroni unless otherwise specified. For group comparison, we used Fisher’s test for count data and paired t-test for continuous data. For R² of PRS comparison, we applied r2redux (Momin et al., 2023) R package to estimate 95% confidence interval and its p-value for the difference of R².

Data availability

The current manuscript is a computational study, so no data have been generated for this manuscript. Data from the UK Biobank (project 84436; Bycroft et al., 2018) are available, pending application approval from: https://www.ukbiobank.ac.uk/. Modeling code is available at https://github.com/WeiCSong/HFS (copy archived at Song, 2024).

References

1. Aguet F
2. Barbeira AN
3. Bonazzola R
4. Brown A
5. Castel SE
6. Jo B
7. Kasela S
8. Kim-Hellmuth S
9. Liang Y
10. Oliva M
11. Flynn ED
12. Parsana P
13. Fresard L
14. Gamazon ER
15. Hamel AR
16. He Y
17. Hormozdiari F
18. Mohammadi P
19. Muñoz-Aguirre M
20. Park YS
21. Saha A
22. Strober BJ
23. Wen X
24. Wucher V
25. Ardlie KG
26. Battle A
27. Brown CD
28. Cox N
29. Das S
30. Dermitzakis ET
31. Engelhardt BE
32. Garrido-Martín D
33. Gay NR
34. Getz GA
35. Guigó R
36. Handsaker RE
37. Hoffman PJ
38. Im HK
39. Kashin S
40. Kwong A
41. Lappalainen T
42. Xiao L
43. MacArthur DG
44. Montgomery SB
45. Rouhana JM
46. Stephens M
47. Stranger BE
48. Todres E
49. Viñuela A
50. Wang G
51. Zou Y
52. Anand S
53. Gabriel S
54. Graubert A
55. Hadley K
56. Huang KH
57. Nguyen JL
58. Balliu DT
59. Conrad B
60. Cotter DF
61. Einson J
62. Eskin E
63. Eulalio TY
64. Ferraro NM
65. Gloudemans MJ
66. Hou L
67. Kellis M
68. Xin L
69. Mangul S
70. Nachun DC
71. Nobel AB
72. Park Y
73. Rao AS
74. Reverter F
75. Sabatti C
76. Skol AD
77. Teran NA
78. Wright F
79. Ferreira PG
80. Li G
81. Melé M
82. Yeger-Lotem E
83. Barcus ME
84. Bradbury D
85. Krubit T
86. McLean JA
87. Qi L
88. Robinson K
89. Smith AM
90. Sobin L
91. Tabor DE
92. Undale A
93. Bridge J
94. Brigham LE
95. Foster BA
96. Gillard BM
97. Hasz R
98. Hunter M
99. Johns C
100. Johnson M
101. Karasik E
102. Kopen G
103. Leinweber WF
104. McDonald A
105. Moser MT
106. Myer K
107. Ramsey KD
108. Roe B
109. Shad S
110. Thomas JA
111. Walters G
112. Washington M
113. Wheeler J
114. Jewell SD
115. Rohrer DC
116. Valley DR
117. Davis DA
118. Mash DC
119. Branton PA
120. Sobin L
121. Barker LK
122. Gardiner HM
123. Mosavel M
124. Siminoff LA
125. Flicek P
126. Haeussler M
127. Juettemann T
128. Kent WJ
129. Lee CM
130. Powell CC
131. Rosenbloom KR
132. Ruffier M
133. Sheppard D
134. Taylor K
135. Trevanion SJ
136. Zerbino DR
137. Abell NS
138. Akey J
139. Chen L
140. Demanelis K
141. Doherty JA
142. Feinberg AP
143. Hansen KD
144. Hickey PF
145. Hou L
146. Jasmine F
147. Jiang L
148. Kaul R
149. Kellis M
150. Kibriya MG
151. Li JB
152. Li Q
153. Lin S
154. Linder SE
155. Pierce BL
156. Rizzardi LF
157. Smith KS
158. Snyder M
159. Stamatoyannopoulos J
160. Tang H
161. Wang M
162. Branton PA
163. Carithers LJ
164. Guan P
165. Koester SE
166. Little AR
167. Moore HM
168. Nierras CR
169. Rao AK
170. Vaught JB
171. Volpi S
(2020) The GTEx Consortium atlas of genetic regulatory effects across human tissues
Science 369:1318–1330.

https://doi.org/10.1126/science.aaz1776
- PubMed
- Google Scholar
1. Altshuler DM
2. Gibbs RA
3. Peltonen L
4. Altshuler DM
5. Gibbs RA
6. Peltonen L
7. Dermitzakis E
8. Schaffner SF
9. Yu F
10. Peltonen L
11. Dermitzakis E
12. Bonnen PE
13. Altshuler DM
14. Gibbs RA
15. de Bakker PIW
16. Deloukas P
17. Gabriel SB
18. Gwilliam R
19. Hunt S
20. Inouye M
21. Jia X
22. Palotie A
23. Parkin M
24. Whittaker P
25. Yu F
26. Chang K
27. Hawes A
28. Lewis LR
29. Ren Y
30. Wheeler D
31. Gibbs RA
32. Muzny DM
33. Barnes C
34. Darvishi K
35. Hurles M
36. Korn JM
37. Kristiansson K
38. Lee C
39. McCarrol SA
40. Nemesh J
41. Dermitzakis E
42. Keinan A
43. Montgomery SB
44. Pollack S
45. Price AL
46. Soranzo N
47. Bonnen PE
48. Gibbs RA
49. Gonzaga-Jauregui C
50. Keinan A
51. Price AL
52. Yu F
53. Anttila V
54. Brodeur W
55. Daly MJ
56. Leslie S
57. McVean G
58. Moutsianas L
59. Nguyen H
60. Schaffner SF
61. Zhang Q
62. Ghori MJR
63. McGinnis R
64. McLaren W
65. Pollack S
66. Price AL
67. Schaffner SF
68. Takeuchi F
69. Grossman SR
70. Shlyakhter I
71. Hostetter EB
72. Sabeti PC
73. Adebamowo CA
74. Foster MW
75. Gordon DR
76. Licinio J
77. Manca MC
78. Marshall PA
79. Matsuda I
80. Ngare D
81. Wang VO
82. Reddy D
83. Rotimi CN
84. Royal CD
85. Sharp RR
86. Zeng C
87. Brooks LD
88. McEwen JE
(2010) Integrating common and rare genetic variation in diverse human populations
Nature 467:52–58.

https://doi.org/10.1038/nature09298
- PubMed
- Google Scholar
1. Ashburner M
2. Ball CA
3. Blake JA
4. Botstein D
5. Butler H
6. Cherry JM
7. Davis AP
8. Dolinski K
9. Dwight SS
10. Eppig JT
11. Harris MA
12. Hill DP
13. Issel-Tarver L
14. Kasarskis A
15. Lewis S
16. Matese JC
17. Richardson JE
18. Ringwald M
19. Rubin GM
20. Sherlock G
(2000) Gene Ontology: tool for the unification of biology
Nature Genetics 25:25–29.

https://doi.org/10.1038/75556
- Google Scholar
1. Avsec Ž
2. Agarwal V
3. Visentin D
4. Ledsam JR
5. Grabska-Barwinska A
6. Taylor KR
7. Assael Y
8. Jumper J
9. Kohli P
10. Kelley DR
(2021) Effective gene expression prediction from sequence by integrating long-range interactions
Nature Methods 18:1196–1203.

https://doi.org/10.1038/s41592-021-01252-x
- PubMed
- Google Scholar
1. Baca SC
2. Singler C
3. Zacharia S
4. Seo JH
5. Morova T
6. Hach F
7. Ding Y
8. Schwarz T
9. Huang CCF
10. Anderson J
11. Fay AP
12. Kalita C
13. Groha S
14. Pomerantz MM
15. Wang V
16. Linder S
17. Sweeney CJ
18. Zwart W
19. Lack NA
20. Pasaniuc B
21. Takeda DY
22. Gusev A
23. Freedman ML
(2022) Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation
Nature Genetics 54:1364–1375.

https://doi.org/10.1038/s41588-022-01168-y
- PubMed
- Google Scholar
1. Barbeira AN
2. Dickinson SP
3. Bonazzola R
4. Zheng J
5. Wheeler HE
6. Torres JM
7. Torstenson ES
8. Shah KP
9. Garcia T
10. Edwards TL
11. Stahl EA
12. Huckins LM
13. Nicolae DL
14. Cox NJ
15. Im HK
(2018) Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics
Nature Communications 9:1825.

https://doi.org/10.1038/s41467-018-03621-1
- PubMed
- Google Scholar
1. Boix CA
2. James BT
3. Park YP
4. Meuleman W
5. Kellis M
(2021) Regulatory genomic circuitry of human disease loci by integrative epigenomics
Nature 590:300–307.

https://doi.org/10.1038/s41586-020-03145-z
- PubMed
- Google Scholar
1. Bycroft C
2. Freeman C
3. Petkova D
4. Band G
5. Elliott LT
6. Sharp K
7. Motyer A
8. Vukcevic D
9. Delaneau O
10. O’Connell J
11. Cortes A
12. Welsh S
13. Young A
14. Effingham M
15. McVean G
16. Leslie S
17. Allen N
18. Donnelly P
19. Marchini J
(2018) The UK Biobank resource with deep phenotyping and genomic data
Nature 562:203–209.

https://doi.org/10.1038/s41586-018-0579-z
- PubMed
- Google Scholar
1. Chen KM
2. Wong AK
3. Troyanskaya OG
4. Zhou J
(2022) A sequence-based global map of regulatory activity for deciphering human genetics
Nature Genetics 54:940–949.

https://doi.org/10.1038/s41588-022-01102-2
- PubMed
- Google Scholar
1. Danecek P
2. Bonfield JK
3. Liddle J
4. Marshall J
5. Ohan V
6. Pollard MO
7. Whitwham A
8. Keane T
9. McCarthy SA
10. Davies RM
11. Li H
(2021) Twelve years of SAMtools and BCFtools
GigaScience 10:giab008.

https://doi.org/10.1093/gigascience/giab008
- PubMed
- Google Scholar
(2019) Accurate, scalable and integrative haplotype estimation
Nature Communications 10:5436.

https://doi.org/10.1038/s41467-019-13225-y
- PubMed
- Google Scholar
1. Ernst J
2. Kellis M
(2012) ChromHMM: automating chromatin-state discovery and characterization
Nature Methods 9:215–216.

https://doi.org/10.1038/nmeth.1906
- PubMed
- Google Scholar
1. Fabregat A
2. Jupe S
3. Matthews L
4. Sidiropoulos K
5. Gillespie M
6. Garapati P
7. Haw R
8. Jassal B
9. Korninger F
10. May B
11. Milacic M
12. Roca CD
13. Rothfels K
14. Sevilla C
15. Shamovsky V
16. Shorser S
17. Varusai T
18. Viteri G
19. Weiser J
20. Wu G
21. Stein L
22. Hermjakob H
23. D’Eustachio P
(2018) The reactome pathway knowledgebase
Nucleic Acids Research 46:D649–D655.

https://doi.org/10.1093/nar/gkx1132
- PubMed
- Google Scholar
1. Finucane HK
2. Bulik-Sullivan B
3. Gusev A
4. Trynka G
5. Reshef Y
6. Loh PR
7. Anttila V
8. Xu H
9. Zang C
10. Farh K
11. Ripke S
12. Day FR
13. Purcell S
14. Stahl E
15. Lindstrom S
16. Perry JRB
17. Okada Y
18. Raychaudhuri S
19. Daly MJ
20. Patterson N
21. Neale BM
22. Price AL
(2015) Partitioning heritability by functional annotation using genome-wide association summary statistics
Nature Genetics 47:1228–1235.

https://doi.org/10.1038/ng.3404
- PubMed
- Google Scholar
1. Finucane HK
2. Reshef YA
3. Anttila V
4. Slowikowski K
5. Gusev A
6. Byrnes A
7. Gazal S
8. Loh PR
9. Lareau C
10. Shoresh N
11. Genovese G
12. Saunders A
13. Macosko E
14. Pollack S
15. Perry JRB
16. Buenrostro JD
17. Bernstein BE
18. Raychaudhuri S
19. McCarroll S
20. Neale BM
21. Price AL
(2018) Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types
Nature Genetics 50:621–629.

https://doi.org/10.1038/s41588-018-0081-4
- PubMed
- Google Scholar
(2021) New insights into the role of pd-1 and its ligands in allergic disease
International Journal of Molecular Sciences 22:11898.

https://doi.org/10.3390/ijms222111898
- PubMed
- Google Scholar
1. Gazal S
2. Weissbrod O
3. Hormozdiari F
4. Dey KK
5. Nasser J
6. Jagadeesh KA
7. Weiner DJ
8. Shi H
9. Fulco CP
10. O’Connor LJ
11. Pasaniuc B
12. Engreitz JM
13. Price AL
(2022) Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity
Nature Genetics 54:827–836.

https://doi.org/10.1038/s41588-022-01087-y
- PubMed
- Google Scholar
(2022) Transcriptome-wide and stratified genomic structural equation modeling identify neurobiological pathways shared across diverse cognitive traits
Nature Communications 13:6280.

https://doi.org/10.1038/s41467-022-33724-9
- PubMed
- Google Scholar
1. Helland IB
2. Smith L
3. Saarem K
4. Saugstad OD
5. Drevon CA
(2003) Maternal supplementation with very-long-chain n-3 fatty acids during pregnancy and lactation augments children’s IQ at 4 years of age
Pediatrics 111:e39–e44.

https://doi.org/10.1542/peds.111.1.e39
- PubMed
- Google Scholar
1. Hu X
2. Qiao D
3. Kim W
4. Moll M
5. Balte PP
6. Lange LA
7. Bartz TM
8. Kumar R
9. Li X
10. Yu B
11. Cade BE
12. Laurie CA
13. Sofer T
14. Ruczinski I
15. Nickerson DA
16. Muzny DM
17. Metcalf GA
18. Doddapaneni H
19. Gabriel S
20. Gupta N
21. Dugan-Perez S
22. Cupples LA
23. Loehr LR
24. Jain D
25. Rotter JI
26. Wilson JG
27. Psaty BM
28. Fornage M
29. Morrison AC
30. Vasan RS
31. Washko G
32. Rich SS
33. O’Connor GT
34. Bleecker E
35. Kaplan RC
36. Kalhan R
37. Redline S
38. Gharib SA
39. Meyers D
40. Ortega V
41. Dupuis J
42. London SJ
43. Lappalainen T
44. Oelsner EC
45. Silverman EK
46. Barr RG
47. Thornton TA
48. Wheeler HE
49. Cho MH
50. Im HK
51. Manichaikul A
(2022) Polygenic transcriptome risk scores for COPD and lung function improve cross-ethnic portability of prediction in the NHLBI TOPMed program
American Journal of Human Genetics 109:857–870.

https://doi.org/10.1016/j.ajhg.2022.03.007
- PubMed
- Google Scholar
1. Huang C
2. Shuai RW
3. Baokar P
4. Chung R
5. Rastogi R
6. Kathail P
7. Ioannidis NM
(2023) Personal transcriptome variation is poorly explained by current genomic deep learning models
Nature Genetics 55:2056–2059.

https://doi.org/10.1038/s41588-023-01574-w
- PubMed
- Google Scholar
1. Iotchkova V
2. Ritchie GRS
3. Geihs M
4. Morganella S
5. Min JL
6. Walter K
7. Timpson NJ
8. UK10K Consortium
9. Dunham I
10. Birney E
11. Soranzo N
(2019) GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals
Nature Genetics 51:343–353.

https://doi.org/10.1038/s41588-018-0322-6
- PubMed
- Google Scholar
1. Jumper J
2. Evans R
3. Pritzel A
4. Green T
5. Figurnov M
6. Ronneberger O
7. Tunyasuvunakool K
8. Bates R
9. Žídek A
10. Potapenko A
11. Bridgland A
12. Meyer C
13. Kohl SAA
14. Ballard AJ
15. Cowie A
16. Romera-Paredes B
17. Nikolov S
18. Jain R
19. Adler J
20. Back T
21. Petersen S
22. Reiman D
23. Clancy E
24. Zielinski M
25. Steinegger M
26. Pacholska M
27. Berghammer T
28. Bodenstein S
29. Silver D
30. Vinyals O
31. Senior AW
32. Kavukcuoglu K
33. Kohli P
34. Hassabis D
(2021) Highly accurate protein structure prediction with alphafold
Nature 596:583–589.

https://doi.org/10.1038/s41586-021-03819-2
- PubMed
- Google Scholar
1. Kelley DR
(2020) Cross-species regulatory sequence activity prediction
PLOS Computational Biology 16:e1008050.

https://doi.org/10.1371/journal.pcbi.1008050
- PubMed
- Google Scholar
1. Kong X
2. Ma L
3. Chen E
4. Shaw CA
5. Edelstein LC
(2019) Identification of the regulatory elements and target genes of megakaryopoietic transcription factor mef2c
Thrombosis and Haemostasis 119:716–725.

https://doi.org/10.1055/s-0039-1678694
- PubMed
- Google Scholar
1. Li X
2. Li Z
3. Zhou H
4. Gaynor SM
5. Liu Y
6. Chen H
7. Sun R
8. Dey R
9. Arnett DK
10. Aslibekyan S
11. Ballantyne CM
12. Bielak LF
13. Blangero J
14. Boerwinkle E
15. Bowden DW
16. Broome JG
17. Conomos MP
18. Correa A
19. Cupples LA
20. Curran JE
21. Freedman BI
22. Guo X
23. Hindy G
24. Irvin MR
25. Kardia SLR
26. Kathiresan S
27. Khan AT
28. Kooperberg CL
29. Laurie CC
30. Liu XS
31. Mahaney MC
32. Manichaikul AW
33. Martin LW
34. Mathias RA
35. McGarvey ST
36. Mitchell BD
37. Montasser ME
38. Moore JE
39. Morrison AC
40. O’Connell JR
41. Palmer ND
42. Pampana A
43. Peralta JM
44. Peyser PA
45. Psaty BM
46. Redline S
47. Rice KM
48. Rich SS
49. Smith JA
50. Tiwari HK
51. Tsai MY
52. Vasan RS
53. Wang FF
54. Weeks DE
55. Weng Z
56. Wilson JG
57. Yanek LR
58. Abe N
59. Abecasis GR
60. Aguet F
61. Albert C
62. Almasy L
63. Alonso A
64. Ament S
65. Anderson P
66. Anugu P
67. Applebaum-Bowden D
68. Ardlie K
69. Arking D
70. Arnett DK
71. Ashley-Koch A
72. Aslibekyan S
73. Assimes T
74. Auer P
75. Avramopoulos D
76. Barnard J
77. Barnes K
78. Barr RG
79. Barron-Casella E
80. Barwick L
81. Beaty T
82. Beck G
83. Becker D
84. Becker L
85. Beer R
86. Beitelshees A
87. Benjamin E
88. Benos T
89. Bezerra M
90. Bielak LF
91. Bis J
92. Blackwell T
93. Blangero J
94. Boerwinkle E
95. Bowden DW
96. Bowler R
97. Brody J
98. Broeckel U
99. Broome JG
100. Bunting K
101. Burchard E
102. Bustamante C
103. Buth E
104. Cade B
105. Cardwell J
106. Carey V
107. Carty C
108. Casaburi R
109. Casella J
110. Castaldi P
111. Chaffin M
112. Chang C
113. Chang YC
114. Chasman D
115. Chavan S
116. Chen BJ
117. Chen WM
118. Chen YDI
119. Cho M
120. Choi SH
121. Chuang LM
122. Chung M
123. Chung RH
124. Clish C
125. Comhair S
126. Conomos MP
127. Cornell E
128. Correa A
129. Crandall C
130. Crapo J
131. Cupples LA
132. Curran JE
133. Curtis J
134. Custer B
135. Damcott C
136. Darbar D
137. Das S
138. David S
139. Davis C
140. Daya M
141. de Andrade M
142. de las Fuentes L
143. DeBaun M
144. Deka R
145. DeMeo D
146. Devine S
147. Duan Q
148. Duggirala R
149. Durda JP
150. Dutcher S
151. Eaton C
152. Ekunwe L
153. El Boueiz A
154. Ellinor P
155. Emery L
156. Erzurum S
157. Farber C
158. Fingerlin T
159. Flickinger M
160. Fornage M
161. Franceschini N
162. Frazar C
163. Fu M
164. Fullerton SM
165. Fulton L
166. Gabriel S
167. Gan W
168. Gao S
169. Gao Y
170. Gass M
171. Gelb B
172. Geng X
173. Geraci M
174. Germer S
175. Gerszten R
176. Ghosh A
177. Gibbs R
178. Gignoux C
179. Gladwin M
180. Glahn D
181. Gogarten S
182. Gong DW
183. Goring H
184. Graw S
185. Grine D
186. Gu CC
187. Guan Y
188. Guo X
189. Gupta N
190. Haessler J
191. Hall M
192. Harris D
193. Hawley NL
194. He J
195. Heckbert S
196. Hernandez R
197. Herrington D
198. Hersh C
199. Hidalgo B
200. Hixson J
201. Hobbs B
202. Hokanson J
203. Hong E
204. Hoth K
205. Hsiung C
206. Hung YJ
207. Huston H
208. Hwu CM
209. Irvin MR
210. Jackson R
211. Jain D
212. Jaquish C
213. Jhun MA
214. Johnsen J
215. Johnson A
216. Johnson C
217. Johnston R
218. Jones K
219. Kang HM
220. Kaplan R
221. Kardia SLR
222. Kathiresan S
223. Kelly S
224. Kenny E
225. Kessler M
226. Khan AT
227. Kim W
228. Kinney G
229. Konkle B
230. Kooperberg CL
231. Kramer H
232. Lange C
233. Lange E
234. Lange L
235. Laurie CC
236. Laurie C
237. LeBoff M
238. Lee J
239. Lee SS
240. Lee WJ
241. LeFaive J
242. Levine D
243. Levy D
244. Lewis J
245. Li X
246. Li Y
247. Lin H
248. Lin H
249. Lin KH
250. Lin X
251. Liu S
252. Liu Y
253. Liu Y
254. Loos RJF
255. Lubitz S
256. Lunetta K
257. Luo J
258. Mahaney MC
259. Make B
260. Manichaikul AW
261. Manson J
262. Margolin L
263. Martin LW
264. Mathai S
265. Mathias RA
266. May S
267. McArdle P
268. McDonald ML
269. McFarland S
270. McGarvey ST
271. McGoldrick D
272. McHugh C
273. Mei H
274. Mestroni L
275. Meyers DA
276. Mikulla J
277. Min N
278. Minear M
279. Minster RL
280. Mitchell BD
281. Moll M
282. Montasser ME
283. Montgomery C
284. Moscati A
285. Musani S
286. Mwasongwe S
287. Mychaleckyj JC
288. Nadkarni G
289. Naik R
290. Naseri T
291. Natarajan P
292. Nekhai S
293. Nelson SC
294. Neltner B
295. Nickerson D
296. North K
297. O’Connell JR
298. O’Connor T
299. Ochs-Balcom H
300. Paik D
301. Palmer ND
302. Pankow J
303. Papanicolaou G
304. Parsa A
305. Peralta JM
306. Perez M
307. Perry J
308. Peters U
309. Peyser PA
310. Phillips LS
311. Pollin T
312. Post W
313. Becker JP
314. Boorgula MP
315. Preuss M
316. Psaty BM
317. Qasba P
318. Qiao D
319. Qin Z
320. Rafaels N
321. Raffield L
322. Vasan RS
323. Rao DC
324. Rasmussen-Torvik L
325. Ratan A
326. Redline S
327. Reed R
328. Regan E
329. Reiner A
330. Reupena MS
331. Rice KM
332. Rich SS
333. Roden D
334. Roselli C
335. Rotter JI
336. Ruczinski I
337. Russell P
338. Ruuska S
339. Ryan K
340. Sabino EC
341. Saleheen D
342. Salimi S
343. Salzberg S
344. Sandow K
345. Sankaran VG
346. Scheller C
347. Schmidt E
348. Schwander K
349. Schwartz D
350. Sciurba F
351. Seidman C
352. Seidman J
353. Sheehan V
354. Sherman SL
355. Shetty A
356. Shetty A
357. Sheu WHH
358. Shoemaker MB
359. Silver B
360. Silverman E
361. Smith JA
362. Smith J
363. Smith N
364. Smith T
365. Smoller S
366. Snively B
367. Snyder M
368. Sofer T
369. Sotoodehnia N
370. Stilp AM
371. Storm G
372. Streeten E
373. Su JL
374. Sung YJ
375. Sylvia J
376. Szpiro A
377. Sztalryd C
378. Taliun D
379. Tang H
380. Taub M
381. Taylor KD
382. Taylor M
383. Taylor S
384. Telen M
385. Thornton TA
386. Threlkeld M
387. Tinker L
388. Tirschwell D
389. Tishkoff S
390. Tiwari HK
391. Tong C
392. Tracy R
393. Tsai MY
394. Vaidya D
395. Van Den Berg D
396. VandeHaar P
397. Vrieze S
398. Walker T
399. Wallace R
400. Walts A
401. Wang FF
402. Wang H
403. Watson K
404. Weeks DE
405. Weir B
406. Weiss S
407. Weng LC
408. Wessel J
409. Willer CJ
410. Williams K
411. Williams LK
412. Wilson C
413. Wilson JG
414. Wong Q
415. Wu J
416. Xu H
417. Yanek LR
418. Yang I
419. Yang R
420. Zaghloul N
421. Zekavat M
422. Zhang Y
423. Zhao SX
424. Zhao W
425. Zhi D
426. Zhou X
427. Zhu X
428. Zody M
429. Zoellner S
430. Abdalla M
431. Abecasis GR
432. Arnett DK
433. Aslibekyan S
434. Assimes T
435. Atkinson E
436. Ballantyne CM
437. Beitelshees A
438. Bielak LF
439. Bis J
440. Bodea C
441. Boerwinkle E
442. Bowden DW
443. Brody J
444. Cade B
445. Carlson J
446. Chang IS
447. Chen YDI
448. Chun S
449. Chung RH
450. Conomos MP
451. Correa A
452. Cupples LA
453. Damcott C
454. de Vries P
455. Do R
456. Elliott A
457. Fu M
458. Ganna A
459. Gong DW
460. Graham S
461. Haas M
462. Haring B
463. He J
464. Heckbert S
465. Himes B
466. Hixson J
467. Irvin MR
468. Jain D
469. Jarvik G
470. Jhun MA
471. Jiang J
472. Jun G
473. Kalyani R
474. Kardia SLR
475. Kathiresan S
476. Khera A
477. Klarin D
478. Kooperberg CL
479. Kral B
480. Lange L
481. Laurie CC
482. Laurie C
483. Lemaitre R
484. Li Z
485. Li X
486. Lin X
487. Mahaney MC
488. Manichaikul AW
489. Martin LW
490. Mathias RA
491. Mathur R
492. McGarvey ST
493. McHugh C
494. McLenithan J
495. Mikulla J
496. Mitchell BD
497. Montasser ME
498. Moran A
499. Morrison AC
500. Nakao T
501. Natarajan P
502. Nickerson D
503. North K
504. O’Connell JR
505. O’Donnell C
506. Palmer ND
507. Pampana A
508. Patel A
509. Peloso GM
510. Perry J
511. Peters U
512. Peyser PA
513. Pirruccello J
514. Pollin T
515. Preuss M
516. Psaty BM
517. Rao DC
518. Redline S
519. Reed R
520. Reiner A
521. Rich SS
522. Rosenthal S
523. Rotter JI
524. Schoenberg J
525. Selvaraj MS
526. Sheu WHH
527. Smith JA
528. Sofer T
529. Stilp AM
530. Sunyaev SR
531. Surakka I
532. Sztalryd C
533. Tang H
534. Taylor KD
535. Tsai MY
536. Uddin MM
537. Urbut S
538. Verbanck M
539. Von Holle A
540. Wang H
541. Wang FF
542. Wiggins K
543. Willer CJ
544. Wilson JG
545. Wolford B
546. Xu H
547. Yanek LR
548. Zaghloul N
549. Zekavat M
550. Zhang J
551. Neale BM
552. Sunyaev SR
553. Abecasis GR
554. Rotter JI
555. Willer CJ
556. Peloso GM
557. Natarajan P
558. Lin X
(2020) Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale
Nature Genetics 52:969–983.

https://doi.org/10.1038/s41588-020-0676-4
- Google Scholar
1. Liang Y
2. Pividori M
3. Manichaikul A
4. Palmer AA
5. Cox NJ
6. Wheeler HE
7. Im HK
(2022) Polygenic transcriptome risk scores (PTRS) can improve portability of polygenic risk scores across ancestries
Genome Biology 23:23.

https://doi.org/10.1186/s13059-021-02591-w
- PubMed
- Google Scholar
1. Liu Z
2. Pan W
3. Li W
4. Zhen X
5. Liang J
6. Cai W
7. Xu F
8. Yuan K
9. Lin GN
(2022) Evaluation of the effectiveness of derived features of alphafold2 on single-sequence protein binding site prediction
Biology 11:1454.

https://doi.org/10.3390/biology11101454
- PubMed
- Google Scholar
Preprint
(2022) An updated map of grch38 linkage disequilibrium blocks based on european ancestry data
bioRxiv.

https://doi.org/10.1101/2022.03.04.483057
- Google Scholar
1. Mbatchou J
2. Barnard L
3. Backman J
4. Marcketta A
5. Kosmicki JA
6. Ziyatdinov A
7. Benner C
8. O’Dushlaine C
9. Barber M
10. Boutkov B
11. Habegger L
12. Ferreira M
13. Baras A
14. Reid J
15. Abecasis G
16. Maxwell E
17. Marchini J
(2021) Computationally efficient whole-genome regression for quantitative and binary traits
Nature Genetics 53:1097–1103.

https://doi.org/10.1038/s41588-021-00870-7
- PubMed
- Google Scholar
1. McVicker G
2. Gordon D
3. Davis C
4. Green P
(2009) Widespread genomic signatures of natural selection in hominid evolution
PLOS Genetics 5:e1000471.

https://doi.org/10.1371/journal.pgen.1000471
- PubMed
- Google Scholar
(2021) Genetic variants associated with platelet count are predictive of human disease and physiological markers
Communications Biology 4:1132.

https://doi.org/10.1038/s42003-021-02642-9
- PubMed
- Google Scholar
1. Momin MM
2. Lee S
3. Wray NR
4. Lee SH
(2023) Significance tests for R² of out-of-sample prediction using polygenic scores
American Journal of Human Genetics 110:349–358.

https://doi.org/10.1016/j.ajhg.2023.01.004
- PubMed
- Google Scholar
1. Nawijn MC
2. Piavaux BJA
3. Jeurink PV
4. Gras R
5. Reinders MA
6. Stearns T
7. Foote S
8. Hylkema MN
9. Groot PC
10. Korstanje R
11. Oosterhout AJMV
(2011) Identification of the Mhc region as an asthma susceptibility locus in recombinant congenic mice
American Journal of Respiratory Cell and Molecular Biology 45:295–303.

https://doi.org/10.1165/rcmb.2009-0369OC
- PubMed
- Google Scholar
(2022) Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies
Genetics 01:6858.

https://doi.org/10.1101/2022.09.06.506858
- Google Scholar
(2022) RAREsim: A simulation method for very rare genetic variants
American Journal of Human Genetics 109:680–691.

https://doi.org/10.1016/j.ajhg.2022.02.009
- PubMed
- Google Scholar
(2018) High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability
Nature Genetics 50:1311–1317.

https://doi.org/10.1038/s41588-018-0177-x
- PubMed
- Google Scholar
1. Park CY
2. Zhou J
3. Wong AK
4. Chen KM
5. Theesfeld CL
6. Darnell RB
7. Troyanskaya OG
(2021) Genome-wide landscape of RNA-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk
Nature Genetics 53:166–173.

https://doi.org/10.1038/s41588-020-00761-3
- PubMed
- Google Scholar
1. Pejaver V
2. Urresti J
3. Lugo-Martinez J
4. Pagel KA
5. Lin GN
6. Nam HJ
7. Mort M
8. Cooper DN
9. Sebat J
10. Iakoucheva LM
11. Mooney SD
12. Radivojac P
(2020) Inferring the molecular and phenotypic impact of amino acid variants with MutPred2
Nature Communications 11:5918.

https://doi.org/10.1038/s41467-020-19669-x
- PubMed
- Google Scholar
1. Purcell S
2. Neale B
3. Todd-Brown K
4. Thomas L
5. Ferreira MAR
6. Bender D
7. Maller J
8. Sklar P
9. de Bakker PIW
10. Daly MJ
11. Sham PC
(2007) PLINK: A tool set for whole-genome association and population-based linkage analyses
American Journal of Human Genetics 81:559–575.

https://doi.org/10.1086/519795
- PubMed
- Google Scholar
1. Quinlan AR
2. Hall IM
(2010) BEDTools: a flexible suite of utilities for comparing genomic features
Bioinformatics 26:841–842.

https://doi.org/10.1093/bioinformatics/btq033
- PubMed
- Google Scholar
1. Rozowsky J
2. Gao J
3. Borsari B
4. Yang YT
5. Galeev T
6. Gürsoy G
7. Epstein CB
8. Xiong K
9. Xu J
10. Li T
11. Liu J
12. Yu K
13. Berthel A
14. Chen Z
15. Navarro F
16. Sun MS
17. Wright J
18. Chang J
19. Cameron CJF
20. Shoresh N
21. Gaskell E
22. Drenkow J
23. Adrian J
24. Aganezov S
25. Aguet F
26. Balderrama-Gutierrez G
27. Banskota S
28. Corona GB
29. Chee S
30. Chhetri SB
31. Cortez Martins GC
32. Danyko C
33. Davis CA
34. Farid D
35. Farrell NP
36. Gabdank I
37. Gofin Y
38. Gorkin DU
39. Gu M
40. Hecht V
41. Hitz BC
42. Issner R
43. Jiang Y
44. Kirsche M
45. Kong X
46. Lam BR
47. Li S
48. Li B
49. Li X
50. Lin KZ
51. Luo R
52. Mackiewicz M
53. Meng R
54. Moore JE
55. Mudge J
56. Nelson N
57. Nusbaum C
58. Popov I
59. Pratt HE
60. Qiu Y
61. Ramakrishnan S
62. Raymond J
63. Salichos L
64. Scavelli A
65. Schreiber JM
66. Sedlazeck FJ
67. See LH
68. Sherman RM
69. Shi X
70. Shi M
71. Sloan CA
72. Strattan JS
73. Tan Z
74. Tanaka FY
75. Vlasova A
76. Wang J
77. Werner J
78. Williams B
79. Xu M
80. Yan C
81. Yu L
82. Zaleski C
83. Zhang J
84. Ardlie K
85. Cherry JM
86. Mendenhall EM
87. Noble WS
88. Weng Z
89. Levine ME
90. Dobin A
91. Wold B
92. Mortazavi A
93. Ren B
94. Gillis J
95. Myers RM
96. Snyder MP
97. Choudhary J
98. Milosavljevic A
99. Schatz MC
100. Bernstein BE
101. Guigó R
102. Gingeras TR
103. Gerstein M
(2023) The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models
Cell 186:1493–1511.

https://doi.org/10.1016/j.cell.2023.02.018
- PubMed
- Google Scholar
Preprint
1. Sasse A
2. Ng B
3. Spiro AE
4. Tasaki S
5. Bennett DA
6. Gaiteri C
7. De Jager PL
8. Chikina M
9. Mostafavi S
(2023) Benchmarking of deep neural networks for predicting personal gene expression from dna sequence highlights shortcomings
bioRxiv.

https://doi.org/10.1101/2023.03.16.532969
- Google Scholar
1. Schaefer CF
2. Anthony K
3. Krupa S
4. Buchoff J
5. Day M
6. Hannay T
7. Buetow KH
(2009) PID: the Pathway Interaction Database
Nucleic Acids Research 37:D674–D679.

https://doi.org/10.1093/nar/gkn653
- PubMed
- Google Scholar
(2016) Schizophrenia risk from complex variation of complement component 4
Nature 530:177–183.

https://doi.org/10.1038/nature16549
- PubMed
- Google Scholar
1. Siepel A
2. Bejerano G
3. Pedersen JS
4. Hinrichs AS
5. Hou M
6. Rosenbloom K
7. Clawson H
8. Spieth J
9. Hillier LDW
10. Richards S
11. Weinstock GM
12. Wilson RK
13. Gibbs RA
14. Kent WJ
15. Miller W
16. Haussler D
(2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
Genome Research 15:1034–1050.

https://doi.org/10.1101/gr.3715005
- PubMed
- Google Scholar
Software
1. Song WC
(2024) Hfs, version swh:1:rev:4412a29207ab609eaf122f2cf1f0fdc0acb25bf2
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:ee5a38b332b0c7467fb0eb09ec50a64afddaf241;origin=https://github.com/WeiCSong/HFS;visit=swh:1:snp:d5463d16439070084c1fc021f2737bfc2415f991;anchor=swh:1:rev:4412a29207ab609eaf122f2cf1f0fdc0acb25bf2
1. Subramanian A
2. Tamayo P
3. Mootha VK
4. Mukherjee S
5. Ebert BL
6. Gillette MA
7. Paulovich A
8. Pomeroy SL
9. Golub TR
10. Lander ES
11. Mesirov JP
(2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
PNAS 102:15545–15550.

https://doi.org/10.1073/pnas.0506580102
- PubMed
- Google Scholar
1. Visscher PM
2. Wray NR
3. Zhang Q
4. Sklar P
5. McCarthy MI
6. Brown MA
7. Yang J
(2017) 10 years of gwas discovery: biology, function, and translation
American Journal of Human Genetics 101:5–22.

https://doi.org/10.1016/j.ajhg.2017.06.005
- PubMed
- Google Scholar
(2020) A simple new approach to variable selection in regression, with application to genetic fine mapping
Journal of the Royal Statistical Society Series B 82:1273–1300.

https://doi.org/10.1111/rssb.12388
- Google Scholar
(2019) A global overview of pleiotropy and genetic architecture in complex traits
Nature Genetics 51:1339–1348.

https://doi.org/10.1038/s41588-019-0481-0
- PubMed
- Google Scholar
1. Weissbrod O
2. Hormozdiari F
3. Benner C
4. Cui R
5. Ulirsch J
6. Gazal S
7. Schoech AP
8. van de Geijn B
9. Reshef Y
10. Márquez-Luna C
11. O’Connor L
12. Pirinen M
13. Finucane HK
14. Price AL
(2020) Functionally informed fine-mapping and polygenic localization of complex trait heritability
Nature Genetics 52:1355–1363.

https://doi.org/10.1038/s41588-020-00735-5
- PubMed
- Google Scholar
1. Weissbrod O
2. Kanai M
3. Shi H
4. Gazal S
5. Peyrot WJ
6. Khera AV
7. Okada Y
8. Biobank Japan Project
9. Martin AR
10. Finucane HK
11. Price AL
(2022) Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores
Nature Genetics 54:450–458.

https://doi.org/10.1038/s41588-022-01036-9
- PubMed
- Google Scholar
1. Yan J
2. Qiu Y
3. Ribeiro Dos Santos AM
4. Yin Y
5. Li YE
6. Vinckier N
7. Nariai N
8. Benaglio P
9. Raman A
10. Li X
11. Fan S
12. Chiou J
13. Chen F
14. Frazer KA
15. Gaulton KJ
16. Sander M
17. Taipale J
18. Ren B
(2021) Systematic analysis of binding of transcription factors to noncoding variants
Nature 591:147–151.

https://doi.org/10.1038/s41586-021-03211-0
- PubMed
- Google Scholar
1. Yengo L
2. Vedantam S
3. Marouli E
4. Sidorenko J
5. Bartell E
6. Sakaue S
7. Graff M
8. Eliasen AU
9. Jiang Y
10. Raghavan S
11. Miao J
12. Arias JD
13. Graham SE
14. Mukamel RE
15. Spracklen CN
16. Yin X
17. Chen SH
18. Ferreira T
19. Highland HH
20. Ji Y
21. Karaderi T
22. Lin K
23. Lüll K
24. Malden DE
25. Medina-Gomez C
26. Machado M
27. Moore A
28. Rüeger S
29. Sim X
30. Vrieze S
31. Ahluwalia TS
32. Akiyama M
33. Allison MA
34. Alvarez M
35. Andersen MK
36. Ani A
37. Appadurai V
38. Arbeeva L
39. Bhaskar S
40. Bielak LF
41. Bollepalli S
42. Bonnycastle LL
43. Bork-Jensen J
44. Bradfield JP
45. Bradford Y
46. Braund PS
47. Brody JA
48. Burgdorf KS
49. Cade BE
50. Cai H
51. Cai Q
52. Campbell A
53. Cañadas-Garre M
54. Catamo E
55. Chai JF
56. Chai X
57. Chang LC
58. Chang YC
59. Chen CH
60. Chesi A
61. Choi SH
62. Chung RH
63. Cocca M
64. Concas MP
65. Couture C
66. Cuellar-Partida G
67. Danning R
68. Daw EW
69. Degenhard F
70. Delgado GE
71. Delitala A
72. Demirkan A
73. Deng X
74. Devineni P
75. Dietl A
76. Dimitriou M
77. Dimitrov L
78. Dorajoo R
79. Ekici AB
80. Engmann JE
81. Fairhurst-Hunter Z
82. Farmaki AE
83. Faul JD
84. Fernandez-Lopez JC
85. Forer L
86. Francescatto M
87. Freitag-Wolf S
88. Fuchsberger C
89. Galesloot TE
90. Gao Y
91. Gao Z
92. Geller F
93. Giannakopoulou O
94. Giulianini F
95. Gjesing AP
96. Goel A
97. Gordon SD
98. Gorski M
99. Grove J
100. Guo X
101. Gustafsson S
102. Haessler J
103. Hansen TF
104. Havulinna AS
105. Haworth SJ
106. He J
107. Heard-Costa N
108. Hebbar P
109. Hindy G
110. Ho YLA
111. Hofer E
112. Holliday E
113. Horn K
114. Hornsby WE
115. Hottenga JJ
116. Huang H
117. Huang J
118. Huerta-Chagoya A
119. Huffman JE
120. Hung YJ
121. Huo S
122. Hwang MY
123. Iha H
124. Ikeda DD
125. Isono M
126. Jackson AU
127. Jäger S
128. Jansen IE
129. Johansson I
130. Jonas JB
131. Jonsson A
132. Jørgensen T
133. Kalafati IP
134. Kanai M
135. Kanoni S
136. Kårhus LL
137. Kasturiratne A
138. Katsuya T
139. Kawaguchi T
140. Kember RL
141. Kentistou KA
142. Kim HN
143. Kim YJ
144. Kleber ME
145. Knol MJ
146. Kurbasic A
147. Lauzon M
148. Le P
149. Lea R
150. Lee JY
151. Leonard HL
152. Li SA
153. Li X
154. Li X
155. Liang J
156. Lin H
157. Lin SY
158. Liu J
159. Liu X
160. Lo KS
161. Long J
162. Lores-Motta L
163. Luan J
164. Lyssenko V
165. Lyytikäinen LP
166. Mahajan A
167. Mamakou V
168. Mangino M
169. Manichaikul A
170. Marten J
171. Mattheisen M
172. Mavarani L
173. McDaid AF
174. Meidtner K
175. Melendez TL
176. Mercader JM
177. Milaneschi Y
178. Miller JE
179. Millwood IY
180. Mishra PP
181. Mitchell RE
182. Møllehave LT
183. Morgan A
184. Mucha S
185. Munz M
186. Nakatochi M
187. Nelson CP
188. Nethander M
189. Nho CW
190. Nielsen AA
191. Nolte IM
192. Nongmaithem SS
193. Noordam R
194. Ntalla I
195. Nutile T
196. Pandit A
197. Christofidou P
198. Pärna K
199. Pauper M
200. Petersen ERB
201. Petersen LV
202. Pitkänen N
203. Polašek O
204. Poveda A
205. Preuss MH
206. Pyarajan S
207. Raffield LM
208. Rakugi H
209. Ramirez J
210. Rasheed A
211. Raven D
212. Rayner NW
213. Riveros C
214. Rohde R
215. Ruggiero D
216. Ruotsalainen SE
217. Ryan KA
218. Sabater-Lleal M
219. Saxena R
220. Scholz M
221. Sendamarai A
222. Shen B
223. Shi J
224. Shin JH
225. Sidore C
226. Sitlani CM
227. Slieker RC
228. Smit RAJ
229. Smith AV
230. Smith JA
231. Smyth LJ
232. Southam L
233. Steinthorsdottir V
234. Sun L
235. Takeuchi F
236. Tallapragada DSP
237. Taylor KD
238. Tayo BO
239. Tcheandjieu C
240. Terzikhan N
241. Tesolin P
242. Teumer A
243. Theusch E
244. Thompson DJ
245. Thorleifsson G
246. Timmers P
247. Trompet S
248. Turman C
249. Vaccargiu S
250. van der Laan SW
251. van der Most PJ
252. van Klinken JB
253. van Setten J
254. Verma SS
255. Verweij N
256. Veturi Y
257. Wang CA
258. Wang C
259. Wang L
260. Wang Z
261. Warren HR
262. Bin Wei W
263. Wickremasinghe AR
264. Wielscher M
265. Wiggins KL
266. Winsvold BS
267. Wong A
268. Wu Y
269. Wuttke M
270. Xia R
271. Xie T
272. Yamamoto K
273. Yang J
274. Yao J
275. Young H
276. Yousri NA
277. Yu L
278. Zeng L
279. Zhang W
280. Zhang X
281. Zhao JH
282. Zhao W
283. Zhou W
284. Zimmermann ME
285. Zoledziewska M
286. Adair LS
287. Adams HHH
288. Aguilar-Salinas CA
289. Al-Mulla F
290. Arnett DK
291. Asselbergs FW
292. Åsvold BO
293. Attia J
294. Banas B
295. Bandinelli S
296. Bennett DA
297. Bergler T
298. Bharadwaj D
299. Biino G
300. Bisgaard H
301. Boerwinkle E
302. Böger CA
303. Bønnelykke K
304. Boomsma DI
305. Børglum AD
306. Borja JB
307. Bouchard C
308. Bowden DW
309. Brandslund I
310. Brumpton B
311. Buring JE
312. Caulfield MJ
313. Chambers JC
314. Chandak GR
315. Chanock SJ
316. Chaturvedi N
317. Chen YDI
318. Chen Z
319. Cheng CY
320. Christophersen IE
321. Ciullo M
322. Cole JW
323. Collins FS
324. Cooper RS
325. Cruz M
326. Cucca F
327. Cupples LA
328. Cutler MJ
329. Damrauer SM
330. Dantoft TM
331. de Borst GJ
332. de Groot L
333. De Jager PL
334. de Kleijn DPV
335. Janaka de Silva H
336. Dedoussis GV
337. den Hollander AI
338. Du S
339. Easton DF
340. Elders PJM
341. Eliassen AH
342. Ellinor PT
343. Elmståhl S
344. Erdmann J
345. Evans MK
346. Fatkin D
347. Feenstra B
348. Feitosa MF
349. Ferrucci L
350. Ford I
351. Fornage M
352. Franke A
353. Franks PW
354. Freedman BI
355. Gasparini P
356. Gieger C
357. Girotto G
358. Goddard ME
359. Golightly YM
360. Gonzalez-Villalpando C
361. Gordon-Larsen P
362. Grallert H
363. Grant SFA
364. Grarup N
365. Griffiths L
366. Gudnason V
367. Haiman C
368. Hakonarson H
369. Hansen T
370. Hartman CA
371. Hattersley AT
372. Hayward C
373. Heckbert SR
374. Heng CK
375. Hengstenberg C
376. Hewitt AW
377. Hishigaki H
378. Hoyng CB
379. Huang PL
380. Huang W
381. Hunt SC
382. Hveem K
383. Hyppönen E
384. Iacono WG
385. Ichihara S
386. Ikram MA
387. Isasi CR
388. Jackson RD
389. Jarvelin MR
390. Jin ZB
391. Jöckel KH
392. Joshi PK
393. Jousilahti P
394. Jukema JW
395. Kähönen M
396. Kamatani Y
397. Kang KD
398. Kaprio J
399. Kardia SLR
400. Karpe F
401. Kato N
402. Kee F
403. Kessler T
404. Khera AV
405. Khor CC
406. Kiemeney L
407. Kim BJ
408. Kim EK
409. Kim HL
410. Kirchhof P
411. Kivimaki M
412. Koh WP
413. Koistinen HA
414. Kolovou GD
415. Kooner JS
416. Kooperberg C
417. Köttgen A
418. Kovacs P
419. Kraaijeveld A
420. Kraft P
421. Krauss RM
422. Kumari M
423. Kutalik Z
424. Laakso M
425. Lange LA
426. Langenberg C
427. Launer LJ
428. Le Marchand L
429. Lee H
430. Lee NR
431. Lehtimäki T
432. Li H
433. Li L
434. Lieb W
435. Lin X
436. Lind L
437. Linneberg A
438. Liu CT
439. Liu J
440. Loeffler M
441. London B
442. Lubitz SA
443. Lye SJ
444. Mackey DA
445. Mägi R
446. Magnusson PKE
447. Marcus GM
448. Vidal PM
449. Martin NG
450. März W
451. Matsuda F
452. McGarrah RW
453. McGue M
454. McKnight AJ
455. Medland SE
456. Mellström D
457. Metspalu A
458. Mitchell BD
459. Mitchell P
460. Mook-Kanamori DO
461. Morris AD
462. Mucci LA
463. Munroe PB
464. Nalls MA
465. Nazarian S
466. Nelson AE
467. Neville MJ
468. Newton-Cheh C
469. Nielsen CS
470. Nöthen MM
471. Ohlsson C
472. Oldehinkel AJ
473. Orozco L
474. Pahkala K
475. Pajukanta P
476. Palmer CNA
477. Parra EJ
478. Pattaro C
479. Pedersen O
480. Pennell CE
481. Penninx B
482. Perusse L
483. Peters A
484. Peyser PA
485. Porteous DJ
486. Posthuma D
487. Power C
488. Pramstaller PP
489. Province MA
490. Qi Q
491. Qu J
492. Rader DJ
493. Raitakari OT
494. Ralhan S
495. Rallidis LS
496. Rao DC
497. Redline S
498. Reilly DF
499. Reiner AP
500. Rhee SY
501. Ridker PM
502. Rienstra M
503. Ripatti S
504. Ritchie MD
505. Roden DM
506. Rosendaal FR
507. Rotter JI
508. Rudan I
509. Rutters F
510. Sabanayagam C
511. Saleheen D
512. Salomaa V
513. Samani NJ
514. Sanghera DK
515. Sattar N
516. Schmidt B
517. Schmidt H
518. Schmidt R
519. Schulze MB
520. Schunkert H
521. Scott LJ
522. Scott RJ
523. Sever P
524. Shiroma EJ
525. Shoemaker MB
526. Shu XO
527. Simonsick EM
528. Sims M
529. Singh JR
530. Singleton AB
531. Sinner MF
532. Smith JG
533. Snieder H
534. Spector TD
535. Stampfer MJ
536. Stark KJ
537. Strachan DP
538. ’t Hart LM
539. Tabara Y
540. Tang H
541. Tardif JC
542. Thanaraj TA
543. Timpson NJ
544. Tönjes A
545. Tremblay A
546. Tuomi T
547. Tuomilehto J
548. Tusié-Luna MT
549. Uitterlinden AG
550. van Dam RM
551. van der Harst P
552. Van der Velde N
553. van Duijn CM
554. van Schoor NM
555. Vitart V
556. Völker U
557. Vollenweider P
558. Völzke H
559. Wacher-Rodarte NH
560. Walker M
561. Wang YX
562. Wareham NJ
563. Watanabe RM
564. Watkins H
565. Weir DR
566. Werge TM
567. Widen E
568. Wilkens LR
569. Willemsen G
570. Willett WC
571. Wilson JF
572. Wong TY
573. Woo JT
574. Wright AF
575. Wu JY
576. Xu H
577. Yajnik CS
578. Yokota M
579. Yuan JM
580. Zeggini E
581. Zemel BS
582. Zheng W
583. Zhu X
584. Zmuda JM
585. Zonderman AB
586. Zwart JA
587. Ng MCY
588. Rivadeneira F
589. Thorsteinsdottir U
590. Sun YV
591. Tai ES
592. Boehnke M
593. Deloukas P
594. Justice AE
595. Lindgren CM
596. Loos RJF
597. Mohlke KL
598. North KE
599. Stefansson K
600. Walters RG
601. Winkler TW
602. Young KL
603. Loh PR
604. Yang J
605. Esko T
606. Assimes TL
607. Auton A
608. Abecasis GR
609. Willer CJ
610. Locke AE
611. Berndt SI
612. Lettre G
613. Frayling TM
614. Okada Y
615. Wood AR
616. Visscher PM
617. Hirschhorn JN
(2022) A saturated map of common genetic variants associated with human height
Nature 610:704–712.

https://doi.org/10.1038/s41586-022-05275-y
- PubMed
- Google Scholar
1. Zeng J
2. Xue A
3. Jiang L
4. Lloyd-Jones LR
5. Wu Y
6. Wang H
7. Zheng Z
8. Yengo L
9. Kemper KE
10. Goddard ME
11. Wray NR
12. Visscher PM
13. Yang J
(2021) Widespread signatures of natural selection across human complex traits and functional genomic categories
Nature Communications 12:1164.

https://doi.org/10.1038/s41467-021-21446-3
- PubMed
- Google Scholar
1. Zhang K
2. Hocker JD
3. Miller M
4. Hou X
5. Chiou J
6. Poirion OB
7. Qiu Y
8. Li YE
9. Gaulton KJ
10. Wang A
11. Preissl S
12. Ren B
(2021a) A single-cell atlas of chromatin accessibility in the human genome
Cell 184:5985–6001.

https://doi.org/10.1016/j.cell.2021.10.024
- PubMed
- Google Scholar
(2021b) Improved genetic prediction of complex traits from individual-level data or summary statistics
Nature Communications 12:4192.

https://doi.org/10.1038/s41467-021-24485-y
- PubMed
- Google Scholar
1. Zhao H
2. Sun Z
3. Wang J
4. Huang H
5. Kocher JP
6. Wang L
(2014) CrossMap: a versatile tool for coordinate conversion between genome assemblies
Bioinformatics 30:1006–1007.

https://doi.org/10.1093/bioinformatics/btt730
- PubMed
- Google Scholar
Preprint
1. Zheng Z
2. Liu S
3. Sidorenko J
4. Yengo L
5. Turley P
6. Ani A
7. Wang R
8. Nolte IM
9. Snieder H
10. Yang J
11. Wray NR
12. Goddard ME
13. Visscher PM
14. Zeng J
15. Lifelines Cohort Study
(2022) Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries
bioRxiv.

https://doi.org/10.1101/2022.10.12.510418
- Google Scholar
1. Zhou J
2. Theesfeld CL
3. Yao K
4. Chen KM
5. Wong AK
6. Troyanskaya OG
(2018) Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk
Nature Genetics 50:1171–1179.

https://doi.org/10.1038/s41588-018-0160-6
- PubMed
- Google Scholar
1. Zhou J
(2022) Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale
Nature Genetics 54:725–734.

https://doi.org/10.1038/s41588-022-01065-4
- PubMed
- Google Scholar
1. Zhou W
2. Bi W
3. Zhao Z
4. Dey KK
5. Jagadeesh KA
6. Karczewski KJ
7. Daly MJ
8. Neale BM
9. Lee S
(2022) SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests
Nature Genetics 54:1466–1469.

https://doi.org/10.1038/s41588-022-01178-w
- PubMed
- Google Scholar

Article and author information

Author details

Weichen Song
1. Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Bioengineering, Shanghai Jiao Tong University, Shanghai, China
2. Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai, China
Contribution
Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing

For correspondence
song628196@gmail.com

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-3197-6236
Yongyong Shi
1. Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai, China
2. Biomedical Sciences Institute of Qingdao University (Qingdao Branch of SJTU Bio-X12 Institutes), Qingdao University, Qingdao, China
Contribution
Conceptualization, Resources, Supervision, Investigation, Writing – review and editing

For correspondence
shiyongyong@gmail.com

Competing interests
No competing interests declared
Guan Ning Lin

Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Bioengineering, Shanghai Jiao Tong University, Shanghai, China

Contribution
Conceptualization, Supervision, Funding acquisition, Validation, Investigation, Project administration, Writing – review and editing

For correspondence
nickgnlin@sjtu.edu.cn

Competing interests
No competing interests declared

Funding

Ministry of Science and Technology (2030 Science and Technology Innovation Key Program 2022ZD020910001)

Guan Ning Lin

National Natural Science Foundation of China (81971292)

Guan Ning Lin

National Natural Science Foundation of China (82150610506)

Guan Ning Lin

Natural Science Foundation of Shanghai (21ZR1428600)

Guan Ning Lin

Medical-Engineering Cross Foundation of Shanghai Jiao Tong University (YG2022ZD026)

Guan Ning Lin

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by grants from the 2030 Science and Technology Innovation Key Program of Ministry of Science and Technology of China (No. 2022ZD020910001), the National Natural Science Foundation of China (No. 81971292, 82150610506) and the Natural Science Foundation of Shanghai (No. 21ZR1428600), the Medical-Engineering Cross Foundation of Shanghai Jiao Tong University (No. YG2022ZD026).

Version history

Preprint posted: August 9, 2023
Sent for peer review: September 28, 2023
Reviewed Preprint version 1: December 5, 2023
Reviewed Preprint version 2: March 21, 2024
Version of Record published: April 19, 2024
Version of Record updated: April 22, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.92574. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.