Fine-mapping cis-regulatory variants in diverse human populations
Figures
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig1-v1.tif/full/617,/0/default.jpg)
Outline and results of pooled ATAC-seq.
(A) Performing ATAC-seq in a pool of individuals selects DNA molecules with higher CA, thus enriching for more accessible alleles. In this example (ASW population), the G allele has a low pre-ATAC frequency but a high post-ATAC frequency, due to its increased CA. The ten population abbreviations refer to: CEU, Utah residents with North European ancestry; FIN, Finnish; TSI, Tuscan; IBS, Iberian; ASW, African-American from Southwest US; YRI, Yoruban; ESN, Esan; LWK, Luhya; GWD, Gambian; and CHB, Han Chinese. (B) The number of caQTLs (top), and the percent of all tested SNPs called as caQTLs (bottom). (C) Enrichment of caQTLs among dsQTLs (Degner et al., 2012), at a range of caQTL p-value cutoffs. (D) Quantitative effect sizes of caQTLs and dsQTLs are highly correlated (scales of each axis are not comparable, and do not affect the correlation coefficient).( E–F) The degree of allelic concordance between our caQTLs and: (E) dsQTLs (Degner et al., 2012). (F) bQTLs aggregated for five TFs (Tehranchi et al., 2016). Full results available in Figure 1—source data 1.
-
Figure 1—source data 1
Detailed results of analyses shown in the Figures.
Each tab has a separate summary: caQTL summary: Numbers of cell lines, reads, SNPs, and caQTLs Shared caQTL summary: Number of populations in which each shared caQTL reached p<0.005 Individual coefficients: The coefficients inferred from our regression approach for inferring contribution of each sample to each pool Numbers of QTLs: Total numbers of dsQTLs, bQTLs, and Hi-C GM12878 heterozygous SNPs used in our analyses Fig 1C-F: Numbers going into Figure 1C–F.
- https://doi.org/10.7554/eLife.39595.008
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig1-figsupp1-v1.tif/full/617,/0/default.jpg)
Comparison of post-ATAC reference allele frequencies between biological replicates of each population pool.
All replicates have 0.94 < r < 0.96.
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig1-figsupp2-v1.tif/full/617,/0/default.jpg)
Pre-ATAC vs post-ATAC reference allele frequencies for nine populations, similar to ASW plot in Figure 1A.
Most SNPs fall close to the diagonal, as expected if most SNPs are not caQTLs. All populations have 0.90 < r < 0.94.
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig1-figsupp3-v1.tif/full/617,/0/default.jpg)
Top row: caQTL p-values for SNPs on chr one in ASW and CEU, shown separately for each biological replicate.
Bottom row: median –log10(p-value) as a function of IDR, plotted using a moving window of IDR values (window width = 0.01). Dashed red lines indicate the p-value cutoff of 5 × 10−4, corresponding to IDR ≈ 0.01.
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig1-figsupp4-v1.tif/full/617,/0/default.jpg)
QQ plots of expected (under the null) vs observed caQTLs p-values.
All populations show a similar excess of significant p-values.
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig1-figsupp5-v1.tif/full/617,/0/default.jpg)
caQTLs enrichments among other molecular QTLs (Ding et al., 2014, Lappalainen et al. (2013), Waszak et al., 2015, Tewhey et al., 2016, Banovich et al., 2014).
https://doi.org/10.7554/eLife.39595.007![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig2-v1.tif/full/617,/0/default.jpg)
Fine-mapping shared caQTLs.
(A) Heatmap showing the overlap in caQTLs for every pair of populations (only for variants that were testable in all ten). To avoid issues related to arbitrary p-value cutoffs, we used the shift in p-value distribution, known as π1 (Storey et al., 2004), to assess overlap. (B) Mapping a trait in multiple populations differing in LD structure allows fine-mapping of causal variants, which will show the most consistent associations. (C) caQTLs shared across many populations (at p<5×10−4) are more highly enriched for experimentally-determined causal eQTL variants (Tewhey et al., 2016). Full results available in Figure 2—source data 1.
-
Figure 2—source data 1
All caQTLs testable (at least 20 reads and MAF >0.01) in all ten populations.
- https://doi.org/10.7554/eLife.39595.012
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig2-figsupp1-v1.tif/full/617,/0/default.jpg)
Sharing of caQTLs across populations, as in Figure 2A, but excluding comparisons with divergent allele frequencies.
One possible explanation for the increase sharing of caQTLs between closely related population (Figure 2A) is that since the allele frequency can affect power to detect QTLs, more similar allele frequencies could lead to greater levels of sharing. To test this possibility, for each SNP, we calculated the sharing as in Figure 2A after excluding any population that had a pre-ATAC allele frequency >5% away from the mean frequency across all 10 populations. Although this excluded 75% of pairwise comparisons, we still observed a similar pattern of sharing, suggesting that patterns of sharing are unlikely to be driven solely by allele frequency differences.
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig2-figsupp2-v1.tif/full/617,/0/default.jpg)
Example of a shared caQTL (rs79979970) that is individually significant in only one population (CHB) out of eight tested, but reaches a shared caQTL p=5.6×10−7 because it has p<0.1 in an additional four populations.
In this case, CHB had the greatest power to detect an effect since it had a pre-ATAC allele frequency of 0.68 for the open allele, whereas the other seven all had frequencies > 0.95 and thus very little range for the open allele to increase in frequency post-ATAC.
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig3-v1.tif/full/617,/0/default.jpg)
Characterizing shared caQTLs.
(A) The fraction of the genome (left) and of shared caQTLs (right) in each of four classes, annotated based on chromatin signatures (Ernst and Kellis, 2012). TSS includes TSS flanking regions; full results in Supplementary file 2. (B) Searching for motifs enriched specifically among open alleles (using closed alleles from the same caQTLs as the background comparison set), we found 80 motifs enriched among open alleles (points below the diagonal). Repeating the analysis for closed alleles, we found no motifs enriched (above diagonal). Note that many motifs are partially overlapping, and thus not independent. Inset: fold-enrichment in open/closed alleles for five selected TFs. Full results in Figure 3—source data 1. (C) The number of caQTLs overlapping each position within the CTCF motif strongly mirrors the information content (i.e. the importance for binding) of that position, as expected if these caQTLs are causal variants affecting CA via CTCF binding. Full results available in Figure 3—source data 1.
-
Figure 3—source data 1
Fig 3A: Numbers in Figure 3A, and chromatin states for every caQTL .
Fig 3B: Known TF binding motifs enriched in the more accessible caQTL alleles. Fig 3C: Numbers going into Figure 3C.
- https://doi.org/10.7554/eLife.39595.015
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig3-figsupp1-v1.tif/full/617,/0/default.jpg)
Effect of shared caQTLs on DNA shape.
P-values are Bonferroni-corrected for four tests. See Materials and methods for details.
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig4-v1.tif/full/617,/0/default.jpg)
TF binding and chromatin accessibility.
(A) Using allele-specific 3D chromosomal interaction (Hi-C) data from an LCL (Rao et al., 2014), we found that open alleles of caQTLs tend to have more long-range interactions than do the closed alleles, establishing a role for CA in polymorphic chromosomal interactions.(B) Splitting bQTLs into two groups (Figure 4—figure supplement 2), we found that bQTLs were strongly associated with the extent of long-range interactions only when they also affect CA (left panel; ** indicates Bonferroni-corrected binomial p<0.008 for all six TFs); for bQTLs that do not affect CA, no allelic bias was observed (right panel; Bonferroni-corrected binomial p>0.08 for all six TFs). (C) caQTLs are strongly enriched for both local and distal eQTLs; however among those that do not affect long-range chromosomal interactions, only local eQTLs are enriched. (D) Model summary: our results suggest that bQTLs generally cannot affect long-range chromosomal interactions without an effect on CA, and caQTLs generally cannot affect distal transcription without an effect on long-range interactions. The model shown represents a plausible interpretation, but is not the only possible causal scenario. Full results available in Figure 4—source data 1.
-
Figure 4—source data 1
Fig 4A: Numbers going into Figure 4A, including additional distance cutoffs.
Fig 4C: Summary numbers going into Figure 4C, and for each caQTL also called as an eQTL, the distance between the caQTL and the target gene’s TSS.
- https://doi.org/10.7554/eLife.39595.021
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig4-figsupp1-v1.tif/full/617,/0/default.jpg)
Allelic bias of shared caQTLs for inter-chromosomal interactions.
To test the possibility that our result in Figure 4A could be due to a nonspecific bias in the Hi-C method—such as open chromatin alleles having higher efficiency of shearing, ligation, or some other step—we reasoned that any such bias should also be reflected in the pattern of allele-specific inter-chromosomal interactions (such inter-chromosomal interactions are typically considered to be ‘noise’, but should still be affected by any nonspecific biases in the method, making them an ideal control). Using the same Hi-C data (Rao et al., 2014), we found only three caQTLs with significant allelic bias in inter-chromosomal reads (two favoring open alleles and one favoring the closed allele at Bonferroni-corrected p<0.05). Moreover, plotting all shared caQTLs with allele-specific Hi-C data from GM12878 (Rao et al., 2014), shown in this figure, we observed no significant difference (4041 caQTLs favoring open alleles vs 3990 favoring closed alleles; binomial p=0.58).
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig4-figsupp2-v1.tif/full/617,/0/default.jpg)
Venn diagram indicating three possible combinations of caQTL/bQTL overlaps, and how we used these to infer their downstream effects in Figure 4B.
https://doi.org/10.7554/eLife.39595.018![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig4-figsupp3-v1.tif/full/617,/0/default.jpg)
Causal probabilities of SNPs affecting disease risk (Farh et al., 2015) for two examples discussed in the main text.
A third example for multiple sclerosis could not be plotted because only SNPs with probabilities > 2.75% were reported.
![](https://iiif.elifesciences.org/lax/39595%2Felife-39595-fig4-figsupp4-v1.tif/full/617,/0/default.jpg)
Likelihood of caQTLs from LCLs acting as eQTLs in other tissues.
See Supplemental Note for details.
Tables
Ten candidate causal variants, shared as caQTLs across all 10 populations.
GWAS information is from the GRASP database (Eicher et al., 2015). See Supplementary file 3 for all caQTL/GWAS overlaps.
Chr | caQTL position | caQTL rsID | GWAS rsID | GWAS p-value | GWAS trait | Candidate gene |
---|---|---|---|---|---|---|
15 | 86018746 | rs7161880 | rs4281668 | 9.90E-09 | Aggressive prostate cancer | AKAP13 |
9 | 5097544 | rs3780372 | rs10974944 | 5.10E-32 | Myeloproliferative neoplasms | JAK2 |
12 | 111351439 | rs11065769 | rs10849917 | 3.21E-08 | Coronary artery disease | MYL2 |
2 | 68598955 | rs17035378 | rs17035378 | 8.00E-09 | Celiac disease | PLEK |
6 | 167433948 | rs6904946 | rs2301436 | 1.00E-12 | Crohn's disease | RNASET2 |
14 | 69273905 | rs194749 | rs194749 | 2.70E-10 | Inflammatory bowel disease | ZFP36L1 |
1 | 154359411 | rs9651053 | rs11265608 | 2.75E-08 | Juvenile idiopathic arthritis | IL6R |
4 | 1731653 | rs798764 | rs798766 | 3.90E-13 | Urinary bladder cancer | FGFR3 |
13 | 40334852 | rs9603612 | rs9532434 | 4.52E-08 | Juvenile idiopathic arthritis | COG6 |
11 | 118560857 | rs73001406 | rs11216930 | 1.40E-09 | Glioma | PHLDB1 |
Additional files
-
Supplementary file 1
All cell lines used, and all caQTLs at a nominal p<5×10−4.
- https://doi.org/10.7554/eLife.39595.023
-
Supplementary file 2
Shared caQTLs at a Fisher’s combined p<5×10−6.
- https://doi.org/10.7554/eLife.39595.024
-
Supplementary file 3
All caQTL/GWAS overlaps.
Rows where the GWAS rsID is the same as the caQTL rsID indicate that the GWAS variant was itself a caQTL; rows where they do not match indicate the two variants are in LD (r2 >0.8 in CEU). caQTLs with p<0.005 were included.
- https://doi.org/10.7554/eLife.39595.025
-
Transparent reporting form
- https://doi.org/10.7554/eLife.39595.026