Fine-mapping cis-regulatory variants in diverse human populations

4 figures, 1 table and 4 additional files


Figure 1 with 5 supplements
Outline and results of pooled ATAC-seq.

(A) Performing ATAC-seq in a pool of individuals selects DNA molecules with higher CA, thus enriching for more accessible alleles. In this example (ASW population), the G allele has a low pre-ATAC frequency but a high post-ATAC frequency, due to its increased CA. The ten population abbreviations refer to: CEU, Utah residents with North European ancestry; FIN, Finnish; TSI, Tuscan; IBS, Iberian; ASW, African-American from Southwest US; YRI, Yoruban; ESN, Esan; LWK, Luhya; GWD, Gambian; and CHB, Han Chinese. (B) The number of caQTLs (top), and the percent of all tested SNPs called as caQTLs (bottom). (C) Enrichment of caQTLs among dsQTLs (Degner et al., 2012), at a range of caQTL p-value cutoffs. (D) Quantitative effect sizes of caQTLs and dsQTLs are highly correlated (scales of each axis are not comparable, and do not affect the correlation coefficient).( E–F) The degree of allelic concordance between our caQTLs and: (E) dsQTLs (Degner et al., 2012). (F) bQTLs aggregated for five TFs (Tehranchi et al., 2016). Full results available in Figure 1—source data 1.
Figure 1—source data 1

Detailed results of analyses shown in the Figures.

Each tab has a separate summary: caQTL summary: Numbers of cell lines, reads, SNPs, and caQTLs Shared caQTL summary: Number of populations in which each shared caQTL reached p<0.005 Individual coefficients: The coefficients inferred from our regression approach for inferring contribution of each sample to each pool Numbers of QTLs: Total numbers of dsQTLs, bQTLs, and Hi-C GM12878 heterozygous SNPs used in our analyses Fig 1C-F: Numbers going into Figure 1C–F.
Figure 1—figure supplement 1
Comparison of post-ATAC reference allele frequencies between biological replicates of each population pool.

All replicates have 0.94 < r < 0.96.
Figure 1—figure supplement 2
Pre-ATAC vs post-ATAC reference allele frequencies for nine populations, similar to ASW plot in Figure 1A.

Most SNPs fall close to the diagonal, as expected if most SNPs are not caQTLs. All populations have 0.90 < r < 0.94.
Figure 1—figure supplement 3
Top row: caQTL p-values for SNPs on chr one in ASW and CEU, shown separately for each biological replicate.

Bottom row: median –log10(p-value) as a function of IDR, plotted using a moving window of IDR values (window width = 0.01). Dashed red lines indicate the p-value cutoff of 5 × 10−4, corresponding to IDR ≈ 0.01.
Figure 1—figure supplement 4
QQ plots of expected (under the null) vs observed caQTLs p-values.

All populations show a similar excess of significant p-values.
Figure 2 with 2 supplements
Fine-mapping shared caQTLs.

(A) Heatmap showing the overlap in caQTLs for every pair of populations (only for variants that were testable in all ten). To avoid issues related to arbitrary p-value cutoffs, we used the shift in p-value distribution, known as π1 (Storey et al., 2004), to assess overlap. (B) Mapping a trait in multiple populations differing in LD structure allows fine-mapping of causal variants, which will show the most consistent associations. (C) caQTLs shared across many populations (at p<5×10−4) are more highly enriched for experimentally-determined causal eQTL variants (Tewhey et al., 2016). Full results available in Figure 2—source data 1.
Figure 2—source data 1

All caQTLs testable (at least 20 reads and MAF >0.01) in all ten populations.
Figure 2—figure supplement 1
Sharing of caQTLs across populations, as in Figure 2A, but excluding comparisons with divergent allele frequencies.

One possible explanation for the increase sharing of caQTLs between closely related population (Figure 2A) is that since the allele frequency can affect power to detect QTLs, more similar allele frequencies could lead to greater levels of sharing. To test this possibility, for each SNP, we calculated the sharing as in Figure 2A after excluding any population that had a pre-ATAC allele frequency >5% away from the mean frequency across all 10 populations. Although this excluded 75% of pairwise comparisons, we still observed a similar pattern of sharing, suggesting that patterns of sharing are unlikely to be driven solely by allele frequency differences.
Figure 2—figure supplement 2
Example of a shared caQTL (rs79979970) that is individually significant in only one population (CHB) out of eight tested, but reaches a shared caQTL p=5.6×10−7 because it has p<0.1 in an additional four populations.

In this case, CHB had the greatest power to detect an effect since it had a pre-ATAC allele frequency of 0.68 for the open allele, whereas the other seven all had frequencies > 0.95 and thus very little range for the open allele to increase in frequency post-ATAC.
Figure 3 with 1 supplement
Characterizing shared caQTLs.

(A) The fraction of the genome (left) and of shared caQTLs (right) in each of four classes, annotated based on chromatin signatures (Ernst and Kellis, 2012). TSS includes TSS flanking regions; full results in Supplementary file 2. (B) Searching for motifs enriched specifically among open alleles (using closed alleles from the same caQTLs as the background comparison set), we found 80 motifs enriched among open alleles (points below the diagonal). Repeating the analysis for closed alleles, we found no motifs enriched (above diagonal). Note that many motifs are partially overlapping, and thus not independent. Inset: fold-enrichment in open/closed alleles for five selected TFs. Full results in Figure 3—source data 1. (C) The number of caQTLs overlapping each position within the CTCF motif strongly mirrors the information content (i.e. the importance for binding) of that position, as expected if these caQTLs are causal variants affecting CA via CTCF binding. Full results available in Figure 3—source data 1.
Figure 3—source data 1

Fig 3A: Numbers in Figure 3A, and chromatin states for every caQTL .

Fig 3B: Known TF binding motifs enriched in the more accessible caQTL alleles. Fig 3C: Numbers going into Figure 3C.
Figure 3—figure supplement 1
Effect of shared caQTLs on DNA shape.

P-values are Bonferroni-corrected for four tests. See Materials and methods for details.
Figure 4 with 4 supplements
TF binding and chromatin accessibility.

(A) Using allele-specific 3D chromosomal interaction (Hi-C) data from an LCL (Rao et al., 2014), we found that open alleles of caQTLs tend to have more long-range interactions than do the closed alleles, establishing a role for CA in polymorphic chromosomal interactions.(B) Splitting bQTLs into two groups (Figure 4—figure supplement 2), we found that bQTLs were strongly associated with the extent of long-range interactions only when they also affect CA (left panel; ** indicates Bonferroni-corrected binomial p<0.008 for all six TFs); for bQTLs that do not affect CA, no allelic bias was observed (right panel; Bonferroni-corrected binomial p>0.08 for all six TFs). (C) caQTLs are strongly enriched for both local and distal eQTLs; however among those that do not affect long-range chromosomal interactions, only local eQTLs are enriched. (D) Model summary: our results suggest that bQTLs generally cannot affect long-range chromosomal interactions without an effect on CA, and caQTLs generally cannot affect distal transcription without an effect on long-range interactions. The model shown represents a plausible interpretation, but is not the only possible causal scenario. Full results available in Figure 4—source data 1.
Figure 4—source data 1

Fig 4A: Numbers going into Figure 4A, including additional distance cutoffs.

Fig 4C: Summary numbers going into Figure 4C, and for each caQTL also called as an eQTL, the distance between the caQTL and the target gene’s TSS.
Figure 4—figure supplement 1
Allelic bias of shared caQTLs for inter-chromosomal interactions.

To test the possibility that our result in Figure 4A could be due to a nonspecific bias in the Hi-C method—such as open chromatin alleles having higher efficiency of shearing, ligation, or some other step—we reasoned that any such bias should also be reflected in the pattern of allele-specific inter-chromosomal interactions (such inter-chromosomal interactions are typically considered to be ‘noise’, but should still be affected by any nonspecific biases in the method, making them an ideal control). Using the same Hi-C data (Rao et al., 2014), we found only three caQTLs with significant allelic bias in inter-chromosomal reads (two favoring open alleles and one favoring the closed allele at Bonferroni-corrected p<0.05). Moreover, plotting all shared caQTLs with allele-specific Hi-C data from GM12878 (Rao et al., 2014), shown in this figure, we observed no significant difference (4041 caQTLs favoring open alleles vs 3990 favoring closed alleles; binomial p=0.58).
Figure 4—figure supplement 2
Venn diagram indicating three possible combinations of caQTL/bQTL overlaps, and how we used these to infer their downstream effects in Figure 4B.
Figure 4—figure supplement 3
Causal probabilities of SNPs affecting disease risk (Farh et al., 2015) for two examples discussed in the main text.

A third example for multiple sclerosis could not be plotted because only SNPs with probabilities > 2.75% were reported.
Figure 4—figure supplement 4
Likelihood of caQTLs from LCLs acting as eQTLs in other tissues.

See Supplemental Note for details.


Table 1
Ten candidate causal variants, shared as caQTLs across all 10 populations.

GWAS information is from the GRASP database (Eicher et al., 2015). See Supplementary file 3 for all caQTL/GWAS overlaps.
ChrcaQTL positioncaQTL rsIDGWAS rsIDGWAS p-valueGWAS traitCandidate gene
prostate cancer
 95097544rs3780372rs109749445.10E-32Myeloproliferative neoplasmsJAK2
 12111351439rs11065769rs108499173.21E-08Coronary artery diseaseMYL2
 268598955rs17035378rs170353788.00E-09Celiac diseasePLEK
 6167433948rs6904946rs23014361.00E-12Crohn's diseaseRNASET2
 1469273905rs194749rs1947492.70E-10Inflammatory bowel diseaseZFP36L1
 1154359411rs9651053rs112656082.75E-08Juvenile idiopathic arthritisIL6R
bladder cancer
 1340334852rs9603612rs95324344.52E-08Juvenile idiopathic arthritisCOG6

Additional files

Supplementary file 1

All cell lines used, and all caQTLs at a nominal p<5×10−4.

The numbers of caQTLs are greater than those shown in Figure 1B, since for Figure 1B and the associated text we removed those in LD (r2 >0.8 in YRI).
Supplementary file 2

Shared caQTLs at a Fisher’s combined p<5×10−6.
Supplementary file 3

All caQTL/GWAS overlaps.

Rows where the GWAS rsID is the same as the caQTL rsID indicate that the GWAS variant was itself a caQTL; rows where they do not match indicate the two variants are in LD (r2 >0.8 in CEU). caQTLs with p<0.005 were included.
Transparent reporting form

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ashley Tehranchi
  2. Brian Hie
  3. Michael Dacre
  4. Irene Kaplow
  5. Kade Pettie
  6. Peter Combs
  7. Hunter B Fraser
Fine-mapping cis-regulatory variants in diverse human populations
eLife 8:e39595.