Regulatory dissection of the severe COVID-19 risk locus introgressed by Neanderthals

  1. Evelyn Jagoda
  2. Davide Marnetto
  3. Gayani Senevirathne
  4. Victoria Gonzalez
  5. Kaushal Baid
  6. Francesco Montinaro
  7. Daniel Richard
  8. Darryl Falzarano
  9. Emmanuelle V LeBlanc
  10. Che C Colpitts
  11. Arinjay Banerjee
  12. Luca Pagani
  13. Terence D Capellini  Is a corresponding author
  1. Department of Human Evolutionary Biology, Harvard University, United States
  2. Estonian Biocentre, Institute of Genomics, University of Tartu, Estonia
  3. Department of Veterinary Microbiology, University of Saskatchewan, Canada
  4. Vaccine and Infectious Disease Organization, University of Saskatchewan, Canada
  5. Department of Biology, University of Bari, Italy
  6. Department of Biomedical and Molecular Sciences, Queen’s University, Canada
  7. Department of Biology, University of Waterloo, Canada
  8. Department of Laboratory Medicine and Pathobiology, University of Toronto, Canada
  9. Department of Biology, University of Padova, Italy
  10. Broad Institute of MIT and Harvard, United States
6 figures, 1 table and 2 additional files

Figures

Overview of experimental workflow from whole genome scans for Neanderthal introgression to variant section for the MPRA and SARS-CoV-2 infection reporter assay experiments.
Figure 2 with 2 supplements
Computational intersections between MPRA emVars and functional genomics datasets across the severe COVID-19 risk locus.

(a) Gene locations across the locus along with boundaries of the four LD blocks (A–D), borders extended to encompass all SNPs in LD (r2 > 0.3) tested with MPRA. (b) Severe COVID-19 GWAS effect sizes from release 5 of the COVID-19 Host Initiative dataset (2021), with strongest genome-wide p-values in yellow spanning the A and B blocks. See key for other color definitions. Dots and diamonds across the panels indicate respectively SNPs identified directly by Sprime (dots) and SNPs in linkage disequilibrium (r2 >0.3) with them (diamonds). (c) eQTL effect sizes across the locus (blue for CCR1, green for CCR5) in whole blood from eQTLGen (Võsa et al., 2018) across the locus. Note the strong down- versus up-regulation of CCR1 for variants in the A versus B blocks, respectively. Grey SNPs are not eQTLs for any of the two genes or were absent from the eQTL study. Asterisks denote trans-eQTLs. (d) Chromatin-based functional annotations across the locus consisting of Hi-C contacts with CCR1 and CCR5 in Spleen, Thymus, or LCL (Jung et al., 2019) and candidate cis-regulatory elements from Moore et al., 2020. (e) -Log p-values for emVars identified using MPRA across the locus. Grey SNPs failed the test for activity in either the archaic or non-archaic form. Vertical lines connect the four putative causal emVars and the most cited tag SNP rs10490770 to functional genomics and genetics data. The four putatively causal variants are unique in having significant hits across all functional genomics and genetics tests.

Figure 2—figure supplement 1
Linkage Disequilibrium (LD) between Sprime identified introgressed variants within the segment (chr3:45,843,242–46,654,616) containing the COVID-19 associated haplotype.

This heatmap shows the pairwise r2 LD between each one of the 361 Sprime identified variants (rows) against every other one of the variants (columns). The color scale is a gradient between blue for r2=0 and red for r2=1. Black boxes indicate the 4 major LD blocks (min r2=0.34) with labels A-D denoting our naming scheme for these blocks. Labeled SNPs represent the boundaries between LD blocks.

Figure 2—figure supplement 2
On the top panel we show UCSC Browser tracks for a 61 k region (chr3:45,849,651–45,911,089, hg19) encompassing the leading SNPs on the 3p21.31 COVID-19-risk-associated locus.

From top to bottom we show ‘Severe COVID vars’' from ‘COVID GWAS v4’ track, H3K4Me1, H3K27Ac and DNase Clusters ‘ENCODE Regulation’ tracks, and four ‘ENCODE Transcription Factor Binding’ tracks displaying raw ChIP-seq signal for CEBPB. Highlighted in red and enlarged in the bottom-left panel is a regulatory region bound by CEBPB overlapping rs17713054. The red line indicates the position of rs17713054 within the region. On the bottom-right panel, the Neanderthal allele at rs17713054 generates a predicted CEBPB binding motif, possibly affecting the regulation of nearby genes.

MPRA results show reproducibility and accuracy.

(a) Log normalized counts for each tested sequence in replicate 1 compared with the replicate 2 of the MPRA. Pearson’s R and Spearman’s ρ are extremely high and significant across pairwise replicate comparisons of all four replicates (R>0.99 p-value <2.2 *10–16; ρ=0.98 p-value <2.2 *10–16). (b) Log normalized sequence counts for each tested in the plasmid DNA averaged across the four replicates compared with log normalized average sequence counts in the cDNA averaged across the four replicates. As with other MPRA studies (Tewhey et al., 2016; Uebbing et al., 2021), there is a significant correlation but the plasmid counts do not fully explain the cDNA counts (Pearson’s R=0.24 p-value <2.2 *10–16; Spearman’s ρ=0.92 p-value <2.2 *10–16), suggesting that some of the sequences have an effect on transcription. Sequences determined to be significantly active in the MPRA (methods) are colored in red, non-significant points are black. (c) Activity log fold change (LFC cDNA:pDNA) of positive control sequences in the source MPRA (Jagoda et al., 2021) and in this MPRA. The significant correlation (Pearson’s R=0.57 p-value = 0.006; Spearman’s ρ=0.51 p-value 0.016) suggests that the activity results in this MPRA are accurate. (d) Fraction of sequences tested showing significant activity (LFC cDNA:pDNA corrected p-value >0.01). 95% of positive control sequences tested and 0.14% of negative control sequences tested show activity once again suggesting accuracy in the MPRA results. 53% of experimental sequences show significant activity.

Properties of MPRA-identified active sequences and expression modulating variants.

(a) Log Fold Change of the cDNA count compared to the plasmid DNA for each sequence and the -log10 associated multiple hypothesis corrected p-value. Active sequences are those with a corrected p-value <0.01; this threshold is denoted with a blue dashed line. The larger plot has a y-axis limit of 20; the inset on the right shows the full spread of the data with the light red shaded box denoting the area shown in the larger figure to the left. (b,d) Enrichment of active sequences in K562 genomic features relative to non-active sequences (b) or sequences with emVars relative to sequences without emVars (d). Genomic features indicated with a number represent chromatin states in K562 cells as defined by Ernst and Kellis, 2017. DHS and H3K27ac derive from ENCODE (Butler and Hirano, 2014). Enrichments are reported as Fisher’s odds ratios with lines indicating confidence intervals. Significant enrichments (p<0.05) are colored in red. Missing chromatin states had no overlap with either active sequences (b) or those containing emVars (d). (c) Log Fold Change between active sequences with the allele on the introgressed haplotype compared with the sequence containing the other allele. Expression modulating variants (emVars) are those whose LFC for this measure is significant with a corrected p-value <0.01; this threshold is denoted with a blue dashed line.

Figure 5 with 1 supplement
Activity of four emVars in SARS-CoV-2 infected and uninfected A549-ACE2 cells using real-time PCR (qRTPCR).

(a, top) Diagram of construct design and experimental transfection setup. (a, bottom) Constructs tested using in vitro transfection in A549-ACE2 cells whose results are shown in (b) in color-coded fashion. (b) Box-whisker plot graph depicting the transcript level of GFP expression driven by the introgressed allele normalized by the non-introgressed allele for each emVar in mock infection (i.e., absense of SARS-CoV-2) (left, solid color boxes) vs SARS-CoV-2 infected (right, empty color boxes) cells. The ‘*’ depicts significant changes using Wilcoxon statistical test that had a p-value of <0.05 (n=3 per condition). (c) Bar plots showing the infection ratio for each construct.

Figure 5—figure supplement 1
Transcript levels of experimental and control plasmids in the presence and absence of SARS-CoV-2 infection.

Box plot showing the UpE transcript levels (40-Ct values) for the eight custom plasmids (A–H) and the control plasmid (I) for transfections with and without SARS-CoV-2, where increased 40-Ct values indicates higher expression of UpE (n=3 per condition).

Author response image 1

Tables

Table 1
Properties of emVars and prioritization.

This table shows a summary of all the functional data we obtained on the 20 significant emVars identified by the MPRA which we used for prioritization of which these 20 emVars show the most evidence for contributing to the severe COVID-19 phenotype. Specifically, we looked for (1) concordance between the eQTL data (Columns H,I) and the Hi-C Data (K) - specifically for emVars that are eQTLs for a COVID-19 relevant gene with which they also physically interact in an immune tissue; (2) a significant association between the allele and severe COVID-19 in the GWAS data (column J); (3) overlap between the emVar and an ENCODE annotated cCRE (column L) with support in at least one ‘class A’ immune tissue (column M). The 4 variants that met all these criteria are highlighted in bold. For visualization of this data see Figure 1.

snpchr:pos_hg19Sapiens only Allele/ Allele on Introgressed HapSprime (1) / Linked (0)LD BlockMPRA LFC (cDNA/pDNA)MPRA P-valueCCR1 eQTL DirectionCCR5 eQTL Direction-log10(GWASp)HI C Data (thymus, spleen, LCL)cCREscCREs Class A Immune Tissue
rs177128773:45848760G/C1A0.301.46E-03--NAxdELSnone
rs354548773:46059484T/C0A0.314.60E-05--23.6FYCO1,CCR5dELSCD14-positive monocyte female donor ENCDO265AAA
rs356572183:46094462A/G0A0.642.01E-04008.2CCR1,LZTFL1xx
rs713270243:46140073G/T1A–0.802.96E-03--24.7CCR1dELS,CTCF-boundCD14-positive monocyte female donor ENCDO265AAA, MM.1S, OCI-LY7, GM12878, PC-9, fibroblast of dermis, B cell adult
rs130838813:46184620C/G1B–0.538.20E-030-10.3xxx
rs349196163:46250008G/A1B–0.362.11E-030-9.5xPLS,CTCF-boundCD14-positive monocyte female donor ENCDO265AAA, MM.1S, GM12878
rs356176773:46258740C/T1B1.834.38E-040-NACCRL2,CCR5xx
rs349859473:46260008C/T0B–2.291.06E-05+-6.6CCRL2,CCR5,CCR2xx
rs15427553:46272440G/T0B–1.531.68E-260-10.8CCR2xx
rs130969053:46274766G/A1B–0.437.23E-040-9.3CCR5,CCR1,CCRL2,CCR2,CCR2dELSnone
rs105107503:46274886T/C0B–0.642.01E-04+-9.5CCR5,CCR1,CCRL2,CCR2dELSnone
rs7627893:46402627G/A0B0.461.34E-05+-1.6CXCR6,CCR1dELS,CTCF-boundCD14-positive monocyte female donor ENCDO265AAA
rs713270573:46402645A/C1B–0.339.19E-07+-7.9CXCR6,CCR1dELS,CTCF-boundCD14-positive monocyte female donor ENCDO265AAA
rs340419563:46402734G/A1B–0.551.34E-05+-7.8CXCR6,CCR1dELS,CTCF-boundCD14-positive monocyte female donor ENCDO265AAA
rs67778753:46425092G/T0B0.422.01E-04-+NAxdELSB cell adult, MM.1S, OCI-LY7
rs342459513:46469384T/C1B–1.061.88E-03+-2.7LZTFL1xx
rs730720023:46605033C/T0D0.559.64E-06000.1ALS2CL,LTFxx
rs556492123:46620020A/G1D0.661.46E-03000.3xxx
rs22930213:46621117A/G1D–0.925.35E-12000.3xxx
rs730737523:46626599C/T1D0.354.07E-03000.3xdELSnone

Additional files

Supplementary file 1

This is an excel file containing supplementary data in the form of 13 tables/sheets called (a-m).

(a) presents the African samples used as outgroup for our Sprime introgression scan. (b) presents the full results from our Sprime scan. (c) presents Neanderthal matching Sprime segments. (d) presents results from our U and Q95 scan. (e) presents cis-eQTLs from our U and Q95 Scan. (f) presents cis-eQTLs from our Sprime scan. (g) presents trans-eQTLs from our Sprime scan. (h) presents expression of genes responsive to cis-eQTLs within the introgressed region in RNA-seq studies of infection by SARS-CoV-2 and related viruses. (i) presents expression of genes responsive to trans-eQTLs within the introgressed region in RNA-seq studies of infection by SARS-CoV-2 and related viruses. (j) presents data on all MPRA tested variants. (l) presents data from our ChIP-seq analyses. (l) presents our analyses on ChIP-seq data and predicted transcription factor binding sites (TFBS). (m) presents our construct sequences used in transfection studies.

https://cdn.elifesciences.org/articles/71235/elife-71235-supp1-v1.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/71235/elife-71235-transrepform1-v1.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Evelyn Jagoda
  2. Davide Marnetto
  3. Gayani Senevirathne
  4. Victoria Gonzalez
  5. Kaushal Baid
  6. Francesco Montinaro
  7. Daniel Richard
  8. Darryl Falzarano
  9. Emmanuelle V LeBlanc
  10. Che C Colpitts
  11. Arinjay Banerjee
  12. Luca Pagani
  13. Terence D Capellini
(2023)
Regulatory dissection of the severe COVID-19 risk locus introgressed by Neanderthals
eLife 12:e71235.
https://doi.org/10.7554/eLife.71235