Abstract
Genome-wide association studies (GWAS) have identified hundreds of genetic signals associated with autoimmune disease. The majority of these signals are located in non-coding regions and likely impact cis-regulatory elements (cRE). Because cRE function is dynamic across cell types and states, profiling the epigenetic status of cRE across physiological processes is necessary to characterize the molecular mechanisms by which autoimmune variants contribute to disease risk. We localized risk variants from 15 autoimmune GWAS to cRE active during TCR-CD28 costimulation of naïve human CD4+ T cells. To characterize how dynamic changes in gene expression correlate with cRE activity, we measured transcript levels, chromatin accessibility, and promoter-cRE contacts across three phases of naive CD4+ T cell activation using RNA-seq, ATAC-seq, and HiC. We identified ∼1,200 protein-coding genes physically connected to accessible disease-associated variants at 423 GWAS signals, at least one-third of which are dynamically regulated by activation. From these maps, we functionally validated a novel stretch of evolutionarily conserved intergenic enhancers whose activity is required for activation-induced IL2 gene expression in human and mouse, and is influenced by autoimmune-associated genetic variation. The set of genes implicated by this approach are enriched for genes controlling CD4+ T cell function and genes involved in human inborn errors of immunity, and we pharmacologically validated eight implicated genes as novel regulators of T cell activation. These studies directly show how autoimmune variants and the genes they regulate influence processes involved in CD4+ T cell proliferation and activation.
Introduction
GWAS has linked hundreds of regions of the human genome to autoimmune disease susceptibility. The majority of GWAS variants are located in non-coding regions of the genome, and likely contribute to disease risk by modulating cis-regulatory element (cRE) activity to influence gene expression1. Identifying causal variants and effector genes molecularly responsible for increased disease risk is critical for identifying targets for downstream molecular study and therapeutic intervention. An understanding of how non-coding variants function is often limited by incomplete knowledge of the mechanism of action, i.e., whether a variant is located in a cRE, in which cell types a cRE is active, and which genes are regulated by which cRE.
CD4+ T cells are key regulators of innate and adaptive immune responses that combat infection by orchestrating the activity of other immune cells. In their quiescent, naïve state, ‘helper’ T cells traffic between the blood and secondary lymphoid tissues as part of an immunosurveillance program maintained by transcription factors such as KLF2, TOB, and FOXO2–4. Upon encountering specific antigen, a cascade of signals activated through the TCR and costimulatory receptors results in large-scale changes in gene expression driven by factors like NFKB, NFAT, IRF4, and STATs, leading to proliferation and differentiation into specialized Th1, Th2, and Th17 subsets capable of participating in protective immunity5, 6. CD4+ T cells are also key players in the induction and pathogenesis of autoimmunity. The cis-regulatory architecture of CD4+ T cells is enriched for autoimmune disease GWAS variants7–14, and T cells from autoimmune patients harbor distinct epigenetic and transcriptomic signatures linking dysregulated gene expression with disease pathogenesis. Autoimmune variants may contribute to the breakdown of immune self-tolerance by shifting the balance between autoreactive conventional vs. regulatory CD4+ T cells, by altering cytokine production, or by promoting auto-antigen production15, 16.
Chromatin conformation assays allow for identification of putative target genes of autoimmune variant-containing cRE in close spatial proximity to gene promoters. Previous work using promoter-capture HiC and HiC in combination with other epigenetic marks have implicated sets of autoimmune variants and effector genes that may participate in T cell activation17–19. In addition to chromatin conformation approaches, multiple complementary approaches have been developed to link disease associated SNPs to their downstream effector genes. Expression quantitative trait mapping (eQTL) and chromatin co-accessibility data can be used to implicate effector genes through statistical association of genotypes with readouts of expression or other molecular markers20. However, these approaches do not account for the relevant 3D structure of the genome in the nucleus, and are highly susceptible to trans-effects and other confounding factors.
In this study, we measured the impact of TCR-CD28 activation on the autoimmune-associated cis-regulatory architecture of CD4+ helper T cells, and by comparing our data to those of several orthogonal GWAS effector gene nomination studies, identify hundreds of effector genes not implicated previously. We find that 3D chromatin-based approaches exhibit 2- to 10-fold higher predictive sensitivity than eQTL and ABC statistical approaches when benchmarked against a ‘gold standard set’ of genes underlying inborn errors in immunity and tolerance. Our maps of autoimmune SNP-gene contacts also predicted a stretch of evolutionarily conserved, intergenic enhancers that we show are required for normal expression of the canonical T cell activation gene IL2 in both human and mouse, whose activity is influenced by autoimmune risk variants. The set of variant-connected effector genes defined by 3D physical proximity to autoimmune-associated cRE is enriched for genes that regulate T cell activation, as validated pharmacologically in this study and by CRISPR-based screens in orthogonal studies21–23.
Results
Gene expression dynamics as a function of naïve CD4+ T cell activation
To explore the universe of genes and cREs that are affected by and may contribute to CD4+ activation, we characterized the dynamics of gene expression, chromatin accessibility, and 3D chromatin conformation in human CD4+CD62LhiCD44lo naïve T cells purified directly ex vivo and in response to in vitro activation through the T cell receptor (TCR) and CD28 for 8 or 24 hours using RNA-seq (N=3 donors), ATAC-seq (N=3 donors), and HiC (N=2 donors, Figure 1A). As sequencing depth and restriction enzyme cutting frequency affects the resolution and reliability of HiC experiments24, we constructed libraries with two four-cutter restriction enzymes and sequenced to a total of approximately 4 billion unique-valid reads per timepoint. We verified the reproducibility of replicate ATAC-seq and RNA-seq libraries with principal component analysis and HiC libraries using distance-controlled, stratum-adjusted correlation coefficient (SCC). Replicate samples were highly correlated and clustered by activation stage (Figure 1 - Figure Supplement 1A-C). As expected, quiescent naïve CD4+ T cells expressed high levels of SELL, TCF7, CCR7, and IL7R, and rapidly upregulated CD69, CD44, HLADR, and IL2RA upon stimulation (Figure 1B, Supplementary File 1). Genome-scale gene set variance analysis based on MsigDB hallmark pathways showed that genes involved in KRAS and Hedgehog signaling are actively enriched in quiescent naive CD4+ T cells, while stimulated cell transcriptomes are enriched for genes involved in cell cycle, metabolism, and TNF-, IL-2/STAT5-, IFNg-, Notch-, and MTORC1-mediated signaling pathways (Supplementary File 2).
To define global gene expression dynamics during the course of CD4+ T cell activation, we performed pairwise comparisons and k-means clustering, identifying 4390 differentially expressed genes after 8 hours of stimulation (3289 upregulated and 1101 downregulated) and 3611 differentially expressed genes between 8 hours and 24 hours (3015 upregulated and 596 downregulated, Figure 1 - Figure Supplement 1D) that could be further separated into five clusters based on their distinct trajectories (Figure 1C, Figure 1 - Figure Supplement 1E-G, Supplementary File 1). Genes upregulated early upon activation (cluster 1; n=1621 genes) are enriched for pathways involved in the unfolded protein response, cytokine signaling, and translation (e.g., IL2, IFNG, TNF, IL3, IL8, IL2RA, ICOS, CD40LG, FASLG, MYC, FOS, JUNB, REL, NFKB1, STAT5A, Supplementary File 3). Genes downregulated late (cluster 2; n=1600 genes) are moderately enriched for receptor tyrosine kinase signaling, cytokine signaling, and extracellular matrix organization. Genes monotonically increasing (cluster 3; n=2676 genes) are highly enriched for pathways involved in infectious disease and RNA stability, translation and metabolism, and moderately enriched for pathways involved in the unfolded protein response, cellular responses to stress, regulation of apoptosis, and DNA repair (e.g., TBX21, BHLHE40, IL12RB2, STAT1, CCND2, CDK4, PRMT1, ICAM1, EZH2, Supplementary File 3). Genes downregulated early (cluster 4; n=1628 genes) are enriched for pathways involved in inositol phosphate biosynthesis, neutrophil degranulation, and metabolism of nucleotides (e.g., KLF2, IL7R, RORA, IL10RA), and genes upregulated late (cluster 5; n=2154 genes) are highly enriched for pathways involved in cell cycle, DNA unwinding, DNA repair, chromosomal maintenance, beta-oxidation of octanoyl-CoA, and cellular response to stress (e.g., CDK2, E2F1, CDK1, CCNE1, CCNA2, PCNA, WEE1, CDC6, ORC1, MCM2). These patterns are consistent with known changes in the cellular processes that operate during T cell activation, confirming that in vitro stimulation of CD4+ T cells recapitulates gene expression programs known to be engaged during a T cell immune response.
Dynamic changes in chromosomal architecture and genome accessibility during naïve CD4+ T cell activation
To understand the cis-regulatory dynamics underlying the observed activation-induced changes in gene expression, we examined CD4+ T cell nuclear chromosome conformation and chromatin accessibility as a function of stimulation state using HiC and ATAC-seq. The human genome consists of ∼3 meters of DNA that is incorporated into chromatin and compacted into the ∼500 cubic micron nucleus of a cell in a hierarchically ordered manner. This degree of compaction results in only ∼1% of genomic DNA being accessible to the machinery that regulates gene transcription25, therefore a map of open chromatin regions (OCR) in a cell represents its potential gene regulatory landscape. Open chromatin mapping of human CD4+ T cells at all states identified a consensus cis-regulatory landscape of 181,093 reproducible OCR (Supplementary File 4). Of these, 14% (25,291) exhibited differential accessibility following 8 hours of stimulation (FDR<0.05). Most differentially accessible regions (DAR) became more open (18,887), but some DAR (6,629) showed reduced accessibility (Figure 2A, Supplementary File 5). The change in accessibility over the next 16 hours of stimulation showed the opposite dynamic, with 6,629 regions exhibiting reduced accessibility, and only 4,417 DAR becoming more open (total of 11,046 DAR, Figure 2A, Supplementary File 5). These OCR represent the set of putative cRE with dynamic activity during T cell activation.
The vast majority of putative cRE are located in intergenic or intronic regions of the genome far from gene promoters, meaning that the specific impact of a given cRE on gene expression cannot be properly interpreted from a one-dimensional map of genomic or epigenomic features. To predict which cRE may regulate which genes in CD4+ T cells across different states of activation, we created three-dimensional maps of cRE-gene proximity in the context of genome structure. The highest order of 3D nuclear genome structure is represented by A/B compartments, which are large chromosomal domains that self-associate into transcriptionally active (A) vs. inactive (B) regions26. In agreement with prior studies, we find that OCR located in active A compartments exhibit higher average accessibility than those OCR located in less active B compartments (Figure 2 – Figure Supplement 1A, Supplementary File 6), a trend observed across all cell states. Likewise, genes located in A compartments show higher average expression than those located in B compartments (Figure 2 – Figure Supplement 1B). A quantitative comparison across cell states showed that 94% of the CD4+ T cell genome remained stably compartmentalized into A (42%) and B (52%), indicating that activation does not cause a major shift in the large-scale organization of the genome within the nucleus (Figure 2B).
Within each A or B compartment, the genome is further organized into topologically associating domains (TADs)27. These structures are defined by the fact that genomic regions within them have the potential to interact with each other in 3D, but have low potential to interact with regions outside the TAD. The location of TAD boundaries can influence gene expression by limiting the access of cRE to specific, topologically associated genes. While ∼80% of TAD boundaries remained stable across all states, 20% of TAD boundaries (8925) changed as a function of T cell activation (Figure 2C, Figure 2 – Figure Supplement 1C-D, Supplemental File 7). TAD boundary dynamics were categorized and 2198 boundaries exhibited a change in strength, 2030 boundaries shifted position, and 4697 boundaries exhibited more complex changes such as loss of a boundary resulting in merger of two neighboring TAD, addition of a boundary splitting one TAD into two, or a combination of any of these changes. Genes nearby dynamic TAD boundaries are enriched for pathways involved in RNA metabolism, cellular response to stress, and the activity of PTEN, p53, JAK/STAT, Runx and Hedgehog (Supplementary File 6). We also detected chromatin stripes, which are TAD-like structures that consist of a genomic anchor region that is brought into contact with a larger loop domain via an active extrusion mechanism. Chromatin stripes are contained within and/or overlap TAD regions, and are enriched for active enhancers and super-enhancers28, 29. We identified 1526 chromatin stripes in quiescent naïve CD4+ T cells, 1676 stripes in 8 hour stimulated cells, and 2028 stripes at 24 hours post-stimulation (Figure 2 - Figure Supplement 2A, Supplementary File 8). Consistent with prior studies in other cell types, chromatin stripes were preferentially located in A compartments, and genes and OCR within stripe regions showed increased expression and chromatin accessibility (Figure 2 - Figure Supplement 2B-D).
Within active topological structures, transcriptional enhancers can regulate the expression of distant genes by looping to physically interact with gene promoters30, 31. To identify potential regulatory cRE-gene interactions, we identified high confidence loop contacts across all cell states using Fit-HiC (merged 1kb, 2kb, and 4kb resolutions, Figure 2D). This approach detected 933,755 loop contacts in quiescent naïve CD4+ T cells, 900,267 loop contacts in 8-hour stimulated cells, and 551,802 contacts in 24-hour stimulated cells (2,099,627 total unique loops). Approximately 23% of these loops involved a gene promoter at one end and an OCR at the other, and these promoter-interacting OCR were enriched for enhancer signatures based on flanking histone marks from CD4+ T cells in the epigenome roadmap database32 (Figure 2 - Figure Supplement 1E). T cell activation resulted in significant reorganization of the open chromatin-promoter interactome, as 907 promoter-OCR exhibited increased contact and 1333 showed decreased contact following 8 hours of stimulation (Figure 2E). Continued stimulation over the next 16 hours was associated with an increase in the contact frequency of 41 promoter-OCR pairs, while only 4 pairs exhibited decreased contact (Figure 2E). Activation-induced changes in chromatin architecture and gene expression were highly correlated, as genomic regions exhibiting increased promoter connectivity became more accessible at early stages of stimulation, which was associated with increased gene transcription from connected promoters (Figure 2F, Supplementary File 9). The accessibility of promoter-connected OCR and the expression of their associated genes decreased globally from 8 to 24 hours of stimulation (Figure 2F). We compared these loop calls to a prior chromatin capture analysis in CD4+ T cells by Burren et al.18 and found that roughly 40% of stable loops in both stimulated and unstimulated cells were identical in both studies, despite differing in approach (HiC vs. PCHiC), analysis (HiC vs. CHiCAGO), sample (naïve CD4+ vs. total CD4+), timepoint (8 vs. 4 hour), and donor individuals (Figure 2 - Figure Supplement 1F). As expected, unstimulated samples were more similar than activated samples.
We next focused on the 5 sets of genes with the dynamic expression patterns defined in Figure 1D, and identified 57,609 OCR that contact dynamic gene promoters in at least one stage. Most dynamic genes contacted between 1 and ∼35 OCR, with a median of 10 OCR per gene, but a handful of dynamic genes were observed in contact with over 100 distinct open chromatin regions (Figure 2 - Figure Supplement 1G). Similarly, most OCR were connected to a single dynamic gene, but many contacted more than one gene (median 2 genes per OCR), suggesting that most dynamic genes have a complex regulatory architecture. Increased gene expression upon activation correlated with an increase in the accessibility and promoter contact frequency of distant cRE (Figure 2 - Figure Supplement 1H), as exemplified by GEM and IRF4 (Figure 2 - Figure Supplement 3A,B). Conversely, the 3D regulatory architecture of genes like KLF2 and DPEP2, which were downregulated following stimulation, exhibited decreased contact and accessibility (Figure 2 - Figure Supplement 3C,D).
Transcription factor footprints enriched in dynamic open chromatin identify regulators of T cell activation
To explore what factors drive dynamic changes in the regulatory architecture of the CD4+ T cell genome during activation, we conducted a quantitative footprinting analysis at 1,173,159 transcription factor (TF) motifs located in the consensus CD4+ T cell open chromatin landscape using the average accessibility of the region surrounding each motif as a measure of regulatory activity (Supplementary File 10). Activation of naïve CD4+ T cells resulted in increased chromatin accessibility around bZIP motifs at tens of thousands of genomic regions by 8 hours post-stimulation (Figure 3A and C), which was associated with increased expression of Fos and Jun family members constituting the AP-1 complex, as well as the bZIP factors BATF, BACH1, and BACH2 (Figure 3A). The activity of NFE2 and SMAD was increased without increased expression (Figure 3A and C), likely due to post-translational regulation of these factors by phosphorylation33. Conversely, the motifs for a number of TF exhibited significantly reduced accessibility early after stimulation, including those for EGR2 and FOXP3 that are known to negatively regulate T cell activation34, 35 (Figure 3A). By 24 hours post-activation, bZIP activity remained largely unchanged compared to 8 hours (Figure 3B), but a number of factors showed decreased activity. These include several members of the Sp family, the Myc cofactor MAZ that also cooperates with CTCF to regulate chromatin looping36, KLF2, which controls genes involved in naïve CD4+ T cell quiescence and homing2, 37, NRF1, a factor implicated in age-associated T cell hypofunction38, and EGR2 and 3, which are known to oppose T cell activation and promote tolerance34, 39, 40 (Figure 3B and C).
To explore how transcription factor activity may operate via the CD4+ T cell open chromatin landscape to regulate distinct programs of dynamic gene expression during TCR/CD28-induced activation, we focused on TF motifs enriched among those OCR specifically contacting promoters of dynamic genes identified by our clustering analysis (Figure 3 – Figure Supplement 1A). The set of OCR contacting dynamic gene promoters were enriched for the motifs of 89 expressed (TPM>5) transcription factors, as compared to motifs present in the total open chromatin landscape (Figure 3D). The majority of this TF activity was enriched in OCR connected to genes highly upregulated at 24 hours post-activation (clusters 3 and 5, Figure 3D), with the exception of CREB3, ELF1, ESRRA, GABPA, RELA, XPB1, ZNF384, and the transcriptional repressor IKZF1 known as a strong negative regulator of T cell activation and differentiation41–44. Conversely, motifs for IKZF1, ZNF384, GABPA, ESRRA, and ELF1 were highly enriched in the set of OCR contacting genes down-regulated early after activation (cluster 4, Figure 3D). Motifs for KLF2 and the metabolic gene regulator SREBF1 were likewise enriched in OCR connected to down-regulated genes. Open chromatin regions interacting with genes in cluster 2 are negatively enriched for this set of TF except for CTCF (Figure 3D).
Finally, we integrated TF footprint, promoter connectome, and gene co-expression data to construct TF-gene regulatory networks likely operating at each timepoint. The connections between regulatory nodes are based on physical promoter-TF footprint interactions with confidence weighted by gene co-expression (Figure 3E, Supplementary File 11). Highly co-expressed genes at the core of the unstimulated CD4+ T cell regulatory network encode transcription factors such as KLF2, ETS1, IKZF1, and TCF745–47 that are known to be involved in T cell gene silencing, quiescence, homeostasis, and homing. Genes connected to the top factors in this network were enriched for pathways involved in immune signaling, DNA replication and repair, protein secretion, and programmed cell death (Figure 3 – Figure Supplement 1B). Costimulation through the TCR and CD28 induced a set of core network genes active at both time points with known roles in T cell activation and differentiation (NFKB1, JUNB, MYC, IRF4, STAT5, STAT1, LEF1, ATF4). Also part of this set is PLAGL2, an oncogene in the Wnt pathway that regulates hypoxia-induced genes48 with no prior defined role in T cell activation. Additional nodes specifically implicated at 8 hours post-activation are HIF1A, the major sensor of cellular hypoxia49, and XBP1, a major transcriptional mediator of the unfolded ER protein response with defined roles in T cell activation, differentiation, and exhaustion50, 51. Factors specifically implicated at 24 hours post-activation include E2F1, a transcriptional regulator of both cell cycle and apoptosis in T cells52, 53, BHLHE40, a factor known to control multiple aspects of T cell metabolism and differentiation54, and the Myc cofactor MAZ that has not been previously studied in the context of T cell function. Genes connected to factors in the activated T cell networks were enriched for pathways involved in cytokine signaling, the interferon response, transcription, cell cycle, DNA replication and repair, and programmed cell death (Figure 3 – Figure Supplement 1B). Together, these data indicate that concurrent but separable stimulation-induced gene programs are the result of the activity of distinct sets of DNA binding factors mobilized by antigen and costimulatory receptor signaling in naïve CD4+ T cells.
Identification of autoimmune variants associated with CD4+ T cell cRE and predicted effector genes
Following our established variant-to-gene (V2G) mapping approach to implicate functional SNPs and their effector genes using the combination of GWAS and chromatin conformation capture data55–59, we intersected promoter-interacting OCR with autoimmune SNPs from the 95% credible set derived from 15 autoimmune diseases (Figure 4A, Supplementary Files 12 and 13). Constraining the GWAS SNPs in this way reduced the credible set size from an average of 14 variants per sentinel to 3 variants per sentinel. To determine whether open chromatin in physical contact with dynamically regulated genes in CD4+ T cells is enriched for autoimmune disease heritability, we performed a partitioned LD score regression analysis. This landscape was enriched for variants associated with susceptibility to inflammatory bowel disease (IBD), ulcerative colitis (UC), type I diabetes (T1D), lupus (SLE), celiac disease (CEL), allergy (ALG), eczema (ECZ), and rheumatoid arthritis (RA), but not for variants associated with psoriasis (PSO) or juvenile idiopathic arthritis (JIA) (Figure 4B, Supplementary File 14). The OCR connected to genes upregulated early and/or progressively upon activation (clusters 1 and 3) were most strongly enriched for ALG, CEL, IBD/UC, RA and T1D heritability, while SLE and ECZ heritability was most enriched in OCR connected to genes upregulated later post-activation (clusters 3 and 5, Figure 4B). SLE was also the only disease (besides PSO and JIA) that was not enriched in open chromatin connected to down-regulated genes (cluster 4, Figure 4B).
The promoter-connected open chromatin landscape for all CD4+ T cell states in this study contains 2606 putatively causal variants linked to ∼half of the sentinel signals (423) for the 15 diseases analyzed, and are in contact with a total of 1836 genes (Supplementary File 13). A total of 1151 autoimmune variants localized to the promoters of 400 genes (−1500/+500bp from the TSS, Supplementary File 15). These variants were on average 103 kb from the TSS of their connected gene (Figure 4C), and each variant contacted an average of 5 genes (Figure 4D). The majority of linked SNPs interact with genes in addition to the nearest gene, and ∼half of linked SNPs ‘skip’ the nearest gene to target only distant genes (Figure 4E). Approximately 60% of connected genes were implicated across all timepoints (Figure 4F, Supplementary File 13), while ∼40% (753) are dynamically regulated (clusters 1-5) in response to TCR/CD28 co-stimulation. Examples of SNP-genes pairs that exhibit dynamic accessibility, chromosome contact, and expression in response to T cell activation are the SIK1, PARK7, DUSP5, CLEC2D, TRIP10, GPR108 and IL2 loci (Figure 4G, Supplementary File 13). The TRIP10 and GPR108 promoters were each captured in contact with a high confidence variant rs1077667 (PP>0.99), which is located in an intron of TNFSF14 and is associated with multiple sclerosis (Figure 4H). The accessibility of this SNP and it’s contact with TRIP10 and GPR108 increased following activation (Figure 4H). Conversely, the allergy-associated SNP rs7380290 is accessible and contacts the SIK1 promoter in resting cells, but shows reduced accessibility and promoter connectivity upon activation (Figure 4I). TRIP10 encodes a cytoskeletal binding protein involved in endocytosis that suppresses glucose uptake in response to insulin signaling60, GPR108 encodes an orphan G-protein coupled receptor, and SIK1 encodes a salt-inducible kinase with roles in cancer, epilepsy, and myeloid signaling61. None of these genes have been previously studied in the context of T cell activation.
Comparative predictive power against orthogonal variant-to-gene mapping approaches
In an attempt to gauge the predictive power of our approach relative to other V2G approaches, we compared our chromatin capture-based autoimmune GWAS effector gene predictions to the predictions of four other chromosome capture-based studies in human CD4+ T cells17–19, 62, four single-cell eQTL studies in human CD4+ T cells13, 62–64, and a set of 449 genes that when mutated in humans cause inborn errors of immunity65. As part of the comparison, we harmonized the chromatin-based datasets using our chromatin loop calling approach and integrated with the same 95% credible set of autoimmune variants to create comparable physical contact maps. A total of 6936 unique genes were implicated by all studies. Our HiC-ATAC-seq approach in naïve and activated human CD4+ implicated 1947 effector genes, 400 of which through proxies in open promoters and 1836 through contacts with distal variants, of which 752 were dynamically regulated throughout activation. The HiC dataset from Gate et al. implicated 110 genes, and the capture-HiC datasets from Burren et al., Yang et al., and Javierre et al. implicated 1408, 2572, and 2368 genes, respectively (Supplementary File 16). A total of 369 autoimmune GWAS effector genes were predicted in common by our HiC approach and the pcHiC-based approaches. The eQTL studies by Soskic et al., Gate et al., Ye et al., and Schmiedel et al. (DICE) implicated 171, 15, 15, and 221 genes, and the co-accessibility data from Gate et al. implicated 45 genes. Concordance between our gene predictions and the pcHiC predictions ranged from 15-24% (Figure 5A), while concordance between our predictions and the eQTL predictions was low (0.6%∼5%, Figure 5A), and concordance among eQTL studies was also low (1.6%-13%, Figure 5A). However, when we co-localized the Soskic variant-gene pairs with autoimmune GWAS SNPs, 41 of these eQTL were also predicted by our study, representing a concordance 10-fold higher than obtained by a random sampling of genes within 500 kb of autoimmune variants (Figure 5B). A total of 1519 of the genes implicated in our study were not predicted by eQTL approaches, while 644 genes from our CD4 T cell V2G data were not implicated by the other chromatin-based datasets, and altogether 562 putative autoimmune effector genes were uniquely predicted by our study. Potential sources of variation between the results of the studies are outlined in the Discussion. A total of 17 genes were implicated by all CD4+ T cell-based approaches that nominated at least 150 genes (APIP, BCL2L11, ERAP1, ERAP2, CD5, FIGNL1, IL18R1, METTL21B, MRPL51, PPIF, PTGER4, PXK, RMI2, RNASET2, SERPINB1, TAPBPL, VAMP1).
To measure the predictive power of the sets of genes implicated by each approach above, a precision-recall analysis was performed against the ‘gold standard’ human inborn errors in immunity gene set, based on the hypothesis that genes under the control of autoimmune-associated variants would cause monogenic immune disease phenotypes if subjected to deleterious coding mutations. The set of genes from our study with disease-associated SNPs located in their open promoters overlapped with only 5% of the genes from the gold standard set (Supplementary File 16, Figure 5C). However, leveraging our HiC data to include distal autoimmune variants captured interacting with open gene promoters increased the sensitivity of the approach 3-fold to 16% (71 overlapping genes, Figure 5C), confirming the relevance of long-range promoter-cRE/SNP contacts. This level of sensitivity is comparable to that of other chromosome capture-based datasets for predicting HIEI genes in CD4+ T cells (Figure 5C). Conversely, the eQTL-based approaches were 3- to ∼12-fold less sensitive than the chromatin-constrained approaches in predicting HIEI genes, with most recalling 0-2.5% and the DICE eQTL dataset recalling 6% (Figure 5C). Restricting our 3D chromatin V2G to only those genes dynamically regulated during T cell activation reduced predictive power from 16% to 10%, but the fraction of these genes in the HIEI gold standard set (precision) increased from ∼4% to 6.2% (Figure 5C), indicating that focusing the V2G set on those SNP-gene pairs that are dynamically regulated by T cell activation reduces potential false positive predictions. The precision of our dataset was superior to that of the other chromosome capture-based datasets except for the Burren pcHiC dataset (Figure 5C). The eQTL approaches were comparable to the 3D chromatin-based datasets in precision (∼3-6%, Figure 5C).
We also analyzed the relative predictive power of a recently proposed ‘activity-by-contact’ (ABC) model66 that uses a multi-tissue averaged HiC dataset instead of cell type-matched 3D chromatin-based data to link variants to genes. This model constrains the input data by removing bidirectional, antisense, and small RNAs, and eliminate genes ubiquitously expressed across tissues based on the assumption that these genes are driven primarily by elements adjacent to their promoters. Also, while most human genes have multiple alternative transcriptional start sites (median = 3), ABC only annotates only one promoter per gene. To apply the activity-by-contact gene nomination model to our CD4+ T cell chromatin-based V2G data, we used our ATAC-seq data and publicly available CD4+ T cell H3K27ac ChIP-seq data as input, and integrated this with GWAS and the averaged HiC dataset from the original ABC study66, 67. The ABC model nominated 650 genes compared to 1836 genes using our cell type-matched HiC data and analysis pipeline. Only 357 genes were nominated by both approaches, while 1479 genes nominated by our approach were not implicated by the ABC model, and 293 genes not implicated by our approach were newly implicated by ABC. To measure the predictive power of the ABC approach, we re-ran the precision-recall re-analysis with all datasets subjected to the ABC gene-promoter filter (Fig. 5D). We found that applying the restricted ABC model promoter annotation to all datasets did not have a large effect on recall, however, the precision of several of the datasets were affected. For example, using the restricted promoter/gene set reduced the precision of our V2G approach and artificially inflated the precision of the ‘nearest gene to SNP’ metric. The precision-recall analysis also shows that the ABC score-based approach is only half as powerful at predicting HIEI genes as the chromatin-based V2G approaches (Fig. 5D). This indicates that informing GWAS data with cell type- and state-specific 3D chromatin-based data brings more target gene predictive power than application of the multi-tissue-averaged HiC used by the ABC model. Together, these analyses indicate that chromosome capture-based V2G is a more sensitive approach for identifying ‘true’ effector genes than eQTL or ABC approaches, but comes with additional predicted genes that represent either false positives or true effector genes for which monogenic LOF/GOF mutations have not yet been characterized in humans.
Functional validation of novel autoimmune V2G-predicted enhancers at the IL2 locus
Our 3D chromatin-based analysis specifically predicts dynamic, disease-associated regulatory elements in intergenic space at the IL2 locus. The IL2 gene encodes a cytokine with crucial, pleotropic roles in the immune system, and dysregulation of IL-2 and IL-2 receptor signaling leads to immunodeficiency and autoimmune disorders in mice and humans68–71. Activation-induced transcription of IL2 involves an upstream regulatory region (URR) ∼375 bp from the TSS that has served as a paradigm of tissue-specific, inducible gene transcription for nearly four decades72–74. However, the presence of evolutionarily conserved non-coding sequences (CNS) in the ∼150 kb of intergenic space 46, 51, 80, 83, 85, 96, 122, and 128 kb upstream of the TSS suggest that additional regulatory elements may have evolved to control IL275 (Figure 6A). The −51 kb CNS contains a SNP linked to T1D, IBD, PSO, CEL and allergy (rs72669153), while the −85 kb CNS contains a SNP linked to RA (rs6232750) and the −128 kb CNS contains two SNPs linked to T1D, JIA, and SLE (rs1512973 and rs12504008, Figure 6A). In TCR/CD28-stimulated naïve CD4+ T cells, these CNS are remodeled to become highly accessible (Figure 6A), and they loop to interact physically with the IL2 URR (Figure 6B) at both time points. ChIP-seq analyses in human T cells76 show that the URR and all distal CNS except −85 are occupied by TF such as Jun/Fos (AP-1), NFAT, and NFkB that are known regulators of IL2 (Figure 6 – Figure Supplement 1), and the −85, −122, and −128 CNS are occupied by additional TF not previously thought to be direct regulators of IL2, such as MYC, BCL6, and STAT5 (Figure 6 – Figure Supplement 1). Recombinant reporter assays in primary activated human CD4+ T cells showed that the −46, −51, −83, and −128 CNS/OCR can enhance transcription from the URR (Figure 6C). To determine whether the native elements contribute to the expression of IL2, we targeted each CNS/OCR individually in primary human CD4+ T cells or Jurkat T cells using CRISPR/CAS9 (Figure 6D) and measured IL-2 secretion following TCR/CD28 stimulation (Figure 6E). Deletion of the −46, −51, −83, −85, −122, and −128 kb elements in primary human CD4+ T cells each resulted in a ∼50% reduction in IL-2 production, while deletion of the −80 kb element had little effect. A very similar pattern of impact was observed when these elements were deleted individually in Jurkat T cells (Figure 6E). The URR has a stronger contribution to IL-2 production than any individual intergenic element, as deletion of the URR almost completely abrogated activation-induced IL-2 production by both primary CD4+ or Jurkat T cells (Figure 6E). To determine whether these intergenic enhancers exist in synergistic epistasis77 necessary for IL2 transactivation, we generated Jurkat T cell clones in which the stretch of all 7 elements located 46 kb to 128 kb upstream of IL2 was deleted using CRISPR/CAS9 (Figure 6F). Despite the URR and 46 kb of upstream sequence being intact in these clones, loss of the 81.3 kb stretch of intergenic enhancers renders these cells incapable of expressing IL2 at both the mRNA and protein level in response to stimulation (Figure 6G). These results show that the URR is not sufficient for activation-induced expression of IL2, and that IL2 has a previously unappreciated, complex, and autoimmune disease-associated regulatory architecture that was accurately predicted by our 3D epigenomic V2G approach. Importantly, we find that the distal IL2 cRE are highly accessible in quiescent memory T cell subsets (Th1, Th2, Th17, Th1-17, Th22) isolated directly ex vivo from human blood, whereas naïve CD4+ T cells and non-IL-2-producing Treg showed little accessibility at these elements (Figure 6 – Figure Supplement 1D). This suggests that stable remodeling of distal IL2 cRE can persist in vivo after TCR signals cease, and that this epigenetic imprinting contributes to the immediate, activation-induced production of IL-2 exhibited by memory, but not naïve or regulatory, CD4+ T cells78–80.
Distal IL2 enhancers are evolutionarily conserved and impact in vivo T cell-mediated immunity in mice
The intergenic IL2 enhancers defined above are conserved at the sequence level between human and mouse (Figure 7A). To test whether enhancer function is likewise evolutionarily conserved, we used zygotic CRISPR/CAS9 targeting to generate mice with a 583 bp deletion of the ortholog of the −128 kb human enhancer CNS/OCR situated 83 kb upstream from mouse Il2 (Il2-83-cRE-ko, C57BL6 background). Deletion of this genomic region did not discernably affect T lymphocyte development in Il2-83-cRE-ko mice. However, peripheral CD4+ T cells from Il2-83-cRE-ko mice produced substantially less IL-2 at both the protein and mRNA level, and exhibited reduced proliferation, in response to in vitro stimulation (Figure 7B), confirming that the enhancer function of the orthologous −128 and −83 kb elements is conserved between human and mouse. The in vitro induction of Foxp3+ regulatory T cells (Treg) from conventional CD4+ T cell precursors in response to antigenic stimulation in the present of TGF-beta, a differentiation process highly dependent upon IL-2, was reduced 4-fold in Il2-83-cRE-ko mice compared to wild-type mice (Figure 7C), but could be rescued by addition of exogenous IL-2 to the culture (Figure 7C).
To test whether the distal −83 kb Il2 enhancer contributes to physiologic immune responses in vivo, we immunized wild-type and Il2-83-cRE-ko mice with the model antigen chicken ovalbumin. Il2-83-cRE-ko animals showed a nominal increase in the differentiation of CD4+ T cells into CXCR5+PD-1hi follicular helper T cells (Tfh) in the spleen compared to wild-type animals (Figure 7D), a process known to be antagonized by IL-2, and generated significantly elevated levels of ovalbumin-specific IgG antibody following immunization (Figure 7E). To determine whether the 83 kb Il2 enhancer contributes to auto-inflammatory disease susceptibility in vivo, we used an adoptive transfer model of IBD in which the development of colitis is determined by the balance between conventional and regulatory CD4+ T cell activities81, 82. Upon transfer into lymphocyte-deficient Rag1-ko recipients, wild-type CD4+ T cells respond to microbiota in the gut by inducing inflammatory colitis that leads to weight loss (Figure 7F, WT Tconv), while co-transfer of wild-type Treg effectively controls inflammation (Figure 7F, WT Tconv + WT Treg). Transfer of Il2-83-cRE-ko Tconv led to significantly less colitis (Figure 7F, Il2-83-cRE-ko Tconv), and was associated with a nominal decrease in Il2-83-cRE-ko Tconv in the spleen, and a significant decrease in the accumulation of Il2-83-cRE-ko Tconv in the mesenteric lymph nodes (MLN) that drain the intestines (Figure 7F inset). Treg require IL-2 from Tconv for their function and homeostasis83, and despite their reduced numbers, Il2-83-cRE-ko Tconv were able to support the homeostasis and function of co-transferred wild-type Treg (Figure 7F, Il2-83-cRE-ko Tconv + WT Treg). That the −83 kb ortholog of the human −128 kb IL2 enhancer contributes to physiologic immune responses and inflammatory disease susceptibility in vivo.
Impact of autoimmune risk-associated genetic variation on cis-regulatory element activity
The chromatin conformation approach used here employs GWAS variants as ‘signposts’ to identify disease-relevant regulatory elements and connect them to their target genes, but does not per se determine the effect of disease-associated genetic variation on enhancer activity or gene expression. We experimentally measured variant effects at the IL2 locus, where the −128 enhancer defined above contains two SNPs linked to T1D, JIA, and SLE (rs1512973 and rs12504008). Using a recombinant reporter assay in primary activated CD4+ T cells, we confirmed that disease-associated genetic variation influences intergenic IL2 enhancer activity at the −128 kb element, in that the risk allele contributes significantly less transcriptional activity than the reference allele (Figure 8A).
Autoimmune variants are likely to influence disease risk by altering the activity of cis-regulatory elements in T cells. At genome scale, we identified over 1000 cRE likely impacted by autoimmune disease-associated genetic variation, in that they contain autoimmune risk SNPs that are predicted to decrease or increase transcription factor binding affinity. Overall, 1370 autoimmune risk variants in open chromatin were predicted to influence the activity of 495 DNA binding factors (Supplementary File 17), including PLAG1, PRDM1, BACH2, MYC, TBX21, BHLHE40, LEF1, TCF7, BCL6, IRF, p53, STAT (Figure 8 – Figure Supplement 1A), and hundreds of NFkB, EGR, KLF, FKH/FOX, and FOS-JUN sites (Figure 8B). For example, the T1D SNP rs3024505 (PP = 0.200) connected to the promoters of IL9 and FAIM3 (Figure 8 – Figure Supplement 1B) and the celiac SNP rs13010713 (PP = 0.154) contacting the ITGA4 promoter (Figure 8 – Figure Supplement 1C) are predicted to disrupt binding sites for MZF1 (Figure 8 – Figure Supplement 1D) and SOX4 (Figure 8 – Figure Supplement 1E). Similarly, the MS SNP rs1077667 contacting the promoters of GPR108 and TRIP10 (Figure 4H) is predicted to reduce affinity for TP53, TP63, and OCT2/POU2F2 (Figure 8 – Figure Supplement 1F).
Functional validation of chromosome capture-based V2G-implicated effector genes
To determine whether genes identified via their physical interaction with autoimmune variants in CD4+ T cell chromatin contact maps tend to be directly involved in T cell activation and function, we compared the set of autoimmune genes implicated by chromatin contacts in this study to sets of genes identified in CRISPR-based screens that control aspects of CD4+ T cell activation like proliferation and expression of the inflammatory genes IFNG, CTLA4, IL2, IL2RA, and TNF21–23. The set of all V2G-implicated genes was highly enriched for genes shown to regulate IL-2, IL-2 receptor, CTLA-4, and proliferation (Figure 8C, Figure 8 – Figure Supplement 2A, Supplementary File 18). For example, 202 genes shown to regulate IL-2 production and 166 genes shown to regulate proliferation were also implicated in our autoimmune V2G set (Figure 8 – Figure Supplement 2A, Supplementary File 16). Genes implicated by V2G in activated CD4+ T cells were moderately enriched for genes known to control the production of IFN-G, but at the individual disease level, only genes connected to CRO- and ATD-associated variants were enriched for IFNG regulatory genes (Figure 8C, Figure 8 – Figure Supplement 2A). Genes contacting CRO, PSO, RA, SLE, T1D and VIT variants were moderately enriched for TNF regulatory genes, but the set of all V2G genes was not enriched for TNF genes. We also queried the orthologs of our V2G-implicated genes from the international mouse phenotype consortium (IMPC) database and identified 97 genes that when knocked out give an immune phenotype and 126 V2G genes that result in a hematopoietic phenotype (Figure 8D, Figure 8 – Figure Supplement 2B, Supplementary File 18). This frequency of observed immune/hematopoietic phenotypes represents a significant (adjP<0.05) ∼30% enrichment over expected. This gene set was also enriched for mortality, homeostasis/metabolism, growth/body size, skeleton, and embryonic phenotypes (Figure 8D, Supplementary File 18).
An important application of this V2G approach is the identification of novel regulators of T cell activation and their potential as drug targets, as nearly 20% of implicated genes have at least one chemical modulator currently available (Figure 8E, Supplementary File 19). As shown above, this approach validates genes that are well-studied regulators of T cell function, however, a significant portion of implicated genes are not well-studied and are not currently known to regulate T cell activation (Figure 8F). We observed a trend that genes expressed more highly in immune tissues (GTEX) have in general been better investigated, and identified several less-studied genes that could be novel targets, including the kinases GRK6, PTK6, SIK1, and MAP3K11, the G protein-coupled receptors OXER1, GPR183, GPR18, and KISS1R, the acetylcholine receptor CHRNB1, and the de novo purine pathway enzyme GART. To determine whether these V2G-implicated genes are novel regulators of T cell activation, we used commercially available pharmacologic modulators in dose-response assays of activation-induced T cell proliferation. Stimulation of T cells in the presence of ligands for CHRNB1, KISS1R and OXER1 did not significantly affect T cell proliferation, however, small molecules targeting GRK6, PTK6, MAP3K11, GPR183, GART, and SIK1 inhibited T cell activation in the nanomolar to micromolar range (Figure 8G). Together, these data show that maps of dynamic, 3D variant-gene chromatin contacts in stimulated CD4+ T cells are able to identify genes with bona fide roles in T cell activation.
Discussion
By measuring dynamic changes in chromosome folding, chromatin accessibility, and gene expression in naïve CD4+ T cells as a function of TCR-CD28 costimulation, we identified the putative cis-regulatory landscape of autoimmune disease-associated genetic variation and physically connected these elements to their putative effector genes. This and prior chromosome capture-based studies show that most cRE and their variants interact with, and therefore have the potential to regulate, more than one gene (median 5 in this study), supporting a scenario in which multiple effector genes are operative at a GWAS locus. We validated a stretch of novel distal elements predicted by our V2G approach as bona fide enhancers for the canonical immune gene IL2, and showed that autoimmune-associated genetic variation at one of these elements influences it’s activity. We also conducted pharmacologic targeting experiments and compared our results with CRISPR-based studies to validate sets of bona fide effector genes with a confluence of multiple orthogonal lines of evidence supporting their role in CD4+ T cell activation.
We also observe that the vast majority (87-95%) of GWAS effector genes predicted by chromosome capture-based approaches are not the genes nearest to the sentinel SNPs queried in this study, while roughly one-third of eGenes predicted by eQTL approaches are the nearest to a sentinel. This and the low concordance between 3D chromatin vs. eQTL eGene predictions is consistent with the view that eQTL-based and GWAS-based approaches inherently implicate different types of genes. While eQTLs cluster strongly near transcription start sites of genes with simple regulatory structures and low enrichment for functional annotations, GWAS variants are generally far from the TSS of genes with complex regulatory landscapes84. Moreover, eGene discovery by eQTL studies using scRNA-seq approaches are significantly limited by the low gene detection power inherent to these methods, particularly in rare cell types.
Our systematic comparison of 3D chromatin-based target gene nomination studies in human CD4+ T cells revealed significant variability between datasets, with the highest concordance exhibited by the Javierre and Burren datasets (37%) and the next highest exhibited by our V2G and the Javierre dataset (24%). Although we harmonized the comparisons as much as possible by subjecting each dataset to the same HiC loop calling and GWAS integration steps, there are several potential sources for the observed discrepancy between the studies. The modes of stimulation are largely comparable, but timepoints and donors varied, and ours was the only study that sorted naïve CD4+ T cells prior to stimulation. The higher concordance among promoter-capture datasets compared to our HiC dataset is likely due to their lower resolution compared to our HiC and their greater sequencing depth focused at promoters compared to HiC. The lower resolution of HindIII-based capture-HiC results in loops called to the wrong promoters24, and will miss distal SNP interactions at any promoters excluded from the capture set. While HiC is unbiased in this regard, high resolution HiC will fail to call some SNP-promoter loops because the sequencing depth is spread across the whole genome instead of focused at promoters. However, despite variation between studies, the results show the clear value of tissue-matched chromatin conformation maps compared to tissue-averaged HiC for understanding the complex genetic and epigenetic mechanisms that regulate gene expression and for predicting autoimmune effector genes.
Why do the V2G approaches not capture a larger proportion of genes from the human inborn errors in immunity ‘truth set’? Low recall/sensitivity could result from the fact that a substantial portion of the mutations in the full HIEI gene set result in immunodeficiency but not autoimmune phenotypes. However, restricting the HIEI truth set to only those disease gene mutations that result in an autoimmune phenotype (143 genes) only increased recall from 16% to 17.5%, suggesting that this is not a major factor reducing sensitivity. Also, our method uses GWAS signals as an input, and unlike GWAS for height, BMI, blood traits, etc., most autoimmune GWAS signals are likely not saturated, which limits discoverability. Another reason for reduced sensitivity is the likelihood that many of the genes in the HIEI truth set operate in cells other than CD4+ T cells, and consistent with this, when the Javierre pcHiC V2G is extended to all immune cell types tested, recall increases from 17% to 27%. This observation emphasizes how limited inclusion cell types and states in a study can significantly limit the power to detect autoimmune effector genes. Why are so few of our 3D chromatin-based V2G genes present in the HIEI truth set? Low precision could be due to false positives in the V2G gene set; i.e., contacts between disease-associated cRE and genes that are not in fact involved in disease susceptibility. This does not argue per se against the biological relevance of the cRE or the gene, only that the linkage to disease susceptibility is not relevant. Alternatively, the absence of chromatin-GWAS-implicated genes in the truth set could be a false negative from the point of view of the truth set; i.e., monogenic, disease-causing mutations in these genes may exist in the human population, but have not yet been discovered. For example, in the year 2000 approximately 100 monogenic mutations causing human inborn errors of immunity were known, but this increased by ∼11 disease mutations per year until 300 human inborn errors of immunity had been identified by 2017. The advent of next-generation exome/genome sequencing and the COVID-19 pandemic resulted in an increase in the rate of HIEI discovery between 2018-2021 to ∼40 per year to the current recognized HIEI of ∼450. If we assume a conservative continued rate of HIEI discovery of 25 disease genes per year, the next 10 years will show us an additional ∼250 disease genes that are not currently contained in the truth set.
Many of the genes identified in this 3D epigenomic V2G screen have known roles in T cell activation and function. An example is IL2, and we used the resulting maps to identify and validate a stretch of previously unknown distal enhancers whose activity is required for IL2 expression and is influenced by autoimmune genetic variation. Another example is the phosphatase DUSP5 that regulates MAPK signaling during T cell activation85, 86. However, roles for many of the genes implicated here in T cell activation are not known. For example, one of the top implicated genes, PARK7, is a deglycase studied in the context of Parkinson’s disease, but has a recently been shown to modulate regulatory T cell function87. The orphan G protein-coupled receptor GPR108 is another top gene uniquely implicated by our chromatin-based V2G that has not been studied in T cells, but was identified in a recent CRISPR screen for genes affecting IL-2 levels23. Also co-implicated by our study and recent CRISPR screens are the cannabidiol receptor GPR1888 and the purine biosynthetic enzyme GART89. The GPR18 agonist arachidonyl glycine inhibited CD4+ T cell activation above 10 uM, while the GPR18 antagonist O-1918 slightly enhanced T cell activation. This GPR was implicated by both chromatin- and eQTL-based approaches. Antagonism of GART, an enzyme we previous identified as a V2G effector gene in COVID19 severity58, with the FDA-approved drug lometrexol inhibited T cell activation in the 10 nM range. Antagonism of GRK6, a member of the G-coupled receptor kinase family associated with insulin secretion and T2D susceptibility90, and PTK6, an oncogenic kinase studied in the context of cancer91, led to inhibition of T cell activation in the nM to uM range. These targets were implicated by the other chromatin-based approaches but not by eQTL. Inhibition of MAP3K11, a kinase that facilitates signaling through JNK1 and IkappaB kinase in cancer92, 93 inhibited stimulation-induced CD4+ T cell proliferation in the 100 nM range, as did 25-hydroxycholesterol, a ligand of the G protein-coupled receptor GPR183 that was implicated by both chromatin- and eQTL-based approaches. SIK1 is a member of the salt-inducible kinase family uniquely implicated in our study that negatively regulates TORC activity61, and a small molecule SIK1 inhibitor potently antagonized stimulation-induced CD4+ T cell activation in the pM range.
Our integration of high-resolution Hi-C, ATAC-seq, RNA-seq and GWAS data in a single immune cell type across multiple activation states identified hundreds of autoimmune variant-gene pairs at ∼half of all GWAS loci studied, and application of this technique to additional immune cell types will likely identify effector genes at many of the remaining loci. This study highlights the value of chromosome conformation data as a powerful biological constraint for focusing variant-to-gene mapping efforts94, and shows that dynamic changes in the spatial conformation of the genome that accompany cell state transitions alter gene expression by exposing promoters to a varying array of cis-regulatory elements, transcription factors, and genetic variants.
Materials and methods
T cell isolation and in vitro stimulation
Human primary CD4+ T cells were purified from the apheresis products obtained from healthy, screened human donors through University of Pennsylvania Human Immunology Core (HIC). Naïve CD4+ T cells were purified using EasySepTM human naïve CD4+ T cell isolation kit II (STEM cells Technologies, cat#17555) by immunomagnetic negative selection as per manufacturer’s protocol. Isolated untouched, highly purified (93-98%) naïve human CD4 T cells were activated using anti-CD3+anti-CD28 Dynabeads (1:1, Thermofisher scientific, cat # 11161D) for 8-24 hours. Cells were then used to prepare sequencing libraries. The human leukemic T cell line Jurkat was obtained from ATCC, cloned by limiting dilution, and clones with high activation-induced secretion of IL-2 were selected for further study. Single-cell mouse lymphocyte suspensions were prepared from spleen and lymph nodes isolated from 6-week-old female wild-type or Il2-83cRE-ko mice (C57BL6 background). Mouse CD4+ T cells were purified by negative selection using a CD4+ T cell isolation kit (Miltenyi Biotec, cat. # 130-104-454). For CD8 depletion, CD8+ T cells were positively stained using CD8a (Ly-2) microbeads (Miltenyi Biotec, cat. # 130-117-044) and the negative flow-through fraction was collected. Tregs were purified from the mouse lymphocytes using a CD4+CD25+ regulatory T cell isolation kit (Miltenyi Biotec, cat. # 130-091-041). The purity of the isolated cells was checked by flow cytometry, showing approximately 95% cell purity. For iTreg induction, CD4+ CD25-T cells (1x 106) were activated in a 24-well plate pre-coated with anti-CD3 (1 µg/mL) and the cells were cultured for 72 hours in RPMI culture medium with soluble anti-CD3 (0.5 µg/mL), IL-2 (25 units/mL), TGF-beta (3 ng/mL), anti-IFN-gamma (5 µg/mL), and anti-IL-4 (5 µg/mL). iTreg induction cultures were also set up without adding IL-2. iTreg cultures were harvested after 72 hours, and cells were stained for Foxp3 expression.
Il2-83-cRE ko Mice
The CRISPR/CAS method was employed to delete the intergenic −83 CNS sequence between Il2 and Il21. CRISPR guide RNAs were designed (two guide RNAs upstream of −83CNS) and one guide RNA (downstream of −83CNS) using guide RNA design tools (http://crispr.mit.edu.guides). The pX335 plasmid (Addgene #42335) was used to generate a DNA template for sgRNA in vitro transcription. To in vitro transcribe the sgRNA, a T7 sequence was incorporated at the 5’ end of the guide RNA sequence, and a part of the sgRNA scaffold sequence from px335 was added at the 3’ end of the guide RNA. Oligonucleotide sequences containing the guide RNA (underlined) along with these added sequences were synthesized by IDT. The oligonucleotide sequences were as follows:
T7_guideRNA#1_scaffold#1: TTAATACGACTCACTATAGGTTTTCCACGGATCTGCTCGGGTTTTAGAGCTAGAAATAGC T7_guideRNA#1_scaffold#2: TTAATACGACTCACTATAGGTGCTTTCTAGGTGAAGCCCCGTTTTAGAGCTAGAAATAGC T7_guideRNA#1_scaffold#3: TTAATACGACTCACTATAGGTCATTTGAGCCTAACTACTCGTTTTAGAGCTAGAAATAGC Along with the above-cited sequences, a reverse primer sequence (AGCACCGACTCGGTGCCACT) from PX335 was used to amplify a PCR-generated template for sgRNA in vitro transcription. The PCR product (∼ 117 bp) was verified on a 2% agarose gel and then gel-purified using the QIAQuick gel extraction kit. The T7 sgRNA PCR product (500 ng) was used as a template for in vitro transcription with the T7 High Yield RNA synthesis kit (NEB, cat # E2040S). The reaction was incubated at 37°C for 4 hours, and then the sgRNA was purified using the MEGAclear kit (Life Technologies, cat # AM1908) according to the kit protocol. The sgRNA was eluted with elution buffer preheated to 95°C. The eluted sgRNA was centrifuged at 13,000 rpm for 20 minutes at 4°C, and the suspension was transferred into a new RNase-free tube. The sgRNA quality was checked by a bioanalyzer using an RNA nano Chip. The sgRNA was diluted to 500 ng/μl in injection buffer (sterile 10 mM Tris/0.1 mM EDTA, pH 7.5) and stored in a −80°C freezer until use. An injection mixture (final volume 30 μl) was prepared in injection buffer by mixing 500 ng/μl of each of the sgRNAs (left and right) with Cas9 mRNA (1 μg/μl, TriLink, cat # L-6125). Fertilized eggs collected from B6/129 mice were microinjected at the CHOP transgenic core and transferred into pseudo-pregnant B6 females. The pups were genotyped using primers flanking the targeted sequence (Forward primer: TTAGGACCTCACCCATCACAA and reverse primer: CATGCCCAGCTACTCTGACAT). The PCR product was cloned into the Promega PGEM T Easy TA cloning vector, and plasmid DNA was Sanger sequenced to determine the size of the deletion. The targeting resulted in mice with a 500 bp and 583 bp deletion at the targeted −83 CNS site and these mutant mice showed same phenotype. The Il2-83-cRE mutant heterozygous male mice were backcrossed with B6 females for 10 successive generations.
ELISA
Cell culture supernatants collected at various time intervals from in vitro stimulated T cells of mice or humans were analyzed for IL-2 by ELISA using kits purchased from ThermoFisher Scientific (mouse IL-2 ELISA kit: cat # 88-7024-88) and human IL-2 ELISA kit, cat # 88-7025-76). IL-2 ELISA was performed following the protocol provided by the vendor.
In vivo ovalbumin immunization
Eight week-old female WT and Il2-CNS-83 KO mice were injected intraperitoneally with 50 µg of ovalbumin (Sigma cat # A5503-5G) mixed with IFA (Sigma cat # F5506). Blood was collected from the immunized mice after 10 days, and serum was separated. The level of ovalbumin-specific IgG in the serum was determined by ELISA using an ovalbumin-specific IgG ELISA kit purchased from MyBiosource (cat # MBS763696). The immunized mice were sacrificed on day 10 of immunization, and spleen and lymph nodes were collected. Splenocytes and lymph node cells were stained with CD4 BV785, CD25 BV650, CD44 Percp Cy5.5, CD62L APC eFL780, CXCR5 BV421, PD-1 APC, Bcl-6 PE antibodies. TFH frequency was determined by flow cytometry analysis.
In vivo inflammatory colitis model
Conventional CD4+CD25-ve T cells and CD4+CD25+ Tregs were purified from spleens and lymph nodes of male wild-type or Il2-CNS-83 ko mice (6-8 weeks of age). T cells were transferred into Rag1-ko male mice (5 per group) by retro-orbital injection of 1 million wild-type or Il2-CNS-83 ko Tconv cells alone or along with (0.25 x 106) wild-type Tregs. Experimental Rag1-ko recipients were weighed 3 times per week and scored for IBD-induced clinical symptoms. Mice were sacrificed 72 days post-transfer, and spleen, mesenteric lymph nodes, and colon were collected. Single-cell suspensions were prepared, cells were stained for CD4, CD8, CD25, CD44, and Foxp3, and analyzed by flow cytometry. The absolute T cell count was estimated using cell count and cell frequency derived from the flow cytometry analysis.
RNA-seq library generation and sequencing
RNA was isolated from ∼ 1 million of each cell stage using Trizol Re-agent (Invitrogen), purified using the Directzol RNA Miniprep Kit (Zymo Research), and depleted of contaminating genomic DNA using DNAse I. Purified RNA was checked for quality on a Bioanlayzer 2100 using the Nano RNA Chip and samples with RIN>7 were used for RNA-seq library preparation. RNA samples were depleted of rRNA using QIAseq Fastselect RNA removal kit (Qiagen). Samples were then processed for the preparation of libraries using the SMARTer Stranded Total RNA Sample Prep Kit (Takara Bio USA) according to the manufacturer’s instructions. Briefly, the purified first-strand cDNA is amplified into RNA-seq libraries using SeqAmp DNA Polymerase and the Forward and the Reverse PCR Primers from the Illumina Indexing Primer Set HT for Illumina. Quality and quantity of the libraries was assessed using the Agilent 2100 Bioanalyzer system and Qubit fluorometer (Life Technologies). Sequencing was performed on the NovaSeq 6000 platform at the CHOP Center for Spatial and Functional Genomics.
ATAC-seq library generation and sequencing
A total of 50,000 to 100,000 sorted cells were centrifuged at 550 g for 5 min at 4 °C. The cell pellet was washed with cold PBS and resuspended in 50 μL cold lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% NP-40/IGEPAL CA-630) and immediately centrifuged at 550 g for 10 min at 4 °C. Nuclei were resuspended in the Nextera transposition reaction mix (25 μL 2x TD Buffer, 2.5 μL Nextera Tn5 transposase (Illumina Cat #FC-121-1030), and 22.5 μL nuclease free H2O) on ice, then incubated for 45 min at 37 °C. The tagmented DNA was then purified using the Qiagen MinElute kit eluted with 10.5 μL Elution Buffer (EB). Ten microliters of purified tagmented DNA was PCR amplified using Nextera primers for 12 cycles to generate each library. PCR reaction was subsequently cleaned up using 1.5x AMPureXP beads (Agencourt), and concentrations were measured by Qubit. Libraries were paired-end sequenced on the Illumina HiSeq 4000 platform (100 bp read length).
Hi-C library preparation
Hi-C library preparation on FACS-sorted CD4+ T cells was performed using the Arima-HiC kit (Arima Genomics Inc), according to the manufacturer’s protocols. Briefly, cells were crosslinked using formaldehyde. Crosslinked cells were then subject to the Arima-HiC protocol, which utilizes multiple restriction enzymes to digest chromatin. Arima-HiC sequencing libraries were prepared by first shearing purified proximally-ligated DNA and then size-selecting 200-600 bp DNA fragments using AmpureXP beads (Beckman Coulter). The size-selected fragments were then enriched using Enrichment Beads (provided in the Arima-HiC kit), and then converted into Illumina-compatible sequencing libraries with the Swift Accel-NGS 2SPlus DNA Library Kit (Swift, 21024) and Swift 2S Indexing Kit (Swift, 26148). The purified, PCR-amplified DNA underwent standard QC (qPCR, Bioanalyzer, and KAPA Library Quantification [Roche, KK4824]) and was sequenced with unique single indexes on the Illumina NovaSeq 6000 Sequencing System using 200 bp reads.
ATAC-seq data analysis
ATAC-seq peaks from libraries unstimulated and stimulated Naïve CD4+ T cells were called using the ENCODE ATAC-seq pipeline (https://www.encodeproject.org/atac-seq/). Briefly, pair-end reads from three biological replicates for each cell type were aligned to hg19 genome using bowtie295, and duplicate reads were removed from the alignment. Narrow peaks were called independently for each replicate using macs296 (-p 0.01 --nomodel --shift −75 --extsize 150 -B -- SPMR --keep-dup all --call-summits) and ENCODE blacklist regions (ENCSR636HFF) were removed from peaks in individual replicates. Reproducible peaks, peaks called in at least two replicates, were used to generate a consensus peakset. Signal peaks were normalized using csaw97 in 10kb bins background regions and low abundance peaks (CPM>1) were excluded from the analysis. Tests for differential accessibility was conducted with the glmQLFit approach implemented in edgeR98 using the normalization factors calculated by csaw. OCRs with FDR < 0.05 and abs(log2FC > 1) between stages were considered differentially accessible.
Hi-C data analysis
Paired-end reads from two replicates were pre-processed using the HICUP pipeline v0.7.499, with bowtie as aligner and hg19 as the reference genome. The alignment .bam file were parsed to .pairs format using pairtools v0.3.0 (https://github.com/open2c/pairtools) and pairix v0.3.7 (https://github.com/4dn-dcic/pairix), and eventually converted to pre-binned Hi-C matrix in .cool format by cooler v0.8.10100 with multiple resolutions (500bp, 1kbp, 2kbp, 2.5kbp, 4kbp, 5kbp, 10kbp, 25kbp, 40kbp, 50kbp, 100kbp, 250kbp, 500kbp, 1Mbp and 2.5Mbp) and normalized with ICE method101. Replicate similarity was determined by HiCRep v1.12.2100 at 10K resolution. For each sample, eigenvectors were determined from an ICE balanced Hi-C matrix with 40kb resolution using cooltools v0.3.2102 and first principal components were used to determine A/B compartments with GC% of genome region as reference track to determine the sign. Differential TAD comparison was performed using TADcompare with the default settings for each chromosome (v1.4.0)103. Finally, for each cell type, significant intra-chromosomal interaction loops were determined under multiple resolutions (1kb, 2kb and 4kb) using the Hi-C loop caller Fit-Hi-C2 v2.0.7104 (FDR<1e-6) on merged replicates matrix. The consensus chromatin loops within resolution were identified by combining all three stages. These sets of loops were used as consensus for quantitative differential analysis explained below. The final consensus interaction loops for visualization were collected by merging loops from all the resolutions with preference to keep the highest resolution. Quantitative loop differential analysis across cell types was performed on fast lasso normalized interaction frequency (IF) implemented in multiCompareHiC v1.8.0105 for each chromosome at resolution 1kb, 2kb and 4kb independently. The contacts with zero interaction frequency (IF) among more than 80% of the samples and average IF less than 5 were excluded from differential analysis. The QLF test based on a generalized linear model was performed in cell type-pairwise comparisons, and p-values were corrected with FDR. The final differential loops were identified by overlapping differential IF contacts with consensus interaction loops.
Bulk RNA-seq data analysis
Bulk RNA-seq libraries were sequenced on an Illumina Novaseq 6000 instrument. The pair-end fastq files were mapped to the genome assembly hg19 by STAR (v2.6.0c) independently for each replicate. The GencodeV19 annotation was used for gene feature annotation and the raw read count for gene feature was calculated by htseq-count (v0.6.1)106 with parameter settings -f bam -r pos -s yes -t exon -m union. The gene features localized on chrM or annotated as rRNAs, small coding RNA, or pseudo genes were removed from the final sample-by-gene read count matrix. Gene set variation analysis was performed using GSVA 1.42.0107 with the MSigDB hallmark geneset108, with resulting scores analyzed using limma (limma_3.50.3)109. Low abundance peaks (CPM>1) were excluded from the analysis. Testing for differential expression was conducted with the glmQLFit approach implemented in edgeR98. Genes with FDR<0.05 and abs(log2FC>1) between stages were considered differentially expressed. Differential genes were then clustered using k-means clustering. The number of clusters was determined using the elbow method on the weighted sum of squares, where was set to k=5. Score for how similar each gene followed the clusters expression pattern was determined by calculating pearson correlation coefficients between each gene in the cluster and the cluster centroid.
Transcription factor footprinting and motif analysis
Transcription factor footprints were called using Regulatory Analysis Toolbox HINT-ATAC (v0.13.0) with pooled ATAC-seq data for each stage and consensus peak calls110. The rgt-hint footprinting was run with parameters –atac-seq, --paired-end, and organism=hg19 set. The output footprint coordinates were subsequently matched using rgt-motifanalysis matching with parameters --organism hg19 and –pseudocount 0.8 set. The JASPAR2020 position weight matrix database was used to match footprints111. Differential analysis of TF binding across conditions was performed using rgt-hint differential with parameters –organism hg19, --bc, --nc 24 using the motif matched transcription factor footprints. An activity score is then calculated based on the accessibility surrounding the footprint.
Partitioned heritability LD score regression enrichment analysis
Partitioned heritability LD score regression112 (v1.0.0) was used to identify heritability enrichment with GWAS summary statistics and open chromatin regions annotated to genes. The baseline analysis was performed using LDSCORE data (https://data.broadinstitute.org/alkesgroup/LDSCORE) with LD scores, regression weights, and allele frequencies from the 1000G phase 1 data. The summary statistics were obtained from studies as described in Supplementary File 14 and harmonized with the munge_sumstats.py script. Annotations for Partitioned LD score regression were generated using the coordinates of open chromatin regions that contact gene promoters through Hi-C loops for each cell type. Finally, partitioned LD scores were compared to baseline LD scores to measure enrichment fold change and enrichment p-values, which were adjusted with FDR across all comparisons.
Variant-to-gene mapping using HiC-derived promoter contacts
95% credible sets were determined as previously described113. Briefly, P values from GWAS summary statistics were converted to Bayes factors. Posterior probabilities were then calculated for each variant. The variants were ordered from highest to lowest posterior probabilities added to the credible set until the cumulative sum of posterior probabilities reached 95%. Variants in the 95% credible set were then intersected with the CD4+ T cell promoter interacting region OCRs from the three timepoints using the R package GenomicRanges (v1.46.1)114.
Genomic reference and visualizations
All analyses were performed using the hg19 reference genome using gencodeV19 as the gene reference. Genomic tracks were visualized with pyGenomeTracks v3.5115. HiC matrices depict the log1p(balanced count) from the cooler count matrix. ATAC-seq tracks were generated from bigwig files that were normalized using deeptools116.
ABC model predictions
We used the ABC model (PMID: 31784727) using the our Tcell ATAC-seq data (unstimulated or 24 hours stimulated) generated from Naive CD4+ T cells merged acrossed replicates. For H3K27ac we retrieved paired fastq files from H3K27ac MINT-ChIP data from ENCODE for resting (unstimated) and activated (36hr stimulated) CD4 T cells derived from thymus. The deduplicated bam files were used as signal files for the ABC model, the consensus peak set were used as the input set of enhancers. The accession numbers for unstimulated cells are: ENCFF732NOS, ENCFF658TLX, ENCFF605OGN, ENCFF012XYW, ENCFF407QZP, ENCFF556QWM, ENCFF888IHV, ENCFF260FUF, ENCFF550EVW, ENCFF747HGZ, ENCFF067XWQ, ENCFF939YBH, ENCFF749LGW, ENCFF788LWV, ENCFF610FRB, ENCFF538GQX, ENCFF355PCA, ENCFF040OMT, ENCFF878SHZ, ENCFF210WNQ, ENCFF849ZJH, ENCFF925ARM, ENCFF112YNU, ENCFF808JFV, ENCFF448KHE, ENCFF591BFV, ENCFF366UTF, ENCFF172EWR. The accession numbers for stimulated cells are: ENCFF442DFF, ENCFF870ZYC, ENCFF189UPS, ENCFF725KSA, ENCFF618YAV, ENCFF765AJK, ENCFF854QUS, ENCFF180TKV, ENCFF761RCK, ENCFF449EAE, ENCFF915VPK, ENCFF575XFH, ENCFF227AGX, ENCFF810XFH, ENCFF367OMH, ENCFF114BBZ, ENCFF855XPP, ENCFF132AKN, ENCFF663MHJ, ENCFF244TPB, ENCFF230RIJ, ENCFF648FPC, ENCFF797CIK, ENCFF096JPW, ENCFF704KMT, ENCFF385POD, ENCFF395RIG, ENCFF043RRJ, ENCFF027RNR, ENCFF333RGD, ENCFF048PMV, ENCFF887XSA, ENCFF188CNA, ENCFF614IZY, ENCFF984TTZ, ENCFF977RDC, ENCFF393EWO, ENCFF781VQT, ENCFF638UCZ, ENCFF873QZH, ENCFF553JPM, ENCFF589HRL, ENCFF721ZCN, ENCFF496HYV, ENCFF455WBD, ENCFF679UTJ, ENCFF200MYN, ENCFF459ARH.
The retrieved files were trimmed/preprocessed using fastp with the default settings. Following this, matched R1 and R2 files were concatenated to a single R1/R2 file pair. The reads were aligned to hg19 and duplicates were removed as described for the ATAC-seq data analysis. The deduplicated bam files were used as input to the ABC model. We ran the ABC model with the default settings for hg19 and provided reference files for the blacklist, promoter annotation, chromosome sizes reference, and set of ubiquitiously expressed genes. The HiC data used was the KR normalized averaged HiC dataset that was derived from a set of cell lines (GM12878, NHEK, HMEC, RPE1, THP1, IMR90, HU-VEC, HCT116, K562, KBM7; average_hic.v2.191020.tar.gz) at 5kb resolution then scaled by powerlaw distribution as described in the original manuscript. The default threshold of 0.2 was used to link enhancers to genes. The resulting elements were intersected with the credible set list used in this study for precision-recall analyses.
Precision-recall analyses against HIEI genes
To benchmark our autoimmune effector gene predictions against prior eQTL and chromatin capture studies, we curated a list of 449 expert curated genes from Human Errors in Inborn Immunity 2022, where mutations have been reported to cause immune phenotypes in humans65. Most of these genes were identified through rare loss of function mutations in the coding region. In addition to this we also took a subset of 145 genes reported to specifically result in autoimmune phenotypes. These two sets of genes were treated as “gold standard” sets of genes to compare different approaches to predict GWAS effector genes. We curated the results of several other chromatin conformation or eQTL based methods of assigning variants to genes to compare with our study. For the chromatin confirmation studies, we obtained loop calls from the associated publications and intersected with the list variants in the 95% credible set from 15 GWAS used to predict effector genes using GenomicRanges (v1.46.1)114. For eQTL studies, we obtained the summary stats files and identified eGenes linked to any member of the credible set reported as an eQTL in the transcriptome wide association analysis (TWAS; p-value < 1e-4). Precision was calculated as the ratio of the number predicted genes in the truth set to the total number of predicted genes, while recall was calculated as the ratio of the number predicted genes in the truth set to the total number of genes in truth set.
Colocalized eQTL comparisons
In addition to comparisons with the eGenes identified by TWAS in the Soskic et al. study, we also compared overlap of gene nominations in our study with the eQTLs that colocalized with an autoimmune GWAS13. Enrichment of sentinel-gene assignments was conducted similarly as described previously56. Briefly, a null distribution was constructed by randomly selecting genes within 1 Mb of the sentinel compared to the set of colocalized cis-eQTL13 found in through 10,000 iterations. The observed overlap reports the set of gene identified by both our HiC based approach with the set of colocalized eQTLs. We report the empirical P value of the observed value relative to null distribution.
International mouse phenotyping consortium comparisons
The set of HiC implicated genes were compared to the mouse international phenotyping consortium set of genes with reported phenotypes117. We converted the list of V2G implicated genes to mouse homologs using homologene. We tested for enrichment for each phenotype using a one-sided proportion test implicated in R prop.test with type set to “upper”.
Identification of pharmacological agents
We queried the Drug-Gene Interaction Database with the set of V2G implicated genes for chemicals using rDGIdb (v1.20.0)118. To identify the number of papers for each gene with at least one drug annotated to target it, we queried pubmed titles and abstracts using the R package RISmed (v2.3.0) with each gene’s name and either “autoimmune” and the list of autoimmune diseases (Supplementary File 17). A score to approximate expression specificity was computed using the sum GTEX median expression values (v8) for whole blood or spleen divided by other tissues119.
Lentiviral-based CRISPR/CAS9 targeting in Jurkat cells
LentiCRISPRv2-mCherry vectors encoding gRNA-CAS9 and the fluorescent reporter mCherry were used for Jurkat targeting. CRISPR guide RNAs (sgRNA) targeting human IL-2-21 intergenic −46, −51, −80, −83, −85, −122, −128 CNS regions were designed using http://crispr.tefor.net and cloned into lentiCRISPRv2-mCherry. Empty vector without gRNA insert was used as a control. Below is the list of CRISPR gRNA for causing deletion of human IL2-21 intergenic regions:
HEK 293T cells were grown in RPMI1640 complete medium (RPMI1640 + 1X P/S, 1x L-Glu, 10% FBS), 37C, 7%CO2. 293T cells were transfected with 10 ug of lenti-CRISPR-V2-CRE construct along with packaging plasmid 6 ug of PsPAX2 (Addgene, Cat #12260) and 3.5 ug of PmD2.G (Addgene, Cat #12259) using Lipofectamine 2000 transfection reagent (Invitrogen cat # 11668019). After 6 hours, the transfection medium was replaced with complete culture medium. Transfected cells were incubated at 37C for 48-72 hours in a cell culture incubator. and then the Lentiviral supernatants were harvested and spun at 300g for 5 minutes to remove cellular debris. Lentiviral supernatants were concentrated using Lenti-XTM Concentrator (Takara Bio, Cat # 631232) and then centrifuged at 1500g for 30 minutes at 4°C and supernatant was discarded. The lentiviral pellet was resuspended at a ratio of 1:20 of the original volume using RPMI media and concentrated virus supernatant aliquots were prepared and stored until use at −80°C. To achieve high transduction efficiency, the viral supernatant was titrated in Jurkat cells through transduction using various dilutions of the viral supernatants and transduction efficiency was determined by mcherry expression analyzed through flow cytometry. Jurkat cells were seeded in a 24 well plate at 0.5 x106/well in culture media, viral supernatant with 8 ug/mL of polybrene was added to each well. Spinfection was performed for 90 min. at 2500 rpm, and transduced cells were equilibrated at 37C for ∼6 hrs, followed by incubation at 37C 5%CO2 for ∼72 hours culture. Transduced Jurkat Cells were then harvested and stimulated using PMA (15 ng/mL) + Ionomycin (1 uM) + human anti-CD28 (2 ug/ml), BioXcell cat # BE0248 in 96 well culture plates (TPP, cat # 92097) in triplicates. Cell culture supernatants were collected at the end of culture and analyzed for IL-2 by ELISA using a kit (cat # 88-7025-76) purchased from Thermofisher Scientific.
gRNA-CAS9 RNP-based targeting in primary human CD4+ T cells
Primary Human CD4 T cells derived from 5 normal healthy donors were obtained from Human Immunology core (University of Pennsylvania). Alt-R S.p. HiFi Cas9 Nuclease V3 cat # 1081061 CAS9 and following list of Alt-R® CRISPR-Cas9 sgRNA targeting IL-2-21 CRE were purchased from Integrated DNA Technologies, USA.
Primary Human CD4 T (5-10e6) were incubated with sgRNA and CAS9 protein complex and electroporation was done using P3 Primary Cell 4D-NucleofectorTM X Kit L (Lonza, cat # V4XP-3024) and Lonza 4D nucleofector system. B2M gene was CRISPR targeted as a positive control. As per manufacturer’s protocol cells were electroporated pulse code Fl-115 in 100ul cuvette format. After nucleofection, cells were allowed to rest in the complete media for ∼2 days. Cells were then harvested, washed with PBS and aliquots of cells were used for further experimentation such as flow staining and cell activation. Primary human CD4 T Cells (0.1e6/well) were seeded in triplicates (for each experimental condition) in 96 well plate format in RPMI complete medium and stimulated using ImmunoCult™ Human CD3/CD28 T Cell Activator (STEM cells Technologies, cat # 10971). Cell culture supernatants were collected at 4h of stimulation and stored at −80C until assayed. IL-2 ELISA was performed using Thermofisher Scientific IL-2 Human Uncoated ELISA Kit with Plates, cat # 88-7025-76.
Recombinant reporter assays
The IL2 URR was cloned into Xho I and Hind III sites of PGL4 Luc vector (Promega) and then ∼500 bp individual sequence representing IL2-IL21 intergenic CNS at −46, −51, −80, −83, −85, - 128 were cloned upstream to IL-2 URR at the Xho I site of the pGL4 vector. Primary human CD4 T cells obtained from 5 normal healthy donors were transfected with PGL4-cRE-URR constructs. Briefly, primary human CD4 T cells were activated with anti-CD3+ anti-CD28 Dynabeads (1:1) (Thermofisher scientific, cat # 11161D) + IL-2 overnight and then 1 million aliquots of cells (triplicates) were electroporated using Nucleofector 2b, human T cell Nuclefactor kit (Lonza VPA # 1002, program # T-020) with 2ug of PGL4 firefly vector constructs along with 0.2 ug of PGL4 Renilla vector; cells were allowed to rest ON in RPMI + IL-2 and then re-stimulated with plate-bound anti-CD3+ anti-CD28 (2 ug/ml each) for 5 hrs. Dual luciferase assay was performed with cell lysate prepared using Promega dual luciferase assay kit. Cell lysate was prepared in PLB as per manufacturer’s protocols and then firefly and renilla luciferase activities were analyzed by spectramax ID5 (Molecular Devices). Firefly luciferase activity was normalized against the internal control renilla luciferase activity.
Pharmacologic validation of novel T cell activation regulatory genes
The drugs lometrexol (GART antagonist), 25-hydroxycholesterol (GPR183 ligand), epinephrine & norepinephrine (CHRNB agonists), and arachidonoyl glycine (GPR18 agonist) were purchased from Sigma. The GPR18 antagonist O-1918 was purchased from ChemCruz. CEP1347 (MAP3K11 antagonist), kisspeptin 234 (KISSIR ligand) and the PTK6 antagonist tilfrinib were purchased from Tocris. The GRK6 antagonist GRK-IN-1 was purchased from DC Chemicals. The SIK1 antagonist HG-91-9-1 was purchased from Selleckchem. The OXER1 agonist 5-Oxo-ete was purchased from Cayman chemicals. The drugs were dissolved in DMSO or ethanol as suggested by the vendor and working stocking concentrations were prepared in RPMI1640 medium or PBS. The drug effect on T cell proliferation was assayed using murine lymphocytes cultured under TCR and CD28 activation conditions. Spleen and lymph nodes were collected from 6-8 weeks old female C57BL/7 mice and single cell lymphocyte cell suspensions were prepared in RPMI 1640 complete medium. 20 million lymphocytes were labeled with CFSE and re-suspended in RPMI 1640 medium. Cells (0.5 million labeled cells/well) were loaded into 48 well culture plate and activated with mouse anti-CD3 and anti-CD28 agonistic antibodies (1μg mL each). Drugs at the indicated concentrations were added in the culture medium and both untreated and drug treated cell cultures were incubated at 37°C for 72 hours in a cell culture incubator. Cells were harvested after 3 days of culture, washed with PBS and then stained with live-dead aqua dye. After washing with FACS buffer (PBS containing 2% FBS), cells were stained with fluorochrome conjugated antibodies CD4-APC, CD8-PB, CD44-Percp Cy5.5 and CD25-BV650. Stained cells were analyzed on a Beckman Coulter Cytoflex S flow cytometer. The division profile of CD4+ CFSE+ T cells were gated on live populations. The flow data was analyzed using Flowjo10 software and the number of divided CD4+ T cells were determined as described previously120.
Supplemental Files
Supplementary File 1: Expression and clustering of differentially expressed genes.
Supplementary File 2: Pathway enrichment of differentially expressed RNA-seq genes
Supplementary File 3: Pathway enrichment of differentially expressed RNA-seq genes by cluster
Supplementary File 4: Accessible chromatin regions
Supplementary File 5: Differential accessible open chromatin regions
Supplementary File 6: A/B compartment calls
Supplementary File 7: Differential TAD boundaries
Supplementary File 8: Stripe calls
Supplementary File 9: Differential contact frequency
Supplementary File 10: Summary of TF footprinting
Supplementary File 11: TF target gene pathway enrichment
Supplementary File 12: List of all GWAS studies included
Supplementary File 13: Variant to gene mapping across all timepoints
Supplementary File 14: Partitioned LD score regression output
Supplementary File 15: Autoimmune variants in open gene promoters
Supplementary File 16: Comparison of V2G approaches
Supplementary File 17: Motifs predicted to disrupt transcription factor binding sites.
Supplementary File 18: V2G genes implicated by orthogonal data
Supplementary File 19: V2G target gene drug repurposing results
References
- 1.Non-coding genetic variants in human diseaseHum Mol Genet 24:R102–10
- 2.KLF2 Transcription-Factor Deficiency in T Cells Results in Unrestrained Cytokine Production and Upregulation of Bystander Chemokine ReceptorsImmunity 31:122–130
- 3.Tob is a negative regulator of activation that is expressed in anergic and quiescent T cellsNat Immunol 2:1174–82
- 4.Regulation of quiescence in lymphocytesTrends Immunol 24:380–386
- 5.Differentiation of Effector CD4 T Cell PopulationsAnnu Rev Immunol 28:445–489
- 6.Helper T cell differentiationCell Mol Immunol 16:634–643
- 7.Genetic and epigenetic fine mapping of causal autoimmune disease variantsNature 518:337–343
- 8.Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune CellsCell 167:1398–1414
- 9.Chromatin activity at GWAS loci identifies T cell states driving complex immune diseasesNat Genet :1–8https://doi.org/10.1038/s41588-019-0493-9
- 10.Fine-mapping, trans-ancestral and genomic analyses identify causal variants, cells, genes and drug targets for type 1 diabetesNat Genet 53:962–971
- 11.Global discovery of lupus genetic risk variant allelic enhancer activityNat Commun 12
- 12.Promoter-interacting expression quantitative trait loci are enriched for functional genetic variantsNat Genet 53:110–119
- 13.Immune disease risk variants regulate gene expression dynamics during CD4+ T cell activationNat Genet :1–10https://doi.org/10.1038/s41588-022-01066-3
- 14.Prioritization of autoimmune disease-associated genetic variants that perturb regulatory element activity in T cellsNat Genet :1–10https://doi.org/10.1038/s41588-022-01056-5
- 15.Paths to understanding the genetic basis of autoimmune diseaseNature 435:584–589
- 16.Mechanisms of impaired regulation by CD4(+)CD25(+)FOXP3(+) regulatory T cells in human autoimmune diseasesNat Rev Immunol 10:849–859
- 17.Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene PromotersCell 167:1369–1384
- 18.Chromosome contacts in activated T cells identify autoimmune disease candidate genesGenome Biol 18
- 19.Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritisNat Commun 11
- 20.Quantitative trait locus (xQTL) approaches identify risk genes and drug targets from human non-coding genomesHum Mol Genet 31:R105–R113
- 21.Genome-wide CRISPR Screens in Primary Human T Cells Reveal Key Regulators of Immune FunctionCell 175:1958–1971
- 22.CRISPR activation and interference screens decode stimulation responses in primary human T cellsScience 375
- 23.Systematic discovery and perturbation of regulatory genes in human T cells reveals the architecture of immune networksNat Genet 54:1133–1144
- 24.Restriction enzyme selection dictates detection range sensitivity in chromatin conformation capture-based variant-to-gene mapping approachesHum Genet https://doi.org/10.1007/s00439-021-02326-8
- 25.The accessible chromatin landscape of the human genomeNature 489
- 26.Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human GenomeScience 326:289–293
- 27.Topological domains in mammalian genomes identified by analysis of chromatin interactionsNature 485
- 28.Two major mechanisms of chromosome organizationCurr Opin Cell Biol 58:142–152
- 29.Stripenn detects architectural stripes from chromatin conformation data using computer visionNat Commun 13
- 30.Architectural and Functional Commonalities between Enhancers and PromotersCell 162
- 31.Molecular architecture of enhancer–promoter interactionCurr Opin Cell Biol 74:62–70
- 32.Integrative analysis of 111 reference human epigenomesNature 518
- 33.Phosphorylation status determines the opposing functions of Smad2/Smad3 as STAT3 cofactors in TH17 differentiationNat Commun 6
- 34.Egr-2 and Egr-3 are negative regulators of T cell activationNat Immunol 6:472–480
- 35.Scurfin (FOXP3) Acts as a Repressor of Transcription and Regulates T Cell ActivationJ Biol Chem 276:37672–37679
- 36.The Myc-associated zinc finger protein (MAZ) works together with CTCF to control cohesin positioning and genome organizationProc National Acad Sci 118
- 37.Transcription factor KLF2 regulates the migration of naive T cells by restricting chemokine receptor expression patternsNat Immunol 9:292–300
- 38.Epigenomics of human CD8 T cell differentiation and agingSci Immunol 2
- 39.Early Growth Response Gene-2, a Zinc-Finger Transcription Factor, Is Required for Full Induction of Clonal Anergy in CD4+ T CellsJ Immunol 173:7331–7338
- 40.The EGR2 targets LAG-3 and 4-1BB describe and regulate dysfunctional antigen-specific CD8+ T cells in the tumor microenvironmentJ Exp Med 214:381–400
- 41.Ikaros Sets Thresholds for T Cell Activation and Regulates Chromosome PropagationImmunity 10:333–343
- 42.Interleukin 2 gene transcription is regulated by Ikaros-induced changes in histone acetylation in anergic T cellsBlood 109:2878–86
- 43.Ikaros Enforces the Costimulatory Requirement for IL2 Gene Expression and Is Required for Anergy Induction in CD4+ T LymphocytesJ Immunol 179:7305–7315
- 44.Ikaros Silences T-bet Expression and Interferon-γ Production during T Helper 2 DifferentiationJ Biol Chem 285:2545–2553
- 45.A critical role for TCF-1 in T-lineage specification and differentiationNature 476:63–68
- 46.Lineage-Determining Transcription Factor TCF-1 Initiates the Epigenetic Identity of T CellsImmunity 48:243–257
- 47.TCF-1 promotes chromatin interactions across topologically associating domains in T cell progenitorsNat Immunol :1–11https://doi.org/10.1038/s41590-022-01232-z
- 48.Involvement of PLAGL2 in activation of iron deficient- and hypoxia-induced gene expression in mouse cell linesOncogene 20:4718–4727
- 49.Hypoxia-inducible factors: master regulators of hypoxic tumor immune escapeJ Hematol Oncol 15
- 50.A multiply redundant genetic switch “locks in” the transcriptional signature of regulatory T cellsNat Immunol 13:972–980
- 51.Genome-wide analyses reveal the IRE1a-XBP1 pathway promotes T helper cell differentiation by resolving secretory stress and accelerating proliferationGenome Med 10
- 52.Role of cell cycle regulator E2F1 in regulating CD8 T cell responses during acute and chronic viral infectionVirology 324:567–576
- 53.E2F1 and E2F2 Are Differentially Required for Homeostasis-Driven and Antigen-Induced T Cell Proliferation In VivoJ Immunol 175:647–655
- 54.Transcription Factor Bhlhe40 in Immunity and AutoimmunityTrends Immunol 41:1023–1036
- 55.Genome-scale Capture C promoter interactions implicate effector genes at GWAS loci for bone mineral densityNat Commun 10
- 56.Mapping effector genes at lupus GWAS loci using promoter Capture-C in follicular helper T cellsNat Commun 11
- 57.Cis-regulatory architecture of human ESC-derived hypothalamic neuron differentiation aids in variant-to-gene mapping of relevant complex traitsNat Commun 12
- 58.Implicating effector genes at COVID-19 GWAS loci using promoter-focused Capture-C in disease-relevant immune cell typesGenome Biol 23
- 59.3D chromatin maps of the human pancreas reveal lineage-specific regulatory architecture of T2D riskCell Metab 34:1394–1409
- 60.The Cdc42-interacting Protein-4 (CIP4) Gene Knock-out Mouse Reveals Delayed and Decreased Endocytosis*J Biol Chem 285:4348–4354
- 61.Nuts and bolts of the salt-inducible kinases (SIKs)Biochem J 478:1377–1397
- 62.Genetic determinants of co-accessible chromatin regions in activated T cells across humansNat Genet 9
- 63.Single-cell eQTL analysis of activated T cell subsets reveals activation and cell type–dependent effects of disease-risk variantsSci Immunol 7
- 64.Intersection of population variation and autoimmunity genetics in human T cell activationScience 345
- 65.Human Inborn Errors of Immunity: 2022 Update on the Classification from the International Union of Immunological Societies Expert CommitteeJ. Clin. Immunol 42:1473–1507
- 66.Genome-wide enhancer maps link risk variants to disease genesNature :1–6https://doi.org/10.1038/s41586-021-03446-x
- 67.Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbationsNat. Genet 51:1664–1669
- 68.Multiple autoimmune-associated variants confer decreased IL-2R signaling in CD4+ CD25(hi) T cells of type 1 diabetic and multiple sclerosis patientsPlos One 8
- 69.Biology and regulation of IL-2: from molecular mechanisms to human therapyNat Rev Immunol 18
- 70.Revisiting IL-2: Biology and therapeutic prospectsSci Immunol 3
- 71.Duplication of the IL2RA locus causes excessive IL-2 signaling and may predispose to very early onset colitisMucosal Immunol :1–11https://doi.org/10.1038/s41385-021-00423-5
- 72.Regulation of human interleukin-2 gene: Functional DNA sequences in the 5′ flanking region for the gene expression in activated T lymphocytesCell 46:401–407
- 73.A 275 basepair fragment at the 5’ end of the interleukin 2 gene enhances expression from a heterologous promoter in response to signals from the T cell antigen receptorJ Exp Medicine 165
- 74.Regulatory anatomy of the murine interleukin-2 geneNucleic Acids Res 18:4523–4533
- 75.Long-Range Transcriptional Control of the Il2 Gene by an Intergenic EnhancerMol Cell Biol 35:3880–3891
- 76.The ENCODE (ENCyclopedia Of DNA Elements) ProjectScience 306:636–640
- 77.Nested epistasis enhancer networks for robust genome regulationScience 377:1077–1085
- 78.Rapid response to Con A by CD4+CD45R− rat memory lymphocytes as compared to CD4+CD45R+ lymphocytesCell Immunol 119:317–326
- 79.Direct demonstration of cytokine synthesis heterogeneity among human memory/effector T cells by flow cytometryBlood 86:1408–1419
- 80.IL-2 Secretion by CD4+ T Cells In Vivo Is Rapid, Transient, and Influenced by TCR-Specific CompetitionJ Immunol 172:6136–6143
- 81.Regulatory interactions between CD45RBhigh and CD45RBlow CD4+ T cells are important for the balance between protective and pathogenic cell-mediated immunityJ Exp Medicine 179:589–600
- 82.Foxp3 Protein Stability Is Regulated by Cyclin-dependent Kinase 2J Biol Chem 288:24494–24502
- 83.CCR7 provides localized access to IL-2 and defines homeostatically distinct regulatory T cell subsetsJ Exp Medicine 211:121–136
- 84.Systematic differences in discovery of genetic effects on gene expression and complex traitsNat. Genet :1–10https://doi.org/10.1038/s41588-023-01529-1
- 85.T-cell development and function are modulated by dual specificity phosphatase DUSP5J Biological Chem 283:17362–9
- 86.MAP4K Family Kinases and DUSP Family Phosphatases in T-Cell Signaling and Systemic Lupus ErythematosusCells 8
- 87.PARK7/DJ-1 promotes pyruvate dehydrogenase activity and maintains Treg homeostasis during ageingNat Metabolism 4:589–607
- 88.N-arachidonoyl glycine, an abundant endogenous lipid, potently drives directed cellular migration through GPR18, the putative abnormal cannabidiol receptorBmc Neurosci 11
- 89.Mouse cDNAs encoding a trifunctional protein of de novo purine synthesis and a related single-domain glycinamide ribonucleotide synthetaseGene 137:195–202
- 90.G protein–coupled receptor kinase 6 (GRK6) regulates insulin processing and secretion via effects on proinsulin conversion to insulinJ Biol Chem 298
- 91.Targeting protein tyrosine kinase 6 in cancerBiochimica Et Biophysica Acta Bba -Rev Cancer 1874
- 92.MAP kinase genes and colon and rectal cancerCarcinogenesis 33:2398–2408
- 93.MAP3K11 is a tumor suppressor targeted by the oncomiR miR-125b in early B cellsCell Death Differ 23:242–252
- 94.Biological constraints on GWAS SNPs at suggestive significance thresholds reveal additional BMI lociElife 10
- 95.Fast gapped-read alignment with Bowtie 2Nat Methods 9:357–359
- 96.Model-based Analysis of ChIP-Seq (MACS)Genome Biol 9
- 97.K. csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windowsNucleic Acids Res 44:e45–e45
- 98.edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinform Oxf Engl 26:139–40
- 99.HiCUP: pipeline for mapping and processing Hi-C dataF1000research 4
- 100.Cooler: scalable storage for Hi-C data and other genomically labeled arraysBioinform Oxf Engl 36:311–316
- 101.Iterative correction of Hi-C data reveals hallmarks of chromosome organizationNat Methods 9
- 102.Molecular basis of CTCF binding polarity in genome foldingNat Commun 11
- 103.TADCompare: An R Package for Differential and Temporal Analysis of Topologically Associated DomainsFrontiers Genetics 11
- 104.Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2Nat Protoc 15:991–1012
- 105.HiCcompare: an R-package for joint normalization and comparison of HI-C datasetsBmc Bioinformatics 19
- 106.HTSeq--a Python framework to work with high-throughput sequencing dataBioinform Oxf Engl 31:166–9
- 107.GSVA: gene set variation analysis for microarray and RNA-Seq dataBmc Bioinformatics 14:7–7
- 108.The Molecular Signatures Database Hallmark Gene Set CollectionCell Syst 1:417–425
- 109.limma powers differential expression analyses for RNA-sequencing and microarray studiesNucleic Acids Res 43:e47–e47
- 110.Identification of transcription factor binding sites using ATAC-seqGenome Biol 20
- 111.JASPAR 2020: update of the open-access database of transcription factor binding profilesNucleic Acids Res 48:D87–D92
- 112.Partitioning heritability by functional annotation using genome-wide association summary statisticsNat Genet 47:1228–35
- 113.3D promoter architecture re-organization during iPSC-derived neuronal cell differentiation implicates target genes for neurodevelopmental disordersProg Neurobiol 102000https://doi.org/10.1016/j.pneurobio.2021.102000
- 114.Software for Computing and Annotating Genomic RangesPlos Comput Biol 9
- 115.pyGenomeTracks: reproducible plots for multivariate genomic datasetsBioinform Oxf Engl 37:422–423
- 116.deepTools: a flexible platform for exploring deep-sequencing dataNucleic Acids Res 42:W187–W191
- 117.The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservationConservation Genetics Print 19:995–1005
- 118.DGIdb 2.0: mining clinically relevant drug–gene interactionsNucleic Acids Res 44:D1036–D1044
- 119.The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humansScience 348:648–660
- 120.Following the fate of individual T cells throughout activation and clonal expansion. Signals from T cell receptor and CD28 differentially regulate the induction and duration of a proliferative responseJ Clin Invest 100:3173–3183
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Copyright
© 2024, Pahl et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 1,195
- downloads
- 94
- citation
- 1
Views, downloads and citations are aggregated across all versions of this paper published by eLife.