CHD-associated enhancers shape human cardiomyocyte lineage commitment
Enhancers orchestrate gene expression programs that drive multicellular development and lineage commitment. Thus, genetic variants at enhancers are thought to contribute to developmental diseases by altering cell fate commitment. However, while many variant-containing enhancers have been identified, studies to endogenously test the impact of these enhancers on lineage commitment have been lacking. We perform a single-cell CRISPRi screen to assess the endogenous roles of 25 enhancers and putative cardiac target genes implicated in genetic studies of congenital heart defects (CHDs). We identify 16 enhancers whose repression leads to deficient differentiation of human cardiomyocytes (CMs). A focused CRISPRi validation screen shows that repression of TBX5 enhancers delays the transcriptional switch from mid- to late-stage CM states. Endogenous genetic deletions of two TBX5 enhancers phenocopy epigenetic perturbations. Together, these results identify critical enhancers of cardiac development and suggest that misregulation of these enhancers could contribute to cardiac defects in human patients.
The work presented is a valuable assessment of a broad set of regulatory elements that coordinate cardiac differentiation. The approach is broadly applicable, and the results point to important mechanisms of gene regulation and differentiation, with implications for future studies in non-coding variation.https://doi.org/10.7554/eLife.86206.sa0
Congenital heart defects (CHDs) encompass a broad range of cardiac malformations that impact 1% of all births (Hoffman and Kaplan, 2002). However, the genetic basis for 54% of familial and 80% of sporadic CHD cases remain unknown (Blue et al., 2017). Genome-wide association studies and whole genome sequencing of CHD patients have identified thousands of variants that associate with CHD (Agopian et al., 2017; Cordell et al., 2013; Richter et al., 2020). Most efforts have focused on functionally characterizing coding variants. For example, traditional linkage analysis in patients with Holt-Oram syndrome implicated the gene for the TBX5 transcription factor, which harbors variants predicted to disrupt protein structure and/or function in affected patients (Basson et al., 1997).
TBX5 is a key regulator of cardiac development, which requires precise temporal expression: early expression of TBX5 is necessary for CM maturation and later expression is required for specification of the cardiac conduction system (Bruneau et al., 2001; Steimle and Moskowitz, 2017). However, protein-coding variants account for only a small fraction of CHD risk. In contrast, the vast majority of de novo variants are non-coding and potentially modify the activity of transcriptional enhancers (Richter et al., 2020). Only a handful have been functionally examined, including a non-coding CHD-associated variant which has been shown to impair proper TBX5 expression (Smemo et al., 2012). Thus, there remains a critical gap in knowledge between CHD-associated non-coding regions and their contribution to CHD.
Recent studies have explored high-throughput genomic approaches to assess the roles of enhancers and regulatory variants. For example, Richter et al., 2020 performed whole genome sequencing in CHD patients to identify thousands of non-coding variants. Following extensive computational filtering to narrow the list of putative causal mutations, the authors used massively parallel reporter assays to functionally identify five CHD-associated variants that impact enhancer activity. However, reporter assays do not assess how variants function in the endogenous genomic context. CRISPR screens have been developed to identify regulatory elements that contribute to screenable phenotypes following epigenetic silencing or genetic deletion (Fulco et al., 2016). While powerful, this strategy has focused on screenable phenotypes in homogeneous cell populations. Thus, traditional CRISPR screens would lose information about how enhancers regulate specification of distinct cell types during development.
To address this deficiency, single-cell approaches have been applied in isogenic systems to delineate how perturbations affect lineage commitment. For example, Kathiriya et al., 2021 generated an allelic series of TBX5 locus modifications in iPSCs followed by CM differentiation and single-cell sequencing to show that TBX5 exhibits a dosage-sensitive effect altering the trajectory of CM specification. Recent studies have extended this approach using single-cell CRISPR screens to systematically test the roles of key genes in endoderm and neuronal specification (Genga et al., 2019; Tian et al., 2019). Similar approaches have also been applied for high-throughput examination of enhancer functions in stable, homogeneous cell lines like K562 cells (Gasperini et al., 2019; Xie et al., 2017). A combination of these approaches will likely be required to systematically study the endogenous functions of CHD-associated enhancers during CM lineage commitment.
Here, we apply a single-cell CRISPRi screen for 25 CHD-associated enhancers in a model of human embryonic stem cell (hESC) to CM differentiation. We identify a subset of 16 enhancers which, when perturbed, results in deficient CM specification. In focused validation studies, we show that CRISPRi repression or genetic deletion of TBX5 enhancers leads to dysregulation of the cardiac gene expression program, enrichment of early CM cell states, and a depletion of mature CM cell states.
Single-cell screens of CHD-associated enhancers during CM differentiation
Whole genome sequencing has identified >50,000 de novo variants in 749 CHD patients (Richter et al., 2020). Some of these variants may contribute to CHD by modulating the activity of transcriptional enhancers that are active in cardiac development. To identify these enhancers for functional study, we overlapped variants with published epigenetic datasets of active enhancers including open chromatin (ATAC-Seq) and histone acetylation (H3K27ac) (Liu et al., 2017; Zhang et al., 2019). We prioritized a diverse set of CHDs from atrial and septal defects to tetralogy of Fallot (Supplementary file 1). We also filtered for enhancers within 100 kb of genes with known roles in heart development. Overall, we prioritized 25 enhancers, including those in proximity to known cardiac regulators, with many exhibiting high predicted sequence conservation (Siepel et al., 2005; van den Hoogenhof et al., 2018; Figure 1A). Epigenetic evidence indicates that many of these enhancers and their putative downstream genes become active at early time points of cardiac differentiation (Tompkins et al., 2016; Figure 1B–C). Overall, these data suggest that the prioritized CHD-associated enhancers may have roles in early cardiac lineage commitment, and we hypothesize that perturbations of these enhancers may impact cell fate.
To test this hypothesis, we used CRISPRi-mediated single-cell perturbation screens to assess the functions of CHD-associated enhancers during early cardiac lineage commitment (Figure 1—figure supplement 1A–B). We modeled this process by differentiating hESCs toward CMs through WNT modulation. Notably, to preserve cellular heterogeneity, we performed CM differentiation without metabolic selection (Tohyama et al., 2013). To capture the early events of lineage commitment, we examined cells at day 8 of differentiation (Figure 1D). Overall, we targeted 397 sgRNAs spanning 25 enhancers, 19 promoters of their putative target genes, and non-genome targeting (NT) controls (Supplementary file 1). After single-cell RNA sequencing (RNA-seq) and stringent filtering to remove low-quality cells and doublets, we retained 80,343 high-quality cells for downstream analysis. On average, we detected one sgRNA in each cell (Figure 1—figure supplement 1C–E and Supplementary file 2).
To define the cell populations during early differentiation, we clustered cells to identify seven distinct cell populations across all three germ layers (Figure 1E and Supplementary file 3). We focused on the three mesoderm-derived clusters due to their relevance to cardiac development. These clusters included TNNT2+ CMs, FN1+ mesodermal cells (Cheng et al., 2013), and COL3A1+ cardiac fibroblasts (van Nieuwenhoven et al., 2013; Figure 1F and Figure 1—figure supplement 1F). Verifying the efficiency of CRISPRi-mediated repression, we observed robust depletion of control genes for positive control sgRNAs (pMalat1 <2.2e-308, Mann-Whitney U) (Figure 1G and Figure 1—figure supplement 1G).
CRISPRi of CHD-associated enhancers delays CM differentiation
To test the ability of our approach to detect changes in cell state from CRISPRi perturbations, we included positive control sgRNAs targeting the promoter of well-known cardiac genes as an internal quality control measure. We first examined sgRNAs targeting the promoter of ZIC2, a known regulator of both neuronal and cardiac lineages necessary for the successful differentiation of both (Luo et al., 2015; Xu et al., 2020). Compared to cells with non-target sgRNAs, the cells containing sgRNAs for ZIC2 promoter were significantly depleted for neuronal and cardiac populations (pneuronal = 4.7e-61; pcardiomyocytes = 1.9e-5, hypergeometric test). Consistent with a stalled differentiation state, these cells exhibit enrichment for an early mesodermal state (Figure 1—figure supplement 1H). Second, based on published knockout studies showing that TBX5 repression prevents proper CM specification (Kathiriya et al., 2021), we also included sgRNAs targeting the promoter of TBX5. We observed that loss of TBX5 expression led to a depletion in CMs (pTBX5 promoter = 2.0e-5; pTBX5 enhancer 6 = 0.025, hypergeometric test) and a corresponding increase in early mesoderm cells (pTBX5 promoter = 3.4e-11; pTBX5 Enhancer 6 = 4.1e-4, hypergeometric test) (Figure 1H). Interestingly, perturbation of a CHD-associated enhancer of TBX5 phenocopies perturbation of the promoter in terms of cell state changes. Bulk qPCR analysis confirms that CRISPRi of TBX5 enhancers results in loss of TBX5 expression (Figure 1—figure supplement 1I). Taken together, this analysis demonstrates that CRISPRi repression at the TBX5 promoter functions as expected during CM differentiation and suggests further that repression of TBX5 expression through enhancer perturbation similarly depletes CM lineage commitment through loss of downstream target expression.
Next, to gain higher resolution insights on how enhancer perturbation influences lineage commitment, we focused on cells in the cardiac lineage (CM, mesoderm). We identified a trajectory of differentiating CMs composed of four distinct subpopulations including SOX4+ progenitors (Paul et al., 2014), FN1+ early-stage CMs, ACTA2+ mid-stage CMs (Potta et al., 2010), and NPPA+ atrial-like late-stage CMs (Figure 2A and Figure 2—figure supplement 1A). Consistent with the established process of CM commitment, pseudotime analysis orders these cell states with increasing expression of cardiac maturation genes such as TNNT2 (Figure 2B–C). Further validating this trajectory, we observe consistency with published bulk RNA-seq datasets of CM differentiation (Tompkins et al., 2016; Figure 2D). Thus, our single-cell dataset recapitulates the expected patterns of CM cell fate commitment.
A previous study established that loss of cardiogenic TFs deviates cells from a WT trajectory (Kathiriya et al., 2021). To examine the effect of enhancers, we compared the cell state distribution in perturbed cells. Surprisingly, we observed that perturbation of 16 enhancers led to a depletion of late-stage CMs. In contrast, only two enhancers were depleted in earlier CM states. This depletion of late-stage CMs coincides with an enrichment in progenitor, early-CM, and mid-CM states (Figure 2E and Figure 2—figure supplement 1B–C). Since connecting enhancers with their cognate promoters can be challenging, we only considered enhancer perturbations that produced phenotypes similar to paired promoter perturbation. Using this stringent approach, we detected 14 out of 25 enhancer perturbations that caused changes in cell state distribution, although this likely represents an underestimate of the true hit rate. To account for the low expression of many cardiac genes which makes differential gene expression analysis underpowered, we performed targeted sequencing to measure the expression of key cardiac gene targets (Figure 2—figure supplement 1D). This analysis showed further consistency between promoter-targeting and enhancer-targeting sgRNAs (Figure 2—figure supplement 1E).
Next, we illustrate several examples. First, we targeted two evolutionarily conserved enhancers near the CM structural protein TNNT2 that exhibit active enhancer chromatin during CM differentiation (Figure 2F). We observed that CRISPRi repression of either the TNNT2 promoter or one of the two targeted nearby enhancers resulted in a depletion of late-CM cells (pTNNT2 P1 = 0.0005; pTNNT2 E2 = 0.03; pTNNT2 E3 = 0.001, hypergeometric test) (Figure 2G). These perturbations also yielded similar changes in pseudotime cell state along with reduced expression of TNNT2 and other cardiac genes within the CM populations (pTNNT2 P1 = 0.0004; pTNNT2 E2 = 0.07; pTNNT2 E3 = 1.2e-6, Mann-Whitney U) (Figure 2H–I and Supplementary file 4). Second, we also targeted a CHD-associated enhancer with a variant characterized by reporter assay analysis (Gilsbach et al., 2018; Richter et al., 2020; Figure 2J). Repression of this enhancer also impacted lineage specification, depleting cells in the late-CM state (p=2.5e-5, hypergeometric test) (p=3.6e-5,Mann-Whitney U) (Figure 2K–L). Third, we examined how perturbation of six distinct TBX5 enhancers affected CM trajectory. CRISPRi-mediated repression of any of the six TBX5 enhancers led to a depletion of late-CM (penhancers1–6 between 3.7e-2 and 1.9e-8, hypergeometric test) (penhancers1–6 between 2.5e-2 and 6.8e-9, Mann-Whitney U) (Figure 2E and M–N). Cells with sgRNAs targeting TBX5 enhancers showed reduced expression TBX5 downstream targets, including NPPA and NPPB (Figure 2O and Supplementary file 4; Ang et al., 2016; Houweling et al., 2005; Luna-Zurita et al., 2016). Consistently, we also observed changes in cell state upon repression of TBX5 promoters. In sum, we observe that perturbation of CHD-associated enhancers, particularly for TBX5, results in deficient CM differentiation. These results suggest that these enhancers have roles in CM lineage commitment.
A focused validation screen demonstrates that TBX5 enhancers modulate CM cell fate
To validate the results of the large screen above, we performed a smaller single-cell CRISPRi screen focused on TBX5 regulatory elements. Three of the enhancers (Enh4, Enh5, and Enh6) were previously identified through sequence conservation and exhibited reporter activity (Smemo et al., 2012). The remaining three enhancers (Enh1, Enh2, Enh3) were identified in this study based on epigenetic hallmarks of active enhancers (Figure 3A). After transduction of control and targeting sgRNAs, CM differentiation, and sequencing, we identified three primary clusters comprising all germ layers (Figure 3B, Figure 3—figure supplement 1A, and Supplementary file 3), with 3102 cells corresponding to CMs based on TNNT2 expression (Figure 3C).
Consistent with our initial screen, we observed that cells with TBX5 enhancer repression were significantly depleted in the late-CM state (penhancers1–6 between 1.3e-5 and 1.3e-19, hypergeometric test) (Figure 3D–E and Figure 3—figure supplement 1B). We also observed activation of ACTA2 and repression of NPPA in cells with perturbed TBX5 enhancers (pACTA2=3.1e-7; pNPPA = 8.2e-24, Mann-Whitney U) (Figure 3F, Figure 3—figure supplement 1C, and Supplementary file 4), consistent with a previous study that knocked out TBX5 (Kathiriya et al., 2021).
Next, we performed focused analysis on CM populations to identify three clusters of cell subpopulations: CM progenitors (FN1+), mid-CM (ACTA2+), and late CM (NPPA+) (Figure 4A and Figure 4—figure supplement 1A). Pseudotime analysis orders these cell populations by increasing expression of known cardiac maturation genes including NPPA and TNNT2 (Figure 4B–D). Overall, our focused screen contains 122 cells balanced across TBX5 enhancer perturbations that attain CM states. Since this number is relatively small, we pooled these cells for downstream analyses. Consistent with our larger screen, we observe a significant depletion of late CMs when TBX5 enhancers are repressed, with an increase in mid-CMs (Figure 4E–F and Figure 4—figure supplement 1B) and a significant decrease in pseudotime rank (plate CM = 3.8e-18; pmid-CM = 7.8e-13, hypergeometric test) (p=2.2e-5, Mann-Whitney U) (Figure 4G). This coincided with downregulation of known targets of TBX5 (Figure 4—figure supplement 1C and Supplementary file 4). Cell label transfer of initial screen CM annotations onto focus screen CMs showed consistent clustering and retention of late-CM depletion under TBX5 enhancer knockdown conditions (Figure 4—figure supplement 1D–E). Overall, results from our focused validation screen are consistent with the observation that TBX5 enhancer repression causes deficient CM lineage commitment.
TBX5 enhancer repression alters CM molecular signatures
While TBX5 enhancer perturbation results in a reduction of late-CM cells, we observe that a small number of cells still reach the state. We next asked whether these cells exhibit altered molecular signatures. First, focusing on the late-CM state, we defined 80 genes that are specifically expressed in NT control cells. These genes coincide with expected CM markers, such as NPPA and TTN, which continuously increase in expression throughout CM differentiation. Interestingly, we find that this set of late-CM genes is depleted in the subset of cells with perturbed TBX5 enhancers (denoted as sgTBX5enh) that reach the late-CM state (p=1.7e-3, Z-test) (Figure 4H–I and Supplementary file 5). Several notable cardiac genes including NPPA and NPPB exhibit pronounced repression in perturbed cells (Figure 4J–K). These observations suggest that, although some sgTBX5Enh cells can reach a late-CM cell state, these cells have deficient activation of late-CM genes. We next asked if these cells also harbored aberrant expression of mid-CM cell signatures.
We identified a set of 18 genes that define the mid-CM state, which includes the structural proteins MYL4 and MYL6. Pseudotime analysis shows that NT control cells repress mid-CM genes as they enter the late-CM stage (Figure 4L). In contrast, we observe that sgTBX5Enh cells retain higher expression levels of mid-CM genes in the late-CM state (p=2.6e-4, Z-test) (Figure 4M and Supplementary file 5). For example, perturbed cells have a 48% increase in HAS2 expression compared to NT control cells (pprogenitor = 0.064; pmid-CM = 0.0006; plate CM = 0.1, Mann-Whitney U) (Figure 4N–O).
In summary, we observe that perturbed cells express late-CM genes at lower levels and mid-CM genes at higher levels than NT control cells in the late-CM state. This suggests that although a subset of sgTBX5Enh cells can reach the late-CM cell state, these cells are in a delayed differentiation state that reflects mid-CM gene signatures. Taken together with previous observations, misregulation of TBX5 through enhancer modulation leads to deficient induction of CM transcriptional signatures that are associated with CM fate commitment.
TBX5 enhancer knockouts recapitulate cardiac phenotypes
Next, to confirm the results from the CRISPRi experiments above, we used a CRISPR/Cas9 strategy to genetically delete enhancers. We focused on the two TBX5 enhancers (Enh3 and Enh5) that harbor CHD-associated variants and exhibit the strongest cellular phenotypes in the CRISPRi screens. To compare enhancer deletion to gene deletion, we also knocked out TBX5 exon 3 (Kathiriya et al., 2021). We screened, isolated, and confirmed clones with biallelic deletion of either Enh3 (TBX5Enh3-/-), Enh5 (TBX5Enh5-/-), or Exon 3 (TBX5Exon-/-)(Figure 5A–B). Consistent with the CRISPRi analysis, we find that all three knockouts can differentiate to CMs (Figure 5—figure supplement 1A). We observe that deletion of TBX5 enhancers phenocopies TBX5 gene deletion, with all knockout clones exhibiting reduced RNA expression by qPCR and protein expression by immunocytochemistry (Figure 5C–E).
Figure 5—source data 1
Figure 5—source data 2
Figure 5—source data 3
Figure 5—source data 4
Figure 5—source data 5
Figure 5—source data 6
Figure 5—source data 7
Figure 5—source data 8
Figure 5—source data 9
Figure 5—source data 10
Next, to test if genetic deletion of enhancers contributes to deficient CM specification, we repeated our scRNA-seq analysis at day 8 of CM differentiation (Figure 5—figure supplement 1A and Supplementary file 3). Mirroring the CRISPRi results, analysis of all cell populations shows a strong depletion of CM states in perturbed cells (Figure 5F and Figure 5—figure supplement 1B). TBX5Enh5-/- cells exhibited the strongest depletion (~9% CMs, p>2.2e-308, hypergeometric test) relative to WT control cells (~57% CMs), followed by TBX5Enh3-/- cells (~24% CMs, p=5.2e-263, hypergeometric test), and TBX5Exon-/- cells (~33% CMs, p=5.9e–166, hypergeometric test). Importantly, we observed consistent results over multiple biological replicates.
To examine if enhancer deletion also contributes to deficient CM differentiation, we focused our analysis on a cluster of TNNT2+ cells. Sub-clustering revealed three CM states: FN1+ early CMs, ACTA2+ mid-CMs, and NPPA+ late CMs (Figure 5G). While cells derived from each genetic background were represented across all three states (Figure 5H and Figure 5—figure supplement 1C), we observed notable differences. Notably, perturbed cells were on average 10.6-fold less likely to reach the mature late-CM state (~27% WTLate_CM,~5% Enh3-/-Late_CM,~11% Enh5-/-Late_CM,~1% Exon-/-Late_CM) and this observation was statistically significant (pEnh3=1.8e-24, pEnh5=1.6e-7, pExon = 5.8e-83) (Figure 5I). Next, we repeated our previous analysis to confirm that cells with TBX5 enhancer or exon deletion exhibit elevated expression of mid-CM genes (Figure 5J). Importantly, deletion of either TBX5 enhancer or exon 3 resulted in the repression of key genes of cardiac development including the TBX5 target gene NPPA (Figure 5K, Figure 5—figure supplement 1D, and Supplementary file 4; Houweling et al., 2005). To confirm this result using an orthogonal assay, we performed RNA fluorescence in situ hybridization coupled to a flow cytometry readout (FlowFISH) (Fulco et al., 2019; Reilly et al., 2021). Consistent with an increase in earlier stage CMs, we find that TBX5 gene and enhancer knockouts have increased RNA and protein expression of early CM markers such as FN1 (Figure 5L–N and Figure 5—figure supplement 1E). Also, consistent with depletion in later stage CMs, we observe that TBX5 gene and enhancer deletions have decreased RNA and protein expression of the late-CM marker NPPA (Figure 5C–D).
Finally, since our analyses also generated clones with heterozygous TBX5 enhancer deletion (Figure 5B), we performed further analysis of one clone: TBX5Enh5+/- (Figure 6A–B). Confirming that these genetic modifications alter TBX5 expression, we observed that clones with enhancer deletions exhibit significantly decreased RNA expression during CM differentiation (Figure 6C). After CM differentiation and scRNA-seq (Figure 6D and Figure 6—figure supplement 1A), TBX5 Enh5 heterozygous knockouts also exhibited significant depletion of CM states (~24% CMs, p=5.2e-125, hypergeometric test) relative to WT cells (~45% CMs) (Figure 6E and Figure 6—figure supplement 1B). Sub-clustering of a TNNT2+ cluster revealed four CM states: FN1+ early CMs, ACTA2+ mid-CMs, HAND1+ late CMs, and IRX2+ ventricular CMs (Figure 6F and Figure 6—figure supplement 1C). Consistently, focused pseudotime analysis of CMs also showed significant depletion of Enh5+/- cells in the mature ventricular CM state (~39% WTVen_CM,~6% Enh5+/-Ven_CM, p=1.5e-75) (Figure 6G–H). Importantly, heterozygous deletion of TBX5 Enh5 results in a weaker differentiation defect than homozygous deletion of TBX5 Enh3 (which has a weaker phenotype compared to homozygous deletion of TBX5 Enh5) (Figure 6G–H). These observations are consistent with those observed for the TBX5 gene deletions previously published (Kathiriya et al., 2021). Next, we repeated our previous analysis to confirm that cells with TBX5 enhancer knockout exhibit delayed activation of late-CM genes (Figure 6I) and delayed repression of mid-CM genes (Figure 6J). Our analysis also revealed a depletion of cardiac related genes in perturbed late-CM cells compared to controls (Figure 6K and Supplementary file 6). Cell label transfer onto all previous datasets showed expected clustering consistent deficiencies to achieve later CM states (Figure 6—figure supplement 1D–G).
In sum, these observations with TBX5 enhancer deletion cells are consistent with our CRISPRi studies, and support the conclusion that loss of TBX5 enhancer function results in deficient transcriptional specification of CMs.
This study identifies and functionally characterizes 16 enhancers that are required for normal CM differentiation. Perturbations of these enhancers yield diverse cellular phenotypes ranging from global reduction of mesoderm populations to deficient CM differentiation. These cell specification defects are consistent with the putative role of these enhancers in regulating known cardiac regulators such as TBX5 (Kathiriya et al., 2021). The diversity of cellular responses has several implications. First, our observations highlight the important role of enhancers in orchestrating spatiotemporal gene regulation in development (Plank and Dean, 2014). Second, that enhancer perturbation can lead to such diverse cellular consequences further highlights the potential role of genetic variants in modifying enhancer function and contributing to developmental defects such as CHD (Richter et al., 2020; Smemo et al., 2012). We speculate that focused sequencing of genetic variants at these enhancers will reveal new genetic contributors of CHD. Third, the diversity of cellular responses observed through enhancer perturbation suggests that the cellular phenotypes of CHD could also be very diverse, and that subtle changes in cell fate may lead to complex CHD phenotypes. Indeed, CHD is a diverse developmental disease with multiple causal genes and distinct disease subtypes (van der Linde et al., 2011).
TBX5 presents a potential framework for studies to understand the genetic basis of human developmental disorders by linking human genetic linkage studies (strongest evidence of variant function) to variants of unknown significance (least support). For example, coding variants at TBX5 cause cardiac defects such as Holt-Oram syndrome (Basson et al., 1997; Bruneau et al., 1999; Holt and Oram, 1960; Li et al., 1997). Consistently, gene knockout studies in human iPSCs have shown that dosage-sensitive impairment of TBX5 alters CM differentiation with CHD-relevant phenotypes (Kathiriya et al., 2021). Even though multiple enhancers control TBX5 expression (Smemo et al., 2012), our studies show that perturbation of TBX5 enhancers exhibit similar cell state phenotypes. Enhancers in close proximity to genes with known roles in CHD represent key targets for future investigation.
Interestingly, our results indicate that knockdown of TNNT2 results in transcriptional phenotypes. While TNNT2 is not known to function as a transcriptional regulator, recent results from similar screens have identified unexpected gene expression phenotypes for genes not typically associated with regulatory function. For example, Replogle et al., 2022 showed that knockdown of chromosome segregation genes (with unclear roles in transcription) resulted in gene expression phenotypes. Similarly, genes with roles in glycolysis, vesicular trafficking, and DNA replication also unexpectedly caused transcriptional changes. These observations make possible the systematic assessment of genotype-phenotype relationships by using the transcriptome as a readout. Thus, TNNT2 knockdown may impact transcriptional states through similarly unexpected effects. One possibility is that, since TNNT2 is a critical component of the sarcomeric apparatus, its loss could initiate a feedback response resulting in delayed CM specification. In support of this possibility, we observe that CRISPRi perturbations of two TNNT2 enhancers and the promoter consistently yield changes in transcriptional and cell state phenotypes (Figure 2G and I).
This study represents a proof of concept for future efforts to systematically test the function of enhancers in heterogeneous developmental systems. We adopted a tiered approach to balance the advantages and disadvantages of CRISPR and CRISPRi approaches (Diao et al., 2017; Fulco et al., 2016; Gasperini et al., 2017; Thakore et al., 2015; Zhou et al., 2014). That is, while CRISPRi-based epigenetic silencing can efficiently act across many cells in a population, it offers incomplete inhibition. In contrast, the generation of genetic knockouts using CRISPR is relatively inefficient, but yields clones with complete enhancer deletion. Thus, we started the study with a higher throughput single-cell CRISPRi screens as a filter for potential hits. To ensure robustness, we validated these hits with a smaller scale CRISPRi validation screen. Finally, to ensure consistency with orthogonal perturbation paradigms, we performed clonal enhancer deletion studies. This tiered approach allowed us to balance speed with accuracy. Tiering strategies such as this and others (Klann et al., 2021) will aid future efforts. Importantly, we found that consistent cellular phenotypes can be obtained with both CRISPRi or CRISPR strategies. Expectedly, perturbations of promoters generally yield stronger cellular responses than enhancers. Similarly, genetic knockouts elicit more dramatic changes in cell state than CRISPRi with dCas9-KRAB. Genetic variants likely yield even weaker phenotypes. In this way, we speculate that epigenetic perturbation more closely matches variants than genetic deletion.
Despite the overall success of our screening approach and subsequent validation experiments, we note several potential limitations. First, scRNA-seq provides deep cellular coverage at the expense of sequencing depth per cell, such that detection of lowly expressed genes is challenging. Thus, conclusive determination of sgRNA functionality in individual cells remains problematic. Second, enhancers are known to function in a time-dependent manner, yet sgRNA perturbations were introduced at a fixed time point. While we expect time-dependent perturbations to yield insights on enhancer-mediated gene regulation, one technical challenge is that the temporal delay between perturbation induction and functional repression needs to be resolved and likely improved before application in dynamic developmental systems. Third, CM differentiation in two-dimensional culture systems does not adequately recapitulate the complex morphological and inter-cell communication events that occur during human development. Standard cardiac differentiation strategies are very efficient but use harsh metabolic selection to enrich for CMs. In our study, we intentionally avoided metabolic selection to preserve non-cell autonomous interactions, since selection may inadvertently mask the phenotypes of genetic perturbations. By eliminating the metabolic selection step during hESC differentiation, we produced heterogeneous cell states including mesodermal (CMs and progenitors), ectodermal (neuronal), and endodermal states. This strategy was previously used by Tohyama et al., 2013. In support of the relevance of this system, Zic2 is required for early specification of CMs. In our system, CRISPRi knockdown of Zic2 results in loss of CM and neuronal specification. These cells with Zic2 sgRNAs are almost entirely diverted to an early mesodermal state (Figure 1—figure supplement 1H). These results indicate that our biological system captures expected phenotypes and suggest that the results from other perturbations are also potentially relevant to CHD. Future utilization of organoid systems is likely to better mimic the complex morphogenetic changes that occur in vivo.
In summary, we have applied single-cell screening technology to functionally characterize enhancers in a heterogeneous developmental system. We expect that future applications of these approaches will help to comprehensively identify enhancers with roles in cell fate specification that could also contribute to developmental defects such as CHD.
Materials and methods
Materials availability statementRequest a detailed protocol
Data and codes required to reproduce the findings of this manuscript are freely available and are referenced below. Further information and requests for resources and reagents should be directed to and will be fulfilled by Gary Hon (Gary.Hon@utsouthwestern.edu).
Generation of PiggyBac dCas9-KRAB plasmidRequest a detailed protocol
The NheI/SalI dCas9-KRAB fragment from Lenti-dCas9-KRAB-blast (Addgene ID: 89567) was ligated into PiggyBac vector 5’-PTK-3’ (Cadiñanos and Bradley, 2007) replacing the NheI/SalI insert. A bovine growth hormone polyadenylation signal was PCR amplified (Kapa HiFi DNA polymerase) (5’-GCC TCC CCG CAT CGA TAC CGC TGT GCC TTC TAG TTG CCA G, 3’-GTA ACA AAA CTT TTA ACT AGC CAT AGA GCC CAC CGC ATC C) and assembled (NEB HiFi Assembly) replacing the remaining SalI/SpeI fragment. A 647 bp ubiquitous chromatin opening element derived from the human HNRPA2B1-CBX3 locus was PCR amplified (5’-TAT AGA TAT CAA CTA GAA TGG GGA GGT GGT CCC TGC AG, 3’-AAC TTT ATC CAT CTT TGC AGG GCC CTC CGC GCC TAC AG) and assembled into the NheI site.
Target selectionRequest a detailed protocol
Fourteen enhancers were selected using publicly available ATAC-seq and H3K27ac ChIP-seq datasets collected at various time points throughout CM differentiation. ATAC-seq peaks across all time points were merged and extended to a total size of 500 bp. We then applied FeatureCounts for H3K27ac ChIP-seq signal at these 500 bp ATAC-seq boundaries. H3K27ac counts were normalized to RPKM then to input followed by log2 transformation. We selected for ATAC-seq regions in which the max ChIP enrichment across all time points was greater than log2(1.5), identifying 43,536 enhancers across CM differentiation. We then selected enhancers which were within 1 kb of a CHD-associated variant which left 2300 CHD-associated enhancers. Using gene ontology, we identified enhancers which were within 100 kb of a heart development-associated gene. After manually filtering for genes based on prior literature, we were left with 14 enhancers. We also included the 18 nearby promoters for heart development genes. Our library also consisted of six TBX5 enhancers, three we identified in house using the same criteria above with the expectation of CHD variant overlap, and three from literature (Smemo et al., 2012). From an MPRA study, we also selected five enhancers in which CHD-associated variants have been shown to perturb activity (Richter et al., 2020).
Generation of sgRNA libraryRequest a detailed protocol
We used CROPseq-Guide-Puro plasmid for sgRNA expression (Addgene ID: 86708). For large-screen sgRNA library construction, a single-strand sgRNA oligo library containing 397 sgRNAs (Supplementary file 1) was synthesized by IDT. The library was amplified by NEBNext High-Fidelity 2× PCR master mix (New England Biolabs) to make it double-stranded and then was inserted into the BsmBI digested CROPseq-Guide-Puro plasmid through Gibson Assembly (New England Biolabs). The circulized product was purified and electroporated into Endura ElectroCompetent cells (Lucigen) following the manufacturer’s protocol. The cells were then cultured in LB medium with 100 μg/ml ampicillin at 30°C overnight and the plasmid extracted using the ZymoPURE plasmid maxiprep kit (Zymo Research). We amplified the spacer sequences of the sgRNA library and verified the complexity of the library by Illumina sequencing. See our previous publication for a full protocol including primer sequences (Xie et al., 2019). Our focused screen consisted of 49 sgRNA which were packaged using a golden gate strategy previously published. In brief, single-strand sgRNA oligos were ordered from IDT and annealed using T4 ligase. Annealed oligos for sgRNA targeting the same region were then pooled for a golden gate reaction using BsmBI and T7 ligase to insert the sgRNA into the CROPseq backbone. Individual sgRNA were packaged in a similar approach.
Cell lineRequest a detailed protocol
H9 ESCs were maintained feeder-free in mTeSR Plus (Stemcell Technologies 100-0276) according to the suggested manufacturer’s instructions on Matrigel (Corning 354277) coated plates, passaged every 3–4 days with DPBS/0.5 mM EDTA and plated in 2 µM thiazovivin (Sigma SML1045) supplemented mTeSR Plus overnight. H9 ESCs and HEK293Ts were authenticated with ATCC using STR (short tandem repeat) profiling. Both lines were verified as mycoplasma free through PCR detection (Bulldog Bio 25234).
Generation of dCas9-KRAB-hESC lineRequest a detailed protocol
H9 ESCs were washed with DPBS and dissociated to single cells with TrypLE Select. Transfection of 1×106 cells with 5 µg PiggyBac transposon plasmid and 1 µg HA-mPB (Cadiñanos and Bradley, 2007) transposase plasmid in 100 µl was performed using P3 Primary Cell 4D-Nucleofector X Kit (V4XP-3024) with program CB-150 according to the suggested manufacturer’s instructions. At confluency cells were passaged and maintained in the presence of 5 µg/ml blasticidin S until emerging colonies were ~1 mm in diameter. Individual colonies were isolated, expanded, and tested for stable expression.
CM differentiationRequest a detailed protocol
At 80–90% confluency, media was replaced with 3 µM CHIR99021 supplemented CDM3 (RPMI 1640; 0.5 mg/ml human albumin, ScienCell OsrHSA; 211 µg/ml L-ascorbic acid 2-phosphate) for 48 hr followed by 2 µM Wnt-C59 supplemented CDM3 for 48 hr with subsequent media changes every 2 days with CDM3 alone. On day 8 post-differentiation, samples were washed with DPBS and dissociated to single cells with Accutase and resuspended in CDM3.
Virus packagingRequest a detailed protocol
Lentiviral plasmid was packaged as previously described (Xie et al., 2019). Briefly, the lentivirus packaging plasmids PMD2.G and psPAX2 (Addgene ID: 12259 and 12260) were co-transfected with the carrier plasmid to HEK293T cells with linear polyethylenimine (Polysciences). Supernatant was collected 72 hr after transfection and filtered with a 0.45 μm filter. The virus was further purified by using Lenti-X lentivirus concentrator (Clontech).
sgRNA transductionRequest a detailed protocol
Single-cell dissociated H9 ESCs (1×106 cells) were incubated in 2 ml medium supplemented with thiazovivin and lentiviral supernatant (MOI ~ 0.3) in a single well of an ultra-low attachment plate (Corning 3471) for 3 hr, diluted in 2 ml medium and then plated onto three wells of a Matrigel-coated six-well plate. At 72 hr post-transduction cells were fed with 5 µg/ml blasticidin S and 1 µg/ml puromycin supplemented medium for 7 days.
Knockout line generationRequest a detailed protocol
Transfection of 10 µg Cas9/sgRNA plasmids (left and right flanking at 1:1) was performed as described in stable line generation. At 48 hr post-transfection, single GFP-positive cells were sorted, expanded, and isolated for knockout validation. For validation, genomic DNA was extracted with QuickExtract (Lucigen) and PCR was performed to verify deleted regions.
ImmunocytochemistryRequest a detailed protocol
Cells were differentiated or passaged on Matrigel coated #1.5 12 mm cover glass. Cells were washed with DPBS, fixed with 4% PFA in DPBS for 20 min at room temperature, and washed with DPBS. Samples were blocked for 1 hr with PBS supplemented with 5% normal donkey serum and 0.3% Triton X-100. Primary antibodies in PBS supplemented with 1% BSA and 0.3% Triton X-100 were incubated overnight at 4°C. After 3×5 min washes with PBS, secondary antibodies at 1:400 were incubated at room temperature for 2 hr, washed, mounted with Prolong Glass (Invitrogen P36981), and imaged with an EVOS FL Auto Imaging System (Thermo Fisher).
Western blottingRequest a detailed protocol
Whole-cell protein lysate was collected using RIPA buffer in the presence of protease inhibitors and PMSF. After protein concentration was determined, the lysate was denatured in 2× Laemmli Sample Buffer, separated by electrophoresis, and transferred to nitrocellulose membrane. Membranes were blocked with 5% non-fat dry milk in TRIS buffer containing 0.1% Tween-20 (TBST) for 1 hr followed by incubation with primary antibody overnight at 4°C. The membranes were then washed with TBST followed by application of appropriate HRP-conjugated secondary antibody for 1 hr at room temperature. After washing with TBST, membranes were incubated with Pierce SuperSignal West Femto and images with a Bio-Rad ChemiDoc Imager.
FlowFISHRequest a detailed protocol
TBX5 exon 3, enhancer 3, enhancer 5, and WT H9 hESC were differentiated into CMs over 8 days followed by PrimeFlow following the commercial protocol. Each sample was multiplexed for FN1 (Alexa 647) and RPL13A (Alexa 488) probes followed by flow cytometry. Data was analyzed using the FlowCal python package (Castillo-Hair et al., 2016). Gating was performed using the density function along the forward scatter and side scatter axes. Briefly, the function takes in the percentage of events to retain as an input. Then, the plot is divided into grids and events in those grids are counted. A histogram is plotted, and the curve is smoothened using Gaussian blur. For this analysis, 45% of events were retained and further analysis was done using gated data. To normalize FN1, FN1 probe intensity was divided with RPL13A probe intensity at a single-cell level for all the samples.
Single-cell RNA-seqRequest a detailed protocol
For the focus screen, dissociated CMs were resuspended in 0.08% BSA in PBS and passed through a 70 µm filter before diluting to 1000 cells/µL. Two 10× lanes were run at an expected recovery of 10,000 cells. For the initial screen, dissociated CMs were divided into 10 samples consisting of 5*105 cells and each sample stained with 1 of 10 cell hashing bodies. Cells were processed as described in cell hashing publication (Stoeckius et al., 2018). After processing, cells were resuspended in 0.04% BSA in PBS and diluted to 4800 cells/µL. Eight 10× lanes were run at expected recovery of 45,000 cells. For sequencing exp (Figure 5 and Figure 6), divided knockout and control dissociated CMs into 12 samples of 1*106 cells and stained each using a unique CellPlex oligo. Samples were processed as in 10× Genomics Chromium 3’ V3.1 with Cell Multiplexing kit and pooled together evenly for a final concentration of 1500 cells/µL in 0.04% BSA in PBS. For sequencing exp (Figure 5), two 10× lanes were run at an expected recovery of 30,000 cells. Single-cell RNA-seq libraries were prepared using 10× Genomics Chromium 3′ V3.1 kit, following the standard protocol. For sequencing exp (Figure 6), two 10× lanes were run at an expected recovery of 60,000 cells. Single-cell RNA-seq libraries were prepared using 10× Genomics Chromium 3′ V3.1 HT kit, following the standard protocol. To construct the sgRNA enrichment libraries, 50 ng of cDNA product was used to perform an enrichment PCR using SI-PCR primer and sgRNA enrichment primers. The PCR product was purified using a 1.6× SPRI beads cleanup followed by a final index PCR using SI-PCR primer and Nextera primers. A final 1.6× SPRI beads cleanup was then performed with a final expected library size of 500 bp. To construct cell hashing libraries, 20 µL of supernatant fraction cDNA was used for hashtag amplification PCR using SI-PCR primer and Nextera Hashtag primers. For a more detailed protocol, see cell hashing publication (Stoeckius et al., 2018). To construct CellPlex libraries, followed 10× Genomics Chromium 3’ V3.1 and V3.1 HT kit with Cell Multiplexing. To construct targeted gene expression libraries for initial screen, amplified transcriptome libraries following the 10× Genomics library re-amplification protocol. Following re-amplification, libraries were pooled and targeted gene expression libraries constructed as in the 10× Genomics protocol. List of probes and primers used in Supplementary file 1. The genomic datasets presented here span 13 independent biological samples: the initial screen in Figures 1 and 2 represents one biological sample, the focused screen in Figures 3 and 4 represents another independent biological sample, the TBX5 enhancer deletion studies in Figure 5 represent eight independent biological samples (four genotypes with two independently cultured biological replicates each), and Figure 6 contains three more biological samples (three genotypes with one biological replicate each).
SequencingRequest a detailed protocol
Libraries were sequenced through a combination of Illumina NovaseqS4, NextSeq 500/550, and NextSeq 2000. We used paired-end sequencing using the following settings: R1-151bp R2-151bp idx1-10bp idx2-10bp on Novaseq; R1-28bp R2-54bp idx1-10bp and R1-28bp R2-56bp idx1-8bp on NextSeq 500/550; and R1-28bp R2-90bp idx1-10bp idx2-10bp on NextSeq 2000. All sequencing data is available on GEO: GSE190475.
MappingRequest a detailed protocol
For the initial screen and focus screens, scRNA-seq libraries were demultiplexed and mapped to the human reference genome (hg38) using the Cellranger software (ver 3.1.0, 10× Genomics). Focus screen experiments were mapped using Cellranger count with the following flags; --expect-cells=10000, and --chemistry=SC3Pv3. Initial screen experiments were mapped using Cellranger count with the following flags; --feature-ref (cell hashing library), --expect-cells=45000, and --chemistry=SC3Pv3. Cell hashing antibody assignments were included as a feature barcode. scRNA-seq mapped libraries for each experiment were combined using Cellranger aggr and normalized for sequencing depth using the flag --normalization=mapped. Sequencing exp (Figure 5) was mapped to the human reference genome (hg38) using the Cellranger software (ver 6.0.0, 10× Genomics) with Cellranger multi using the following parameters; --expect-cells=30,000 and --chemistry SC3Pv3. Sequencing exp (Figure 6) was mapped to the human reference genome (hg38) using Cellranger software (ver 6.1.2, 10× Genomics) with Cellranger multi using the following parameters: --expect-cells=60,000 and --chemistry SC3Pv3HT. Cellplex oligo assignments were included as a feature barcode. Cellplex demultiplexed libraries were aggregated using Cellranger aggr and normalized for sequencing depth using the flag --normalization=mapped. sgRNA libraries were mapped using FBA (Duan et al., 2021). FBA extract was run using a mismatch of 1 with --r1_coords 0,16 and --r2_coords 19,38. Following sgRNA sequence extraction, FBA count was run. Initial screen targeted gene expression libraries were mapped to the human reference genome (hg38) using Cellranger (ver 6.1.2, 10× Genomics) count with the following flags; --expect-cells=45,000, and --chemistry=SC3Pv3. Initial screen targeted gene expression libraries were aggregated using Cellranger aggr and normalized for sequencing depth using the flag --normalization=mapped.
Single-cell processingRequest a detailed protocol
We assigned sgRNA to cells using the saturation curve method described in Drop-seq (Macosko et al., 2015). For a given sgRNA, we calculated the cumulative distribution of the UMIs from all cells. By identifying the inflection point of the curve, we adjusted the UMIs from cells after the inflection point to be zero. The sgRNAs with the adjusted UMI count greater than 0 are considered as true sgRNAs in this cell. To process cell hashing antibodies, we applied a saturation curve in a similar method as with the sgRNA processing.
We filtered for only singlets using Scrublet in the focused screen. In our initial screen, cells in which only one antibody barcody was detected were retained. Cells with higher than 20% mitochondrial content were removed, as well as genes with fewer than 1 counts. The count matrix was natural log transformed and saved as a separate layer for differential gene expression calling before scaling counts to unit variance and zero mean. An extra filter was applied to the initial screen in which cells where we could not detect an sgRNA were removed. Post filtering we were left with 80,147 cells in the initial screen. Clustering was performed through Scanpy implementations (Wolf et al., 2018). Briefly, PCA was performed followed by neighborhood calculation using default parameters. We then clustered cells using the Louvain algorithm at 0.2 resolution for initial and focus screens, 0.15 resolution for sequencing exp (Figure 5), and 0.1 resolution for sequencing exp (Figure 6). PAGA was performed to map cluster connectivity and served as the embedding for UMAP visualization, which was applied with default settings. Clusters which held fewer than 1% of all cells were removed. This resulted in 22,214 single cells in the focus screen, 31,367 cells in sequencing exp (Figure 5), and 24,571 in sequencing exp (Figure 6).
We applied the rank_gene_groups function of scanpy to identify cluster defining genes using the Wilcoxon rank-sum method. In our initial screen, we identified the following populations: two neuronal (+SOX2), CMs (+TNNT2), mesoderm (+FN1), epithelial (+EPCAM), cardiac fibroblast (+COL3A1), and endoderm (+TTR). In our focus screen, we grouped similar clusters together and defined the following cell types: neuronal (+SOX2), CMs (+TNNT2), endoderm (+TTR), and endothelial (+PECAM1). Similar clusters were observed in sequencing exp (Figure 5 and Figure 6). For all experiments, we identified the enriched genes defining each Louvain cluster by applying the Scanpy rank_gene_groups function using the Wilcoxon rank-sums method.
Single-cell trajectory clusteringRequest a detailed protocol
To identify CM trajectory, clusters were filtered to retain cell populations relevant to cardiac differentiation. We isolated Louvain clusters corresponding to CMs in the focus screen and initial screen. The mesoderm cluster was included in the initial screen due to its involvement in CM differentiation. We then clustered these cells using PCA and PHATE with k=10/a=20 for the focus screen, k=30/a=100 in the initial screen, k=10/a=100 in sequencing exp (Figure 5), and k=10/a=50 in sequencing experiment (Figure 6). Neighborhood calculation and Louvain clustering at resolution 0.83 for the focus screen, 0.3 for the initial screen, 0.6 in sequencing experiment (Figure 5), and 0.7 in sequencing experiment (Figure 6) were applied to generate new clusters. We selected the clusters corresponding to CM and smooth muscle markers resulting in 1832 focus screen, 17,716 initial screen, 4717 sequencing experiment (Figure 5), and 4871 sequencing experiment (Figure 6) CM cells. The above clustering process was then repeated using k=10/a=20 for the focus screen, k=60/a=100 for the initial screen, k=10/a=100 in sequencing exp (Figure 5), and k=20/a=100 for sequencing experiment (Figure 6). We applied the Louvain algorithm to the initial screen, focus screen, and sequencing experiment (Figure 5) at resolution 0.3. For sequencing exp (Figure 6), an initial broad Louvain clustering was identified using resolution 1.2. This was used to identify the root mesoderm-like cluster for pseudotime calling. Following, a final resolution of 0.4 was applied. Louvain clustering identified SOX4+ progenitor, FN1+ early CMs, ACTA2+ mid-CMs, and NPPA+ atrial-like late CMs in our initial screen. Our focus screen identified three clusters comprising progenitor, mid- and late CMs. Sequencing exp (Figure 5) consisted of three clusters comprising early, mid-, and late CMs. Sequencing exp (Figure 6) consisted of four clusters comprising early, mid-, late, and IRX2+ ventricular CMs. To verify the integrity of the PHATE projected trajectory, we ran diffusion pseudotime as implemented in Scanpy dpt. Clusters reminiscent of mesoderm-like cells were selected as the roots for all projections.
We applied the ingest module of Scanpy to transfer CM cluster labels across datasets as validation. We assigned the initial screen as reference and transferred labels onto focus screen CMs. Sequencing exp (Figure 6) was also assigned as reference and labels transferred onto other CM datasets. Perturbation CM distribution analysis was repeated across all label transfer re-assignments.
sgRNA filteringRequest a detailed protocol
We designed multiple sgRNA for each candidate enhancer and promoter to account for potential off-targets. To identify sgRNA with off-target effects or that were non-functional, we applied a filtration approach using the identified CM trajectory in the large experiment. First we filtered our five non-target sgRNA to create a background. This was done by using a hypergeometric test for each CM cluster to compare the distribution of cells with each individual NT to the background of cells with any NT. We performed the hypergeometric test five times, in each iteration removing one of the NT. Through this we identified one NT which, if included, produced bias in the background distribution. After removing this NT, we were left with four similarly distributed controls which served to define our control cells for downstream analysis.
Second, for each target, we filtered to remove sgRNA that deviated in distribution to the majority. For each possible combination of sgRNA cells for each target, we performed a hypergeometric test for each CM cluster comparing the distribution of these sgRNA cells to the filtered NT cells background. For each of four CM clusters, we had two lists of p-values, one for depletion of sgRNA cells and one for enrichment. This resulted in a total of eight p-value lists. We then ordered each resulting p-value list for all sgRNA combinations in descending order of significance. A cutoff was drawn corresponding to the number of times each sgRNA is found in a given combination. We then asked whether any given sgRNA for this target was found at a significant rate to the left of this cutoff by using a hypergeometric test. If an sgRNA was found to be uniquely biased in this direction, it was removed. This was repeated for all eight p-value lists. If more than half of the sgRNA for a given target would be removed by this approach, all sgRNA were kept. This was repeated for sgRNA targeting every enhancer and promoter. All analysis, including validation experiment, used this filtered sgRNA set.
Bulk validation of CM trajectoryRequest a detailed protocol
Previously published RNA-seq data from an hESC to CM differentiation time course was mapped to the hg38 reference genome by STAR. For each sample, counts were normalized to RPKM. Replicate counts for each sample were then averaged together. We identified the top 100 uniquely enriched genes for each of our CM subpopulations in the initial screen from the Scanpy rank_gene_groups function using the Wilcoxon rank-sums method. For each enriched cluster gene set, fold change over day 0 was calculated for every time point. The fold changes of each gene from a gene set were then averaged by time point.
Comparison of sgRNA distributionRequest a detailed protocol
Clustering bias was performed in a similar manner for every dataset. We compared the distribution of sgRNA for a given target to NT sgRNA across cell clusters using a hypergeometric test. Depletion of sgRNA was calculated using the hypergeometric cumulative distribution function and enrichment by using the survival function.
To compare the distribution of sgRNA across CM trajectory pseudotimes, we applied a Mann-Whitney U test which identified whether the median pseudotime for a given target was significantly shifted from NT control.
Differential expression of cell type defining gene setsRequest a detailed protocol
We applied a Wilcoxon rank-sum test using Scanpy integration to compare gene expression of late CMs and mid-CMs from our focused screen. This approach identified 18 genes significantly enriched in mid-CMs and 80 genes enriched in late CMs. To assess the expression of these gene sets in NT control vs sgTBX5 ENH cells within the late-CM population, we calculated the average expression of this gene set within all our sgTBX5 ENH cells in late CM. As significantly more NT cells (558) achieved a late-CM fate in comparison to sgTBX5 ENH cells (122), we obtained a control expression distribution by sampling 122 NT cells through 1000 iterations and recording the average expression of genes sets each time. We applied a Z-test comparing perturbed to control expression and converted to p-value using scipy.stats.norm.cdf for enrichment and scipy.stats.norm.sf for depletion of gene set expression in perturbed cells. The same Z-test methodology was applied to our knockout screen in late CMs comparing focused screen differential gene sets between knockout and WT cells.
Differential expressionRequest a detailed protocol
Differential expression of individual genes was calculated by applying a Mann-Whitney U test through scipy.stats.mannwhitneyu to the expression distribution of a given gene in perturbed to NT control cells.
Sequencing data have been deposited in GEO under accession code: GSE190475. We have deposited Jupyter notebooks to GithHub (https://github.com/darmen04/Repression-of-CHD-associated-enhancers-delays-human-cardiomyocyte-lineage-commitment copy archived at Armendariz, 2023).
NCBI Gene Expression OmnibusID GSE190475. CHD-associated enhancers direct human cardiomyocyte lineage commitment.
Genome-wide association studies and meta-analyses for congenital heart defectsCirculation. Cardiovascular Genetics 10:e001449.https://doi.org/10.1161/CIRCGENETICS.116.001449
SoftwareCHD-associated-enhancers-shape-human-cardiomyocyte-lineage-commitment, version swh:1:rev:862bb8fc07bd255c53adc60c5b9b9e959e5676aeSoftware Heritage.https://archive.softwareheritage.org/swh:1:dir:941c483d9a12d93ce378d71205493396362e4d70;origin=https://github.com/darmen04/CHD-associated-enhancers-shape-human-cardiomyocyte-lineage-commitment;visit=swh:1:snp:e5970ef4339d88004cdb21843ab091db0514e5c1;anchor=swh:1:rev:862bb8fc07bd255c53adc60c5b9b9e959e5676ae
Advances in the genetics of congenital heart disease: A clinician’s guideJournal of the American College of Cardiology 69:859–870.https://doi.org/10.1016/j.jacc.2016.11.060
Chamber-specific cardiac expression of tbx5 and heart defects in holt-oram syndromeDevelopmental Biology 211:100–108.https://doi.org/10.1006/dbio.1999.9298
Generation of an inducible and optimized piggybac transposon systemNucleic Acids Research 35:e87.https://doi.org/10.1093/nar/gkm446
Fibronectin mediates mesendodermal cell fate decisionsDevelopment 140:2587–2596.https://doi.org/10.1242/dev.089052
Fba: feature barcoding analysis for single cell RNA-seqBioinformatics 37:4266–4268.https://doi.org/10.1093/bioinformatics/btab375
Crispr/Cas9-Mediated scanning for regulatory elements required for HPRT1 expression via thousands of large, programmed genomic deletionsAmerican Journal of Human Genetics 101:192–205.https://doi.org/10.1016/j.ajhg.2017.06.010
The incidence of congenital heart diseaseJournal of the American College of Cardiology 39:1890–1900.https://doi.org/10.1016/s0735-1097(02)01886-7
Familial heart disease with skeletal malformationsBritish Heart Journal 22:236–242.https://doi.org/10.1136/hrt.22.2.236
Cardiac outflow tract development relies on the complex function of sox4 and sox11 in multiple cell typesCellular and Molecular Life Sciences 71:2931–2945.https://doi.org/10.1007/s00018-013-1523-x
Isolation and functional characterization of alpha-smooth muscle actin expressing cardiomyocytes from embryonic stem cellsCellular Physiology and Biochemistry 25:595–604.https://doi.org/10.1159/000315078
Regulatory variation in a TBX5 enhancer leads to isolated congenital heart diseaseHuman Molecular Genetics 21:3255–3263.https://doi.org/10.1093/hmg/dds165
TBX5: A key regulator of heart developmentCurrent Topics in Developmental Biology 122:195–221.https://doi.org/10.1016/bs.ctdb.2016.08.008
Birth prevalence of congenital heart disease worldwide: a systematic review and meta-analysisJournal of the American College of Cardiology 58:2241–2247.https://doi.org/10.1016/j.jacc.2011.08.025
A Sox2 distal enhancer cluster regulates embryonic stem cell differentiation potentialGenes & Development 28:2699–2711.https://doi.org/10.1101/gad.248526.114
[Editors' note: we include below the reviews that the authors received from another journal, along with the authors’ responses.]
Summary of changes
1. We thank the reviewers for their careful review of our manuscript and their helpful comments. We have made several key updates to the manuscript, including:
2. We generated and validated H9 hESCs with TBX5 exon deletion, and we performed comparative studies with TBX5 enhancer knockouts during cardiomyocyte differentiation.
3. We performed new experiments to independently phenotype TBX5 enhancer KOs in comparison to TBX5 exon KOs and WT cells. New experiments included: single-cell RNA-Seq, qPCR, immunocytochemistry, and FlowFISH (PrimeFlow).
4. We performed target amplification studies of putative enhancer target genes to verify gene expression changes.
5. To validate the integrity of cell cluster labeling, we performed cell label transfer analysis.
6. We improved the single-cell analysis to give more molecular insights on enhancer perturbations, including downstream target genes.
Below, we provide point-by-point responses to the comments. All the responses have been incorporated into the revised manuscript.
It is a shame that the authors present broad cell state as a readout, rather than more in depth reporting of gene expression data.
We agree that more in-depth reporting of gene expression phenotypes would be helpful. To address this issue, we overcome several technical challenges.
– First, some cardiac genes are lowly expressed, making differential gene expression analysis underpowered. Hence, we have performed targeted sequencing to measure the expression of key cardiac gene targets (Figure 2—figure supplement 1D).
For example, this analysis shows that CRISPRi of TGFB1 promoter and enhancer results in 28% and 38% knockdown, respectively (Figure 2—figure supplement 1E).
– A second technical challenge is that CRISPRi repression results in incomplete silencing in the single-cell screens. For example, our positive control of targeting MALAT1 achieves 28% silencing (Figure 1G). To address this concern, we have generated additional enhancer knockouts, focusing on TBX5 regulatory elements. RNA sequencing of these full knockouts revealed gene expression signatures of fully penetrant enhancer perturbations. For example, knockout of the TBX5 exon resulted in repression of key cardiac development genes, which is phenocopied in TBX5 enhancer knockouts. These include expected targets of TBX5. See 3 Manhattan plots in Figure 5K, Figure 5—figure supplement 1D.
– A third technical challenge is that, by design, our cardiac differentiation strategy is only modestly efficient, and each perturbed element only has a modest number of cells attaining the mesodermal state. To address this issue, we combined cells with perturbed enhancers of the same gene. This allows us to pinpoint genes that were differentially expressed. See Manhattan plots in Figure 2O and Figure 4—figure supplement 1C.
(Initial Screen) Differentially expressed genes in CM cells with an sgRNA targeting a TBX5 enhancer. Please note the loss of NPPA and NPPB expression.
(Focused Screen) Differentially expressed genes in CM cells with an sgRNA targeting a TBX5 enhancer. Please note the loss of NPPA and NPPB expression.
Why would a TNNT2 enhancer inhibition alter cell state rather than just contractile function?
While TNNT2 is not known to function as a transcriptional regulator, recent results from similar screens have identified unexpected gene expression phenotypes for genes not typically associated with regulatory function. For example, Replogle et al. showed that knockdown of chromosome segregation genes (with unclear roles in transcription) resulted in gene expression phenotypes (Replogle et al. 2022). Similarly, genes with roles in glycolysis, vesicular trafficking, and DNA replication also unexpectedly caused transcriptional changes. These observations make possible the systematic assessment of genotype-phenotype relationships by using the transcriptome as a readout. Thus, TNNT2 knockdown may impact transcriptional states through similarly unexpected effects. One possibility is that, since TNNT2 is a critical component of the sarcomeric apparatus, its loss could initiate a mechanosensory feedback response resulting in delayed CM specification. In support of this possibility, we observe that CRISPRi perturbations of two TNNT2 enhancers and the promoter consistently yield changes in transcriptional and cell state phenotypes (Figure 2G) (Figure 2I).
How do the data for the TBX5 enhancers compare with the gene knockout data?
To adequately address this reviewer’s concern, we have generated and extensively validated TBX5 exon knockouts in H9 hESCs. Overall, we observe that:
– Knockout of TBX5 enhancers partially phenocopies TBX5 gene knockout. We find that TBX5 gene and enhancer knockouts exhibit reduced TBX5 protein expression by immunocytochemistry (Figure 5C-E).
– Consistently, the TBX5 downstream target NPPA also exhibits reduced protein expression by immunocytochemistry (Figure 5D).
–To examine changes in cell state, we performed new single-cell RNA-Seq experiments in all exon and enhancer 3 / 5 knockouts. We observe that TBX5 gene and enhancer knockouts also result in deficient differentiation of CMs (Figure 5I-J and Figure 5—figure supplement 1D).
–Consistent with an increase in earlier stage CMs, we find that TBX5 gene and enhancer knockouts have increased RNA expression of FN1 Figure 5K-N shows data from the single-cell RNA-Seq experiments.
– qPCR and western blot experiments also confirm this result (Figure 5L and 5N)
–To test if we can observe increased FN1 expression using an orthogonal assay, we performed FlowFISH experiments, which uses RNA in situ hybridization to examine expression in individual cells. Consistent with an increase in earlier stage CMs, we find that TBX5 gene and enhancer knockouts have increased RNA expression of FN1 (Figure 5M).
What are the type of CHDs represented in this set of loci?
CHD variants proximal to our target loci encompass a diverse set of CHDs including atrial and septal defects and Tetralogy of Fallot. We have added a table with patient phenotype information (Supplementary file 1).
The analysis falls short of bringing truly new insight into CHDs, cardiac differentiation, or the specific function of enhancers in these processes. As the iPS cell system is 2D, and apparently in the authors' hands very (surprisingly: neurons?) heterogeneous in its output, the relevance and importance of the findings as presented is limited.
Yes, we agree that standard cardiac differentiation strategies are less heterogeneous, but they use highly stringent metabolic selection to enrich for cardiomyocytes. We sought to avoid this selection because it could mask subtle phenotypes resulting from promoter and enhancer perturbations. For example, mice lacking TBX5 do not produce hearts (Bruneau et al. 2001), but ESCs lacking TBX5 can still differentiate into cardiomyocytes (Zhao et al. 2021). Therefore, we eliminated the metabolic selection step during hESC differentiation, resulting in the production of heterogeneous cell states including both mesodermal (cardiomyocytes and progenitors), ectodermal (neuronal), and endodermal states. This strategy was previously used by Tohyama et al. (Tohyama et al. 2013).
In support of the relevance of this system, Zic2 is required for early specification of cardiomyocytes. In our system, CRISPRi knockdown of Zic2 results in loss of cardiomyocyte and neuronal specification. These cells with Zic2 sgRNAs are almost entirely diverted to the mesodermal lineage, and yet do not efficiently differentiate into cardiomyocytes (Figure 1—figure supplement 1H). These results suggest that the system is relevant and that the results from other perturbations are also potentially relevant to CHD.
The authors state that loss of TBX5 alters CM lineage. In fact the reduced dosage of TBX5 did not alter this, and complete loss of TBX5 only altered later differentiation endpoints. So the authors' partial (slightly over 50%) reduction in TBX5 is in fact consistent with the allelic deletion series.
The conclusion that "these enhancers have roles in CM lineage commitment." Is not consistent with the data, which show (incompletely) modestly reduced numbers of late CMs. This would be a decreased differentiation efficiency.
We agree with the reviewer. Work from Kathiriya et al. identified that “TBX5in/+ cells followed a path similar to WT but to a transcriptionally distinct endpoint”. As correctly pointed out by the reviewer, our TBX5 enhancer knockdown results are quite consistent with the allelic series of the prior study. Both TBX5 hypomorphs and enhancer knockouts display similar populations with inefficient cardiomyocyte differentiation. We have modified the text to reflect this observation.
We agree with the reviewer on the subject of accurate language. We have modified the text to reflect a decrease in differentiation efficiency. An example is shown below:
Before: “We showed that loss of cardiac enhancer function results in delayed CM specification…
After: “We observe that perturbation of CHD-associated enhancers, particularly for TBX5, results in deficient CM differentiation.”
1. The efficiency of CRISPRi knockdown was only validated for the promoters MALAT1 and OCT4, however there is no evidence of enhancer loss-of-function provided. Considering all findings from the study rely on the robustness of the initial large-scale CRISPRi screen, it is critical the authors provide clear evidence on the efficacy of this system in repressing target enhancer activity. While the functional validation of knockdown efficiency for all sgRNAs is unreasonable, at a minimum the loss of enhancer activity should be confirmed via qPCR or CAGE-seq. We recognise this is a very challenging QC assay as it relates to validating knockdown of enhancer elements (vs genes) – but it is essential to have this to ensure fidelity of downstream data interpretation and assays.
We agree that we did not provide direct evidence of enhancer loss of function. We have performed two experiments to mitigate this.
First, we have performed TBX5 qPCR of D8 CMs following transduction with either non-target sgRNA or sgRNA targeting TBX5 enhancer 3 or 5. We observe marked reduction of TBX5 when either enhancer is repressed (Figure 1—figure supplement 1I).
Second, we performed target transcript amplification of 25 cardiac genes proximal to our targeted enhancers. This allowed for the robust detection of lowly expressed genes such as TGFB1 where we can detect repression in either TGFB1 promoter or enhancer knockdown cells (Figure 2—figure supplement 1E).
2. The 'NC control' referred to throughout the manuscript is not defined. This is critical, as this control is used as a reference point in all analyses included in this study. Are these cells lentivirus treated cells without gRNA or are these non-treated cells? The inclusion of an appropriate control is crucial, especially considering studies have shown that the introduction of lentiviral vectors in cells induces a robust innate immune response and technical factors such as these could influence the differentiation efficiency.
We agree that non-treated cells are an inappropriate control. In our studies, we use negative control sgRNAs that are lentivirally transduced together with targeting sgRNAs. Thus, negative controls sgRNAs are an internal experimental control in the pooled CRISPR screen.
We apologize for the confusion in our notation. In the text, we used “NC” to refer to a set of “negative control” sgRNAs, which do not target the genome: either targeting GFP, the CAG Promoter, or empty sgRNA. We now explicitly define this in the text. To further reduce confusion, we use “NT” to refer to these “non-target” sgRNA controls.
3. The annotation of cell types in all scRNA-seq datasets are poorly justified. Rather than relying on individual gene markers for manual annotation, we suggest the use of label transfer or other robust methods for unbiased cell type annotation. This is a major concern considering the underlying claims made in this study rely heavily on the quality of this initial cell type labeling step. This is relevant to all single cell analyses performed in the study.
We thank the reviewer for the suggestion. To address this concern, we first defined CM subpopulations for the initial screen. This was done using a combination of marker genes and pseudotime ordering of clusters (Figure 2A) (Figure 6F). We then applied the ingest function within the scanpy python package to label CMs in the focused screen using the initial screen as reference (Figure 4—figure supplement 1D). Cell label transfer produces similar clustering to our manual annotations and also supports the observation of deficient CM differentiation in hits (Figure 4—figure supplement 1E).
Similar analysis using the sequencing exp (Figure 6) datasets as a reference also yield consistent results (Figure 6—figure supplement 1H-K).
4. The consistent reliance on the proportional distribution of cells in scRNA-seq data (Figures 1H, 2G, 2J, 2L, 3E, 4F, 5E, 5H) is not sufficient to justify the claim that loss of target enhancers disrupts the proper differentiation of CMs as the authors claim. In some cases, the differences in proportions are so small (e.g. Figure 1H) that their biological significance is difficult to interpret. These data are correlative and require validation via independent methods such as qPCR, FACs, imaging or other independent endpoints to support claims about effects of enhancer KD on differentiation.
To address the reviewer’s concern, we have performed several independent assays to assess CM phenotypes using several genetic knockouts: TBX5 exon3 -/-, TBX5 enhancer 3 -/-, and TBX5 enhancer 5 -/-.
– qPCR: We performed qPCR for NPPA (late CM marker), and FN1 (early CM marker). In all knockouts, we observed elevated levels of FN1 and depletion of NPPA (Figure 5C and 5L).
– FlowFISH: To test if we can observe increased FN1 expression using an orthogonal assay, we performed FlowFISH experiments, which uses RNA in situ hybridization to examine expression in individual cells. We performed FlowFISH for FN1 in all exon and enhancer knockouts. Briefly, cells are stained with fluorescent probes targeting FN1 and RPL13A control transcripts. After staining, FN1 expression is quantified at the single cell level through flow cytometry, relative to the RPL13A control. We observe elevated levels of FN1 transcript through PrimeFlow (Figure 5M).
– Western Blot: As an alternative validation of increased FN1 expression, we examined protein expression of FN1 across our TBX5 genetic knockouts using western blots. Western blot confirmed elevated FN1 across the knockouts (Figure 5N).
– ICC: As a further validation, we examined protein expression of TBX5 and NPPA across our TBX5 genetic knockouts using immunocytochemistry. ICC phenocopied scRNA-seq data, showing decreased TBX5 and NPPA across the knockouts (Figure 5D-E).
Overall, these results support the observations from single-cell analysis that TBX5 gene and enhancer perturbations cause deficient CM specification.
5. Cellular phenotyping of TBX5-enhancer CRISPR knockout cells via flow cytometry, protein expression and calcium transient measurements displayed no significant difference to control (Figure S5), deriving >90% TNNT2+ cardiomyocytes on day 23 of differentiation. These results are discordant with the findings from the scRNA-seq analysis of day 8 TBX5-enhancer knockout cells where the authors claim effects of TBX5 enhancers on cardiomyocyte specification. We suggest the authors perform secondary validation of perturbed cells at day 8 (e.g. using FACs and qPCR) to assess the effects of TBX5-enhancer knockdown on the cardiac lineage.
Following the reviewer’s suggestion, we performed a panel of secondary validation experiments, including qPCR, FlowFISH, and immunocytochemistry, on TBX5 exon and enhancer genetic knockouts: TBX5 exon3 -/-, TBX5 enhancer 3 -/-, and TBX5 enhancer 5 -/-. These analyses were conducted at day 8 of differentiation to match our scRNA-seq analysis, and are also mentioned above in Response 4. Consistently, these data support the molecular phenotypes observed in the single-cell data and are also in agreement with the conclusions from the study by Kathiriya et al.
We agree with the reviewer that comparing cellular phenotypes at day 23 of differentiation (when these assays can be readily performed) to single-cell RNA-Seq data at day 8 is not ideal. To avoid confusion, therefore, we have removed day 23 data and focused on day 8.
6. Figure 5H: these data point to differences between ENH3-/- and ENH5-/- groups suggesting that these enhancers may be acting differently to influence transcriptional programs during cardiac differentiation. However, these differences are not addressed in the text and the authors claim that they have the same role. These difference are perhaps the most interesting in the manuscript and could be further developed because this may provide insight into the regulatory function of different enhancers in controlling cardiac lineage specification. Can the authors develop this finding further as a key area of novelty and significance in the study?
Response 6: During revision, we determined that the TBX5 ENH5 clone described in the first submission was a functional heterozygous deletion (rather than a homozygous deletion) (Figure 6B). We have since rederived and confirmed TBX5 ENH5 -/- hESC clones. After repeating the CM differentiation and single-cell sequencing, we observe minimal differences between ENH3 and ENH5 KOs in regards to differentiation deficiency (Figure 5I). We have revised the section regarding Figure 5 accordingly. In addition, we have added a new paragraph describing the result of the heterozygous ENH5 +/- (derived in the first version of this manuscript). We conclude that ENH5 +/- cells do not display differentiation deficiencies to the degree of full enhancer knockouts. These observations are in-line with those observed for the TBX5 gene deletions previously published.
1. Figure 2E: The order in which the targets are displayed is confusing and makes it difficult to interpret. It is suggested the authors display the enhancer targets in a separate panel to the promoter targets.
We thank the reviewer for the suggestion and have updated the figure (Figure 2E) jnto three panels to display promoters, enhancer, and controls.
2. It is unclear what additional novel insights are gained from the secondary CRISPRi screen of 6 TBX5 enhancers- the data appear to confirm the same conclusions (e.g. Figures 3E-F and 4F-G) already drawn from the inclusion of these targets in the large-scale screen (Figure 2L-M). Can the authors clarify this point?
The main purpose of the secondary CRISPRi screen is to confirm the observations from the initial screen given our concern for potential false positive hits. Not only did we validate the primary screen, but we also found that the secondary CRISPRi screen exhibited greater repression as portrayed by a general depletion of TBX5 enhancer knockdown cells within the CM cluster (Figure 4F). Overall, the data from the secondary screen phenocopied what was observed in the primary large-scale screen and allowed for differential expression analysis since CRISPRi repression was stronger (Figure 4H-O).
3. Throughout the manuscript the authors use jargon and vague language that undermines the narrative of the study. For example: "delays CM fate commitment by altering transcriptional signatures", "results in delayed transcriptional specification of CMs." The authors should carefully review the manuscript to ensure accuracy of results and interpretation.
We thank the reviewer for their comment. We have opted for more precise language which has been corrected in the text. An example of modified language is below:
Before: Taken together with previous observations, misregulation of TBX5 through enhancer modulation delays CM fate commitment by altering transcriptional signatures.
After: Taken together with previous observations, misregulation of TBX5 through enhancer modulation leads to deficient induction of CM transcriptional signatures that are associated with CM fate commitment.https://doi.org/10.7554/eLife.86206.sa2
Article and author information
- Gary C Hon
- Nikhil V Munshi
- Gary C Hon
- Gary C Hon
- Gary C Hon
- Nikhil V Munshi
- Nikhil V Munshi
Burroughs Wellcome Fund (1019804)
- Gary C Hon
Burroughs Wellcome Fund (1009838)
- Nikhil V Munshi
Welch Foundation (I-2103-2022033)
- Gary C Hon
Department of Defense (PR172060)
- Nikhil V Munshi
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We acknowledge the BioHPC computational infrastructure at UT Southwestern for providing HPC and storage resources that have contributed to the research results reported within this paper. GCH is supported by CPRIT (RP190451), NIH (DP2GM128203, UM1HG011996, 1R35GM145235), the Burroughs Wellcome Fund (1019804), the Welch Foundation (I-2103-20220331), and the Green Center for Reproductive Biology. NVM was supported by the NIH (HL136604, HL151650, and UM1HG011996), the Burroughs Wellcome Fund (1009838), and the Department of Defense (PR172060).
- Didier YR Stainier, Max Planck Institute for Heart and Lung Research, Germany
- Benoit Bruneau, University of California, San Francisco, United States
- Preprint posted: April 10, 2022 (view preprint)
- Received: January 15, 2023
- Accepted: March 10, 2023
- Accepted Manuscript published: April 25, 2023 (version 1)
- Version of Record published: May 3, 2023 (version 2)
© 2023, Armendariz, Goetsch et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
- Page views
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
- Chromosomes and Gene Expression
- Genetics and Genomics
Regulatory networks underlying innate immunity continually face selective pressures to adapt to new and evolving pathogens. Transposable elements (TEs) can affect immune gene expression as a source of inducible regulatory elements, but the significance of these elements in facilitating evolutionary diversification of innate immunity remains largely unexplored. Here, we investigated the mouse epigenomic response to type II interferon (IFN) signaling and discovered that elements from a subfamily of B2 SINE (B2_Mm2) contain STAT1 binding sites and function as IFN-inducible enhancers. CRISPR deletion experiments in mouse cells demonstrated that a B2_Mm2 element has been co-opted as an enhancer driving IFN-inducible expression of Dicer1. The rodent-specific B2 SINE family is highly abundant in the mouse genome and elements have been previously characterized to exhibit promoter, insulator, and non-coding RNA activity. Our work establishes a new role for B2 elements as inducible enhancer elements that influence mouse immunity, and exemplifies how lineage-specific TEs can facilitate evolutionary turnover and divergence of innate immune regulatory networks.
- Genetics and Genomics
Upstream open reading frames (uORFs) are potent cis-acting regulators of mRNA translation and nonsense-mediated decay (NMD). While both AUG- and non-AUG initiated uORFs are ubiquitous in ribosome profiling studies, few uORFs have been experimentally tested. Consequently, the relative influences of sequence, structural, and positional features on uORF activity have not been determined. We quantified thousands of yeast uORFs using massively parallel reporter assays in wildtype and ∆upf1 yeast. While nearly all AUG uORFs were robust repressors, most non-AUG uORFs had relatively weak impacts on expression. Machine learning regression modeling revealed that both uORF sequences and locations within transcript leaders predict their effect on gene expression. Indeed, alternative transcription start sites highly influenced uORF activity. These results define the scope of natural uORF activity, identify features associated with translational repression and NMD, and suggest that the locations of uORFs in transcript leaders are nearly as predictive as uORF sequences.