Abstract
The non-B DNA structures can act as dynamic functional genomic elements regulating gene expression. Among them, G4s and R-loops are two of the best studied. The interplay between R-loops and G4s are emerging in regulating DNA repair, replication and transcription. A comprehensive picture of native co-localized G4s and R-loops in living cells is currently lacking. Here, we describe the development of HepG4-seq and an optimized HBD-seq methods, which robustly capture native G4s and R-loops, respectively, in living cells. We successfully employed these methods to establish comprehensive maps of native co-localized G4s and R-loops in human HEK293 cells and mouse embryonic stem cells (mESCs). We discovered that co-localized G4s and R-loops are dynamically altered in a cell type-dependent manner and are largely localized at active promoters and enhancers of transcriptional active genes. We further demonstrated the helicase Dhx9 as a direct and major regulator that modulates the formation and resolution of co-localized G4s and R-loops. Depletion of Dhx9 impaired the self-renewal and differentiation capacities of mESCs by altering the transcription of co-localized G4s and R-loops - associated genes. Taken together, our work established that the endogenous co-localized G4s and R-loops are prevalently persisted in the regulatory regions of active genes and are involved in the transcriptional regulation of their linked genes, opening the door for exploring broader roles of co-localized G4s and R-loops in development and disease.
Introduction
Genomic DNA can form various types of non-B secondary structures, including G-quadruplexes (G4s), R-loops, Z-DNA, i-motifs, Cruciform and others (Matos-Rodrigues et al., 2023). Among them, G4s and R-loops are two of the best studied. G4s are built by stacked guanine tetrads connected via Hoogsteen hydrogen bonds and can be formed by intra- or inter-molecular folding of the tetramers (Panyutin et al., 1989; Sen and Gilbert, 1988; Sundquist and Klug, 1989; Williamson et al., 1989). R-loops are three-stranded structures containing a DNA-RNA hybrid and a displaced single-stranded DNA (Aguilera and Garcia-Muse, 2012; Xu and Clayton, 1996). Both G4s and R-loops are involved in key biological processes, including transcription, replication, genomic instability, class switch recombination in B cells, DNA damage and repair, and telomere maintenance (Garcia-Muse and Aguilera, 2019; Robert Hänsel-Hertsch, 2017; Varshney et al., 2020; Yang et al., 2023).
R-loops appear to have a strong sequence preference with high G/C ratios (Ginno et al., 2013; Ginno et al., 2012). Reports about the interplay between R-loops and G4s are emerging. Specific G4 ligands stabilized G4s and simultaneously increase R-loop levels within minutes in human cancer cells, which finally induced DNA damage (De Magis et al., 2019). Reactive oxygen species (ROS) have been reported to induce G4 and R-loop formation at transcriptionally active sites, and their inter-regulation is essential for the DNA repair (Tan et al., 2020). In a reconstituted eukaryotic DNA replication system, the interplay of R-loops and G4s was shown to impact replication fork progression by inducing fork stalling (Kumar et al., 2021). Single-molecule fluorescence studies showed the existence of a positive feedback mechanism of G4 and R-loop formation during transcription, where the transcription-induced R-loop precedes and facilitates G4 formation in the non-template strand, and in turn G4 promotes the R-loop formation in the following rounds of transcription (Lee et al., 2020; Lim and Hohng, 2020). Wulfridge et al. reported that the architectural protein CCCTC binding factor (CTCF)-bound sites are enriched for R-loops and G4s which facilitate CTCF binding to promote chromatin looping interactions (Wulfridge et al., 2023).
Detection of G4s and R-loops have been largely based on the use of a single-chain variable fragment (scFv) BG4 for G4s and a monoclonal antibody S9.6 for R-loops. In recent years, these two affinity probe have been coupled with deep sequencing to genome-widely detect G4s and R-loops (Galli et al., 2022; Ginno et al., 2012; Hänsel-Hertsch et al., 2016; Jiang et al., 2023; Lyu et al., 2021). Using BG4 and S9.6 -based CUT & Tag, G4s and R-loops showed high degree of co-occurrence in mESCs (Lyu et al., 2021). However, given that a group of helicases, RNA-binding factors, endonucleases and DNA topoisomerases cooperate to actively dissolve G4s and R-loops restoring B-formed DNA duplexes (Robert Hänsel-Hertsch, 2017; Varshney et al., 2020; Yang et al., 2023), a steady state equilibrium is generally set at low levels in living cells under physiological conditions (Miglietta et al., 2020) and thus addition of high affinity antibodies may pull the equilibrium towards folded states. Additionally, the specificity of the S9.6 antibody on R-loops has been questioned recently for accurate quantification and mapping of R-loops (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013).
To understand the co-localized G4s and R-loops in living cells under physiological conditions, we sought to develop an in vivo strategy for G4 profiling based on the G4-hemin complex-induced proximal labeling and R-loop profiling based on the N-terminal hybrid-binding domain (HBD) of RNase H1. Recent studies showed that G4s could tightly form a complex with the cellular cofactor hemin both in vitro and in living cells, where hemin binds by end-stacking on the terminal G-quartets of G4s without affecting the folding of G4s (Gray et al., 2019; Stadlbauer et al., 2021). The G4-hemin complex has been shown to act as a peroxidase to catalyze oxidation reactions in the presence of hydrogen peroxide (H2O2) (Cheng et al., 2009; Lat et al., 2020; Yang et al., 2011). The H2O2-activated G4-hemin complex oxidizes the biotin tyramide to phenoxyl redicals that covalently conjugate biotin to G4 itself and its proximal DNA within 10 nm (equivalent to approximately 31 bp) in vitro and in vivo (Einarson and Sen, 2017; Lat et al., 2020). Here, we have utilized the G4-hemin-mediated proximal biotinylation rection to develop a new method HepG4-seq (for high throughput sequencing of hemin-induced proximal labelled G4s) to map the genomic native G4s under physiological conditions. The HBD domain of RNase H1 mediates the specific recognition of DNA/RNA hybrid in a sequence-independent manner, which is a gold standard for R-loop recognition in the cell (Nowotny et al., 2008). The catalytically inactive RNase H1 or its HBD domain -based methods have been successfully used to identify genome-wide native R-loops (Chen et al., 2017; Wang et al., 2021). We have adapted the “GST-His6-2xHBD”-mediated CUT&Tag protocol (Wang et al., 2021) to develop the HBD-seq protocol in this study.
We have combined the HepG4-seq and HBD-seq to profile the genome-wide native co-localized G4s and R-loops with high signal-to-noise ratios in HEK293 cells and mouse embryonic stem cells (mESCs). We observed that the co-localized G4s and R-loops exhibit cell type-dependent distributions and are largely localized at active promoters and enhancers of transcriptionally active genes. We further showed that ∼70% of the co-localized G4s and R-loops in mESCs were directly bound by the helicase Dhx9 and that depletion of Dhx9 significantly altered the levels of ∼6200 co-localized G4s and R-loops bound by Dhx9. Furthermore, depletion of Dhx9 was shown to impair the self-renewal and differentiation capacities of mESCs by altering the transcription of co-localized G4s and R-loops -associated genes.
Results
Mapping of the native DNA G4 through the G4-hemin-mediated proximal biotinylation
The DNA G4-hemin complex could act as a mimic peroxidase to oxidize the biotin tyramide to phenoxyl radicals that can covalently conjugate biotin to G4 itself and its proximal DNA within ∼30bp in the presence of H2O2 (Cheng et al., 2009; Einarson and Sen, 2017; Lat et al., 2020). However, the efficiency of peroxidase-mediated biotinylation on DNA is limited using the substrate biotin tyramide (Zhou et al., 2019). Recently, biotin aniline (Bio-An) has been shown to have superior labeling efficiency on DNA than biotin tyramide, when catalyzed by the engineered peroxidase APEX2 (Zhou et al., 2019). The free heme concentration in normal human erythrocytes is 21 ± 2 µM (Aich et al., 2015). To explore the G4-hemin-mediated biotinylation in the living cells, we treated HEK293 cells with 25 µM hemin and 500 µM Bio-An for 2 hours prior to activation with 1mM H2O2 for 1 minute, and then quenched the labeling reaction and performed the immunofluorescence staining using Alexa Fluor 647 conjugated recombinant streptavidin (Strep-647) that specifically recognizes biotin. As shown in Fig. 1A, cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity. To understand whether addition of hemin disturbs the formation of G4s, we performed the BG4 CUT&Tag-seq using the recombinant BG4 and Tn5 on HEK293 cells treated with and without 25 µM hemin (Supplementary Fig. 1A, B). The heatmap and profile plot analysis showed similar BG4 CUT&Tag signals between the hemin-treated and control samples (Supplementary Fig. 1B). There were only 174 BG4 CUT&Tag peaks with significantly differential signals between the hemin-treated and control samples (Supplementary Fig. 1C). These data suggest that the hemin treatment condition we used does not significantly affect G4 folding. Therefore, hemin-induced proximal biotinylation of G4s could be utilized to mark the native G4s in living cells.
The recombinant streptavidin monomer (mSA) combines the streptavidin and rhizavidin sequences to achieve specific monovalent detection of biotin or biotinylated molecules with a high affinity (Kd= 2.8 nM) (Lim et al., 2013). The Moon-tag system consists of a 15-amino acid peptide GP41-tag and a 123-amino acid anti-GP41-tag nanobody with an affinity of ∼30 nM in vitro (Boersma et al., 2019). We expressed and purified the recombinant mSA fused with the anti-GP41 nanobody (mSA-scFv), and the recombinant Tn5 fused with the GP41-tag and protein G (GP41-pG-Tn5) from E. coli (Supplementary Fig. 1A). To map the hemin-induced biotinylated G4s, we developed a new method HepG4-seq (Fig. 1B), where mSA-scFv recognizes the biotinylated G4s and recruits the transposase Tn5 to achieve “Cleavage Under Targets and Tagmentation (CUT&Tag)”. Deep sequencing analysis of the biotinylated G4s fragments identified 6,799 consensus peaks from two independent biological repeats in HEK293 cells, where the signals were dramatically diminished in HEK293 cells without treatment with hemin and Bio-An, suggesting the specificity of HepG4-seq (Fig. 1C, Supplementary Table 1). Several representative HepG4s-seq-identified G4 peaks were shown in Fig. 1D. Genomic distribution analysis showed that G4s are mainly localized in promoters (38.7%) and gene bodies (47.1%) (Fig. 1E).
We also evaluated the HepG4-seq-identified peaks using a G4-froming sequences (PQS) predication tool pqsfinder which has been shown to have 96% accuracy on ∼400 known and experimentally observed G4 structures (Hon et al., 2017). The peaks identified by HepG4-seq overlap quite well with the center of pqsfinder maxScores that report the PQS quality (Fig. 1C-D). The motif enrichment analysis by HOMER (Heinz et al., 2010) revealed a high prevalence of G-rich sequences in HepG4-seq peaks (Fig. 1F). All above further validated the specificity of HepG4-seq in capturing G4s.
Induction of DNA G4s by inhibiting G4 resolving helicase
The RecQ-like helicases Bloom syndrome protein (BLM) and Werner syndrome ATP-dependent helicase (WRN) are the first recognized and the best characterized DNA G4-resolving mammalian helicases (Fry and Loeb, 1999; Mendoza et al., 2016; Mohaghegh et al., 2001). The small molecules ML216 and NSC617145 are selective and cell permeable inhibitors of BLM and WRN, respectively, by inhibiting their ATPase activity (Aggarwal et al., 2013a; Aggarwal et al., 2013b; Nguyen et al., 2013). To investigate the effect of BLM or WRN inhibition on native G4s, we treated HEK293 cells with ML216 or NSC617145 for 16 hours and then labeled cellular G4s by hemin-G4-induced biotinylation in living cells. Immunofluorescence staining of the treated cells using Strep-647 showed that the treatment of ML216 or NSC617145 remarkably elevated signals of native G4s (Fig. 1G). Furthermore, we performed HepG4-seq on HEK293 cells treated with ML216 or NSC617145. Notably, HepG4-seq identified 77,003 peaks from ML216- or NSC617145-treated HEK293 cells, and ∼ 70,000 new G4 peaks were induced by inhibition of BLM or WRN (Supplementary Table 1). The signals of G4s detected by HepG4-seq were significantly increased after inhibiting BLM or WRN (Fig. 1H-I). Representative G4s peaks are shown in Fig. 1J. Taken together, these data suggest that HepG4-seq is able to efficiently detect dynamic native G4s.
Mapping of native co-localized G4s and R-loops in HEK293 cells
The HBD domain of RNase H1 has been demonstrated as a DNA/RNA hybrid recognition sensor and applied to identify genome-wide native DNA/RNA hybrids using the recombinant GST-His6-2xHBD coupling with Tn5-based CUT&Tag (Nowotny et al., 2008; Wang et al., 2021). Given that the GST-fusion proteins are prone to form variable high molecular-weight aggregates and these aggregates often undermine the reliability of the fusion proteins (Deceglie et al., 2014; Ki and Pack, 2020), we produced the recombinant two copies of HBDs fused with EGFP and V5-tag (HBD-V5) (Supplementary Fig. 1A) and used the anti-V5 tag antibody instead of the anti-His tag antibody for the CUT&Tag-seq. We call this modified protocol as HBD-seq for mapping the native R-loops in cells (Fig. 2A). We performed the HBD-seq on HEK293 cells, and revealed 42,488 consensus native R-loops peaks with a high signal-to-noise ratio while the HBD-seq signals were dramatically diminished in HEK293 cells treated with the RNases prior to HBD-seq (Fig. 2B), suggesting the specificity of HBD-seq in detecting native R-loops.
We then analyzed the regions co-occupied by both HepG4-seq-identified G4s and HBD-seq-identified R-loops, and revealed 5030 native co-localized peaks in HEK293 cells, ranging in size from 100 bp to ∼1.5 kb (Fig. 2C-D, Supplementary Fig. 1D). 73.8% of these co-localized peaks are localized at promoters, 5’UTR, exon1 and intron 1 (Fig. 2E). When we performed a metagene analysis of these co-localized peaks, a distinct peak was detected around the transcription start site (TSS) (Fig. 2F). Representative co-localized peaks are shown in the Fig. 2G. The motifs enrichment analysis by HOMER (Heinz et al., 2010) showed that G-rich sequences are highly enriched in the co-localized peaks (Fig. 2H).
The co-localized G4s and R-loops-mediated transcriptional regulation in HEK293 cells
The predominant distribution of co-localized peaks around TSS implies that they may participate in transcriptional regulation of their associated genes. RNA-seq analysis revealed that the RNA levels of co-localized G4s and R-loops-associated genes are significantly higher (Fig. 3A). Different from G4s and R-loops, the co-localized G4s and R-loops are mainly localized within 1 kb of the TSS of transcriptionally active genes (∼60% peaks with FPKM >=5) (Fig. 3B). To investigate the transcriptional regulation of co-localized G4s and R-loops in living cells, we performed the RNA-seq on HEK293 cells treated with and without ML216 or NSC617145 and then analyzed the differential gene expression using DESeq2 (Love et al., 2014). As a result, hundreds of genes were linked to co-localized G4s and R-loops with increased G4 signals (at least 1.5 foldchange) and at the same time exhibited significant changes in expression levels upon inhibition of BLM or WRN in HEK293 cells (Fig. 3C), suggesting that co-localized G4s and R-loops could regulate the transcription of their associated genes. Distribution analysis showed that these differential co-localized G4s and R-loops are mainly localized in the promoter-TSS (Fig. 3D). Among the differential genes, 125 genes were co-regulated by both ML216 and NSC617145 (Fig. 3E), suggesting that BLM and WRN could co-regulate the transcription of genes by resolving G4s. Gene ontology (GO) analysis showed that co-localized G4s & R-loops-regulated genes in HEK293 cells are mainly involved in cell cycle regulation, DNA/mRNA metabolic regulation, DNA damage response, chromatin binding, kinase binding, cell-substrate junction, et al (Fig. 3F).
Mapping of native co-localized G4s and R-loops in mESCs
Mouse embryonic stem cells (mESCs) are pluripotent stem cells that could differentiate into various types of cells of three germ lineages (Murry and Keller, 2008; Young, 2011). To understand the regulatory roles of co-localized G4s and R-loops in mESCs, we performed the HepG4-seq and HBD-seq on mESCs and finally uncovered 68,482 native overlapping peaks in mESCs, ranging in size from 100 bp to ∼2 kb (Fig. 4A-D, Supplementary Fig. 2A, Supplementary Table 2). Notably, unlike HEK293 cells, large number of native G4s (95,128) were identified by HepG4-seq in mESCs (Fig. 4A, Supplementary Table 2), which well overlap with the PQS predicted by pqsfinder (Fig. 4B), suggesting that native G4s exhibit obvious cell type-specific distribution.
For the genomic distribution of co-localized G4s and R-loops, unlike HEK293 cells, only 34.2% peaks are localized in promoters, exon1 and intron1 while 25.9% peaks in intergenic regions (Fig. 4E). The distinct number and localization feature of co-localized G4s and R-loops in HEK293 cells and mESCs shows the cell type-specific distribution. The metagene analysis of overlapping peaks exhibited a distinct peak around TSS (Fig. 4F), suggesting the potential of transcriptional regulation. The motifs enrichment analysis found that G-rich sequences are highly enriched in these overlapping peaks, similar to those in HEK293 cells (Fig. 4G). Representative peaks were found in several key regulatory genes of mESCs (Fig. 4H).
Characterization of native co-localized G4s and R-loops in mESCs
Similar to HEK293 cells, the RNA levels of co-localized G4s and R-loops-associated genes were seen to be significantly higher in mESCs (Fig. 5A). However, unlike in HEK293 cells (Fig. 3B), the overlapping peaks in mESCs are mainly localized in the proximal promoters (1kb from TSS, 9063 peaks with FPKM >=5) and the region 5-50 kb from the TSS of transcriptionally active genes (32690 peaks with FPKM >=5) (Fig. 5B), suggesting that co-localized G4s and R-loops are possibly distributed in active promoters or enhancers. To test this idea, we analyzed the co-localization of G4, R-loop, multiple chromatin markers and RNA polymerase II with the phosphorylated serine 5 at its CTD domain (RNAP) that marks the transcriptionally initiated RNA Polymerase II (Hsin and Manley, 2012). As a result, co-localized G4s and R-loops were observed to well overlap with active chromatin markers (H3K4me3, H3K27ac, H3K36me3, H3K4me1) and RNAP but not the repressed chromatin marker H3K27me3 (Fig. 5C).
Extensive studies define promoters into active, bivalent and repressed states based on patterns of H3K4me3 and H3K27me3 (Fig. 5D); enhancers are defined as active, poised and unmarked states based on patterns of H3K27ac and H3K4me1 (Fig. 5D) (Atlasi and Stunnenberg, 2017; Bernstein et al., 2006; Bibikova et al., 2008; Calo and Wysocka, 2013; Heintzman et al., 2007). Notably, 20,741 and 19,726 co-localized G4s and R-loops are found in promoters and enhancers, respectively; 18,496 peaks are seen in active promoters; 15,787 peaks are seen in active enhancers (Fig. 5D, Supplementary Table 2). The co-localized G4s and R-loops in active promoters show high and almost equal signals of G4s and R-loops, and enrich H3K4me3, H3K27ac and RNAP (Fig. 5E). The co-localization of G4s, R-loops and RNAP at active promoters suggests that co-localized G4s and R-loops are likely linked to promoter-associated nascent RNAs (Core et al., 2008; Li and Fu, 2019; Preker et al., 2008; Seila et al., 2008). Interestingly, a medium level of H3K4me1 is present in a bimodal pattern beside co-localized G4s and R-loops at active promoters (Fig. 5E). The co-localized G4s and R-loops in bivalent promoters exhibit low signals of both G4s and R-loops, and overlap with a high level of H3K27me3, a medium level of H3K4me1, and a low level of H3K4m3 (Fig. 5E). The co-localized G4s and R-loops in active enhancers exhibit sharp peaks and well overlap with H3K27ac, H3K4me1, H3K4me3 and RNAP; the co-localized G4s and R-loops in poised enhancers enrich the active histone marks H3K4me1 and H3K4me3, and the repressive mark H3K27me3 while the signal of RNAP is very low (Fig. 5F). The unmarked enhancers-associated co-localized G4s and R-loops show low levels of all marks tested. Given that enhancer RNAs (eRNAs) have been widely identified as non-coding RNAs in enhancers and are functionally important for enhancer activity (Andersson et al., 2014; Kim et al., 2010; Sigova et al., 2015), the co-occupacy of eRNAs are likely involved in the formation of co-localized G4s and R-loops in enhancers and shed light on the new regulatory mechanism of eRNA action.
Modulation of co-localized G4s and R-loops by the helicase Dhx9
Dhx9 (also known as RNA Helicase A) is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation by unwinding secondary structures in the nascent RNA strand (Chakraborty and Grosse, 2011; Chakraborty et al., 2018; Cristini et al., 2018; Matsui et al., 2020; Tang et al., 2022; Yuan et al., 2021), and has been reported to play roles in DNA replication, transcription, translation, RNA processing and transport and maintenance of genomic stability (Aktas et al., 2017; Aratani et al., 2001; Chellini et al., 2022; Jain et al., 2013; Tang et al., 2022). Thus, Dhx9 is a promising regulator of co-localized G4s and R-loops.
To investigate the role of Dhx9 in modulating co-localized G4s and R-loops, we generated the Dhx9 knockout mESCs (dhx9KO) by CRISPR/Cas9-medidated gene editing (Ran et al., 2013). The depletion of Dhx9 in the dhx9KO mESC clone was confirmed by western blot assay (Fig. 6A) and immunofluorescence staining (Fig. 6B). We next examined the G4 an R-loop levels in the dhx9KO mESCs by performing HepG4-seq and HBD-seq. Notably, compared to the wildtype mESCs, a large amount of G4s or R-loops within co-localized G4s and R-loops exhibited significantly up-regulated or down-regulated signals in the dhx9KO mESCs (Fig. 6C, Supplementary Fig. 3A, Supplementary Table 2), suggesting that Dhx9 could unwind or promote the formation of co-localized G4s and R-loops in mESCs. Interestingly, only small proportion of co-localized G4s and R-loops displayed differential G4s and R-loops at the same time in the dhx9KO mESCs (Fig. 6D, Supplementary Fig. 3B), suggesting that Dhx9 cannot simultaneously unwind or promote G4s and R-loops within co-localized G4 and R-loop regions and that multiple helicases or regulators are required for modulating these regions.
Given that co-localized G4s and R-loops have been shown to be enriched in active promoters and enhancers (Fig. 5D-F), the loss-of-Dhx9-induced differential co-localized G4s and R-loops preferentially localize in the active and bivalent promoters and all three types of enhancers (Fig. 6E). To explore the effect of Dhx9 on the transcription of co-localized G4s and R-loops -associated genes, we performed the RNA-seq on wild-type and dhx9KOmESCs. Differential gene expression analysis revealed 1647 significantly up-regulated genes and 1916 significantly down-regulated genes in absence of Dhx9 (Supplementary Fig. 3C-D). Importantly, loss of Dhx9 resulted in hundreds of G4-, R-loop-, and G4&R-loop-associated genes with significantly differential expression (Fig. 6F), suggesting that Dhx9 could regulate transcription by modulating G4s, R-loops and co-localized G4s and R-loops. Representative Dhx9-regulated locus are shown in Fig. 6G. GO analysis showed that co-localized G4s and R-loops-associated genes that showed differential expression after knocking out Dhx9 are mainly involved in negative regulation of cell differentiation, head development, positive regulation of cell motility, DNA-binding transcription activator activity, pattern specification process, embryonic organ morphogenesis, et al., suggesting that Dhx9 may regulate the cell fate of mESCs by modulating co-localized G4s and R-loops (Fig. 6H). Coinciding with GO analysis, Dhx9 knockout in the mouse causes embryonic lethality and Dhx9 knockdown leads to large structural changes in chromatin and eventually cell death (He et al., 2008; Zhang et al., 2004). Heterozygous loss-of-function variants of DHX9 are associated with neurodevelopmental disorders in human (Yamada et al., 2023).
Characterization of co-localized G4s and R-loops directly bound by Dhx9
Tens of helicases or regulators have been reported to directly resolve or stabilize G4s or R-loops (Mendoza et al., 2016; Varshney et al., 2020; Yang et al., 2023). Interestingly, loss of Dhx9 caused 30 of these helicases/regulators to be significantly differentially expressed (Fig. 7A) and Dhx9 physically interacts with at least ten of them based on the STRING protein-protein interaction network database (Szklarczyk et al., 2023) (Supplementary Fig. 3E). These data suggest that Dhx9 could also indirectly modulate G4s and R-loops by affecting other helicases or regulators. Thus, to explore the direct target co-localized G4s and R-loops of Dhx9, we performed the CUT&Tag-seq using the Dhx9 antibody (Kaya-Okur et al., 2019) and revealed 54,982 Dhx9 binding peaks in wild-type mESCs (Fig. 7B). Notably, 65.5% Dhx9 binding peaks well overlapped with 69.9% co-localized G4s and R-loops in mESCs (Fig. 7C-D), suggesting that Dhx9 is a direct and major regulator of co-localized G4s and R-loops in mESCs. Motif analysis showed that G-rich sequences are highly enriched in the Dhx9 binding peaks overlapping with co-localized G4s and R-loops in mESCs (Fig. 7E), further demonstrating that Dhx9 directly bind to co-localized G4s and R-loops that harbor G-rich sequences as shown in Fig. 4G.
We next compared the Dhx9-bound co-localized G4s and R-loops in wild-type and dhx9KO mESCs, and identified 1,382 significantly up-regulated peaks (823 with increased G4s and 559 with increased R-loops) and 4,789 significantly down-regulated peaks (2,278 with decreased G4s and 2,511 with decreased R-loops) (Fig. 7F, Supplementary Fig. 3F), accounting for ∼50-75% of differential co-localized G4s and R-loops in absence of Dhx9 (Fig. 6C). Analysis of the genomic distribution of Dhx9-bound differential co-localized G4s and R-loops found that these peaks are mainly localize in the active and bivalent promoters and all three types of enhancers (Fig. 7G). The Dhx9-bound differential co-localized G4s and R-loops were linked to 852 genes with significantly differential expression (Supplementary Fig. 3G, Supplementary table 2), which are enriched in GO terms related to pattern specification, cell junction organization, brain development, negative regulation of cell differentiation, mesenchyme/mesoderm development, embryonic morphogenesis, et al. (Fig. 7H). Several key regulators of mouse embryonic stem cell and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, were shown to be transcriptionally regulated by Dhx9 through direct modulation of their associated G4s and R-loops (Fig. 7I). These data suggest that Dhx9 significantly contributes to transcriptional regulation of co-localized G4s and R-loops -associated genes.
Dhx9 regulates the cell fate of mESCs
To understand the role of Dhx9 in regulating the cell fate of mESCs, we first examined the RNA levels of several key genes that maintain the pluripotency of mESCs by quantitative RT-PCR (qRT-PCR) and found that the RNA levels of Lin28a and Oct4 were significantly decreased and the RNA level of Nanog was significantly increased when Dhx9 was knocked out (Fig. 8A). Western blot assay showed that dhx9KO mESCs produce obviously lower level of Lin28a protein than wild-type mESCs, consistent with its RNA level, whereas in contrast to the RNA level, the protein level of Nanog was significantly decreased in dhx9KO mESCs, suggesting that Dhx9 directly or indirectly modulates the translation of Nanog (Fig. 8B). In line with the western blot assay, immunofluorescence staining of WT and dhx9KOmESCs showed that the loss of Dhx9 leads to reduced protein level of Nanog, but not Oct4, while dhx9KO mESCs exhibited normal morphology (Fig. 8C). The mESCs can be maintained in a proliferative state for prolonged periods, which was known as “self-renewal” (Liang and Zhang, 2013; Murry and Keller, 2008). Nanog and Lin28a have been reported to promote embryonic stem self-renewal (Chambers et al., 2003; Mitsui et al., 2003; Xu et al., 2009). Coinciding with reduced levels of Nanog and Lin28a proteins, dhx9KO mESCs was shown to modestly arrest at S phase of the cell-cycle (Fig. 8D) and exhibited significantly attenuated proliferation capacity (Fig. 8E), suggesting that Dhx9 regulates the self-renewal of mESCs.
mESCs are pluripotent stem cells which are able to differentiate into three germ lineages (Murry and Keller, 2008; Young, 2011). In the absence of differentiation inhibitor LIF, mESCs cultured in suspension spontaneously form three-dimensional aggregates called embryoid bodies (EBs), which could recapitulate many aspects of early embryogenesis, including the induction of three early germ lineages (Simunovic and Brivanlou, 2017). To understand role of Dhx9 in regulating the pluripotency of mESCs, we performed the EB assay using the wild-type and dhx9KO mESCs. As the EB differentiation progressed, loss of Dhx9 resulted in apparently smaller and fewer EBs than wild-type cells (Fig. 8F). At the same time, we collected EBs at different days of EB differentiation and examined the RNA levels of well-known markers of three germ lineages by qRT-PCR. As shown in Fig. 8G, all maker genes tested displayed significantly differential expression, suggesting that Dhx9 regulates the pluripotency of mESCs, which is in line with the GO enrichment results of Dhx9-regulated co-localized G4s and R-loops-associated genes in Fig. 7H. Taken together, Dhx9 regulates the self-renewal and differentiation capacities of mESCs.
Discussion
In this study, we developed the new method “HepG4-seq” and optimized the RNase H1 HBD domain-based HBD-seq to robustly map endogenous G4s and R-loops, respectively, in living cells with high specificity. Using the HepG4-seq and HBD-seq, we systematically characterized the native co-localized G4s and R-loops in HEK293 cells and mESCs, and revealed that co-localized G4s and R-loops are dynamically altered in a cell type-dependent way and largely localized at active promoters and enhancers of transcriptional active genes. Small molecules-induced inhibition of helicases BLM or WRN resulted in significant accumulation of G4s within co-localized G4s and R-loops and at the same time leaded to genes with significantly differential expression in HEK293 cells that are enriched in the processes related to cell cycle, DNA metabolic, DNA damage response, chromatin binding, et al. Furthermore, we characterized the helicase Dhx9 as a key regulator of co-localized G4s and R-loops which efficiently unwinds or promotes co-localized G4s and R-loops, and illustrated that depletion of Dhx9 significantly altered the transcription of co-localized G4s and R-loops-associated genes that are enriched in embryonic development, cell differentiation and germ lineage development, et al. Therefore, loss of Dhx9 apparently impaired the self-renewal and pluripotency of mESCs.
In this study, we utilized a low dosage of hemin, similar to the physiological concentration in normal human erythrocytes, to spark the peroxidase activity of endogenous G4s without significantly altering the levels of native G4s (Supplementary Fig. 1B-C) and further robustly biotinylated G4s themselves by G4-hermin complex-mediated proximity labeling in just one minute in living cells (Cheng et al., 2009; Einarson and Sen, 2017; Lat et al., 2020; Li et al., 2016; Stadlbauer et al., 2021; Yang et al., 2011). In consideration of the high affinity and specificity, the recombinant streptavidin monomer (Lim et al., 2013) is able to recognize the biotinylated G4s with high sensitivity and specificity and thereby yield robust CUT&Tag signals with the help of Moon-tag system (Boersma et al., 2019). Therefore, our HepG4-seq strategy is able to robustly and specifically capture native G4s. In HEK293 cells, HepG4-seq uncovered 6,799 consensus G4 peaks under wild-type status and 77,003 G4 peaks in the present of BLM/WRN inhibitors, suggesting that HepG4-seq is capable of detecting endogenous G4s with high sensitivity.
Notably, we also discovered that the native co-localized G4s and R-loops landscape is altered in a cell-dependent manner and that approximately 10 folds more peaks were observed in mESCs than in HEK293 cells, which reflects that co-localized G4s and R-loops have the potential to regulate the complex pluripotency network in mESCs. CTCF is a key regulator of genome organization and gene expression (Ong and Corces, 2014). Recently, Wulfridge et al. reported that CTCF-bound regions are enriched for both R-loops and G4s and G4s associated with R-loops promote CTCF binding (Wulfridge et al., 2023). Interestingly, the enriched motif with the most significant p-value in mESCs co-localized G4s and R-loops (Fig. 4G) well matches the motif of CTCF ChIP-seq, suggesting that co-localized G4s and R-loops may be able to modulate the CTCF binding.
Furthermore, while 47,857 co-localized G4s and R-loops are directly bound by Dhx9 in the wild-type mESCs (Supplementary Table 2), only 4,060 of them display significantly differential signals in absence of Dhx9, suggesting that redundant regulators exist. It is worth noting that depletion of Dhx9 significantly altered the transcription of 30 known G4s and/or R-loops helicases/regulators (Fig. 7A) and that half of these helicases/regulators are able to establish physical interaction network (Supplementary Fig. 3E). The multi-faceted pathways of modulating co-localized G4s and R-loops formation and resolution and the pivot of the complex regulatory network remain to be elucidated.
Taken together, our study provides new insights into exploring regulatory roles of co-localized G4s and R-loops in development and disease.
Data availability
The HepG4-seq, HBD-seq, BG4-seq, Dhx9 CUT&Tag and RNA-seq data have been deposited to the Gene Expression Omnibus (accession code GSE254764 and GSE254763). The ChIP-seq data of histone markers and RNAP are openly available in GNomEx database (accession number 44R) (Wamstad et al., 2012).
Acknowledgements
Z.X. is supported by the National Key Research and Development Program of China, Stem Cell and Translational Research (2018YFA0109200) and National Natural Science Foundation of China (General Program No. 31970600).
References
- Targeting an Achilles’ heel of cancer with a WRN helicase inhibitorCell Cycle 12:3329–3335
- Werner syndrome helicase has a critical role in DNA damage responses in the absence of a functional fanconi anemia pathwayCancer Res 73:5497–5507
- R loops: from transcription byproducts to threats to genome stabilityMol Cell 46:115–124
- The free heme concentration in healthy human erythrocytesBlood Cells Mol Dis 55:402–409
- DHX9 suppresses RNA processing defects originating from the Alu invasion of the human genomeNature 544:115–119
- An atlas of active enhancers across human cell types and tissuesNature 507:455–461
- Dual roles of RNA helicase A in CREB-dependent transcriptionMol Cell Biol 21:4460–4469
- The interplay of epigenetic marks during stem cell differentiation and developmentNat Rev Genet 18:643–658
- A bivalent chromatin structure marks key developmental genes in embryonic stem cellsCell 125:315–326
- Unraveling epigenetic regulation in embryonic stem cellsCell Stem Cell 2:123–134
- Multi-Color Single-Molecule Imaging Uncovers Extensive Heterogeneity in mRNA DecodingCell 178:458–472
- Modification of enhancer chromatin: what, how, and why?Mol Cell 49:825–837
- Human DHX9 helicase preferentially unwinds RNA-containing displacement loops (R-loops) and G-quadruplexesDNA Repair (Amst 10:654–665
- DHX9 helicase promotes R-loop formation in cells with impaired RNA splicingNat Commun 9
- Functional expression cloning of Nanog, a pluripotency sustaining factor in embryonic stem cellsCell 113:643–655
- The DNA/RNA helicase DHX9 contributes to the transcriptional program of the androgen receptor in prostate cancerJ Exp Clin Cancer Res 41
- R-ChIP Using Inactive RNase H Reveals Dynamic Coupling of R-loops with Transcriptional Pausing at Gene PromotersMol Cell 68:745–757
- General peroxidase activity of G-quadruplex-hemin complexes and its application in ligand screeningBiochemistry 48:7817–7823
- Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promotersScience 322:1845–1848
- RNA/DNA Hybrid Interactome Identifies DXH9 as a Molecular Player in Transcriptional Termination and R-Loop-Associated DNA DamageCell reports 23:1891–1905
- DNA damage and genome instability by G-quadruplex ligands are mediated by R loops in human cancer cellsProceedings of the National Academy of Sciences of the United States of America 116:816–825
- Expression and Purification of Large Active GST Fusion EnzymesProtein Downstream Processing: Design, Development and Application of High and Low-Resolution Methods Totowa, NJ: Humana Press :169–180
- Self-biotinylation of DNA G-quadruplexes via intrinsic peroxidase activityNucleic Acids Research 45:9813–9822
- Human werner syndrome DNA helicase unwinds tetrahelical structures of the fragile X syndrome repeat sequence d(CGG)nJ Biol Chem 274:12797–12802
- DNA G-Quadruplex Recognition In Vitro and in Live Cells by a Structure-Specific NanobodyJ Am Chem Soc 144:23096–23103
- R Loops: From Physiological to Pathological RolesCell 179:604–618
- GC skew at the 5’ and 3’ ends of human genes links R-loop formation to epigenetic regulation and transcription terminationGenome Res 23:1590–1600
- R-Loop Formation Is a Distinctive Characteristic of Unmethylated Human CpG Island PromotersMolecular Cell 45:814–825
- G-quadruplexes Sequester Free Heme in Living CellsCell chemical biology 26:1681–1691
- G-quadruplex structures mark human regulatory chromatinNature genetics 48:1267–1272
- The Affinity of the S9.6 Antibody for Double-Stranded RNAs Impacts the Accurate Mapping of R-Loops in Fission YeastJ Mol Biol 430:272–284
- Comparisons of RNAi approaches for validation of human RNA helicase A as an essential factor in hepatitis C virus replicationJ Virol Methods 154:216–219
- Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genomeNature genetics 39:311–318
- Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identitiesMol Cell 38:576–589
- pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics (OxfordEngland 33:3373–3379
- The RNA polymerase II CTD coordinates transcription and RNA processingGenes Dev 26:2119–2137
- DHX9 helicase is involved in preventing genomic instability induced by alternatively structured DNA in human cellsNucleic Acids Res 41:10345–10357
- Genome-wide map of R-loops reveals its interplay with transcription and genome integrity during germ cell meiosisJ Adv Res 51:45–57
- CUT&Tag for efficient epigenomic profiling of small samples and single cellsNature Communications 10
- Fusion tags to enhance heterologous protein expressionApplied Microbiology and Biotechnology 104:2411–2425
- Widespread transcription at neuronal activity-regulated enhancersNature 465:182–187
- The monoclonal S9.6 antibody exhibits highly variable binding affinities towards different R-loop sequencesPLoS One 12
- The interplay of RNA:DNA hybrid structure and G-quadruplexes determines the outcome of R-loop-replisome collisionseLife 10
- High specificity and tight spatial restriction of self-biotinylation by DNA and RNA G-Quadruplexes complexed in vitro and in vivo with HemeNucleic Acids Res 48:5254–5267
- R-loop induced G-quadruplex in non-template promotes transcription by successive R-loop formationNat Commun 11
- Insight into G-quadruplex-hemin DNAzyme/RNAzyme: adjacent adenine as the intramolecular species for remarkable enhancement of enzymatic activityNucleic Acids Research 44:7373–7384
- Chromatin-associated RNAs as facilitators of functional genomic interactionsNat Rev Genet 20:503–519
- Embryonic stem cell and induced pluripotent stem cell: an epigenetic perspectiveCell Res 23:49–69
- Single-molecule fluorescence studies on cotranscriptional G-quadruplex formation coupled with R-loop formationNucleic Acids Res 48:9195–9203
- Stable, high-affinity streptavidin monomer for protein labeling and monovalent biotin detectionBiotechnol Bioeng 110:57–67
- Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2Genome Biol 15
- Genome-wide mapping of G-quadruplex structures with CUT&TagNucleic Acids Res
- Detection of alternative DNA structures and its implications for human diseaseMol Cell 83:3622–3641
- USP42 enhances homologous recombination repair by promoting R-loop resolution with a DNA-RNA helicase DHX9Oncogenesis 9
- G-quadruplexes and helicasesNucleic Acids Res 44:1989–2006
- G-quadruplex-R-loop interactions and the mechanism of anticancer G-quadruplex bindersNucleic Acids Res 48:11942–11957
- The homeoprotein Nanog is required for maintenance of pluripotency in mouse epiblast and ES cellsCell 113:631–642
- The Bloom’s and Werner’s syndrome proteins are DNA structure-specific helicasesNucleic Acids Res 29:2843–2849
- Differentiation of embryonic stem cells to clinically relevant populations: lessons from embryonic developmentCell 132:661–680
- A small molecule inhibitor of the BLM helicase modulates chromosome stability in human cellsChem Biol 20:55–62
- Specific recognition of RNA/DNA hybrid and enhancement of human RNase H1 activity by HBDEMBO J 27:1172–1181
- CTCF: an architectural protein bridging genome topology and functionNat Rev Genet 15:234–246
- Magnesium-dependent supercoiling-induced transition in (dG)n(dC)n stretches and formation of a new G-structure by (dG)n strand. Nucleic Acids Res 17:8257–8271
- The sub-nanomolar binding of DNA-RNA hybrids by the single-chain Fv fragment of antibody S9.6J Mol Recognit 26:376–381
- RNA exosome depletion reveals transcription upstream of active human promotersScience 322:1851–1854
- Genome engineering using the CRISPR-Cas9 systemNat Protoc 8:2281–2308
- DNA G-quadruplexes in the human genome: detection, functions and therapeutic potentialNat Rev Mol Cell Biol 18
- Divergent transcription from active promotersScience 322:1849–1851
- Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosisNature 334:364–366
- Transcription factor trapping by RNA in gene regulatory elementsScience 350:978–981
- Embryoids, organoids and gastruloids: new approaches to understanding embryogenesisDevelopment 144:976–985
- Insights into G-Quadruplex–Hemin Dynamics Using Atomistic Simulations: Implications for Reactivity and FoldingJournal of Chemical Theory and Computation 17:1883–1899
- Telomeric DNA dimerizes by formation of guanine tetrads between hairpin loopsNature 342:825–829
- The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interestNucleic Acids Res 51:D638–D646
- Resolution of ROS-induced G-quadruplexes and R-loops at transcriptionally active sites is dependent on BLM helicaseFEBS Lett 594:1359–1367
- Polymerase eta Recruits DHX9 Helicase to Promote Replication across Guanine Quadruplex StructuresJ Am Chem Soc 144:14016–14020
- The regulation and functions of DNA and RNA G-quadruplexesNat Rev Mol Cell Biol 21:459–474
- Dynamic and Coordinated Epigenetic Regulation of Developmental Transitions in the Cardiac LineageCell 151:206–220
- Genomic profiling of native R loops with a DNA-RNA hybrid recognition sensorSci Adv 7
- Monovalent cation-induced structure of telomeric DNA: the G-quartet modelCell 59:871–880
- G-quadruplexes associated with R-loops promote CTCF bindingMol Cell
- RNA-DNA hybrid formation at the human mitochondrial heavy-strand origin ceases at replication start sites: an implication for RNA-DNA hybrids serving as primersEMBO J 15:3135–3143
- Lin28 modulates cell growth and associates with a subset of cell cycle regulator mRNAs in mouse embryonic stem cellsRNA 15:357–361
- Heterozygous loss-of-function DHX9 variants are associated with neurodevelopmental disorders: Human genetic and experimental evidencesEur J Med Genet 66
- Helicases in R-loop Formation and ResolutionJ Biol Chem 299
- Characterization of G-quadruplex/hemin peroxidase: substrate specificity and inactivation kineticsChemistry 17:14475–14484
- Control of the embryonic stem cell stateCell 144:940–954
- TDRD3 promotes DHX9 chromatin recruitment and R-loop resolutionNucleic Acids Res 49:8573–8591
- DNA-dependent protein kinase (DNA-PK) phosphorylates nuclear DNA helicase II/RNA helicase A and hnRNP proteins in an RNA-dependent mannerNucleic Acids Res 32:1–10
- Expanding APEX2 Substrates for Proximity-Dependent Labeling of Nucleic Acids and Proteins in Living CellsAngewandte Chemie 58:11763–11767
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Copyright
© 2024, Liu et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 990
- downloads
- 67
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.