Human immunodeficiency virus-1 induces and targets host genomic R-loops for viral genome integration

Kiwon Park; Dohoon Lee; Jiseok Jeong; Sungwon Lee; Sun Kim; Kwangseog Ahn

doi:10.7554/eLife.97348.1

eLife assessment

Based on largely indirect evidence, this study proposes that genomic integration of HIV targets DNA/RNA hybrids called R-loops. The evidence is indirect because the authors do not use relevant models systems to show integration and because they artificially induce R-loops in the critical experiments. There are two interrelated findings: 1) VSVg-pseudotyped HIV-1 induces R-loops in various cell types, and 2) VSVg-pseudotyped HIV-1 targets R-loops for integration in an artificial Hela cell model in which R-loops are exogenously induced. The induction of R-loops by a pseudotyped HIV-1 is a potentially valuable finding. Critically, however, because of the caveats above, the evidence is inadequate to support the primary claims in the title, abstract, and manuscript. Furthermore, if these claims were true, the authors do not provide context for how they could be reconciled with well-established structural data showing that HIV-1 integrase catalyzes the integration of viral DNA into dsDNA as a substrate.

https://doi.org/10.7554/eLife.97348.1.sa3

Significance of findings

valuable: Findings that have theoretical or practical implications for a subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

inadequate: Methods, data and analyses do not support the primary claims

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Although HIV-1 integration sites are considered to favor active transcription units in the human genome, high-resolution analysis of individual HIV-1 integration sites have shown that the virus can integrate in a variety of host genomic locations, including non-genic regions. The invisible infection by HIV-1 integrating into non-genic regions challenging the traditional understanding of HIV-1 integration site selection are rather more problematic as they are selected to preserve in the host genome during prolonged antiretroviral therapies. Here, we showed that HIV-1 targets R-loops, a genomic structure made up of DNA–RNA hybrids, for integration. HIV-1 initiates the formation of R-loops in both genic and non-genic regions of the host genome and preferentially integrates into R-loop-rich regions. Using a cell model that can independently control transcriptional activity and R-loop formation, we demonstrated that the formation of R-loops directs HIV-1 integration targeting sites. We also found that HIV-1 integrase proteins physically bind to the host genomic R-loops. These findings provide fundamental insights into the mechanisms of retroviral integration and the new strategies of antiretroviral therapy against HIV-1 latent infection.

Introduction

Retroviruses cause permanent infection in the host by integrating their reverse-transcribed viral genome into the host genome. Retroviral integration considerably impacts a wide range of biological phenomena, including the persistence of fatal human diseases and the shaping of metazoan evolution (1). Human immunodeficiency virus (HIV)-1 is a representative retrovirus that underlies the global burden of acquired immune deficiency syndrome (AIDS) (2). The chromosomal landscape of HIV-1 integration plays a critical role in proviral gene expression, persistence of integrated proviruses, and prognosis of antiretroviral therapy (3–5). Integration into the host genome is not random and displays distinct preferences for gene-dense regions, where active transcription occurs (6), by interacting host factors such as transcription activators, epigenetic marker binding proteins and super enhancers (7–13). However, transcription activity is not the sole determinant of the HIV-1 integration site landscape (10). For instance, the most favored region of HIV-1 integration is an intergenic locus, and despite the lower probability of integration, HIV-1 proviruses are observed in non-genic regions in the genomes of infected individuals (4, 6). This indicates the possibility of there being an undiscovered mechanism or determinant that composes the correct genomic environment for HIV-1 integration.

An R-loop is a three-stranded nucleic acid structure that comprises a DNA–RNA hybrid and displaced strand of DNA, and has long been considered a transcription byproduct (14, 15). R-loops in cellular genomes are enriched in actively transcribed genes as they occur naturally during transcription (14, 16), but R-loop formation is not limited to gene body regions and is widespread in the genome (14). As a result of in trans R-loop formation, R-loops are also abundant in non-genic regions, such as intergenic regions, repetitive sequences, including transposable elements, centromeres, or telomeres (14, 17–19), independently of transcription activity of the genes harboring the R-loops. Although R-loops are identified as critical intermediates and regulators in a number of biological processes (14, 15, 20), the dynamics and the role played by cellular R-loops in pathological contexts remain unrevealed.

R-loops are important contributors molding the genomic environment and spatial organization of the cellular genome, and can potentially take a novel role in host-pathogen interaction. In the cellular genome, R-loops relieve superhelical stresses and are often associated with open chromatin marks and active enhancers (21, 22), which are also distributed over HIV-1 integration sites (6, 9, 10). In the case of transcription-induced R-loop formation, a guanine-quadruplex (G4) structure can be generated in the non-template DNA strand of the R-loop (23). A recent study has shown that G4 DNA can influence both productive and latent HIV-1 integration (24). In addition, R-loops are prevalent non-canonical B-form DNA structures (25) and intermediates between B-form DNA and A-form RNA conformation (26), which have recently been disclosed to be the conformational characteristics of the target DNA during retroviral integration (26, 27). The accumulated evidence implicates that host genomic R-loops are undiscovered host factor in HIV-1 integration site selection mechanism, which dynamically interact with the host genomic environment.

Here, we showed a notable role of R-loops in the interaction between HIV-1 and its host, specifically in HIV-1 integration. HIV-1-infection induces host cellular R-loop formation and the R-loop rich regions of the host genome are preferred by HIV-1 integration. HIV-1 integrase proteins showed considerable binding affinity to nucleic acid substrate comprising R-loop structures. Our results suggest that R-loops are an important composer of host genomic environment for HIV-1 integration site determination.

Results

Host genomic R-loops accumulate by HIV-infection

To investigate the relationship between HIV-1 infection and host cellular R-loops, we first analyzed R-loop dynamics in different types of cells infected with HIV-1 at early post-infection time points using DNA–RNA immunoprecipitation followed by cDNA conversion coupled to high-throughput sequencing (DRIPc-seq) using a DNA–RNA hybrid-specific binding antibody, anti-S9.6 (28). HeLa cells, primary CD4⁺ T cells isolated from two individual donors and CD4⁺/CD8^- T cell lymphoma Jurkat cell line were infected with VSV-G-pseudotyped HIV-1-EGFP and harvested at 0, 3, 6, and 12 h post infection (hpi) for DRIPc-seq library construction (Fig. 1A and S1A-C Fig.). Our DRIPc-seq analysis yielded loci specific R-loop signals at the referenced R-loop-positive loci (RPL13A and CALM3) and an R-loop-negative locus (SNRPN) (28) that were both strand-specific and highly sensitive to pre-immunoprecipitation in vitro RNase H treatment, in HeLa cells, CD4⁺ and Jurkat T cells (Table S1-3). Notably, the number of DRIPc-seq peaks mapped to the human reference genome increased remarkably during early post infection of HIV-1 (3 and 6 hpi for HeLa cells and 6 and 12 hpi for CD4⁺ and Jurkat T cells; Fig. 1B). Most of the peaks mapped in cells harvested at 0 hpi were commonly found in all other samples, but a significant numbers of unique peaks were observed after infection (Fig. 1C).

HIV-1 infection induces genomic R-loop accumulation in cells at early post-infection.
(A) Summary of experimental design for DRIPc-seq in HeLa cells, primary CD4⁺ T cells and Jurkat cells infected with HIV-1. (B) Bar graphs indicating DRIPc-seq peak counts for HIV-1-infected HeLa cells, primary CD4⁺ T cells and Jurkat cells harvested at the indicated hours post infection (hpi). Pre-immunoprecipitated samples were untreated (−) or treated (+) with RNase H, as indicated. Each bar corresponds to pooled datasets from two biologically independent experiments. (C) All genomic loci overlapping a DRIPc-seq peak from HIV-1 infected HeLa cells, primary CD4⁺ T cells and Jurkat cells in at least one sample are stacked vertically; the position of each peak in a stack is constant horizontally across samples. Each hpi occupies a vertical bar, as indicated. Each bar corresponds to pooled datasets from two biologically independent experiments. Common peaks for all samples are represented in black, and in dark gray for those common for at least two samples. The lack of a DRIP signal over a given peak in any sample is shown in light gray. The sample-unique peaks are colored blue, yellow, green, and red at 0, 3, 6, and 12 hpi, respectively. (D) Dot blot analysis of the R-loop in gDNA extracts from HIV-1 infected HeLa cells with MOI of 0.6 harvested at the indicated hpi. gDNAs were probed with anti-S9.6. gDNA extracts were incubated with or without RNase H in vitro before membrane loading (anti-RNA/DNA signal). Fold-induction was normalized to the value of harvested cells at 0 hpi by quantifying the dot intensity of the blots and calculating the ratios of the S9.6 signal to the total amount of gDNA (anti-ssDNA signal). (E) Representative images of the immunofluorescence assay of S9.6 nuclear signals in HIV-1 infected HeLa cells with MOI of 0.6 harvested at 6 hpi. The cells were pre-extracted of cytoplasm and co-stained with anti-S9.6 (red), anti-nucleolin antibodies (green), and DAPI (blue). The cells were incubated with or without RNase H in vitro before staining with anti-S9.6 antibodies, as indicated. Quantification of S9.6 signal intensity per nucleus after nucleolar signal subtraction for the immunofluorescence assay. The mean value for each data point is indicated by the red line. Statistical significance was assessed using one-way ANOVA (n >53).

In addition to our DRIPc-seq data analysis, we used different biochemical approaches to examine R-loop accumulation after HIV-1 infection in HeLa cells. First, R-loop accumulation in HIV-1-infected cells was observed using DNA–RNA hybrid dot blots with the anti-S9.6 antibodies (Fig. 1D). The dot intensity increased significantly upon HIV-1 infection at 6 hpi and the enhanced R-loop signals on dot blots of HIV-1-infected cells were highly sensitive to in vitro treatment with RNase H (Fig. 1D). This result was highly consistent with our DRIPc-seq data analysis results in HIV-1-infected HeLa cells.

Subsequently, we observed HIV-1-induced R-loops using an immunofluorescence assay by probing HIV-1-infected or non-infected control cells with S9.6 antibody at 6 hpi (Fig. 1E, left). The nuclear fluorescence signal associated with the R-loops after subtracting the nucleolar signal was significantly enhanced in cells infected with HIV-1 (Fig. 1E, right). We validated and quantified HIV-1-infection induced R-loop formation on the host genome in a genome-site specific manner by using DRIP followed by real-time polymerase chain reaction (DRIP-qPCR). In this experiment, the S9.6 signal was determined for three and two HIV-1-induced-R-loop-positive (P1, P2, and P3) and -negative regions (N1 and N2), respectively, where were defined by DRIPc-seq data analysis (S2A-E Fig.). We detected significantly increased R-loop signals that are highly sensitive to RNase H treatment of pre-immunoprecipitates in the P1, P2, and P3 regions of HIV-1-infected cells at 6 hpi compared to those in the cells harvested at 0 hpi (S3A Fig.). However, the HIV-1-induced R-loop-negative regions, N1 and N2, did not show significant R-loop accumulations (S3A Fig.).

Importantly, the R-loop signal was enriched even in cells infected with HIV-1 when the reverse transcription or integration of HIV-1 is blocked by enzyme inhibitors like Nevirapine (NVP) or Raltegravir (RAL), respectively (S3B and S3C Fig.). This result indicates that the enrichment of R-loop signals in cells are originated from the host genome but not by DNA-RNA hybrid formation during the viral life cycle or transcriptional burst from integrated HIV-1 proviruses. In addition, we confirmed that nearly 100% of DRIPc-seq reads from HIV-1-infected HeLa, CD4⁺ and Jurkat T cells were aligned to the host cellular genome, but not on that of HIV-1 (S3D Fig.). Together, these data demonstrate that HIV-1 infection induced host genomic R-loop formation at early post-infection.

R-loops accumulation after HIV-1 infection are widely distributed in both genic and non-genic regions

To investigate the distribution of cellular genomic R-loops during HIV-1 infection, we conducted a genome-wide analysis of our DRIPc-seq data. The unique DRIPc-seq peaks observed after HIV-1 infection were not only numerous but also relative longer (Fig. 2A).

HIV-1-induced R-loops are enriched at both transcriptionally active and silent regions.
(A) Distribution of DRIPc-seq peak lengths for HIV-1-infected HeLa cells, primary CD4⁺ T cells and Jurkat cells harvested at the indicated time points (blue, 0 hpi; yellow, 3 hpi; green, 6 hpi; red, 12 hpi). (B) Stacked bar graphs indicating the proportion of DRIPc-seq peaks mapped for HIV-1-infected HeLa cells, primary CD4⁺ T cells and Jurkat cells harvested at the indicated hpi over different genomic features. (C) Correlation between gene expression and DRIPc-seq signals of HIV-1-infected HeLa cells with MOI of 0.6 harvested at the indicated hpi. Statistical significance was assessed using Pearson’s r and p-values.

This suggests that R-loops induced by HIV-1 infection occupy a genomic region larger than that of the R-loops presents without HIV-1 infection. We observed a significant accumulation of R-loops over diverse genomic compartments at the hpi of HIV-1-infection induced R-loop formation (Fig. 2B). The presence of R-loops is often correlated with high transcriptional activity, and we found significantly high proportion of DRIPc-seq peaks enrichment upon HIV-1 infection in the gene body regions (Fig. 2B). However, we also observed enrichment of HIV-1-infection induced DRIPc-seq peaks proportion mapped to intergenic or repeat regions, including short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), and long terminal repeat (LTR) retrotransposons, where transcription is typically repressed (Fig. 2B). Although the expression of repetitive elements is mostly repressed during normal cellular activities, HIV-1 infection could activate endogenous retroviral promoters (29, 30). To investigate the possibility that R-loop induction in gene-silent regions is associated with transcriptome changes during HIV-1 infection, we performed RNA sequencing (RNA-seq) for HIV-1-infected HeLa cells at 0, 3, 6, and 12 hpi. Consistent with previous reports, we observed an increase in the expression levels of repetitive elements at later time points post-infection (S4A Fig.; 12 hpi). In contrast, we found that the expression levels of SINEs, LINEs, and LTRs were even lower at both 3 and 6 hpi compared to 0 hpi while HIV-1-induced R-loops were significantly accumulated, compared to 0 hpi (S4A Fig.). We further examined the expression profile of genes containing R-loop in HeLa cells. The expression profile of genes harboring HIV-1-induced R-loops in their gene bodies showed very weak correlations with the signals of DRIPc-seq peaks at 3 hpi (Pearson’s r = 0.21, P = 1.08 × 10^-84; Fig. 2C) and at 6 hpi samples (Pearson’s r = −0.34, P = 2.40 × 10^-228; Fig. 2C), which implies that the unique R-loop peaks upon HIV-1 infection do not engage with transcriptional burst. In agreement with our DRIPc-seq and global RNA-seq data analysis, the expression level of the genes harboring HIV-1-infection induced R-loops, which were quantified by DRIP-qPCR (S3A Fig.), were not significantly affected by HIV-1 infection (S4B Fig. and Table S4). Together, our data demonstrate that host cellular R-loop accumulation upon HIV-1 infection are widely distributed in both genic and non-genic regions and are not necessarily correlate with the expression levels of the genes harboring the R-loops.

HIV-1 integration sites are enriched at systemically induced sequence-specific R-loop regions in cell model

HIV-1 completes its infection by integrating its viral genome into the host’s through dynamic interaction with the host genome (31). Besides, as HIV-1 infection induced R-loop accumulation at early post infection hours when HIV-1 genome are imported into nucleus and integration may initiate (32–34), we hypothesized that host genomic R-loops play a role in HIV-1 integration, and possibly in integration site selection. To systemically and directly assess the relationship between host genomic R-loops and HIV-1 integration in a genome-site-specific manner, we adapted and modified an elegantly designed episomal system that induces sequence specific R-loops through DOX-inducible promoters (16). To most closely mimic the presence R-loop in host cellular genome, we subcloned the R-loop-forming portion of the mouse gene encoding AIRN (mAIRN) (17) or non-R-loop-forming ECFP sequence with a DOX-inducible promoter into the piggyBac transposon vector and co-expressed the piggyBac transposase in HeLa cells. These R-loop forming (mAIRN) or non-R-loop forming sequence (ECFP) are non-human sequences. Therefore, our cell model allows us to induce and quantify R-loop formation at designated genomic region and distinguish the R-loop formation from the endogenous R-loops on the cellular genome, which are not sequence-specific and impossible to control for induction. Moreover, by using this system we can quantify R-loop-dependent site-specific HIV-1 integration events at the designated regions, which can also be distinguished from HIV-1 integration event at endogenous host genomic loci. We designated the pool of cells with the R-loop forming sequence (mAIRN) inserted into its genome as “pgR-rich (piggyBac R-loop rich)” cell line and the pool of cells with the non-R-loop forming sequence (ECFP) inserted into its genome as “pgR-poor (piggyBac R-loop poor)” cell line (Fig. 3A).

R-loop inducible cell line model directly addresses R-loop-mediated HIV-1 integration site selection.
(A) Summary of the experimental design for R-loop inducible cell lines, pgR-poor and pgR-rich. (B) Gene expression of ECFP (gray) and mAIRN (red), as measured using RT-qPCR in pgR-poor or pgR-rich cells. Where indicated, the cells were incubated with 1 µg/ml DOX for 24 h. Gene expression was normalized relative to *β-actin*. Data are presented as the mean ± SEM, n = 3. (C) DRIP-qPCR using the anti-S9.6 antibody against ECFP and mAIRN in pgR-poor or pgR-rich cells. Where indicated, the cells were incubated with 1 µg/ml DOX for 24 h. Pre-immunoprecipitated samples were untreated or treated with RNase H as indicated. Values are relative to those of DOX-treated (+) RNase H-untreated (−) pgR-poor cells. Data are presented as the mean ± SEM; statistical significance was assessed using two-way ANOVA (n = 2). (D) Bar graphs indicate luciferase activity at 48 hpi in pgR-poor or prR-rich cells infected with 100ng/p24 capsid antigen of luciferase reporter HIV-1 virus per 1× 10⁵ cells/mL. Data are presented as the mean ± SEM; P values were calculated using one-way ANOVA (n = 6). (E) Box graph indicating the quantified HIV-1 integration site sequencing read count across pgR-poor and pgR-rich transposon sequences in untreated (–) or DOX-treated (+) pgR-poor or pgR-rich cell line infected with 100ng/p24 capsid antigen of luciferase reporter HIV-1 virus per 1× 10⁵ cells/mL. Each bar corresponds to pooled datasets from three biologically independent experiments (n =3). In each boxplot, the centerline denotes the median, the upper and lower box limits denote the upper and lower quartiles, and the whiskers denote the 1.5 × interquartile range. Statistical significance was assessed using a two-sided Mann–Whitney U test. (F and G) Heat maps representing HIV-1 integration frequency across pgR-poor (F) or pgR-rich (G) transposon sequence in untreated (-) or DOX-treated (+) pgR-poor (F) or pgR-rich (G) cell line. Each rectangular box corresponds to the pooled integration frequency from three biologically independent experiments (n =3) at the indicated position within pgR-poor (F) or pgR-rich (G) transposon vector. Each light blue box represents actual position of R-loop forming or non-R-loop forming sequence (ECFP or mAIRN) and the yellows stars indicate TRE promoter position within vector.

A similar number of the copies of piggyBac transposon was successfully delivered to the genome of each cell line (S5A Fig.), and DOX treatment strongly induced the transcriptional activity of mAIRN or ECFP without affecting the transcription of endogenous loci in both cell lines (S5B and S5C Fig.). Although the transcription of mAIRN or ECFP was strongly induced upon DOX treatment, the activity did not exceed that of endogenous loci in both cell lines (S5D and S5E Fig.). While two cell lines showed comparable level of DOX-inducible transcription activity at the designated sequences (Fig. 3B), only pgR-rich cells exhibited robust RNase H-sensitive stable R-loop formation upon DOX treatment (Fig. 3C, mAIRN). By contrast, R-loops were weakly formed in the pgR-poor cells where non-R-loop forming sequence (ECFP) inserted into its genome (Fig. 3C, ECFP).

To examine whether the formation of ‘extra’ R-loops in the host genome influence HIV-1-infection to the host cells, we infected both cell lines with VSV-G-pseudotyped HIV-1-luciferase viruses and examined the luciferase activity. Interestingly, we found that pgR-rich cells showed significantly high luciferase activity only when R-loops were induced by DOX treatment, whereas pgR-poor cells showed comparable luciferase activity regardless of transcription activation by DOX treatment (Fig. 3D). We conducted HIV-1 integration site sequencing in HIV-1-infected pgR-poor and pgR-rich cells to directly quantify site-specific integration events at sequence-specific R-loop regions. Remarkably, integration events were significantly higher in pgR-rich cells only when R-loops were induced by DOX treatment (Fig. 3E). However, HIV-1 integration frequency within non-R-loop forming sequence in pgR-poor cells remained very low, even with transcription activation by DOX treatment (Fig. 3E). HIV-1 integration frequency was enriched at the vicinity of R-loop forming regions in pgR-rich cell line upon DOX treatment, but the enrichment was not observed in pgR-poor cells that does not form stable R-loops even after transcription activation by DOX treatment (Figs. 3F and 3G). This cell-based R-loop inducing system with independent control over transcription and R-loop formation enabled the direct measurement of HIV-1 integration events at the defined R-loop regions, and the results indicate that host genomic R-loops are targeted by HIV-1 integration. Moreover, our data suggest that transcription activity itself is not sufficient for HIV-1 integration site determination, but the formation of R-loops accounts for HIV-1 integration site selection.

Host genomic R-loops are targeted by HIV-1 integration

We attempted to further validate the relationship between R-loops and the HIV-1 integration site selection by global analysis of HIV-1 integration sites on endogenous genomic regions of HIV-1 infected host cells. We performed HIV-1 integration site sequencing in HIV-1 infected HeLa cells, CD4⁺ and Jurkat T cells and analyzed the sequencing data combined with our DRIPc-seq data. We counted and compared the number of successfully integrated proviruses in the R-loop regions (the combined genomic regions within 30-kb windows centered on DRIPc-seq peaks from 0, 3, 6, and 12 hpi) to those in non-R-loop forming regions (the total genomic regions outside of the 30-kb windows centered on DRIPc-seq peaks). Notably, we found that approximately three to four times more integration were detected in the R-loop regions as in other genomic regions without R-loops in HeLa cells, CD4⁺ and Jurkat T cells (Fig. 4A). Interestingly, the HIV-1 integration sites preferred the center and nearby areas of the R-loops regions (Fig. 4B). We observed biases for HIV-1 integration in HIV-1-induced R-loop-positive regions, P2 and P3, where gave highly induced R-loop signal upon HIV-1 infection in DRIPc-seq analysis and DRIP-qPCR (Fig. 4C). By contrast, HIV-1 integration sites were not detected in R-loop-negative regions, N1 and N2 (Fig. 4D). Overall, our results from bioinformatics analysis using different types of naïve host cells infected with HIV-1 are consistent with the idea that the virus has a preference for targeting R-loops for integration (Fig. 3), and our data suggest R-loops as an important composer of host genomic environment for HIV-1 integration site determination.

HIV-1 targets host genomic R-loop for its viral cDNA integration.
(A) Bar graphs showing quantified number of HIV-1 integration sites per Mb pairs in total regions of 30-kb windows centered on DRIPc-seq peaks from HIV-1 infected HeLa cells, primary CD4⁺ T cells and Jurkat cells (magenta) or non-R-loop region in the cellular genome (gray). (B) Proportion of integration sites within the 30-kb windows centered on R-loops (magenta solid lines) or randomized R-loops (gray dotted lines). Control comparisons between randomized integration sites with R-loops and randomized R-loops are indicated by black dotted lines and gray solid lines, respectively. (C and D) Superimpositions of HIV-1-induced R-loop positive chromatin regions, P2 and P3 (C), and HIV-1-induced R-loop negative chromatin regions, N1 and N2 (D), on DRIPc-seq (blue, 0 hpi; yellow, 3 hpi; green, 6 hpi; red, 12 hpi) and HIV-1 integration frequency (IF, black).

HIV-1 integrase physically interacts with R-loops on the host genome

HIV-1 intasome tether to the host genome for its viral cDNA integration. Intasomes consist of HIV-1 viral cDNA and HIV-1 coding protein, integrases. We observed that HIV-1 preferentially integrated into R-loops on the host genome, thus we hypothesized that the HIV-1 integrase protein could directly bind and be recruited to the genomic R-loops. To test this hypothesis, we first investigated whether HIV-1 integrase proteins have physical binding affinity to nucleic acid substrates possessing R-loop structure. Although HIV-1 integrases are DNA and RNA binding proteins (35, 36), its binding ability towards such three-stranded nucleic acid structure that is composed with a DNA-RNA hybrid like R-loop has not been investigated. We carried in vitro protein-nucleic acid binding assay by electrophoretic mobility shift assay (EMSA) with Sso7d-tagged HIV-1 integrase recombinant proteins and diverse structures of nucleic acid substrates including R-loop and simple dsDNA duplex.

Interestingly, nucleic acid substrate consisted with R-loop structure bound to HIV-1 integrase proteins with greater binding affinity than simple dsDNA duplex (Fig. 5A). Additionally, R-loop composing forms of nucleic acid structures such as RNA-DNA hybrid with exposed ssDNA (R:D+ssDNA) and RNA-DNA hybrid (hybrid) also hosed high binding affinity to integrases (S6A Fig. and Fig. 5A).

HIV-1 integrase proteins directly bind to host genomic R-loops.
(A) Representative gel images for EMSA of Sso7d-tagged HIV-1-integrase (E152Q) with R-loop and dsDNA, 10 nM nucleic acid substrate was incubated with Sso7d-tagged HIV-1-integrase (E152Q) at 0 nM, 20 nM, 50 nM, 100 nM, 200 nM, and 400 nM (left). Unbound fraction were quantified for EMSA of Sso7d-tagged HIV-1-integrase (E152Q) with different types substrates (R-loop, dsDNA, R-loop, R:D+ssDNA and Hybrid). Data are presented as the mean ± SEM, n = 3 (right). (B) Summary of the experimental design for R-loop immunoprecipitation using S9.6 antibody in FLAG-tagged HIV-1 integrase protein-expressing HeLa cells. (C) Western blotting for HIV-1 integrase protein, H3, and LaminA/C of DNA–RNA hybrid immunoprecipitation using the S9.6 antibody. (D) **and** (E) HeLa gDNA input was either untreated (–) or treated (+) with RNase H before enrichment for DNA–RNA hybrids using the S9.6 antibody. gDNA–RNA hybrids were incubated with nuclear extracts depleted of DNA–RNA hybrids with RNase A followed by S9.6 immunoprecipitation. DNA–RNA hybrid dot blot (D) and western blot of DNA–RNA hybrid immunoprecipitation, probed with the indicated antibodies (E). (F) DNA–RNA hybrid dot blot of FLAG antibody-immunoprecipitated nucleic acid extracts. Where indicated, nucleic acid extracts were untreated (–) or treated (+) with RNase H before probing with the S9.6 antibodies. (G) Representative images of the proximity-ligation assay (PLA) between GFP and S9.6 antibodies in HIV-IN-EGFP virion-infected HeLa cells at 6 hpi. Cells were subjected to PLA (orange) and co-stained with DAPI (blue). PLA puncta in the nucleus are indicated by the yellow arrows. Quantification analysis of number of PLA foci per nucleus (left). GFP_alone and S9.6_alone were used as single-antibody controls from HIV-IN-EGFP virion-infected HeLa cells (right). The mean value for each data point is indicated by the red line. P value was calculated using a two-tailed unpaired t-test (n > 50).

We validated the interaction between cellular genomic R-loops and HIV-1 integrase proteins by DNA–RNA hybrid immunoprecipitation using S9.6 antibodies in FLAG-tagged HIV-1 integrase-expressing HeLa cells (Fig. 5B). Under our experimental conditions, R-loops were reproducibly immunoprecipitated (S6B Fig.) and HIV-1 integrase proteins co-immunoprecipitated with R-loops (Fig. 5C). DNA–RNA hybrids also co-immunoprecipitated with the positive control H3 (37) but not with the negative control LaminA/C and Actin (37) (Fig. 5C). To verify the specificity of our co-immunoprecipitation results for R-loops and HIV-1 integrases, we performed DNA–RNA hybrid immunoprecipitation with RNase H treatment (S6C Fig.). The S9.6 signal of immunoprecipitated nucleic acids was highly sensitive to RNase H treatment of pre-immunoprecipitates (Fig. 5D). Accordingly, the blotting signal of the co-immunoprecipitated HIV-1 integrase and H3 proteins was significantly reduced upon RNase H treatment (Fig. 5E). We performed reciprocal immunoprecipitation using an anti-FLAG monoclonal antibody and detected immunoprecipitated R-loops using dot blot analysis with anti-S9.6. R-loops were immunoprecipitated by HIV-1 integrase, and the S9.6 signal of immunoprecipitated nucleic acids was highly sensitive to RNase H treatment (Fig. 5F and S6D Fig.). Subsequently, we attempted to observe the interaction between the R-loops and HIV-1 integrase using a proximity-ligation assay (PLA), in HIV-1-infected cells. We used two antibodies: one that binds to R-loops (anti-S9.6) and another one that binds to GFP-tagged HIV-1 integrase. We detected PLA signals in cells infected with HIV-IN-EGFP virions and in non-infected control cells. PLA signals in non-infected cells were comparable to those in S9.6-alone and GFP-alone single antibody-negative controls; however, PLA signals significantly increased upon HIV-1 infection (Fig. 5G and S6E Fig.). Our data suggest that the HIV-1 frequently targets R-loop-rich regions for viral genome integration by physical binding of HIV-1 integrase proteins to R-loop structures on the host genome.

Discussion

In this study, we found that HIV-1 preferentially integrates into regions rich in R-loops, suggesting that R-loops are a novel host factor governing HIV-1 integration site selection. In our bioinformatics analysis, host cellular R-loops were induced by HIV-1 infection and widespread over host genomic regions. Using our R-loop-inducible cell models, R-loop formation, not necessarily transcription activity itself, was found to be important for HIV-1 integration site determination. In addition, HIV-1 integrase proteins favored physical binding with R-loops in vitro, and they interacted with host genomic R-loops in HIV-1-infected cells. These results demonstrated that HIV-1 exploits and frequently targets the host genomic R-loops for successful integration and infection.

Our data show that HIV-1 targets host genomic R-loops for viral genome integration and its integrase proteins physically interact with genomic R-loops in vitro and in cells. This may because the R-loops own an unique nucleic acid conformation of B-form DNA and A-form RNA intermediates, which possess intrinsic preferential binding ability to retroviral intasome (25–27). Another possible explanation for why HIV-1 integration shows a preference towards host genomic R-loops is that R-loops perhaps take a collaborative role with host factors governing the HIV-1 integration site selection. Cellular R-loops are recognized and regulated by numerous cellular proteins (37, 38). Besides, the correct genomic environment for HIV-1 integration site selection are composed by host proteins (9). LEDGF/p75 (9, 13, 39) and CPSF6 (7, 9) are two decisive host factors that direct HIV-1 integration by interacting with integrase or trafficking viral preintegration complex towards nuclear interior (7, 9). In fact, these host factors have recently been identified as potential R-loop binding proteins in DNA–RNA interactome analysis (37) and R-loop proximity proteomics (38), respectively. R-loops are tightly regulated by DNA damage response proteins (40) and DNA repair machineries take important roles in HIV-1 integration process (31). For example, the Fanconi anemia pathway (41, 42), a well-known R-loop regulatory pathway, has been recently proposed as an HIV-1 integration regulatory factor exploited by HIV-1 (43). Taking into account theses previous studies alongside our current findings, we propose R-loops as another pivotal host factor driving HIV-1 integration site determination and as a possible intermediate regulator of HIV-1 integration site selection by such host proteins.

Viruses often take advantage of various host factors, and targeting viral components that manipulate the host cellular environment can be an effective strategy for antiviral therapy. Our study has shown that host genomic R-loops accumulate significantly shortly after HIV-1 infection. Thus, it is possible that virion-associated HIV-1 proteins are responsible for inducing these R-loops. For instance, the HIV-1 accessory protein Vpr causes genomic damage (44) and transcriptomic changes during the early stages post infection(45), both of which can lead to in cis and in trans R-loop formation (15). Another HIV-1 accessory protein, Vif, counteracts the host antiviral factor, APOBEC3 (46, 47), which were recently found to regulates cellular R-loop levels (48). Identifying the HIV-1 components responsible for inducing host cellular R-loops, and elucidating the mechanism by which they induce genome-wide R-loop formation and contribute to successful viral integration into selective genomic regions, represents an area for further research.

Although most HIV-1 integration occurs in genic regions (4, 6), HIV-1 proviruses are also found in non-genic regions (49) and understanding these “transcriptionally silent” proviruses is critical for developing strategies to completely eliminate HIV-1. In HIV-1 elite controllers, who suppress viral gene expression to undetectable levels, HIV-1 proviruses accumulate in heterochromatic regions (5). Moreover, proviruses with lower expression level can persist in the host genome even during antiretroviral therapy (4). However, the mechanism by which HIV-1 targets gene-silent regions for “invisible” integration remains unclear. Our study has revealed that R-loops are enriched in both genic and non-genic regions during HIV-1 infection, and that the virus preferentially targets these R-loops for integration. We propose that R-loops, particularly those enriched in non-genic regions, may represent the mechanism by which the virus achieves “invisible” and permanent infection.

Materials and methods

Cell culture

HeLa and HEK293T cells were cultured in Dulbecco’s modified Eagle’s medium (Gibco) supplemented with 10% (v/v) fetal bovine serum (FBS, Cytiva), antibiotic mixture (100 units/ml penicillin–streptomycin, Gibco), and 1% (v/v) GlutaMAX-I (Gibco). Jurkat cells were cultured in Roswell Park Memorial Institute (RPMI) 1640 medium (ATCC) supplemented with 10% (v/v) FBS (Cytiva). Cells were incubated at 37°C and 5% CO₂.

Virus production and infection

VSV-G-pseudotyped HIV-1 virus stocks were prepared by performing standard polyethylenimine-mediated transfection of HEK293T monolayers with pNL4-3 ΔEnv EGFP (NIH AIDS Reagent Program 11100) or pNL4-3. Luc.R-E (NIH AIDS Reagent Program, 3418) along with pVSV-G at a ratio of 5:1. HIV-IN-EGFP virions were produced by performing polyethylenimine-mediated transfection of HEK293T cells with 6 µg of pVpr-IN-EGFP, 6 µg of HIV-1 NL4-3 non-infectious molecular clone (pD64E; NIH AIDS Reagent Program 10180), and 1 µg of pVSV-G. The cells were incubated for 4 h before the medium was replaced with fresh complete medium. Virion-containing supernatants were collected after 48 h, filtered through a 0.45-µm syringe filter, and pelleted using the Lenti-X Concentrator (631232; Clontech) according to the manufacturer’s instructions. The multiplicity of infection (MOI) of virus stocks was determined by transducing a known number of HeLa cells with a known amount of virus particles and then counting GFP-positive cells using flow cytometry. For luciferase reporter HIV-1 virus, the HIV-1 p24 antigen content in viral stock were quantified using the HIV1 p24 ELISA kit (Abcam, ab218268), according to the manufacturer’s instruction. For virus infection, HeLa cells were seeded at a density of 0.5–4 × 10⁵ cells/mL on the day before infection. The culture medium was replaced with fresh complete culture medium 2 hpi. The infected cells were washed twice with PBS and harvested at the indicated time points. Jurkat cells were seeded at a density of 1× 10⁶ cells/mL and inoculated with 300ng/p24 capsid antigen. The plates were centrifuged at 1000 g at 30°C for 1 h. The medium was replaced with fresh RPMI 2 h after infection.

Primary cell isolation, culture, T cell activation, and infection

For CD4⁺ T cells isolation, human PBMC (ST70025, STEMCELL Technologies) was mixed and incubated with MACS CD4 MicroBeads (130-045-101, Miltenyi Biotec) and FITC-conjugated mouse anti-CD4 (561005, BD Bioscience) according to the manufacturer’s instructions. Then the CD4⁺ T cells were enriched by using LS Columns (130-042-401, Miltenyi Biotec) and MidiMACS Separator (130-042-302, Miltenyi Biotec). The efficiency of magnetic separation was analyzed by using Flow-Activated Cell Sorter Canto II (BD Bioscience) and Flowjo software (Flowjo).

CD4⁺ T cells were cultured in Roswell Park Memorial Institute (RPMI) 1640 medium (Gibco), supplemented with 10% (v/v) fetal bovine serum (FBS, Cytiva), antibiotic mixture (100 units/ml penicillin–streptomycin, Gibco), 1% (v/v) GlutaMAX-I (Gibco), and 20 ng/ml of IL-2 (PHC0026, Gibco), left in resting state or activated with Dynabeads Human T-Activator CD3/CD28 (1161D, Thermo Fisher Scientific) for 72 h. CD4⁺ T cells activation efficiency was assessed by staining cells with FITC-conjugated mouse anti-CD25 (340694, BD Bioscience) and APC-conjugated mouse anti-CD69 (130-114-046, Miltenyi Biotec) and using Flow-Activated Cell Sorter Canto II (BD Bioscience) and Flowjo software (Flowjo).

Purified and activated CD4⁺ T cells were seeded at a density of 1× 10⁶ cells/mL and inoculated with 600ng/p24 capsid antigen in presence of polybrene. The plates were centrifuged at 1000 g at 30°C for 1 h. The medium was replaced with fresh RPMI 2 h after infection.

DRIP-qPCR

DRIP was performed as described for the construction of the DRIPc-seq library. After the elution of isolated complexes, nucleic acids were purified using the standard phenol-chloroform extract method and used for qPCR. S6 Table presents details of the primer sequences used for DRIP-qPCR analysis.

RNA-seq library construction

For RNA-seq, HeLa cells were infected with VSV-G-pseudotyped HIV-1 NL4-3 ΔEnv EGFP virus at a MOI of 0.6 and harvested at 0, 3, 6, and 12 hpi. Sequencing was performed with biological replicates. Total RNA was extracted using TRIzol reagent (Invitrogen), according to the manufacturer’s instructions. An mRNA sequencing library was constructed using Illumina adaptors harboring p5 and p7 sequences and Rd1 SP and Rd2 SP sequences. Sequencing was performed using the HiSeq2500 system (Illumina).

Luciferase assay

HeLa cells infected with VSV-G-pseudotyped pNL4-3.Luc.R-E HIV-1 viruses were harvested at 48 hpi, and luminescence was measured using the Dual-Luciferase Reporter Assay System (Promega) according to the manufacturer’s instructions. Briefly, 250 μl of passive lysis buffer was used to lyse cells for each sample, 20 μl of the lysate was mixed with 100 μl of the Luciferase Assay Reagent II, and the luminescence of firefly luciferase was measured using a microplate luminometer (Berthold). The luminescence signal were normalized with total protein content, measured by BCA assay.

Quantitative real-time PCR (qPCR)

For RT (reverse transcription)-qPCR, 1 μg of RNA was reverse-transcribed using the ReverTra Ace qPCR RT Kit (TOYOBO) following the manufacturer’s instructions. For qPCR, DNA extracts were prepared using a DNA purification kit (Qiagen, 51106) according to the manufacturer’s instructions. Equivalent amounts of purified gDNA from each sample were analyzed using qPCR. qPCR was performed using TOPreal qPCR PreMIX (Enzynomics, RT500M). The reactions were performed in duplicate or triplicate for technical replicates. PCR was performed using the iCycler iQ real-time PCR detection system (Bio-Rad). All the primers used for qPCR are listed in S6 Table.

DRIPc-seq library construction

DRIP followed by library preparation, next-generation sequencing, and peak calling were performed as described earlier (28). Briefly, the corresponding cells were harvested and their gDNA was extracted. The extracted nucleic acids were fragmented using a restriction enzyme cocktail with BsrB I (NEB, R0102S), HindIII (NEB, R0136L), Xba I (NEB, R0145L), and EcoRI (NEB, R3101L) overnight at 37°C. Half of the fragmented nucleic acids were digested with RNase H (New England Biolabs) overnight at 37°C to serve as a negative control. The digested nucleic acids were cleaned using standard phenol-chloroform extraction and resuspended in DNase/RNase-free water. DNA–RNA hybrids were immunoprecipitated from total nucleic acids using mouse anti-DNA–RNA hybrid S9.6 (Kerafast, ENH001) DRIP binding buffer and incubated overnight at 4°C. Dynabeads Protein A (Invitrogen, 10001D) was used to pull down the DNA-antibody complexes by incubation for 4 h at 4°C. The isolated complexes were washed twice with DRIP binding buffer before elution. For elution, the isolated complexes were incubated in an elution buffer containing proteinase K for 45 min at 55 °C. Subsequently, DNA was purified using the standard phenol-chloroform extract method and subjected to DNase I (Takara, 2270 B) treatment and reverse transcription for DRIPc-seq library construction. DRIPc-seq was performed in biological replicates. S5 Table shows details of the oligonucleotides used for DRIPc-seq library construction. DRIPc-seq libraries were analyzed using 150 bp paired-end sequencing on a HiSeqX Illumina instrument.

Immunofluorescence microscopy

For immunofluorescence assays of S9.6 nuclear signals, when indicated, the cells were pre-extracted with cold 0.5% NP-40 for 3 min on ice. Cells were fixed with 100% ice-cold methanol for 10 min on ice and then incubated with 100% ice-cold acetone for 1 min. The slides were washed three times with 1× PBS and incubated with or without 60 U/mL RNase H (M0297S, NEB) at 37°C for 36 h or left untreated. The slides were subsequently briefly rinsed thrice with 2% BSA/0.05% Tween (in PBS) and incubated with mouse anti-DNA– RNA hybrid S9.6 (Kerafast, ENH001; 1:100) and rabbit anti-nucleolin (Abcam, ab22758; 1:300) in 2% BSA/0.05% Tween (in PBS) for 4 h at 4°C. The slides were then washed three times with 2% BSA/0.05% Tween (in PBS) and incubated with goat anti-rabbit AlexaFluor-488-conjugated (Invitrogen, A-11008) and goat anti-mouse AlexaFluor-568-conjugated (Molecular Probes, A11004) secondary antibodies (1:200) for 2 h at room temperature. The slides were then washed three times with 2% BSA/0.05% Tween (in PBS) and mounted using the ProLong Gold AntiFade reagent (Invitrogen). Images were obtained using an inverted microscope Nikon Eclipse Ti2, equipped with a 1.45 numerical aperture, plan apochromat lambda 100× oil objective, and an scientific complementary metal–oxide–semiconductor camera (Photometrics prime 95 B 25 mm). For each field of view, images were obtained with DAPI395, GFP488, and Alexa594 channels using the NIS-Elements software. For quantification analysis, binary masks of nuclei and nucleoli were generated using the ROI manager and auto local thresholding using the ImageJ software. The intensity of nuclear signals for DNA–RNA hybrids and nucleolin was then quantified. The final DNA–RNA hybrid signals in the nucleus were calculated by subtracting the nucleolin signals from the DNA–RNA hybrid signals.

pgR-rich and -poor cell line generation with piggyBac transposition

We adapted and modified an elegantly designed episomal system that induces defined R-loops with controlled transcription levels (16) for R-loop-forming or non-R-loop-forming sequence subcloning into the piggyBac transposon vector. HeLa cells were seeded at a density of 5 × 10⁴ cells/ml in a 6-well plate. The next day, cells were transfected with 0.2 μg of Super PiggyBac Transposase Expression Vector (System Biosciences, PB210PA-1) and 0.2, 1, or 2 μg of transposon vectors with appropriate “cargo” sub cloned using Lipofectamine 3000 (Invitrogen) according to the manufacturer’s instructions. After 3 days, the cells were treated with 10 μg/ml blasticidin S (Gibco, A1113903) for selection. Cells with positive integrants for more than 7 days were validated using immunoblotting or RT-qPCR following treatment with DOX. Jurkat cells were seeded at a density of 8 × 10⁵ cells/ml in a 6-well plate and transfected with 0.2 µg of transposase and 1 µg of corresponding transposon vectors with Lipofectamine 3000, like HeLa cells. After 3 days, the cells were treated with 10 μg/ml blasticidin S (Gibco, A1113903) for selection. For each passage, cells were cushioned onto Ficoll-Pacque (Cytiva, 17144002) to separate live cells from dead cell debris. The cells over the cushion were washed with PBS and incubated in cell culture medium with 10 µg/ml of blasticidin for further selection for at least 14 days. Cells with positive integrants were validated by immunoblotting after treatment with DOX. Quantification of successfully integrated piggyBac transposons was performed using a piggyBac qPCR copy number kit (System Biosciences, PBC100A-1) according to the manufacturer’s instructions.

HIV-1 integration site sequencing library construction

HIV-1 integration site sequencing library construction was performed as described earlier (7, 9). Summarily, HeLa cells were infected with VSV-G-pseudotyped HIV-1 NL4-3 ΔEnv EGFP virus at a MOI of 0.6 and harvested 5 days post infection. gDNA was isolated using a DNA purification kit (Qiagen, 51106), according to the manufacturer’s instructions. gDNA (10 µg) was digested overnight at 37°C with 100 U each of the restriction endonucleases MseI (NEB, R0525L) and BglII (NEB, R0144L). Linker oligonucleotides, which were compatible for ligation with the MseI-generated DNA ends, were ligated with gDNA overnight at 12°C in reactions containing 1.5 μM ligated linker, 1 μg fragmented DNA, and 800 U T4 DNA ligase (NEB, M0202S). Viral LTR–host DNA junctions were amplified using semi-nested PCR with a unique linker-specific primer and LTR primers. The second round of PCR was carried out with primers binding to the LTR and the linkers for next-generation sequencing. Two PCRs were performed in parallel for the first round of PCR and five PCRs were performed in parallel for the second round of PCR to enhance library diversity. S7 Table presents details of the oligonucleotides used for HIV-1 integration site sequencing library construction. HIV-1 integration site sequencing was performed in biological replicates. Integration site libraries were analyzed using 150 bp paired-end sequencing on a HiSeqX Illumina instrument.

Recombinant Sso7d-IN protein purification

Sso7d-integrase active site mutant E152Q was expressed in Escherichia coli BL21-AI and purified essentially as previously described (50). Briefly, Sso7d-IN (E152Q) expressed BL21-AI cells were lysis in lysis buffer (20 mM HEPES pH 7.5, 2 mM 2-mercaptoethanol, 1 M NaCl, 10% (w/v) glycerol, 20 mM imidazole, 1 mg RNase A, and 1000 U DNase I) and purified by nickel affinity chromatography (Qiagen, 30210). Protein were first loaded on HeparinHP column (GE Healthcare) equilibrated with equilibrated with 20 mM Tris, pH 8.0, 0.5 mM TCEP, 200 mM NaCl, 10% glycerol for anion exchange chromatography prior to the size exclusion chromatography. Proteins were eluted with a linear gradient of NaCl from 200 mM to 1 M. Eluted fractions were pooled and then separated on a Superdex-200 PC 10/300 GL column (GE Healthcare) equilibrated with 20 mM Tris pH 8.0, 0.5 mM TCEP, 500 mM NaCl and 6% (w/v) glycerol. The purified protein was concentrated to 0.6 mg/ml using an Amicon centrifugal contentrator (EMD Millipore), flash-frozen in liquid nitrogen and stored at −80°C.

Electrophoretic mobility shift assay for R-loop binding of Sso7d-IN

To test the binding affinity of Sso7d-tagged HIV-1 IN to different types of nucleic acid substrates, we prepared R-loop, dsDNA, RNA-DNA hybrid with exposed ssDNA (R:D+ssDNA), RNA-DNA hybrid (Hybrid), ssDNA, and ssRNA by annealing different combinations of Cy3, Cy5 or non-labeled oligonucleotides following the previous protocol (51). 10 nM of DNA substrate was incubated with Sso7d-IN at different concentrations in assembly buffer (20 mM HEPES pH 7.5, 5 mM CaCl₂, 8 mM 2-mercaptoethanol, 4 uM ZnCl₂, 100 mM NaCl, 25% (w/v) glycerol and 50 mM 3-(Benzyldimethylammonio) propanesulfonate (NDSB-256)), for 1 h at 30°C then incubated for 15 min on ice. All the reactants were run on 4.5% non-denaturing PAGE in 1× TBE and then Cy3 or Cy5 fluorescence signal was imaged by ChemiDoc MP imaging system (Bio-Rad). S8 Table presents details of the oligonucleotide sequence used for EMSA.

Co-immunoprecipitation of DNA–RNA hybrid

DNA–RNA hybrid immunoprecipitation was performed as described earlier (37). Summarily, non-crosslinked HeLa cells transfected with the pFlag-IN codon-optimized plasmid were lysed in 85 mM KCl, 5 mM PIPES (pH 8.0), and 0.5% NP-40 for 10 min on ice, and then, the lysates were centrifuged at 750 g for 5 min to pellet the nuclei. The pelleted nuclei were resuspended in sodium deoxycholate, SDS, and sodium lauroyl sarcosinate in RSB buffer and were sonicated for 10 min (Diagenode Bioruptor). Extracts were then diluted (1:4 in RSB + T buffer) and subjected to immunoprecipitation with the S9.6 antibody overnight at 4°C. Antibody-bound complexes were incubated with Protein A Dynabeads (Invitrogen) for 4 h at 4°C for immunoprecipitation. Normal mouse IgG antibodies (Santa Cruz, sc-2025) were used as negative controls. RNase A (Thermo Scientific, EN0531) was added during immunoprecipitation at 0.1 ng RNase A per µg gDNA. Beads were washed four times with RSB + T; twice with RSB, and eluted either in 2× LDS (Novex, NP0007), 100 mM DTT for 10 min at 70°C (for western blot), or 1% SDS and 0.1 M NaHCO₃ for 30 min at room temperature (for DNA–RNA hybrid dot blot).

For co-immunoprecipitation of DNA–RNA hybrids with RNase H treatment, gDNA containing RNA–DNA hybrids was isolated from HeLa cells transfected with a pFlag-IN codon-optimized plasmid using a QIAmp DNA Mini Kit (Qiagen, 51304). gDNA was sonicated for 10 min (Diagenode Bioruptor) and then treated with 5.5 U RNase H (NEB, M0297) per µg of DNA overnight at 37 °C. A fraction of gDNA was stored as “nucleic acid input” for dot blot analysis. gDNA was cleaned using standard phenol-chloroform extraction, resuspended in DNase/RNase-free water, enriched for DNA–RNA hybrids using immunoprecipitation with the S9.6 antibody (overnight at 4°C), isolated with Protein A Dynabeads (Invitrogen; 4 h at 4°C), washed thrice with RSB+T. The immunoprecipitated complexes were incubated with nuclear extracts of HeLa cells transfected with the pFlag-IN codon-optimized plasmid for 2 h at 4°C with diluted HeLa nuclear extracts. The cell lysate containing proteins were pre-treated with 0.1 mg/ml RNase A (Thermo Scientific, EN0531) for 1 h at 37°C to degrade all RNA–DNA hybrids, and the excess of RNase A was blocked by adding 200 U of SUPERase in RNase inhibitor (Invitrogen, AM2694) for immunoprecipitation. In addition, 100 μL fraction of diluted and RNase A pre-treated extracts prior to immunoprecipitation was stored as “protein input” for western blotting. Beads were washed four times with RSB + T; twice with RSB, and eluted either in 2× LDS (Novex, NP0007), 100 mM DTT for 10 min at 70°C (for western blot), or 1% SDS, and 0.1 M NaHCO₃ for 30 min at room temperature (for DNA–RNA hybrid dot blot).

PLA

For PLA, HeLa cells were grown on coverslips and infected with HIV-IN-EGFP virions. At 6 hpi, cells were pre-extracted with cold 0.5% NP-40 for 3 min on ice. The cells were fixed with 4% paraformaldehyde in PBS for 15 min at 4 °C. The cells were then blocked with 1× blocking solution (Merck, DUO92102) for 1 h at 37°C in a humidity chamber. After blocking, cells were incubated with the following primary antibodies overnight at 4°C for S9.6-HIV-1-IN_PLA: mouse anti-DNA–RNA hybrid S9.6 (1:250; Kerafast, ENH001) and rabbit anti-GFP (1:500; Abcam, ab6556). The following day, after washing with once with buffer A twice (Merck, DUO92102), cells were incubated with pre-mixed Duolink PLA plus (anti-mouse) and PLA minus probes (anti-rabbit) antibodies for 1 h at 37°C. The subsequent steps in the proximal ligation assay were performed using the Duolink PLA Fluorescence kit (Sigma) according to the manufacturer’s instructions. To obtain images, the mounted specimens were visually scanned and representative images were acquired using a Zeiss LSM 710 laser scanning confocal microscope (Carl Zeiss). The number of intranuclear PLA puncta was quantified using the ImageJ software. For each biological replicate and experiment, a PLA with a single antibody was performed as a negative control under the same conditions.

DRIPc-Seq data processing and peak calling

DRIPc-seq reads were quality-controlled using FastQC v0.11.9 (52), and sequencing adapters were trimmed using Trim Galore! v0.6.6 (53) based on Cutadapt v2.8 (54). Trimmed reads were aligned to the hg38 reference genome using bwa v0.7.17-r1188 (55). Read deduplication and peak calling were performed using MACS v2.2.7.1 (56). Because R-loops appear as both narrow and broad peaks in DRIPc-seq read alignment owing to its variable length, two independent “MACS2 callpeak” runs were performed for narrow and broad peak calling. The narrow and broad peaks were merged using Bedtools v2.26.0 (57). To increase the sensitivity of DRIPc-seq peak identification, peaks were called after pooling the two biological replicates of the DRIPc-seq sequencing data for each condition.

Consensus R-loop peak calling

The R-loop peaks at 0, 3, 6 and 12 hpi were first merged using “bedtools merge” to create a universal set of R-loop peaks across time points (n = 46542). Then, each of the universal R-loop peaks was tested for overlap with the R-loop peaks for 0, 3, 6 and 12 hpi using “bedtools intersect”. In all, 9,190, 21,403, 33,544, and 9,941 peaks overlapped with 0, 3, 6, and 12 hpi R-loop peaks, respectively. For CD4 cells, we identified a universal R-loop set consisting of 3,928 R-loops, and among them, 737, 722, 1,796 and 2,766 peaks overlapped with 0, 3, 6 and 12hpi R-loop peaks.

HIV-1 integration site sequencing data processing

Quality control of HIV-1 integration site-sequencing reads was performed using FastQC v0.11.9. To discard primers and linkers specific for integration site-sequencing from reads, we used Cutadapt v2.8 with the following option: “-u 49-U 38--minimum-length 36--pair-filter any--action trim-q0,0 –a linker-A TGCTAGAGATTTTCCACACTGACTGGGTCTGAGGG-A GGGTCTGAGGG--no-indels--overlap 12”. This allowed the first position of the read alignment to directly represent the genomic position of HIV-1 integration. Processed reads were aligned to the hg38 reference genome using bwa v0.7.17-r1188, and integration sites were identified using an in-house Python script. Genomic positions supported by more than five read alignments were regarded as HIV-1 integration sites. For Jurkat cells, we adopted integration site sequencing data of HIV-1 infected wild type Jurkat cells from SRR12322252 (58).

Co-localization analysis of R-loops and integration sites

Enrichment of integration sites near the R-loop peaks was tested using a randomized permutation test. Randomized R-loop peaks were generated using “bedtools shuffle” command, thus preserving the number and the length distribution of the R-loop peaks during the randomization process. Similarly, integration sites were randomized using the “bedtools shuffle” command. Randomization was performed 100 times. ENCODE blacklist regions (59) were excluded while shuffling the R-loops and integration sites to exclude inaccessible genomic regions from the analysis. For each of the observed (or randomized) integration sites, the closest observed (or randomized) R-loop peak and the corresponding genomic distance were identified using the “bedtools closest” command. The distribution of the genomic distances was displayed to show the local enrichment of integration sites in terms of the increased proportion of integration sites within the 30-kb window centered on R-loops compared to their randomized counterparts.

DNA plasmid construction and transfection

R-loop-forming mAIRN and non-R-loop forming ECPF sequences were subcloned from pSH26 and pSH36 plasmids, which were generously provided by Prof. Karlene A. Cimprich, into the piggyBac transposon vector, where the tet operator sequences were located upstream of the minimal CMV promoter. The pFlag-IN codon-optimized plasmid and pVpr-IN-EGFP were kindly provided by Prof. A. Engelman and Prof. Anna Cereseto, respectively. Lipofectamine 3000 (Invitrogen) transfection reagent was used for the transfection of all plasmids into cells, according to the manufacturer’s protocol.

DNA–RNA hybrid dot blotting

Total gDNA was extracted using the QIAmp DNA Mini Kit (Qiagen, 51304) according to the manufacturer’s instructions. gDNA (1.2 μg) was treated with 2 U RNase H (NEB, M2097) per µg of gDNA for 4 h at 37°C, with half of the sample left untreated but denatured. Half of the DNA sample was probed with S9.6 antibody (1:1000), and the other half was probed with an anti-ssDNA antibody (MAB3034, Millipore, 1:10000).

Immunoblotting

Cells were lysed using RIPA buffer (50 mM Tris, 150 mM sodium chloride, 0.5% sodium deoxycholate, 0.1% SDS, and 1.0% NP-40) supplemented with 10 μM leupeptin (Sigma-Aldrich) and 1 mM phenylmethanesulfonyl fluoride (Sigma-Aldrich) and boiled at 98°C for 10 min with SDS sample buffer prior to SDS-PAGE. The primary antibodies used were mouse monoclonal anti-FLAG M2 (Sigma, F3165), monoclonal mouse anti-HSC70 (Abcam, ab2788), polyclonal rabbit anti-histone H3 (tri methyl K4) antibody (Abcam, ab8580), monoclonal mouse anti-HIV-1 Integrase (Santa Cruz, sc-69721), rabbit anti-LaminA/C antibody (Cell Signaling, 2032), and monoclonal mouse anti-Actin (Invitrogen, MA1-744). All primary antibodies were used at a dilution of 1:1000 for western blotting. Peroxidase-conjugated anti-mouse IgG (115-035-062) and anti-rabbit IgG (111-035-003; both Jackson Laboratories) were used as secondary antibodies at 1:5000 dilution. Signals were detected using the SuperSignal West Pico chemiluminescence kit (Thermo Fisher Scientific).

RNA-seq data processing

RNA-seq reads were quality-controlled and adapter-trimmed as in DRIPc-seq processing. To quantify the expression levels of protein-coding genes, processed reads were aligned to the hg38 reference genome with GENCODE v37 gene annotation (60) using STAR v2.7.3a (61). Gene expression quantification was performed using RSEM v1.3.1. To quantify the expression levels of transposable elements (TEs), we used TEtranscripts v2.2.1 (62).

Processed reads were first aligned to the hg38 reference genome using GENCODE v37 and RepeatMasker TE annotation using STAR v2.7.3a. In this case, STAR options were modified as follows to utilize multimapping reads in downstream analyses: “--outFilterMultimapNmax 100--winAnchorMultimapNmax 100--outMultimapperOrder random--runRNGseed 77--outSAMmultNmax 1--outFilterType BySJout--alignSJoverhangMin 8--alignSJDBoverhangMin 1--alignIntronMin 20--alignIntronMax 1000000--alignMatesGapMax 1000000”. Expression levels of TEs were quantified as read counts with the “TEcount” command.

Genome annotations

All bioinformatic analyses were performed using the hg38 reference genome and GENCODE v37 gene annotation. Promoters were defined as a 2-kb region centered at the transcription start sites of the APPRIS principal isoform of protein-coding genes. TTS regions were defined as the 2-kb region centered at the 3′ terminals of protein-coding transcripts. CpG island annotations were downloaded from the UCSC table browser. CpG shores were defined as 2-kb regions flanking CpG islands, excluding the regions overlapping with CpG islands. Similarly, CpG shelves were defined as 2-kb regions flanking the stretch of CpG islands and shores while excluding the regions overlapping with CpG islands and shores. Annotations for LINE, SINE, and LTR were extracted from the RepeatMasker track in the UCSC table browser.

Identification of viral sequencing reads in DRIPc-seq

To identify sequencing reads originating from the viral genome, we aligned DRIPc-seq reads to a composite reference genome consisting of the human and HIV1 genome (Genbank accession number: AF324493.2) and computed the proportion of the reads mapped to HIV1 genome.

Code availability

Bioinformatics pipelines and scripts used in this study are accessible from https://github.com/dohlee/hiv1-rloop.

Acknowledgements

We are grateful to Prof. Karlene A. Cimprich (Standford University) for providing the pSH26 and pSH36 plasmids, Prof. A. Engelman (Harvard Medical School) for providing pFlag-IN codon optimized plasmid and Prof. Anna Cereseto (University of Trento) for providing pVpr-IN-EGFP. The NL4-3 ΔEnv EGFP and pNL4-3.Luc.R-E-viral plasmids were obtained through the NIH HIV Reagent Program, Division of AIDS, NIAID, NIH. We thank Dr. Sungchul Kim (IBS center for RNA Research) and Seongjin An (Korea University) for their technical support in recombinant protein purification.

Author contributions

K.P. and K.A. designed experiments. K.P., J.J and S.L. performed experiments. D.L. performed the bioinformatical and statistical analyses. K.P., D.L., K.A. and S.K. analyzed the data. K.P., D.L., and K.A. wrote the manuscript.

Funding

This work was supported by the Institute for Basic Science of the Ministry of Science Grant (IBS-R008-D1) and the National Research Foundation of Korea (NRF) grant funded by the Korea government (NRF-2020R1A2C3011298) (to K. A.) and (NRF-2020R1A5A1018081) (to K.A.). The funders had no role in the study design, data collection, analysis, decision to publish, or preparation of the manuscript.

Competing interests

The authors have declared that no competing interests exist.

Supplemental Information

Primary CD4⁺ T cells sorting strategies and GFP-HIV-1 infection.
(A) Gating strategy used to determine the efficiency of CD4⁺ T cells sorting from human PBMC. Pre-sorted PBMCs were staining with FITC-conjugated anti-CD4 and subjected for positive CD4+ T cell sorting. The percentages of FITC stained cell population at each step of cell sorting are as indicated. (B) Gating strategy used to determine non-activated (Naïve) and activated cells (αCD3/28) with two markers, CD25 (FITC) and CD69 (APC), for each donor (upper panels, Donor 1; lower panels, Donor 2). (C) Gating strategy used to determine HIV-1-infectivity of CD4+ T cells from each donor infected with GFP reporter HIV-1 virus at 48 hpi. The percentages of GFP positive cell population at are as indicated.

Genome browser screenshot over the HIV-1-induced R-loop forming positive or negative genomic regions.
**(A-C),** Genome browser screenshot over the P1 (A), P2 (B), and P3 (C) HIV-1 induced R-loop-positive chromosomal regions showing result from DRIPc-seq in HIV-1-infected HeLa cells (blue, 0 hpi; yellow, 3 hpi; green, 6 hpi; red, 12 hpi; black, input signals for each indicated sample) on plus (+) or minus (-) DNA strand. Magenta dotted lines represent primer binding sites in qPCR following DRIP. (D and **E),** Genome browser screenshot over the N1 (D), and N2 (E) HIV-1 induced R-loop-negative chromosomal regions showing result from DRIPc-seq in HIV-1-infected HeLa cells (blue, 0 hpi; yellow, 3 hpi; green, 6 hpi; red, 12 hpi; black, input signals for each indicated sample) on plus (+) or minus (-) DNA strand. Magenta dotted lines represent primer binding sites in qPCR following DRIP.

Host cellular R-loop induction by HIV-1 infection is host-genome specific.
(A) DRIP-qPCR using the anti-S9.6 antibody at P1, P2, P3, N1, and N2 in HIV-1-infected cells with MOI of 0.6 harvested at the indicated hpi (blue, 0 hpi; green, 6 hpi). Pre-immunoprecipitated materials were untreated (−) or treated (+) with RNase H, as indicated. Data are presented as the mean ± SEM; P-values were calculated using one-way ANOVA (n = 2). (B) Dot blot analysis of the R-loop in gDNA extracts from HIV-1 infected HeLa cells with MOI of 0.6 harvested at 6hpi. The cells were treated with DMSO, 10uM of Nevirapine (NVP), or 10uM of Raltegravir (RAL) for 24 h before infection, as indicated. gDNAs were probed with anti-S9.6. gDNA extracts were incubated with or without RNase H in vitro before membrane loading (anti-RNA/DNA signal). Fold-induction was normalized to the value of harvested cells at 0 hpi by quantifying the dot intensity of the blots and calculating the ratios of the S9.6 signal to the total amount of gDNA (anti-ssDNA signal). (C) Representative images of the immunofluorescence assay of S9.6 nuclear signals in HIV-1 infected HeLa cells with MOI of 0.6 at 6 hpi. The cells were pre-extracted of cytoplasm and co-stained with anti-S9.6 (red), anti-nucleolin antibodies (green), and DAPI (blue). The cells were treated with DMSO, 10uM of Nevirapine (NVP), or 10uM of Raltegravir (RAL) for 24 h before infection, as indicated. Quantification of S9.6 signal intensity per nucleus after nucleolar signal subtraction for the immunofluorescence assay. The mean value for each data point is indicated by the red line. Statistical significance was assessed using one-way ANOVA (n >51). (D) Pie graphs indicating the percentage of DRIPc-seq reads aligned to host cellular genome (aquamarine) or to HIV-1 viral genome (gray), out of the total consensus DRIPc-seq peaks from HIV-infected HeLa cells, primary CD4⁺ T cells and Jurkat cells.

R-loop induction by HIV-1 infection does not follow transcriptome changes in HeLa cells.
(A) Line graphs and heat maps representing expression levels of indicated repetitive elements (SINE, right; LINE, middle; LTR, left) at the indicated hpi of HIV-1 in HeLa cells. Data are presented as the mean expression levels of two biologically independent experiments. (B) Indicated gene expression as measured by RT-qPCR in 0 or 6 hpi harvested HIV-1-infected HeLa cells. Data represent mean ± SEM, n = 3, P values were calculated according to two-tailed Student’s t-test. P > 0.05; n.s, not significant.

Regulation of cellular R-loops by RNase H1 expression, or by transposon-transposase insertion of R-loop forming and non-R-loop forming sequences in HeLa cells.
(A) Copy number of piggyBac transposon inserts in each cell line constructed by transfecting the transposon vector and transposase-expressing vector. Cell lines used for further experiments are shaded gray (pgR-poor) or red (pgR-rich). (B and C) Fold induction of gene expression for the indicated genes as measured by RT-qPCR. Fold induction were calculated by dividing the gene expression level of DOX-treated (+) by that of DOX-untreated (-) in pgR-poor cells (B) or pgR-rich cells (C). Data represent mean ± SEM, n = 2, P values were calculated according to two-way ANOVA. P > 0.05; n.s, not significant. (D and E) Relative gene expression of the indicated genes as measured by RT-qPCR in DOX-treated (+) or DOX-untreated (-) pgR-poor cells (D) or pgR-rich cells (E). Data represent mean ± SEM, n = 2, P values were calculated according to two-way ANOVA. P > 0.05; n.s, not significant.

HIV-1 integrase proteins directly binds to host genomic R-loops.
(A) Representative gel images for EMSA of Sso7d-tagged HIV-1-integrase (E152Q) with different types of nucleic acids substrates (R:D+ssDNA and Hybrid). 100 nM nucleic acid substrate was incubated with Sso7d-tagged HIV-1-integrase (E152Q) at 0 nM, 20 nM, 50 nM, 100 nM, 200 nM, and 400 nM (n = 3). (B) Nucleic acid extracts from FLAG-HIV-1-integrase-expressing cells were immunoprecipitated using S9.6 antibody. gDNA was precipitated from the elutes of immunoprecipitation and subjected to DNA–RNA hybrid dot blotting. Where indicated, the gDNA extracts were either untreated (–) or treated (+) with RNase H after elution of immunoprecipitated materials. (C) Summary of the experimental design for R-loop immunoprecipitation using S9.6 antibody in FLAG-tagged HIV-1 integrase protein-expressing HeLa cells with pre-immunoprecipitation in vitro RNase H treatment. (D) Protein extracts from FLAG-HIV-1-integrase-expressing cells were immunoprecipitated using anti-FLAG antibody. Western blot of FLAG immunoprecipitation was probed with anti-FLAG or anti-H3 antibodies. (E) Representative images of the proximity-ligation assay (PLA) using single antibody (anti-GFP or anti-S9.6) in HIV-IN-EGFP virion-infected HeLa cells at 6 hpi, as PLA signal negative controls. Cells were subjected to PLA (orange) and co-stained with DAPI (blue) (n > 50).

Chromosomal position and DRIPc-seq signal for referenced R-loop-positive and –negative regions in HIV-1 infected HeLa cells.

Chromosomal position and DRIPc-seq signal for referenced R-loop-positive and –negative regions in HIV-1 infected primary CD4⁺ T cells.

Chromosomal position and DRIPc-seq signal for referenced R-loop-positive and –negative regions in HIV-1 infected Jurkat cells.

RNA-seq analysis of relative gene expression levels of P1-3 and N1,2 R-loop regions.

Oligonucleotides used for DRIPc-seq library construction.

Oligonucleotides used for HIV-1 integration site sequencing library construct.

Oligonucleotides used for electrophoretic mobility shift assay substrate preparation.

References

1.
1. Johnson W. E.
2019Origins and evolutionary consequences of ancient endogenous retrovirusesNat Rev Microbiol 17:355–370
2.
1. Lusic M.
2. Siliciano R. F.
2017Nuclear landscape of HIV-1 infection and integrationNat Rev Microbiol 15:69–82
3.
1. Chen H. C.
2. Martinez J. P.
3. Zorita E.
4. Meyerhans A.
5. Filion G. J.
2017Position effects influence HIV latency reversalNat Struct Mol Biol 24:47–54
4.
1. Einkauf K. B.
2. et al.
2022Parallel analysis of transcription, integration, and sequence of single HIV-1 provirusesCell 185:266–282
5.
1. Jiang C.
2. et al.
2020Distinct viral reservoirs in individuals with spontaneous control of HIV-1Nature 585:261–267
6.
1. Schroder A. R.
2. et al.
2002HIV-1 integration in the human genome favors active genes and local hotspotsCell 110:521–529
7.
1. Achuthan V.
2. et al.
2018Capsid-CPSF6 Interaction Licenses Nuclear HIV-1 Trafficking to Sites of Viral DNA IntegrationCell Host Microbe 24:392–404
8.
1. Ciuffi A.
2. et al.
2005A role for LEDGF/p75 in targeting HIV DNA integrationNat Med 11:1287–1289
9.
1. Sowd G. A.
2. et al.
2016A critical role for alternative polyadenylation factor CPSF6 in targeting HIV-1 integration to transcriptionally active chromatinProc Natl Acad Sci U S A 113:E1054–1063
10.
1. Lucic B.
2. et al.
2019Spatially clustered loci with multiple enhancers are frequent targets of HIV-1 integrationNat Commun 10
11.
1. Marini B.
2. et al.
2015Nuclear architecture dictates HIV-1 integration site selectionNature 521:227–231
12.
1. Kvaratskhelia M.
2. Sharma A.
3. Larue R. C.
4. Serrao E.
5. Engelman A.
2014Molecular mechanisms of retroviral integration site selectionNucleic Acids Res 42:10209–10225
13.
1. Cherepanov P.
2. et al.
2003HIV-1 integrase forms stable tetramers and associates with LEDGF/p75 protein in human cellsJ Biol Chem 278:372–381
14.
1. Niehrs C.
2. Luke B.
2020Regulatory R-loops as facilitators of gene expression and genome stabilityNat Rev Mol Cell Biol 21:167–178
15.
1. Petermann E.
2. Lan L.
3. Zou L.
2022Sources, resolution and physiological relevance of R-loops and RNA-DNA hybridsNat Rev Mol Cell Biol 23:521–540
16.
1. Hamperl S.
2. Bocek M. J.
3. Saldivar J. C.
4. Swigut T.
5. Cimprich K. A.
2017Transcription-Replication Conflict Orientation Modulates R-Loop Levels and Activates Distinct DNA Damage ResponsesCell 170:774–786
17.
1. Ginno P. A.
2. Lott P. L.
3. Christensen H. C.
4. Korf I.
5. Chedin F.
2012R-loop formation is a distinctive characteristic of unmethylated human CpG island promotersMol Cell 45:814–825
18.
1. Lim Y. W.
2. Sanz L. A.
3. Xu X.
4. Hartono S. R.
5. Chedin F.
2015Genome-wide DNA hypomethylation and RNA:DNA hybrid accumulation in Aicardi-Goutieres syndromeElife 4
19.
1. Arora R.
2. et al.
2014RNaseH1 regulates TERRA-telomeric DNA hybrids and telomere maintenance in ALT tumour cellsNat Commun 5
20.
1. Garcia-Muse T.
2. Aguilera A.
2019R Loops: From Physiological to Pathological RolesCell 179:604–618
21.
1. Sanz L. A.
2. et al.
2016Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in MammalsMol Cell 63:167–178
22.
1. Chedin F.
2016Nascent Connections: R-Loops and Chromatin PatterningTrends Genet 32:828–838
23.
1. Lee C. Y.
2. et al.
2020R-loop induced G-quadruplex in non-template promotes transcription by successive R-loop formationNat Commun 11
24.
1. Ajoge H. O.
2. et al.
2022G-Quadruplex DNA and Other Non-Canonical B-Form DNA Motifs Influence Productive and Latent HIV-1 Integration and Reactivation PotentialViruses 14
25.
1. Chedin F.
2. Benham C. J.
2020Emerging roles for R-loop structures in the management of topological stressJ Biol Chem 295:4684–4695
26.
1. Jozwik I. K.
2. et al.
2022B-to-A transition in target DNA during retroviral integrationNucleic Acids Res 50:8898–8918
27.
1. Ballandras-Colas A.
2. et al.
2022Multivalent interactions essential for lentiviral integrase functionNat Commun 13
28.
1. Sanz L. A.
2. Chedin F.
2019High-resolution, strand-specific R-loop mapping via S9.6-based DNA-RNA immunoprecipitation and high-throughput sequencingNat Protoc 14:1734–1755
29.
1. Jones R. B.
2. et al.
2013LINE-1 retrotransposable element DNA accumulates in HIV-1-infected cellsJ Virol 87:13307–13320
30.
1. Srinivasachar Badarinarayan S.
2. et al.
2020HIV-1 infection activates endogenous retroviral promoters regulating antiviral gene expressionNucleic Acids Res 48:10890–10908
31.
1. Lesbats P.
2. Engelman A. N.
3. Cherepanov P.
2016Retroviral DNA IntegrationChem Rev 116:12730–12757
32.
1. Brussel A.
2. Sonigo P.
2003Analysis of early human immunodeficiency virus type 1 DNA synthesis by use of a new sensitive assay for quantifying integrated provirusJ Virol 77:10119–10124
33.
1. Albanese A.
2. Arosio D.
3. Terreni M.
4. Cereseto A.
2008HIV-1 pre-integration complexes selectively target decondensed chromatin in the nuclear peripheryPLoS One 3
34.
1. Dharan A.
2. Bachmann N.
3. Talley S.
4. Zwikelmaier V.
5. Campbell E. M.
2020Nuclear pore blockade reveals that HIV-1 completes reverse transcription and uncoating in the nucleusNat Microbiol 5:1088–1095
35.
1. Kessl J. J.
2. et al.
2016HIV-1 Integrase Binds the Viral RNA Genome and Is Essential during Virion MorphogenesisCell 166:1257–1268
36.
1. van Gent D. C.
2. Elgersma Y.
3. Bolk M. W.
4. Vink C.
5. Plasterk R. H.
1991DNA binding properties of the integrase proteins of human immunodeficiency viruses types 1 and 2Nucleic Acids Res 19:3821–3827
37.
1. Cristini A.
2. Groh M.
3. Kristiansen M. S.
4. Gromak N.
2018RNA/DNA Hybrid Interactome Identifies DXH9 as a Molecular Player in Transcriptional Termination and R-Loop-Associated DNA DamageCell Rep 23:1891–1905
38.
1. Mosler T.
2. et al.
2021R-loop proximity proteomics identifies a role of DDX41 in transcription-associated genomic instabilityNat Commun 12
39.
1. Schrijvers R.
2. et al.
2012LEDGF/p75-independent HIV-1 replication demonstrates a role for HRP-2 and remains sensitive to inhibition by LEDGINsPLoS Pathog 8
40.
1. Stirling P. C.
2. Hieter P.
2017Canonical DNA Repair Pathways Influence R-Loop-Driven Genome InstabilityJ Mol Biol 429:3132–3138
41.
1. Garcia-Rubio M. L.
2. et al.
2015The Fanconi Anemia Pathway Protects Genome Integrity from R-loopsPLoS Genet 11
42.
1. Giannini M.
2. et al.
2020TDP-43 mutations link Amyotrophic Lateral Sclerosis with R-loop homeostasis and R loop-mediated DNA damagePLoS Genet 16
43.
1. Fu S.
2. et al.
2022HIV-1 exploits the Fanconi anemia pathway for viral DNA integrationCell Rep 39
44.
1. Li D.
2. Lopez A.
3. Sandoval C.
4. Nichols Doyle R.
5. Fregoso O. I.
2020HIV Vpr Modulates the Host DNA Damage Response at Two Independent Steps to Damage DNA and Repress Double-Strand DNA Break RepairmBio 11
45.
1. Bauby H.
2. et al.
2021HIV-1 Vpr Induces Widespread Transcriptomic Changes in CD4(+) T Cells Early PostinfectionmBio 12
46.
1. Stopak K.
2. de Noronha C.
3. Yonemoto W.
4. Greene W. C.
2003HIV-1 Vif blocks the antiviral activity of APOBEC3G by impairing both its translation and intracellular stabilityMol Cell 12:591–601
47.
1. Kmiec D.
2. Kirchhoff F.
2024Antiviral factors and their counteraction by HIV-1: many uncovered and more to be discoveredJ Mol Cell Biol
48.
1. McCann J. L.
2. et al.
2021R-loop homeostasis and cancer mutagenesis promoted by the DNA cytosine deaminase APOBEC3BbioRxiv
49.
1. Yukl S. A.
2. et al.
2018HIV latency in isolated patient CD4(+) T cells may be due to blocks in HIV transcriptional elongation, completion, and splicingSci Transl Med 10
50.
1. Passos D. O.
2. et al.
2017Cryo-EM structures and atomic model of the HIV-1 strand transfer complex intasomeScience 355:89–92
51.
1. Nguyen H. D.
2. et al.
2017Functions of Replication Protein A as a Sensor of R Loops and a Regulator of RNaseH1Mol Cell 65:832–847
52.
1. Andrews S.
2010FastQC
53.
1. Felix Krueger F. J.
2. Ewels Phil
3. Afyounian Ebrahim
4. Schuster-Boeckler Benjamin
2021FelixKrueger/TrimGalore (0.6.7)Zenodo
54.
1. Martin M.
2011Cutadapt removes adapter sequences from high-throughput sequencing reads. 2011J EMBnet.journal 17
55.
1. Li H.
2. Durbin R.
2009Fast and accurate short read alignment with Burrows-Wheeler transformBioinformatics 25:1754–1760
56.
1. Zhang Y.
2. et al.
2008Model-based analysis of ChIP-Seq (MACS)Genome Biol 9
57.
1. Quinlan A. R.
2. Hall I. M.
2010BEDTools: a flexible suite of utilities for comparing genomic featuresBioinformatics 26:841–842
58.
1. Li W.
2. et al.
2020CPSF6-Dependent Targeting of Speckle-Associated Domains Distinguishes Primate from Nonprimate Lentiviral IntegrationmBio 11
59.
1. Amemiya H. M.
2. Kundaje A.
3. Boyle A. P.
2019The ENCODE Blacklist: Identification of Problematic Regions of the GenomeSci Rep 9
60.
1. Frankish A.
2. et al.
2021Gencode 2021Nucleic Acids Res 49:D916–D923
61.
1. Dobin A.
2. et al.
2013STAR: ultrafast universal RNA-seq alignerBioinformatics 29:15–21
62.
1. Jin Y.
2. Tam O. H.
3. Paniagua E.
4. Hammell M.
2015TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasetsBioinformatics 31:3593–3599

Article and author information

Author information

Kiwon Park
Center for RNA Research, Institute for Basic Science, Seoul 08826, Republic of Korea, School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
- These authors contributed equally to this work.
Dohoon Lee
Bioinformatics Institute, Seoul National University, Seoul 08826, Republic of Korea, BK21 FOUR Intelligence Computing, Seoul National University, Seoul 08826, Republic of Korea
- These authors contributed equally to this work.
Jiseok Jeong
Center for RNA Research, Institute for Basic Science, Seoul 08826, Republic of Korea, School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
Sungwon Lee
Center for RNA Research, Institute for Basic Science, Seoul 08826, Republic of Korea, School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
Sun Kim
Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Republic of Korea
Kwangseog Ahn
Center for RNA Research, Institute for Basic Science, Seoul 08826, Republic of Korea, School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea, SNU Institute for Virus Research, Seoul National University, Seoul 08826, Republic of Korea
ORCID iD: 0000-0002-1015-245X
- Corresponding author Email: ksahn@snu.ac.kr (KA)

Version history

Preprint posted: March 6, 2024
Sent for peer review: March 6, 2024
Reviewed Preprint version 1: June 21, 2024
Reviewed Preprint version 2: November 11, 2024

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
John Schoggins
The University of Texas Southwestern Medical Center, Dallas, United States of America
Senior Editor
Detlef Weigel
Max Planck Institute for Biology Tübingen, Tübingen, Germany

Reviewer #1 (Public Review):

(1) Significance of findings and strength of evidence.

(a) The work presented in this manuscript is intended to support the authors' novel idea that HIV DNA integration strongly favors "triple-stranded" R-loops in DNA formed either during transcription of many, but not all, genes or by strand invasion of silent DNA by transcripts made elsewhere, and that HIV infection promotes R-loop formation mediated by incoming virions in the absence of reverse transcription. The authors were able to demonstrate a reverse transcription-independent increase in R-loop formation early during HIV infection, while also demonstrating increased integration into sequences that contain R-loop structures. Furthermore, this manuscript also identifies that R-loops are present in both transcriptionally active and silent regions of the genome and that HIV integrase interacts with R-loops. Although the work presented supports a correlation between R-loop formation and HIV DNA integration, it does not prove the authors' hypothesis that R-loops are directly targeted for integration. Direct experimentation, such as in vitro integration into defined DNA targets, will be required. Further, the authors provide no explanation as to how current sophisticated structural models of concerted retroviral DNA integration into both strands of double-stranded DNA targets can accommodate triple-stranded structures. Finally, there are serious technical concerns with the interpretation of the integration site analyses.

(2) Public review with guidance for readers around how to interpret the work, highlighting important findings but also mentioning caveats.

(a) Introduction: The authors provide an excellent introduction to R-loops but they base the rationale for this study on mis-citation of earlier studies regarding integration in transcriptionally silent regions of the genome. E "most favored locus" cited in the very old reference 6 comprises only 5 events and has not been reproduced in more recent, much larger datasets. For example, see the study of over 300.000 sites in freshly infected PBMC cited in https://doi.org/10.1371/journal.ppat.1009141, which shows a 15-fold preference for integration in expressed genes and no evidence of clustering of sites (as seen in expressed genes) in non-expressed DNA. Further, as far as I can tell, they present no examples in the Results section of R-loops in non-expressed DNA serving as integration targets.

(b) Figure 1: Demonstrates models for HIV infections in both cell lines and primary human CD4+ T cells. R-loop formation was determined through a method called DRIPc-seq which utilizes an antibody specific for DNA-RNA hybrid structures and sequences these regions of the genome using RNaseH treatment to show that when RNA-DNA hybrids are absent then no R-loops are detected. In these models of in vitro and ex vivo infection, the authors show that R-Loop formation increases following HIV infection between 6 hour post-infection and 12 hours post-infection, depending on the cell model. However, these figures lack a mock-infected control for each cell model to assess R-loop formation at the same time points. They would also benefit from a control showing that virus entry is necessary, such as omitting the VSV G protein donor.

Additionally, they use intracellular staining to confirm DRIPc-Seq results, by demonstrating an increase in R-loop formation at 6 hours post-infection in HeLa cells. It would have been more relevant to use primary T cells for this assay, but HeLa cells probably provided easier and clearer imaging.

(c) Figure 2: This figure shows that cells infected with HIV show more R-loops as well as longer sequences containing R-loop structures. Panel B shows that these R-loops were distributed throughout different genomic features, such as both genic and intergenic regions of the genome. However, the data are presented in such a way that it is impossible to determine the proportion of R-loops in each type of genomic feature. The reader has no way to tell, for example, the proportion of R-loops in genic vs intergenic DNA and how this value changes with time. Furthermore, increased R-loop formation due to HIV infection showed poor correlation with gene expression, suggesting that R-loops were not forming due to transcriptional activation, although the difference between 0 and the remaining time points is not apparent, nor is the meaning of the absurd p values.

(d) Figure 3: This figure shows the use of cell lines carrying R-loop inducible (mAIRN) or non-inducible (ECFP) genes to model the association of HIV integration with R-loop structures. The authors demonstrate the functional validation of R-loop induction in the cell line model. Additionally, when R-loops are induced there is a significant increase in HIV integration in the R-loop forming vector sequence when R-loops are induced with doxycycline. This result shows a correlation between expression and integration that is much stronger in the R-loop forming gene than in the unreferenced ECFP gene but does not prove that integration directly targets R-loops. It is possible, for example, that some features of the DNA sequence, such as base composition affect both integration and R-loop formation independently. As described more fully below, there is also a serious concern regarding the method used to quantify the integration frequencies.

(e) Figure 4: This figure shows evidence of increased HIV integration within regions of the genome containing R-loops with an additional preference for integration within the R-loop and a decrease in frequency of integration further from the R-loop. Identifying a preference for R-loops is very intriguing but the authors do also demonstrate that integration does occur when R-loops are not present. Also Panel A, which shows that regions of cell DNA that form R-loops have a higher frequency of Integration sites than those that do not, should also be controlled for the level of gene expression of the two types of region.

(f) Figure 5: In this figure, the authors demonstrate that HIV integrase binds to R-loops through a number of protein assays, but does not show that this binding is associated with enzymatic activity. ESMA of integrase identified increased binding to DNA-RNA over dsDNA. Additionally, precipitation of RNA-DNA hybrids pulled down HIV integrase. A proximity ligation assay detecting R-loops and HIV-integrase showed co-localization within the nucleus of HeLa cells. HeLa cells were probably used due to their efficiency of transduction but are not physiologically relevant cell types.

(g) Discussion: In the discussion, the authors address how their work relates to previous evidence of HIV integration by association of LEDGF/p75 and CPSF6. They also cite that LEDGF/p75 has possible R-loop binding capabilities. They also discuss what possible mechanisms are driving increases in R-loop formation during HIV infection, pointing to possible HIV accessory proteins. They also state that how HIV integrates in transcriptionally silent regions is still unknown but do point out that they were able to show R-loops appear in many different regions of the genome but did not show that R-loops in transcriptional inactive regions are integration targets. More seriously, they failed to make a connection between their work and the current understanding of the biochemical and structural mechanism of the integration reaction.

https://doi.org/10.7554/eLife.97348.1.sa2

Reviewer #2 (Public Review):

Retroviral integration in general, and HIV integration in particular, takes place in dsDNA, not in R-loops. Although HIV integration can occur in vitro on naked dsDNA, there is good evidence that, in an infected cell, integration occurs on DNA that is associated with nucleosomes. This review will be presented in two parts. First, a summary will be provided giving some of the reasons to be confident that integration occurs on dsDNA on nucleosomes. The second part will point out some of the obvious problems with the experimental data that are presented in the manuscript.

(1) 2017 Dos Passos Science paper describes the structure of the HIV intasome. The structure makes it clear that the target for integration is dsDNA, not an R-loop, and there are very good reasons to think that structure is physiologically relevant. For example, there is data from the Cherepanov, Engelman, and Lyumkis labs to show that the HIV intasome is quite similar in its overall structure and organization to the structures of the intasomes of other retroviruses. Importantly, these structures explain the way integration creates a small duplication of the host sequences at the integration site. How do the authors propose that an R-loop can replace the dsDNA that was seen in these intasome structures?

(2) As noted above, concerted (two-ended) integration can occur in vitro on a naked dsDNA substrate. However, there is compelling evidence that, in cells, integration preferentially occurs on nucleosomes. Nucleosomes are not found in R loops. In an infected cell, the viral RNA genome of HIV is converted into DNA within the capsid/core which transits the nuclear pore before reverse transcription has been completed. Integration requires the uncoating of the capsid/core, which is linked to the completion of viral DNA synthesis in the nucleus. Two host factors are known to strongly influence integration site selection, CPSF6 and LEDGF. CPSF6 is involved in helping the capsid/core transit the nuclear pore and associate with nuclear speckles. LEDGF is involved in helping the preintegration complex (PIC) find an integration site after it has been released from the capsid/core, most commonly in the bodies of highly expressed genes. In the absence of an interaction of CPSF6 with the core, integration occurs primarily in the lamin-associated domains (LADs). Genes in LADs are usually not expressed or are expressed at low levels. Depending on the cell type, integration in the absence of CPSF6 can be less efficient than normal integration, but that could well be due to a lack of LEDGF (which is associated with expressed genes) in the LADs. In the absence of an interaction of IN with LEDGF (and in cells with low levels of HRP2) integration is less efficient and the obvious preference for integration in highly expressed genes is reduced. Importantly, LEDGF is known to bind histone marks, and will therefore be preferentially associated with nucleosomes, not R-loops. LEDGF fusions, in which the chromatin binding portion of the protein is replaced, can be used to redirect where HIV integrates, and that technique has been used to map the locations of proteins on chromatin. Importantly, LEDGF fusions in which the chromatin binding component of LEDGF is replaced with a module that recognizes specific histone marks direct integration to those marks, confirming integration occurs efficiently on nucleosomes in cells. It is worth noting that it is possible to redirect integration to portions of the host genome that are poorly expressed, which, when taken with the data on integration into LADs (integration in the absence of a CPSF6 interaction) shows that there are circumstances in which there is reasonably efficient integration of HIV DNA in portions of the genome in which there are few if any R-loops.

(3) Given that HIV DNA is known to preferentially integrate into expressed genes and that R-loops must necessarily involve expressed RNA, it is not surprising that there is a correlation between HIV integration and regions of the genome to which R loops have been mapped. However, it is important to remember that correlation does not necessarily imply causation.

If we consider some of the problems in the experiments that are described in the manuscript:

(1) In an infected individual, cells are almost always infected by a single virion and the infecting virion is not accompanied by large numbers of damaged or defective virions. This is a key consideration: the claim that infection by HIV affects R-loop formation in cells was done with a VSVg vector in experiments in which there appears to have been about 6000 virions per cell. Although most of the virions prepared in vitro are defective in some way, that does not mean that a large fraction of the defective virions cannot fuse with cells. In normal in vivo infections, HIV has evolved in ways that avoid signaling infected the cell of its presence. To cite an example, carrying out reverse transcription in the capsid/core prevents the host cell from detecting (free) viral DNA in the cytoplasm. The fact that the large effect on R-loop formation which the authors report still occurs in infections done in the absence of reverse transcription strengthens the probability that the effects are due to the massive amounts of virions present, and perhaps to the presence of VSVg, which is quite toxic. To have physiological relevance, the infections would need to be carried out with virions that contain HIV even under circumstances in which there is at most one virion per cell.

(2) Using the Sso7d version of HIV IN in the in vitro binding assays raises some questions, but that is not the real question/problem. The real problem is that the important question is not what/how HIV IN protein binds to, but where/how an intasome binds. An intasome is formed from a combination of IN bound to the ends of viral DNA. In the absence of viral DNA ends, IN does not have the same structure/organization as it has in an intasome. Moreover, HIV IN (even Sso7d, which was modified to improve its behavior) is notoriously sticky and hard to work with. If viral DNA had been included in the experiment, intasomes would need to be prepared and purified for a proper binding experiment. To make matters worse, there are multiple forms of multimeric HIV IN and it is not clear how many HIV INs are present in the PICs that actually carry out integration in an infected cell.

(3) As an extension of comment 2, the proper association of an HIV intasome/PIC with the host genome requires LEDGF and the appropriate nucleic acid targets need to be chromatinized.

(4) Expressing any form of IN, by itself, in cells to look for what IN associates with is not a valid experiment. A major factor that helps to determine both where integration takes place and the sites chosen for integration is the transport of the viral DNA and IN into the nucleus in the capsid core. However, even if we ignore that important part of the problem, the IN that the authors expressed in HeLa cells won't be bound to the viral DNA ends (see comment 2), even if the fusion protein would be able to form an intasome. As such, the IN that is expressed free in cells will not form a proper intasome/PIC and cannot be expected to bind where/how an intasome/PIC would bind.

(5) As in comment 1, for the PLA experiments presented in Figure 5 to work, the number of virions used per cell (which differs from the MOI measured by the number of cells that express a viral marker) must have a high, which is likely to have affected the cells and the results of the experiment. However, there is the additional question of whether the IN-GFP fusion is functional. The fact that the functional intasome is a complex multimer suggests that this could be a problem. There is an additional problem, even if IN-GFP is fully functional. During a normal infection, the capsid core will have delivered copies of IN (and, in the experiments reported here, the IN-GFP fusion) into the nucleus that is not part of the intasome. These "free" copies of IN (here IN-GFP) are not likely to go to the same sites as an intasome, making this experiment problematic (comment 4).

(6) In the Introduction, the authors state that the site of integration affects the probability that the resulting provirus will be expressed. Although this idea is widely believed in the field, the actual data supporting it are, at best, weak. See, for example, the data from the Bushman lab showing that the distribution of integration sites is the same in cells in which the integrated proviruses are, and are not, expressed. However, given what the authors claim in the introduction, they should be more careful in interpreting enzyme expression levels (luciferase) as a measure of integration efficiency in experiments in which they claim proviruses are integrated in different places.

(7) Using restriction enzymes to create an integration site library introduces biases that derive from the uneven distribution of the recognition sites for the restriction enzymes.

https://doi.org/10.7554/eLife.97348.1.sa1

Reviewer #3 (Public Review):

In this manuscript, Park and colleagues describe a series of experiments that investigate the role of R-loops in HIV-1 genome integration. The authors show that during HIV-1 infection, R-loops levels on the host genome accumulate. Using a synthetic R-loop prone gene construct, they show that HIV-1 integration sites target sites with high R-loop levels. They further show that integration sites on the endogenous host genome are correlated with sites prone to R-loops. Using biochemical approaches, as well as in vivo co-IP and proximity ligation experiments, the authors show that HIV-1 integrase physically interacts with R-loop structures.

My primary concern with the paper is with the interpretations the authors make about their genome-wide analyses. I think that including some additional analyses of the genome-wide data, as well as some textual changes can help make these interpretations more congruent with what the data demonstrate. Here are a few specific comments and questions:

(1) I think Figure 1 makes a good case for the conclusion that R-loops are more easily detected HIV-1 infected cells by multiple approaches (all using the S9.6 antibody). The authors show that their signals are RNase H sensitive, which is a critical control. For the DRIPc-Seq, I think including an analysis of biological replicates would greatly strengthen the manuscript. The authors state in the methods that the DRIPc pulldown experiments were done in biological replicates for each condition. Are the increases in DRIPc peaks similar across biological replicates? Are genomic locations of HIV-1-dependent peaks similar across biological replicates? Measuring and reporting the biological variation between replicate experiments is crucial for making conclusions about increases in R-loop peak frequency. This is partially alleviated by the locus-specific data in Figure S3A. However, a better understanding of how the genome-wide data varies across biological replicates will greatly enhance the quality of Figure 1.

(2) I think that the conclusion that R-loops "accumulate" in infected cells is acceptable, given the data presented. However, in line 134 the authors state that "HIV-1 infection induced host genomic R-loop formation". I suggest being very specific about the observation. Accumulation can happen by (a) inducing a higher frequency of the occurrence of individual R-loops and/or (b) stabilizing existing R-loops. I'm not convinced the authors present enough evidence to claim one over the other. It is altogether possible that HIV-1 infection stabilizes R-loops such that they are more persistent (perhaps by interactions with integrase?), and therefore more easily detected. I think rephrasing the conclusions to include this possibility would alleviate my concerns.

(3) A technical problem with using the S9.6 antibody for the detection of R-loops via microscopy is that it cross-reacts with double-stranded RNA. This has been addressed by the work of Chedin and colleagues (as well as others). It is absolutely essential to treat these samples with an RNA:RNA hybrid-specific RNase, which the authors did not include, as far as their methods section states. Therefore, it is difficult to interpret all of the immunofluorescence experiments that depend on S9.6 binding.

(4) Given that there is no clear correlation between expression levels and R-loop peak detection, combined with the data that show increased detection of R-loop frequency in non-genic regions, I think it will be important to show that the R-loop forming regions are indeed transcribed above background levels. This will help alleviate possible concerns that there are technical errors in R-loop peak detection.

(5) In Figures 4C and D the hashed lines are not defined. It is also interesting that the integration sites do not line up with R-loop peaks. This does not necessarily directly refute the conclusions (especially given the scale of the genomic region displayed), but should be addressed in the manuscript. Additionally, it would greatly improve Figure 4 to have some idea about the biological variation across replicates of the data presented 4A.

(6) The authors do not adequately describe the Integrase mutant that they use in their biochemical experiments in Figure 5A. Could this impact the activity of the protein in such a way that interferes with the interpretation of the experiment? The mutant is not used in subsequent experiments for Figure 5 and so even though the data are consistent with each other (and the conclusion that Integrase interacts with R-loops) a more thorough explanation of why that mutant was used and how it impacts the biochemical activity of the protein will help the interpretation of the data presented in Figure 5.

https://doi.org/10.7554/eLife.97348.1.sa0

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Host genomic R-loops accumulate by HIV-infection

HIV-1 infection induces genomic R-loop accumulation in cells at early post-infection.

R-loops accumulation after HIV-1 infection are widely distributed in both genic and non-genic regions

HIV-1-induced R-loops are enriched at both transcriptionally active and silent regions.

HIV-1 integration sites are enriched at systemically induced sequence-specific R-loop regions in cell model

R-loop inducible cell line model directly addresses R-loop-mediated HIV-1 integration site selection.

Host genomic R-loops are targeted by HIV-1 integration

HIV-1 targets host genomic R-loop for its viral cDNA integration.

HIV-1 integrase physically interacts with R-loops on the host genome

HIV-1 integrase proteins directly bind to host genomic R-loops.

Discussion

Materials and methods

Cell culture

Virus production and infection

Primary cell isolation, culture, T cell activation, and infection

DRIP-qPCR

RNA-seq library construction

Luciferase assay

Quantitative real-time PCR (qPCR)

DRIPc-seq library construction

Immunofluorescence microscopy

pgR-rich and -poor cell line generation with piggyBac transposition

HIV-1 integration site sequencing library construction

Recombinant Sso7d-IN protein purification

Electrophoretic mobility shift assay for R-loop binding of Sso7d-IN

Co-immunoprecipitation of DNA–RNA hybrid

PLA

DRIPc-Seq data processing and peak calling

Consensus R-loop peak calling

HIV-1 integration site sequencing data processing

Co-localization analysis of R-loops and integration sites

DNA plasmid construction and transfection

DNA–RNA hybrid dot blotting

Immunoblotting

RNA-seq data processing

Genome annotations

Identification of viral sequencing reads in DRIPc-seq

Code availability

Acknowledgements

Author contributions

Funding

Competing interests

Supplemental Information

Primary CD4+ T cells sorting strategies and GFP-HIV-1 infection.

Genome browser screenshot over the HIV-1-induced R-loop forming positive or negative genomic regions.

Host cellular R-loop induction by HIV-1 infection is host-genome specific.

R-loop induction by HIV-1 infection does not follow transcriptome changes in HeLa cells.

Regulation of cellular R-loops by RNase H1 expression, or by transposon-transposase insertion of R-loop forming and non-R-loop forming sequences in HeLa cells.

HIV-1 integrase proteins directly binds to host genomic R-loops.

Chromosomal position and DRIPc-seq signal for referenced R-loop-positive and –negative regions in HIV-1 infected HeLa cells.

Chromosomal position and DRIPc-seq signal for referenced R-loop-positive and –negative regions in HIV-1 infected primary CD4+ T cells.

Chromosomal position and DRIPc-seq signal for referenced R-loop-positive and –negative regions in HIV-1 infected Jurkat cells.

RNA-seq analysis of relative gene expression levels of P1-3 and N1,2 R-loop regions.

Oligonucleotides used for DRIPc-seq library construction.

Primers used for qPCR.

Oligonucleotides used for HIV-1 integration site sequencing library construct.

Oligonucleotides used for electrophoretic mobility shift assay substrate preparation.

Accession numbers and data sources.

References

Article and author information

Author information

Kiwon Park¶

Dohoon Lee¶

Jiseok Jeong

Sungwon Lee

Sun Kim

Kwangseog Ahn

Version history

Copyright

Peer review process

Editors

Primary CD4⁺ T cells sorting strategies and GFP-HIV-1 infection.

Chromosomal position and DRIPc-seq signal for referenced R-loop-positive and –negative regions in HIV-1 infected primary CD4⁺ T cells.

Kiwon Park

Dohoon Lee