Abstract
Human Immunodeficiency Virus (HIV) persists as a leading global health issue. A significant knowledge gap exists in our understanding of long-range interactions of the HIV-1 RNA genome. To bridge this gap, we introduce HiCapR, incorporating a psoralen crosslinking RNA proximity ligation and post-library hybridization for capturing HIV RNA:RNA interactions.
Leveraging HiCapR, we confirm the presence of stem structures in the key regions, such as the 5’-UTR and RRE stems, and dimer sites in 5’-UTR region, which is responsible for HIV packaging. Importantly, we reveal multiple previously unknown homodimers along the HIV genome, which may have important implications for viral RNA splicing and packaging processes. Also, we uncover a wealth of unprecedented long-range interactions, particularly within the 5’-UTR of infected cells.
Intriguingly, our findings indicate a pronounced reduction in long-range RNA:RNA interactions, signifying a transition from a state of abundant interactions, hence a relative loose state within infected cells to a condensed structure within virions. Concurrently, we have demonstrated the presence of stable genomic domains within virions that are instrumental in the dimerization process. These domains are preserved throughout the packaging process.
Our findings shed light on the functional significance of RNA organization, including stable and persistent genomic domains, homodimerization, and long-range RNA:RNA interactions, in the splicing, packaging as well as assembly of HIV.
Highlights
HiCapR is a new proximity ligation method for mapping RNA structures and homodimers in the HIV genome with sufficient reliability and efficiency.
Multiple homodimers were discovered along the genome, with potential implications for splicing and packaging processes.
Long-range RNA:RNA interactions are abundant in infected cells but significantly reduced in virions.
Stable genomic domains encluding homodimer sites are persistent in virions and are involved in dimerization.
Main
Human Immunodeficiency Virus (HIV) remains a significant global health concern despite advancements in antiretroviral therapy even with the newest report of total 7 AIDS patients been completely “cued” 1 2,3. Research on HIV-1 RNA structure and function has intensified, focusing on assembly, release, and maturation processes, using NMR4, cryo-EM5 and chemical probing6–8. However, our knowledge of the global architecture and the long-range interactions within the HIV-1 RNA genome remains limited, primarily due to the complexity and dynamic nature of these structures.
The HIV genome, comprising two copies of positive-sense RNA, has been extensively studied to understand its regulatory elements, such as the 5’-UTR, crucial for dimerization and selective packaging into viral particles9. Additionally, interactions involving the nucleocapsid protein (NC) and the dimerization signal (Ψ) play pivotal roles in the assembly and maturation of HIV virions10. Furthermore, the Rev Response Element (RRE) in the HIV genome, responsible for nucleocytoplasmic transport of viral RNA, exhibits structural variability impacting virus replication rates11,12. While current techniques like SHAPE provide insights into local RNA structures7,8,13, studying long-range RNA interactions within HIV under physiological conditions remains a challenge.
Proximity ligation-based methods have been shown to be particularly useful for studying long-range RNA interactions, as they can overcome the RNA length limitations of traditional techniques. These methods have been applied to the study of various viruses, including influenza14,15, Zika16–18, and SARS-CoV-219–22, and have led to the identification of numerous RNA structures that are closely related to the virus lifecycle. We and others have recently shown that RNA proximity ligation data also contains information about RNA homodimerization events23,24. Nevertheless, as of now, there are no proximity ligation studies of long-range interactions or dimeric interactions of the HIV RNA genome.
In this study, we modified our previous protocol22 by integrating a hybridization-based capture of HIV-1 sequence after library construction, resulting in a method we call High throughput Capture of RNA interactions (HiCapR). This advancement enabled the comprehensive capture and analysis of the complete HIV RNA genome. By applying HiCapR to both infected cells and virions, we uncover a distinct genomic compression pattern in virions, highlighting the critical role of global genome folding in HIV packaging and assembly. Our findings not only reveal the presence of persistent genomic domains within virions that facilitate whole genome dimerization, even the transgenerational inheritance but also a significant reduction of long-range RNA:RNA interactions in virions compared to infected cells. These insights provide a new perspective on the functional implications of HIV RNA structure in splicing and packaging processes and may lead to the identification of new targets for antiviral intervention.
Results
1. Capturing in vivo HIV genome RNA structure in infected cells and in virion
In a previous study, we characterized the genome structure of SARS-CoV-2 22, which was sufficiently abundant in cells, obviating the need for enrichment. To effectively capture and analyze full-length low abundant HIV RNA-RNA interactions in infected cells, we developed an improved version of the protocol, which we call High throughput Capture of RNA interactions (HiCapR). The principle of hybridization-based capture incorporated NGS library construction is well established and widely employed in capture Hi-C methods25–28 and in RNA proximity ligation 17,19. Applying this strategy, we compared the NL4-3 and GX2005002 strains 29, which represent the prevalent strains in Europe and Southeast Asia, respectively. The overall design of this study is presented in Figure 1A, and the detailed methods are provided in the methods section. Supplementary Figure S1 illustrates the bioinformatic pipeline used to detect RNA-RNA interactions as well as homodimers, which are discussed later in this paper.
Approximately 16 million raw reads were obtained per sequencing library. Chimeric reads were identified using the hyb technique as previously described22, with a database comprising human spliced mRNAs, noncoding RNAs30 and the HIV genome sequence as reference (see Supplementary Table 1). The alignment rate exceeded 90%, and over 95% of the reads contained HIV sequences, indicating an efficient capture of HIV RNA transcripts. On average, we obtained reads with depths of coverage of approximately 11,500x across the genome (Supplementary table 1). The uniform distribution of reads across the HIV genome post-capture (Supplementary Figures S2) indicating HiCapR’s effectiveness in capturing the entire HIV RNA genome with minimal bias.
The overall duplication rate remained below 50%, with the lower duplication rate in the ligation group compared to the non-ligation group (supplementary table 1), possibly due to the increased library complexity resulting from RNA proximity ligation.
In proximity ligation assays, the chimeric rate serves as a crucial indicator of ligation efficiency. We observed an approximate 1% chimeric rate in the non-ligation controls, and around 9% of chimeric reads in the ligation group, aligning with the widely reported range in the literature. The proportion of 5’-3’ and 3’-5’ chimeras was nearly equal in long-range (>1kb) interactions, whereas proximal (<100-nt) interactions were enriched for 3’-5’ chimeras, resulting from the similarity of proximal 5’-3’ chimeras to nonchimeric sequencing reads, in agreement with previous studies19,22 (supplementary table 1).
We performed a Pearson correlation analysis and clustering of contact read counts from various samples, including ligated (+L) and non-ligated control samples (-L). Our results revealed a high degree of correlation (r>0.99) between biological replicates, indicating the high reproducibility. Additionally, we observed a significant similarity between virions in the supernatant and infected cells from the same viral strain (Supplementary Figure S3). Using a similar approach as described in our previous paper, we generated a contact matrix for the global HIV-1 genome, revealing clear local and long-range interactions (Figure 1B and Supplementary Figure S4).
In summary, HiCapR demonstrated high reproducibility, low bias, reliability and efficiency in capturing HIV transcripts. Therefore, it can be applied as a robust tool for analyzing the RNA structure of the HIV genome, and perhaps the other similar RNA virus.
2. Local RNA structure heterogeneity, dynamics and robustness in HIV 5’-UTR
We aimed to characterize specific local structures and their dynamics within the HIV genome. One of the extensively investigated structures is 5’-UTR, as these structures are closely associated with crucial life processes of HIV.
Chimeras analysis support the presence of key structural elements, including TAR, polyA, SL1, SL2, and SL3, as well as polyA-SL1 in the monomeric conformation in the HIV genome (Figure 2A and Supplementary Figure S5). Despite the pivotal role of the 5’-UTR in replication, packaging, transcription and translation, we noted that the 5’-UTR sequence of HIV is not highly conserved. A comparison between the NL4-3 and GX2005002 strains revealed notable insertions and deletions in the U5 region, along with point mutations in the loop of the SL1 core region, which is critical for HIV dimerization(Figure 2B). These findings raise questions about the conservation of the structural integrity of this region. We therefore applied the comradesFold algorithm 17 to our HiCapR data to generate 1000 structure predictions of the 5’-UTR (extending 100 nt downstream of the AUG start codon) for each sample, and then performed MDS analysis after aligning the viral genome coordinates of both strains (Figure 2C). This analysis revealed that the reported dimer and monomer structures clustered together in NL4-3 but form distinct clusters in GX2005002(Figure 2C). Based on these folded structures, we calculated the base pairing probability for each base pair, which involves determining the number of folded structures supporting a specific base pair divided by the total number of structures. Visualizing this base pairing probability as a heatmap identifies the most stable base pairs in the 5’ UTR of HIV. We observed a consistent presence of key structural elements such as polyA, TAR, SL1, SL2, and SL3 in both NL4-3 and GX2005002 strains (Figure 2D), suggesting robustness in the overall structure despite sequence variations and alternative RNA conformations.
3. Novel homodimerization sites and their implications in RNA splicing and viral assembly
Proximity ligation techniques enable unambiguous detection of RNA homodimers from “overlapping” chimeric reads23,24 (Figure 3A). To ensure the accuracy of our results, we implemented a rigorous data filtering process to select chimeras formed exclusively through dimerization, minimizing background interference caused by RNA interactions or self-ligation (Supplementary Figure S1 and Methods). We quantified the total number of reads for dimeric chimeras, 3’-5’ chimeras, and 5’-3’ chimeras respectively in each sample (Supplementary Figure S6) and plotted contact matrices made from non-overlapping chimeras and dimeric chimeras (Supplementary Figure S7). The non-overlapping chimeras reveal short-range intramolecular RNA-RNA interactions, while dimeric chimeras capture homodimer formation in the HIV genome.
The strongest homodimerization signal was found in the 5’-UTR region, which aligns with previous studies. The 5’-UTR region is well-known for its role in triggering HIV dimerization. Previous literature has highlighted the importance of SL1 (DIS) and SL3 (Ψ) in 5’-UTR8,31,32. Our data shows that the SL1, SL2, and SL3 regions all have supportive dimeric chimeras (Supplementary Figure S7). We derived a Dimerization Score by calculating the reads that support base pairing of homodimeric chimeras (Supplementary Figure S8), analogous to the COMRADES Score described previously 17,19,22. The base pairing of homodimers in the SL1 region of the HIV genome is consistent with previous data from NMR and cyro-EM studies 33,34. Additionally, we observed that variations in nucleotides within the SL1 region do not alter the base pairing pattern of dimers between the NL4-3 and GX2005002 strains (Figure 3B) which is egret with previous structure robustness revealed by permutation and structure prediction results
Strikingly, in addition to the known homodimer within the SL1 and SL3 regions, we observed homodimerization distributed along the entire length of the HIV genome (Supplementary Figure 9). Homodimers were present in both infected cells and virions of both strains, with approximately 20 peaks of dimerization that are conserved between the NL4-3 and GX2005002 strains along the genome (Suplementary table2).
We investigated whether dimers could be predicted by the strength of base pairing of intermolecular loop-loop interactions by performing a systematic molecular hybridization analysis on the genomes of two HIV-1 strains, NL4-3 and GX2005002. Our findings showed no correlation between the predicted folding energy and the abundance of measured dimeric chimeras (Supplementary Figure S9A), suggesting that the formation of HIV homodimers is not solely a consequence of local base pairing propensities. Instead, it implies that additional factors, such as the binding of specific proteins, may significantly influence the dimerization process, potentially playing a more decisive role in stabilizing or facilitating the formation of these homodimers.
The identification of multiple dimers in HIV RNA beyond the 5’-UTR region raises intriguing questions regarding their potential roles serving within the HIV genome. To this end, we analyzed the sequence within dimer peaks, identifying a prevalent AG-rich motif rich, present in nearly every peak. This motif highly? resembles the RNA binding motif of Serine/Arginine-Rich Splicing Factor (SRSF) (Supplementary Figure S9B,S9C and S10), which are essential RNA-binding proteins involved in pre-mRNA splicing and alternative splicing regulation35, serving as the main type of HIV-1 splicing factors36. This result suggests a potential role of dimerization in RNA splicing processes within the HIV genome.
5’-3’ discontinuous reads were identified in non-ligation control data, enabling the inference of splicing junction sites. These inferred sites showed strong concordance with canonical splicing sites and recent nanopore sequencing data37 (Figure 3B). Notably, almost every junction site, including splicing donor and acceptor sites, exhibited dimer peaks around them. Additionally, the majority of dimeric chimeras covered these junction sites and enriched downstream of acceptor sites (Supplementary Figure S10), suggesting that dimerization process/events takes place on unspliced genomic RNA. Interestingly, dimers surrounding splicing regulatory elements (as summarized in36) form stable base pairing supported by numerous chimeras (Supplementary Figure S10). This observation underscores the close proximity and potential functional relationship between dimerization events and RNA splicing processes at these critical sites within the HIV genome, suggesting a potential role of dimerization in splicing processes or assisting in the transport of unspliced RNA out of the nucleus.
According to a previous report that utilized CLIP-seq to analyze the RNA binding specificity of the Gag protein in both cells and virions/From a previous CLIP-seq analyzing of Gag protein’s RNA binding specificity, it was observed that Gag exhibits a preference for binding to multiple elements, such as psi and RRE, in HIV infected cells38. Interestingly, significant dimer peaks were also identified in these regions (Supplementary Figure S9A). This suggests a potential connection between Gag binding and dimerization of the HIV genome.By plotting contact matrices derived from non-overlapping and dimeric chimeras around the RRE region, we were able to identify two distinct dimer peaks flanking the RRE region (Supplementary Figure S11A).
To validate these novel homodimers, we synthesized extended RRE RNA fragments with two dimer sites using in vitro transcription and confirmed their ability to form a dimeric conformation through annealing and non-denaturing gel electrophoresis (Supplementary Figure S12A), as well as Agilent Tapestation 4200 capillary electrophoresis (Supplementary Figure S12B) and Bio-Layer Interferometry (BLI) technology (Figure 3C).
4. Interplay between dimeric peaks and 3D genome organization
The local interaction surrounding the RRE forms an extended "arch" structure that encompasses this element, as illustrated in Supplementary Figure S13. This unique architecture may contribute to stabilizing the core RRE conformation, providing a structural basis for the rev-RRE interaction and HIV RNA transport. Additionally, we observed substantial local interaction signals around dimer sites in the 5’-UTR, as depicted in Supplementary Figure S7. To provide a comprehensive overview of the local interaction landscape around dimer sites, we generated meta matrices by computing local interactions (from non-overlapping chimeras) at dimer sites and the flanking regions located 1/2 upstream and 1/2 downstream (Supplementary Figure S12). This analysis unveiled significant local interactions around dimer sites, suggesting the involvement of stable local structures in the formation of dimers between two copies of the HIV genomes.
We observed the meta matrices around dimer sites are similar to genomic domains, which have been previously described in SARS-COV-222. This observation prompted us to investigate the potential existence of genomic domains in the HIV RNA genome. Using similar methods39, we calculated insulation scores for two strains of the virus in both cells and virions, as depicted in Figure 4A.
In this way, we identified 31, 33 HIV genomic domains in infected cells and 36, 32 domains in virions for NL4-3 and GX2005002 strains, respectively. The high correlation of insulation scores between cells and virions and consistent domain boundaries are observed (Figure 4B), and more importantly, the boundary strengths did not show any significant differences between cells and virions (Figure 4C), suggesting conserved, stable and persistent genomic domains during the analyzed stages.
Interestingly, we found that dimer sites are often located within genomic domains, and overlaying dimer coverage onto domains and insulation score curves revealed a striking concordance between dimer sites and genomic domains (Supplementary Figure S14). These findings suggest an interplay between the dimerization signals and global 3D organization of the HIV genome, providing insights into the complex mechanisms of HIV-1 replication.
5. Dynamic changes in global and long-range interactions throughout HIV-1 life cycle
We noticed extensive long-range interactions between the HIV-1 genome of two strains (NL4-3 and GX2005002) (Figure 1B and Supplementary Figure S4).
To quantitatively analyze these interactions, we utilized DESeq2, following a similar approach as previous studies. Our results showed that both strains exhibit a substantial number of long-range interactions in infected cells, while these interactions are significantly reduced in virions (Figure 5A). Furthermore, histograms of enriched interaction pairs revealed a significant decrease in long-range interactions exceeding 2500nt in virions compared to cells (Supplementary Figure S15A). The Contact probability decay curves (PS curve) also demonstrated a faster decay of genomic interactions in virions compared to cells, for both strains of the HIV-1 virus (Supplementary Figure S15B).
Specifically, we employed viewpoint analysis to examine interactions involving the 5’-UTR. The results, depicted in Figure 5B, reveal multiple peaks of 5’-UTR interactions across the HIV-1 genome. Interestingly, for the majority of these peaks, the signals are lower in virions compared to cells. However, cyclization interactions (5’-UTR-3’-UTR) at the ends are still maintained in virions (Figure 5C). Intriguingly, we also identified multiple interaction peaks located near crucial elements, such as the frameshift element (Supplementary Figure S16A), rev start codon (Supplementary Figure S16B), RRE, and nuclear exporting signal (NES) (Supplementary Figure S16C), indicating complex roles of 5’-UTR in HIV life cycle. Among these interactions, some are strain specific, such as the interaction between 5’-UTR and FSE is specific to the NL4-3 strain, while interaction between 5’-UTR and 8.5K region (near NES) are highly enriched in GX2005002 strain (Supplementary Figure S16C). As mentioned earlier, the majority of interactions decrease in the virion, with the notable exception of interactions in the 5’-UTR and 6K region (near D4 splicing donor site), which significantly increase in the virion. (Supplementary Figure S16B). The enhanced interactions in virions are consistent across both strains.
These results indicate that the HIV genome undergoes systematic remodeling during virus packaging, with a general loss of long-range interactions but maintenance of specific interactions, including genome cyclization. The exploration of long-range interactions and their dynamics has provided us with an unprecedented understanding of the structural organization of the HIV genome. The significance of these interactions and their impact on the HIV life cycle and pathogenesis warrants further investigation in future research.
Discussion
1. HicapR : a reliable and efficient method for exploring structures of viral RNA genomes
In this study, we have introduced the HiCapR protocol, which combines the SPLASH method with the capture of HIV RNA-derived libraries. This integration of RNA extraction, fragmentation, proximity ligation, library construction, and subsequent capture of target RNA involved in interactions enables the investigation of various viral genomic structures. This protocol offers a streamlined and efficient approach for studying the structure and dynamics of complete HIV RNA genomes.
HiCapR uses a similar hybridization-based capture strategy as COMRADES17,19 except that in HiCapR, this capture step occurs after library construction, whereas in COMRADES, it takes place prior to proximity ligation. The post-library-capture principle employed in HiCapR is a well-established approach utilized in other methods such as capture Hi-C or capture-C40–43, and it offers additional flexibility relative to previous protocols. Our results demonstrate remarkable sensitivity and reliability of the HiCapR technique in capturing low-abundance HIV RNA structure, providing an unprecedented resolution and comprehensive approach to elucidate the intricate RNA structures within the HIV-1 genome. This novel proximity ligation method transcends the limitations of traditional techniques, offering a robust framework for dissecting the complex architecture of viral RNA.
2. New insight of HIV RNA local and long-range interactions
The 5’-UTR is crucial for various stages in the HIV life cycle. Our analysis of its structures across different strains and conditions revealed consistent canonical stem-loops and interactions, despite sequence variations and alternative conformations, highlighting the structural robustness? stability and functional significance of the HIV 5’-UTR.
In addition to its local folding, the 5’-UTR engages in extensive and dynamic long-range interactions, notably with the 3’-UTR, suggesting genome cyclization akin to other viruses (Figure 4). This phenomenon, also observed in Zika, influenza, and SARS-COV-215,17,19,22, hints at broader biological implications warranting further exploration. Additionally, the 5’-UTR interacts with key import elements in the HIV genome, such as NES, RRE, and the rev start codon. Given the Rev protein’s roles in RNA transport and nuclear export44, the 5’-UTR may also contribute to these processes.
Previous studies have highlighted folding principles of Zika, SARS-COV-2 coronavirus, and influenza virus, unraveling the intricate mechanisms underlying viral replication, pathogenesis, and host interactions 15,17,20,22,45. One of the key findings from this study is that the HIV-1 genome is organized in a complex three-dimensional structure that facilitates long-range interactions between distant regions of the genome. This study shed light on the dynamics of long-range interactions between the HIV-1 genome of two strains, NL4-3 and GX2005002. These interactions are largely reduced in virions compared to cells, suggesting a critical role in virus assembly and release.
Previous studies has elucidated the folding principles of various viruses like Zika, SARS-COV-2, and influenza, shedding light on viral replication, pathogenesis, and host interactions 15,17,20,22,45. Our study revealed that the HIV-1 genome’s intricate three-dimensional organization enables long-range interactions.
The contrasting loss of long-range interactions in the HIV-1 genome compared to SARS-CoV-2 22 suggests distinct folding processes between these viruses. This disparity underscores the unique characteristics of each virus and their genomic structures. Speculations indicate that the HIV-1 genome may adopt a rod-like structure within viral particles, unlike the spherical compression seen in SARS-COV-2 46. Further investigation is needed to unravel the mechanisms and biological significance of this compression in the HIV-1 genome.
3. Emerging Insights into HIV-1 Dimerization through newly identified dimer sites
Our detailed analysis has consistently identified approximately 20 candidate dimer peaks both within and beyond the 5’-untranslated region (5’-UTR) across the NL4-3 and GX2005002 strains of HIV. Intriguingly, these peaks show a notable enrichment in the vicinity of splicing sites, frequently featuring a sequence motif that bears a strong resemblance to the RNA binding signature of the Serine/Arginine-Rich Splicing Factors (SRSF). This enrichment pattern and the presence of the SRSF-like motif at these sites suggest a potential regulatory role of these dimers in the complex landscape of HIV RNA splicing. The pervasive presence of homodimers flanking almost every splicing site introduces the intriguing hypothesis that these dimers could be actively participating in the regulation of alternative splicing or facilitating the export of unspliced genomic RNA from the nucleus to the cytoplasm. This hypothesis warrants rigorous investigation in subsequent studies to elucidate the mechanistic underpinnings of these observations. Collectively, these findings hint at a significant interplay between RNA dimerization and splicing processes within the HIV genome, which could have profound implications for our understanding of viral gene expression and replication strategies.
Moreover, significant dimer signals were observed in the RRE flanking sequences, adding complexity to its functional role, potentially impacting nucleocytoplasmic transport, Gag binding, and virus packaging processes. Given previous studies highlighting Gag binding to RRE and the crucial role of RRE in anchoring Gag synthesis3,38,47, it is further hypothesized that HIV-1 homodimers may play a role in HIV splicing or assist in transporting unspliced full-length RNA out of the nucleus for genome packaging.
Interestingly, the observation that candidate dimer sites are located within genomic domains suggests that dimerization may be influenced by the local RNA folding environment (Supplementary Figure S12). It is possible that the dense local interaction around these sites may facilitate dimerization by bringing the two regions of the genome into close proximity. Alternatively, dimerization may play a role in shaping the local chromatin environment by promoting even initiating the formation of genomic domains. Further research is needed to fully understand the relationship between RNA genomic domains and dimer sites in the HIV-1 genome. However, the identification of these candidate dimer sites within genomic domains provides a starting point for investigating the role of local genome RNA interactions in HIV-1 replication and dimerization. These findings may have implications for the development of novel antiviral therapies that target the dimerization process and may provide new avenues for future research in this field.
In summary, our study provides a comprehensive analysis of the HIV-1 genome using reliable HicapR, revealing potential dimer sites and long-range interactions, particularly within the 5’-UTR as well as persistent structure domains? These findings significantly advance our understanding of the structure, dynamics and robustness of HIV RNA and their involvement in splicing, assembly and packaging processes, potentially leading to the development of novel antiviral therapies.
Methods
Cell culture and virus infection
The MT4 cell lines (RRID:CVCL_2632) were cultured under specific conditions: RPMI 1640 medium (Gibco, USA) supplemented with 10% fetal calf serum (Gibco, USA), 100 U/ml penicillin, and 100 μg/ml streptomycin. The cells were maintained at a temperature of 37°C with 5% CO2 and saturated humidity. To initiate infection, cells were exposed to HIV-1 NL4-3 (HIV-1 strain of subtype B) or HIV-1 GX2005002 (HIV-1 primary strain of CRF01_AE which is one of the main strain in China, accession: GU564222) at a multiplicity of infection (MOI) of 0.15.. Concurrently, parallel control groups consisting of uninfected cells were also established. Both the cells and the cell supernatant were collected after 48hours post infection for subsequent experiments.
HicapR method
First, extracted crosslinked RNA was treated as in simplified SPLASH protocol 22.
Briefly, 500 ng of each sample was fragmented using RNase III (Ambion) in a 20 μl mixture for 10 minutes at 37°C. The fragmented RNA was then purified using 40 μl of MagicPure RNA Beads (TransGen). Each RNA sample was subsequently divided into two halves, with one half used for proximity ligation and crosslink reversal (C, V samples). The proximity ligation process was then carried out using the following conditions: 200 ng of fragmented RNA, 1 unit/μl RNA ligase 1 (New England Biolabs), 1× RNA ligase buffer, 1mM ATP, 1 unit/μl Superase-in (Invitrogen), with a final volume of 200 μl. The reactions were incubated for 16 hours at 16°C and were stopped by cleaning with the miRNeasy kit (Qiagen). To reverse the crosslinking, the RNA was irradiated on ice with 254 nm UltraViolet C radiation for 5 minutes using a CL-1000 crosslinker (UVP). For the non-ligated controls, crosslink reversal was performed immediately after crosslinking, without proximity ligation (these controls were labeled as "-L").
The proximity ligated RNA was then subjected to library construction using SMARTer Stranded Total RNA-Seq Kit v2 (Clonetech).
The cDNA libraries were enriched for HIV fragment using the TargetSeq One Hyb & Wash kit (igenetech) with the T548XV1 probe panel, designed based on HIV genome. The probe sequences are provided as supplementary data
Data preprocessing and chimeric reads identification
The data preprocessing and identification of chimeric reads were conducted as previously described. The reference genome used for NL4-3 strain was a combination of NL4-3 genome sequence (https://www.ncbi.nlm.nih.gov/nuccore/AF003887) and human spliced mRNAs and noncoding RNAs described in 30, while for GX2005002 strain, the reference genome used was a combination of GX2005002 genome sequence (https://www.ncbi.nlm.nih.gov/nuccore/KP178420) and human spliced mRNAs and noncoding RNAs as above.
First, we used pear to merge the overlapped reads:
pear -e -j 32 -f sample_R2.fastq.gz -r sample_R1.fastq.gz -o Sample.PEAR note that for Clonetech 634413, the sense strand of RNAs is in R2 read. Then, we used fastp to filter reads with low quality and cut adapters:
fastp -i sample.PEAR.fastq -o sample.PEAR.fastp.fastq -h fastp.html -w 4 -a AGATCGGAAGAGCGTCG
And then, chimeric reads are detected using hyb pipeline with default setting: hyb detect in= sample .PEAR.fastp.fastq db=$DB qc=none
Chimeras for RNA-RNA interaction and dimer identification
We identified dimers based on the methodology described in23. Specifically, we utilized the hub pipeline and filtered chimeras in .hyb files. To calculate the degree of overlap between the two arms of chimeric reads, we used the formula L=1+min(e1,e2)-max(s1,s2), where e1 and e2 represent the ends of each arm of chimeric reads, while s1 and s2 represent their starts. To ensure that possible circular RNAs, which may be produced by single RNA end-to-end ligation, are filtered out, we applied the following condition: (e1 < e2) OR (s1 < s2). A schematic diagram of dimer chimeras is provided in Supplementary Figure S1.
We defined the dimer range as the maximum value between s1 and s2, and the minimum value between e1 and e2. Based on this range, we further calculated the coverage of dimer chimeras.
On the other hand, only non-overlapping chimeras (with L<0) were included for RNA-RNA interaction contact matrices to reduce the possibility that the chimeras called for RNA-RNA interactions come from homodimers.
Dimer score and dimer base pairing
The dimer score (DS) was defined as the number of chimeric reads that supported a potential dimer base-pairing event. To visualize each particular candidate dimer, we used the hybrid-min command in the RNA fold package to in-silico hybridize two homodimers. The paired bases were then colored using the aforementioned dimer score.
Local structure folding and MDS plot
Initial local structure folding was performed similarly to a previous report17. Base pairing scores were first calculated using comradesMakeConstraints function in COMRADES package(https://github.com/gkudla/comrades):
comradesMakeConstraints -i sample_rm_overlap.hyb -f Genome_fasta -b 1 -e genome_length
we folded the RNA structures 1000 times using the COMRADES manual. Subsequently, they utilized multidimensional scaling to calculate the distances between RNA structures.
Calculation of base pairing probability for local structure
For a particular cluster in the MDS plot, we construct a base pairing probability matrix to reveal consensus stems in these structures, where the probability of base pairing for each base pair is determined by calculating the proportion of structures that support the base pair among all folded structures.
Dimer validation
RNA preparation
Purified NL4-3 and GX2005002 5’-UTR and extended RRE PCR products (200 ng) were utilized as templates for RNA in vitro transcription with T7 RNA polymerase (Vazyme). The reaction was then incubated at 37 °C for 16 hours, followed by DNase I treatment for 30 minutes at 37 °C. The RNA was subsequently purified using MagicPure RNA Beads (TransGen) through gel purification.
Native agarose gel electrophoresis
RNA (600 ng) was heated to 95°C and then slowly cooled to room temperature in either high salt buffer (50mM Tris-HCl pH 7.5, 140mM KCl, 10mM NaCl) or low salt buffer (high salt buffer diluted 1:10 with water). Samples were loaded with native loading dye (0.17% Bromophenol Blue and 40% (vol/vol) sucrose) on 2% agarose gel prepared with 1× tris-borate magnesium (TBM) buffer (89 mM Tris base, 89 mM boric acid and 2 mM MgCl2) and fractionated at 100 V for 85 min at room temperature.
Agilent 4200 TapeStation Capillary electrophoresis
We prepared the RNA samples as described above and loaded them onto the TapeStation RNA screentape without the heating and denaturing step. Subsequently, we initiated the device for electrophoresis analysis, which ran automatically without the need for manual intervention. After the electrophoresis analysis was completed, the system generated electropherograms and relevant data regarding the RNA samples, including their size distribution and concentration. This method allowed us to quickly and accurately evaluate the conformational features of the RNA.
Biomolecular Binding Kinetics Assays
Biomolecular Binding Kinetics Assays were performed using the Octet R8 Platform (sartorius) following a standardized protocol. The assay involved the preparation of analytes at varying concentrations, real-time monitoring of association and dissociation phases, and subsequent data analysis to determine kinetic parameters kd. Quality control measures were implemented to ensure the reliability and reproducibility of the binding kinetics measurements.
Data availability
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive 48 in National Genomics Data Center 49, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA016024) that are publicly accessible at https://ngdc.cncb.ac.cn/gsa.
Code availability
Custom code for the analysis performed in this study is publicly available via biocode at https://ngdc.cncb.ac.cn/biocode/tools/BT007456
Acknowledgements
This work was funded by a National Key Technologies R&D Program (2018YFA0900801) awarded to Y.Z
The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
References
- 1Seventh patient ‘cured’of HIV: why scientists are excitedNature 632:235–236
- 2HIV-1 assembly, release and maturationNat Rev Microbiol 13:484–496https://doi.org/10.1038/nrmicro3490
- 3HIV-1: To Splice or Not to Splice, That Is the QuestionViruses 13
- 4RNA structure. Structure of the HIV-1 RNA packaging signalScience 348:917–921https://doi.org/10.1126/science.aaa9266
- 5Structure of the 30 kDa HIV-1 RNA dimerization signal by a hybrid Cryo-EM, NMR, and molecular dynamics approachStructure 26:490–498
- 6Architecture and secondary structure of an entire HIV-1 RNA genomeNature 460:711–716
- 7Determination of RNA structural diversity and its role in HIV-1 RNA splicingNature 582:438–442https://doi.org/10.1038/s41586-020-2253-5
- 8Short- and long-range interactions in the HIV-1 5’ UTR regulate genome dimerization and packagingNat Struct Mol Biol 29:306–319https://doi.org/10.1038/s41594-022-00746-2
- 95’-Cap sequestration is an essential determinant of HIV-1 genome packagingProc Natl Acad Sci U S A 118https://doi.org/10.1073/pnas.2112475118
- 10Visualizing the translation and packaging of HIV-1 full-length RNAProceedings of the National Academy of Sciences 117:6145–6155
- 11The HIV-1 Rev response element (RRE) adopts alternative conformations that promote different rates of virus replicationNucleic Acids Res 43:4676–4686https://doi.org/10.1093/nar/gkv313
- 12Rev-RRE Functional Activity Differs Substantially Among Primary HIV-1 IsolatesAIDS Res Hum Retroviruses 32:923–934https://doi.org/10.1089/AID.2016.0047
- 13Alternative RNA structures formed during transcription depend on elongation rate and modify RNA processingMolecular cell 81:1789–1801
- 14Mapping of Influenza Virus RNA-RNA Interactions Reveals a Flexible NetworkCell Rep 31https://doi.org/10.1016/j.celrep.2020.107823
- 15In vitro vRNA-vRNA interactions in the H1N1 influenza A virus genomeMicrobiol Immunol 64:202–209https://doi.org/10.1111/1348-0421.12766
- 16Integrative Analysis of Zika Virus Genome RNA Structure Reveals Critical Determinants of Viral InfectivityCell Host Microbe 24:875–886https://doi.org/10.1016/j.chom.2018.10.011
- 17COMRADES determines in vivo RNA structures and interactionsNat Methods 15:785–788https://doi.org/10.1038/s41592-018-0121-0
- 18Structure mapping of dengue and Zika viruses reveals functional long-range interactionsNature communications 10https://doi.org/10.1038/s41467-019-09391-8
- 19The Short- and Long-Range RNA-RNA Interactome of SARS-CoV-2Mol Cell https://doi.org/10.1016/j.molcel.2020.11.004
- 20The architecture of the SARS-CoV-2 RNA genome inside virionNature communications 12https://doi.org/10.1038/s41467-021-22785-x
- 21Comprehensive mapping of SARS-CoV-2 interactions in vivo reveals functional virus-host interactionsNature communications 12https://doi.org/10.1038/s41467-021-25357-1
- 22In vivo structure and dynamics of the SARS-CoV-2 RNA genomeNature communications 12https://doi.org/10.1038/s41467-021-25999-1
- 23Global mapping of RNA homodimers in living cellsGenome Res https://doi.org/10.1101/gr.275900.121
- 24Classification and clustering of RNA crosslink-ligation data reveal complex structures and homodimersGenome Res 32:968–985https://doi.org/10.1101/gr.275979.121
- 25Capture Hi-C identifies the chromatin interactome of colorectal cancer risk lociNature communications 6
- 26Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-CGenome research 24:1854–1868
- 27Genome-wide mapping of promoter-anchored interactions with close to single-enhancer resolutionGenome Biol 16https://doi.org/10.1186/s13059-015-0727-9
- 28Mapping long-range promoter contacts in human cells with high-resolution capture Hi-CNature genetics 47:598–606
- 29Comparison of susceptibility of HIV-1 variants to antiretroviral drugs by genotypic and recombinant virus phenotypic analysesInternational Journal of Infectious Diseases 37:86–92
- 30Hyb: a bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) dataMethods 65:263–273https://doi.org/10.1016/j.ymeth.2013.10.015
- 31Three-dimensional RNA structure of the major HIV-1 packaging signal regionStructure 21:951–962
- 32The Life-Cycle of the HIV-1 Gag-RNA ComplexViruses 8https://doi.org/10.3390/v8090248
- 33Identification of the initial nucleocapsid recognition element in the HIV-1 RNA packaging signalProc Natl Acad Sci U S A 117:17737–17746https://doi.org/10.1073/pnas.2008519117
- 34Structure of the 30 kDa HIV-1 RNA Dimerization Signal by a Hybrid Cryo-EM, NMR, and Molecular Dynamics ApproachStructure 26:490–498https://doi.org/10.1016/j.str.2018.01.001
- 35The RNA-binding landscapes of two SR proteins reveal unique functions and binding to diverse RNA classesGenome biology 13:1–17
- 36Behind the scenes of HIV-1 replication: Alternative splicing as the dependency factor on the quietVirology 516:176–188https://doi.org/10.1016/j.virol.2018.01.011
- 37Dynamic nanopore long-read sequencing analysis of HIV-1 splicing events during the early steps of infectionRetrovirology 17https://doi.org/10.1186/s12977-020-00533-1
- 38Global changes in the RNA binding specificity of HIV-1 gag regulate virion genesisCell 159:1096–1109https://doi.org/10.1016/j.cell.2014.09.057
- 39Condensin-driven remodelling of X chromosome topology during dosage compensationNature 523:240–244https://doi.org/10.1038/nature14450
- 40Capture Hi-C reveals novel candidate genes and complex long-range interactions with related autoimmune risk lociNature communications 6
- 41Capture Hi-C identifies putative target genes at 33 breast cancer risk lociNature communications 9
- 42Promoter capture Hi-C-based identification of recurrent noncoding mutations in colorectal cancerNature genetics 50
- 43Genome-scale Capture C promoter interactions implicate effector genes at GWAS loci for bone mineral densityNature communications 10
- 44HIV genome-wide protein associations: a review of 30 years of researchMicrobiology and Molecular Biology Reviews 80:679–731
- 45The structure of the influenza A virus genomeNature Microbiology 4:1781–1789https://doi.org/10.1038/s41564-019-0513-7
- 46Molecular Architecture of the SARS-CoV-2 VirusCell 183:730–738https://doi.org/10.1016/j.cell.2020.09.018
- 47Subcellular Localization of HIV-1 gag-pol mRNAs Regulates Sites of Virion AssemblyJ Virol 91https://doi.org/10.1128/JVI.02315-16
- 48The genome sequence archive family: toward explosive data growth and diverse data types. GenomicsProteomics and Bioinformatics 19:578–583
- 49China national center for bioinformation in 2022Nucleic Acids Research 50:D27–D38
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
Copyright
© 2024, Zhang et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 28
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.