circFL-seq reveals full-length circular RNAs with rolling circular reverse transcription and nanopore sequencing

  1. Zelin Liu
  2. Changyu Tao
  3. Shiwei Li
  4. Minghao Du
  5. Yongtai Bai
  6. Xueyan Hu
  7. Yu Li
  8. Jian Chen
  9. Ence Yang  Is a corresponding author
  1. Institute of Systems Biomedicine, Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Key Laboratory for Neuroscience, Ministry of Education/National Health Commission of China , NHC Key Laboratory of Medical Immunology (Peking University), China
  2. Department of Human Anatomy, Histology & Embryology, School of Basic Medical Sciences, Peking University Health Science Center, China
  3. Department of Radiation Medicine, School of Basic Medical Sciences, Peking University Health Science Center, China
  4. Department of Microbiology & Infectious Disease Center, School of Basic Medical Science Peking University Health Science Center, China
  5. Chinese Institute for Brain Research, China

Abstract

Circular RNAs (circRNAs) act through multiple mechanisms via their sequence features to fine-tune gene expression networks. Due to overlapping sequences with linear cognates, identifying internal sequences of circRNAs remains a challenge, which hinders a comprehensive understanding of circRNA functions and mechanisms. Here, based on rolling circular reverse transcription and nanopore sequencing, we developed circFL-seq, a full-length circRNA sequencing method, to profile circRNA at the isoform level. With a customized computational pipeline to directly identify full-length sequences from rolling circular reads, we reconstructed 77,606 high-quality circRNAs from seven human cell lines and two human tissues. circFL-seq benefits from rolling circles and long-read sequencing, and the results showed more than tenfold enrichment of circRNA reads and advantages for both detection and quantification at the isoform level compared to those for short-read RNA sequencing. The concordance of the RT-qPCR and circFL-seq results for the identification of differential alternative splicing suggested wide application prospects for functional studies of internal variants in circRNAs. Moreover, the detection of fusion circRNAs at the omics scale may further expand the application of circFL-seq. Taken together, the accurate identification and quantification of full-length circRNAs make circFL-seq a potential tool for large-scale screening of functional circRNAs.

Introduction

Circular RNAs (circRNAs), a class of covalently closed RNA molecules formed via back-splicing (BS) or lariat precursors, are involved in various biological processes and pathogenesis by fine-tuning the eukaryotic gene regulatory network (Kristensen et al., 2019; Chen, 2020). CircRNAs directly or indirectly regulate target gene expression via diverse mechanisms, which are largely determined by circRNA sequence features. For example, circRNAs may complementarily bind to miRNAs as sponges (Hansen et al., 2013; Memczak et al., 2013); circRNAs may interact with proteins as scaffolds (Du et al., 2017) or structural components (Li et al., 2015) based on sequence motifs; and several circRNAs are also able to translate into peptides (Zhang et al., 2018; Gao et al., 2021 ) through internal ribosome entry sites. Thus, full-length sequences of circRNAs have become the foundation for ascertaining their biological functions in transcriptional plasticity and complexity.

By detecting the back-splicing junctions (BSJs) of circRNAs with deep sequencing, short-read RNA sequencing discriminates circRNAs with low expression (as low as 1% polyadenylated RNA; Salzman et al., 2013) from their linear cognates. The full-length sequences of short circRNAs (<500 nt) can be inferred from a patchwork of BSJs and short fragments via bioinformatic approaches (Zheng et al., 2019; Wu et al., 2019b). However, a full understanding of circRNA isoforms is impossible by using short reads. Single-molecule long-read sequencing has shown methodological advances in identifying circRNAs at the isoform level. Pacific Biosciences (PacBio) sequencing has been applied to PCR products for target full-length circRNA sequences in a low-throughput and high-cost way (You et al., 2015). Very recently, several Oxford Nanopore Technology (ONT)-based methods have also been employed in genome-wide full-length circRNA reconstruction (Rahimi et al., 2019; Xin et al., 2021a; Zhang et al., 2021a).

Here, we developed a high-throughput circRNA sequencing method, termed circFL-seq, with rolling circular reverse transcription (RCRT) (Boss and Arenz, 2020; Das et al., 2019) and nanopore long-read sequencing for the identification of circRNA at the isoform level. With a customized computational pipeline, we identified 77,606 high-quality full-length circRNA isoforms from seven cell lines and two human tissues. We validated circFL-seq for full-length circRNA detection and quantification by comparison to annotated circRNAs, RNA-seq, isoCirc, CIRI-long, and RT-qPCR results. By providing full-length circRNA sequences, circFL-seq allowed the study of sequence features, alternative splicing (AS), and differential expression at the isoform level. In addition, circFL-seq showed the ability to detect fusion circRNAs (f-circRNAs), which were further experimentally validated. Taken together, circFL-seq benefits from RCRT and long-read sequencing and has shown advantages for identifying high-quality full-length circRNAs at a low cost.

Results

Sequencing full-length circRNA with circFL-seq

We developed the circFL-seq approach that utilized RCRT with long-read sequencing for full-length circRNA profiling (Figure 1A). First, circRNA was enriched by rRNA depletion, poly(A) tailing (to increase the efficiency of RNase R Xiao and Wilusz, 2019), and RNase R treatment. Then, first-strand cDNA was synthesized with random primers (P1-N6) by RCRT for circRNA and regular reverse transcription (RT) for linear residuals. After tailing poly(A) of the first strand and synthesizing the second strand with an anchor primer (P2-T24), the double-strand cDNA was amplified with P1 and P2 to construct the final circFL-seq library. To benchmark full-length circRNA sequences of the circFL-seq library, eight known circRNAs were selected for PCR validation in the HEK293T library. Rolling circles of target circRNA (Figure 1B; Figure 1—source data 1) and the full-length sequences obtained with Sanger sequencing (Figure 1—figure supplement 1) indicated that the circRNAs were successfully amplified. Finally, the circFL-seq library was sequenced by long-read sequencing on the ONT platform.

Figure 1 with 1 supplement see all
Diagram of circFL-seq workflow.

(A) Experimental operation of circFL-seq consisted of circRNA enrichment, library construction, and nanopore sequencing. (B) PCR validation of rolling circle products from the circFL-seq cDNA library. The yellow and green lines indicate the positions of the PCR primers. The upward triangle, downward triangle, and circle symbols denote the 0-circle, 1-circle, and 2-circle cDNA products. (C) Computational pipeline of circFL-seq. circFL-seq clean reads were directly used in RG mode or were self-corrected for consensus sequences in cRG mode to reconstruct full-length circRNAs. circRNA, circular RNA.

Figure 1—source data 1

Original figures of gels.

This file includes figures with uncropped gels.

https://cdn.elifesciences.org/articles/69457/elife-69457-fig1-data1-v2.zip

Based on the nature of RCRT products in circFL-seq libraries, full-length sequences of circRNAs were reconstructed and structurally annotated with a customized computational pipeline. For the reference guide (RG) mode, clean reads (qscore≥7) were aligned to the reference genome to identify potential back-spliced junctions (BSJs) by the sign of chiastic overlapping segments (Figure 1C). Then, reads were realigned to pseudo-references, generated by concatenating two sequences of potential BSJ regions, to accurately localize circRNA BSJs. In addition, full-length structures were adjusted with multiple alignments from rolling circular segments to improve the construction. To complement the RG mode for low-quality reads, the cRG mode identified consensus sequences (CS) from clean reads with two or more rolling circles, and triply duplicated CS were used as query sequences in RG mode to improve circRNA detection. The strand origin of circRNAs, especially non-canonical BSJs, was predicted by the primer sequences P1, P2, and P2-T24.

CircRNA profiling of eight libraries

To assess the performance of circFL-seq, we applied circFL-seq to eight libraries of six human cell lines (two replicates from HeLa cells, two replicates from SKOV3 cells, one from MCF7 cells, one from VCaP cells, one from SH-SY5Y cells, and one from HEK293T cells) and produced 30 M clean reads sequenced by using one PromethION Flow Cell (Figure 2—figure supplement 1A). We detected 197,252 isoforms of 162,409 circRNA BSJs from 1.3 M reads that contained full-length circRNA sequences (Figure 2A; Figure 2—figure supplement 1B; Figure 2—figure supplement 2A; Supplementary file 1), 2% of which were additionally detected by cRG mode, which refined low-quality reads with CS (Figure 2—figure supplement 2B-D). Quantification of replicate samples at both the BSJ and isoform levels was highly consistent (Pearson’s r>0.93; Figure 2A and B; Figure 2—figure supplement 3).

Figure 2 with 8 supplements see all
Analysis of full-length circRNA in eight samples.

(A) Stacked bar plot represents the number of full-length circRNA isoforms detected by RG and cRG for six cell lines. (B) Expression correlation matrix for circRNA BSJs and isoforms of all samples. The color scale corresponds to Pearson’s correlation coefficient. (C) Stacked bar plot represents the number of circRNA isoforms with read counts≥5 from known or novel BSJs based on the circRNA database. (D) Boxplot showing the length distribution per isoform for circRNA isoforms with read counts≥5 in all samples. Box lefts or rights are lower or upper quartiles, the bar is the median, and whiskers are the median±1.5×interquartile range. (E) Stacked bar plot showing the fraction of exon numbers per isoform for circRNA isoforms with read counts≥5 in all samples. (F) Boxplot showing the length distribution per exon for circRNA isoforms with read counts≥5 in all samples. Box bottoms or tops are lower or upper quartiles, the bar is the median, and the whiskers are the median±1.5×interquartile range. (G) Diagram of four types of alternative splicing (AS) events in circRNA: exon skipping (ES), alternative 3′ splice site (A3SS), alternative 5′ splice site (A5SS), and intron retention (IR). (H) Plot showing the coverage of full-length circRNA reads in the position of CDR1as for circFL-seq data of six cell lines (replicate data were merged). Structures of the two isoforms of CDR1as are shown at the bottom. (I–K) AS events (one ES and one IR) of circ-TMEM138 detected by circFL-seq (I), agarose gel electrophoresis (J), and Sanger sequencing (K). Red/blue arcs are forward/reverse primers for validation of back-splicing junctions (BSJs) and forward splicing junctions (FSJs). Asterisks denote FSJs. Downward triangles denote BSJs. BSJ, back-splicing junction; circRNA, circular RNA; RG, reference guide.

Based on BSJs and boundary exons identified by circFL-seq, we classified circRNAs into seven types: exonic, intronic, novel splicing site (NSS), intergenic, novel UTR, antisense, and read-through (Figure 2—figure supplement 4). For high-quality circRNA isoforms (read counts≥5), the exonic type accounted for 73.2% of the identified circRNA species, while only 0.1% were from read-through (Figure 2C). Most exonic types (99.8%) could be verified in at least one database (Glažar et al., 2014; Chen et al., 2016; Dong et al., 2018; Liu et al., 2019; Vo et al., 2019; Huang et al., 2021; Figure 2C), which could be attributed to their higher expression levels (Figure 2—figure supplement 5). The median length of each type varied from 402 nt of the intergenic type to 618 nt of read-throughs (Figure 2D), indicating that a notable proportion of circRNAs>500 nt cannot be reconstructed by RNA-seq. Read-through circRNAs contained more exons (Figure 2E), while the exon length of single-exon circRNA was significantly longer than that of multiple-exon circRNA (p<2.2×10–16; Figure 2F).

From 65,656 isoforms of 35,251 high-quality circRNA BSJs (read counts≥5), we identified 23,267 internal AS events (Supplementary file 2), including 44.3% exon skipping (ES), 28.1% alternative 3′ splicing site (A3SS), 24.1% alternative 5′ splicing site (A5SS), and 3.5% intron retention (IR) events (Figure 2G). Specifically, IR events increased circRNA length by 289 nt on average (from 363 to 652 nt), which may influence their cellular localization (Huang et al., 2018). circFL-seq accurately detected the full length of two isoforms (1485 and 1301 nt) generated by the IR event of CDR1as (Hansen et al., 2011; Figure 2H). We further experimentally verified 13 AS events (5 of ES, 3 of A3SS, 1 of A5SS, and 4 of IR) of 23 isoforms from 11 high-quality circRNAs in the HeLa cell line. With divergent primers to amplify both BSJs and alternative spliced forward splicing junctions (FSJs), all but an A3SS event were validated by Sanger sequencing (Figure 2I–K; Figure 2—figure supplements 68).

Comparison with RNA-seq, isoCirc, and CIRI-long for circRNA detection

Based on read-spanning BSJs, short-read sequencing has long been used for genome-wide characterization of circRNAs. Thus, for comparison, eight RNA-seq libraries (150 bp×2) with the same circRNA enrichment method for the same six cell lines (Pearson’s r>0.90 for replicates, Figure 3—figure supplement 1) were generated and sequenced by the Illumina HiSeq X Ten platform. A total of 95,371 circRNA BSJs were detected by CIRI2 (Gao et al., 2018; Supplementary file 3). BSJ reads accounted for 0.15–0.32% of all RNA-seq reads, an amount 10 times lower than that of full-length circRNA reads with circFL-seq (2.2–8.5%). Compared to known BSJs that have been annotated in the database, the proportion of overlapping BSJs identified by RNA-seq seemed to be larger than that identified by circFL-seq (76.7% vs. 40.3%) (Figure 3—figure supplement 2). However, when focusing on high-quality BSJs (read counts≥5), the proportion obtained with circFL-seq (78.9%) was similar to that obtained with RNA-seq, indicating that more reads are required to confidently identify circRNAs by ONT (Figure 3—figure supplement 3). As expected, the full-length construction by RNA-seq was highly dependent on circRNA length (Figure 3—figure supplement 4). Approximately 96.3% of full-length circRNAs reconstructed by RNA-seq were less than 500 nt in length, while 44.4% of circRNA isoforms detected by circFL-seq were more than 500 nt in length.

A recent method, isoCirc, identified full-length circRNAs by rolling circle amplification (RCA) followed by nanopore sequencing. With a single circFL-seq library of the HEK293 cell line sequenced by one MinION Flow Cell, we compared the features of the RCRT strategy and RCA strategy. circFL-seq produced more full-length circRNA reads per 109 raw bases than isoCirc (11,791 vs. 5820) (Figure 3—figure supplement 5A-C) but identified fewer circRNA isoforms (Figure 3—figure supplement 5D). However, 75.3% of circFL-seq-detected BSJs were annotated in the database, while the percentage of isoCirc was 43.8% (Supplementary file 4). The difference might be caused by the higher read coverage for confident circRNAs of circFL-seq (Figure 3—figure supplement 5E) or by the higher sensitivity of circRNA detection of isoCirc. Although the ground truth of all BSJs was unknown, we found that circRNAs with higher read counts were more likely to be included in the database for both methods (Figure 3—figure supplement 5F-I). Thus, we evaluated the precision of the two methods by top expressed circRNAs, which could eliminate the effect of the high noise of nanopore reads and the disturbance of low-expressed circRNAs without annotation in the database. All the top 100 expressed BSJs identified by circFL-seq and RNA-seq (collected from isoCirc study) were annotated in databases. In contrast, 22 of the top 100 BSJs identified by isoCirc were not supported in the databases or the RNA-seq and circFL-seq results (Figure 3—figure supplement 5M). Specifically, two products among the 22 BSJs were from histone genes with linear RNAs resistant to RNase R degradation (Xiao and Wilusz, 2019). We also failed to experimentally validate the two potential circular products with divergent primers. At the isoform level in the HEK293 cell line, 87.0% of the common BSJs (10,702/12,307) between circFL-seq and isoCirc identified at least one identical isoform. Inconsistent isoforms usually had lower read counts with both circFL-seq and isoCirc (Figure 3—figure supplement 5N,O), suggesting that accurate construction of full-length circRNA requires higher read coverage. In addition, circFL-seq identified more than two times ES, A3SS, and A5SS but similar numbers of IR events to isoCirc (Supplementary file 4), which may be due to the more full-length reads of circFL-seq.

We then compared circFL-seq to isoCirc in two normal human tissues (brain and testis) with high sequencing depth. isoCirc built multiple libraries for each sample and sequenced on multiple MinION Flow Cells to obtain a high sequencing depth, while circFL-seq built one library for each tissue and sequenced the two libraries on one PromethION Flow Cell to obtain a comparable sequencing depth. isoCirc and circFL-seq detected 79,312 and 34,046 known BSJs, respectively. Known BSJs detected by isoCirc were more likely (38,846 vs. 2511) to be expressed at low levels (read counts=1), indicating that more libraries were more sensitive for detecting circRNAs, overcoming bias and artifacts related to sampling, RNase R treatment, cDNA synthesis, and amplification.

Both circFL-seq and CIRI-long employed an RCRT strategy for library construction. Although the two methods were applied in different species (human vs. mouse), the computational pipelines were compatible and could be compared with each other. We employed the computational pipeline of circFL-seq and CIRI-long to analyze HEK293 cells library of circFL-seq and a mouse brain library of CIRI-long, respectively. circFL-seq detected 27,869 BSJs from HEK293 cells (75.3% known in the database) and 18,396 BSJs from the mouse brain (68.6% known in the database), while CIRI-long detected 15,242 BSJs from HEK293 cells (76.9% known in the database) and 9258 BSJs from the mouse brain (69.9% known in the database) (Supplementary file 5). Thus, the computational pipeline of circFL-seq was more sensitive with similar precision for BSJ detection. For AS events, circFL-seq identified more than 4.5 times the ES events, 12.4 times the A3SS events, 8.5 times the A5SS events, and 1.5 times the IR events, suggesting a better sensitivity of circFL-seq in AS event detection, although ground-truth circRNA isoforms were required for performance evaluation (Supplementary file 5).

Evaluation of quantification of full-length circRNAs

With the benefit of high read coverage, circFL-seq is able to quantify circRNA expression levels. For BSJs identified by circFL-seq and RNA-seq, the amounts between the two methods were significantly correlated (Pearson’s r=0.41–0.68, Figure 3A; Figure 3—figure supplement 6). We then detected differentially expressed circRNAs (DECs) between HeLa and SKOV3 cell lines as a test. For the highly expressed BSJs (read counts≥10 in at least two samples and detected by both methods), the fold changes in HeLa to SKOV3 cells were highly concordant with the RNA-seq results (Pearson’s r=0.78, Figure 3B). By using DESeq2 (Love et al., 2014), circFL-seq detected 89 DECs, 58 of which were also identified by RNA-seq. We next selected 16 circRNAs with a wide range of fold changes (7 downregulated, 5 stable, and 4 upregulated in the HeLa cell line) to validate the DECs via RT-qPCR. The consistent circFL-seq and RT-qPCR results for both total RNA (Figure 3—figure supplement 7A-C) and RNase R-treated (Figure 3C and D) samples further supported the capabilities of BSJ quantification.

Figure 3 with 7 supplements see all
Quantification of circRNA at the BSJ and isoform levels.

(A) Expression correlation matrix of circRNA BSJ quantified by circFL-seq and RNA-seq for six cell lines. The numbers in the matrix represent Pearson’s correlation coefficients. (B) Comparison of differentially expressed circRNA (DEC) detection between circFL-seq and RNA-seq. Top panel: Venn diagram showing the number of DECs detected by circFL-seq (green), RNA-seq (purple), and both methods (orange). Bottom panel: scatter plot showing the correlation of fold change (log base 2) for HeLa and SKOV3 cells between circFL-seq and RNA-seq. (C) Scatter plot showing the correlation of the expression levels of 16 circRNA BSJs for HeLa (left) and SKOV3 (right) cells between circFL-seq and RT-qPCR. (D) Scatter plot showing the correlation of fold changes (log base 2) of the 16 BSJs for HeLa and SKOV3 cells between circFL-seq and RT-qPCR. (E) Plot showing the adjusted coverage of full-length circRNA reads and RNA-seq reads in the position of circRNA from PLOD2. The circular structures of the two circRNA isoforms are shown in the lower panel. (F) Scatter plot showing the correlation of the transcript ratio of 18 circRNA isoforms from nine circRNA BSJs (each BSJ has two isoforms) for HeLa (left) and SKOV3 (right) cells between circFL-seq and RT-qPCR. The relative expression of target BSJs/isoforms quantified by RT-qPCR was determined with RNase R-treated samples and GAPDH from total RNA without RNase R treatment as a reference. (G) Scatter plot showing the correlation of the differential ratio (∆ratio) of the 18 isoforms for HeLa and SKOV3 cells between circFL-seq and RT-qPCR. The shaded areas denote 95 % confidence intervals. BSJ, back-splicing junction; circRNA, circular RNA.

We next evaluated the quantification performance of circFL-seq for circRNA AS. From 87 BSJs that had at least two isoforms detected by both HeLa and SKOV3 cell lines, 193 isoforms were used to quantify the internal variants by the ratio of transcripts (target isoforms to total isoforms from the same BSJ). A total of 90 transcripts had more than 0.1 ratio differences between HeLa and SKOV3 cell lines. For example, circ-PLOD2 showed a higher ratio of transcripts without ES in the HeLa cell line (0.83 to 0.45) (Figure 3E). Although RNA-seq data also detected differential read coverage at the skipped exon of PLOD2, short reads were unable to discriminate different sources of ES events, that is, linear or circular isoforms. We selected 18 isoforms of 9 circRNA BSJs from a wide range of ratio differences to perform experimental validation. Both the transcript ratios and ratio differences were consistent with the RT-qPCR results (Figure 3F and G; Figure 3—figure supplement 7D-F), supporting the advantage of circFL-seq for quantification at the isoform level.

circFL-seq reveals fusion circRNAs with long-read sequencing

F-circRNA is a specific type of circRNA containing two fusion junctions (one junction may come from gene fusion and the other may be produced by BS) and has been found to play important roles in cancer (Guarnerio et al., 2016). Although short reads of RNA-seq may detect both fusion junctions separately, they usually fail to detect f-circRNA for the short read that is not able to combine two fusions in one fragment. circFL-seq was able to detect f-circRNA with rolling circular reads separately mapped to two loci in different chromosomes or a >1 Mbp distance in the same chromosome (Figure 4A). With circFL-seq data, we identified six high-quality f-circRNA isoforms (read count≥5) affiliated with two fusion genes in the MCF7 cell line. For the five isoforms fused by GBF1 and MACROD2 (Figure 4B), we selected two major isoforms (circ_290 nt for 35% and circ_408 nt for 26%) to perform full-length validation. As expected, more than one product was observed (Figure 4C), and the rolling circles of full-length sequences were validated by Sanger sequencing (Figure 4D, Figure 4—figure supplement 1). For comparison, we also used linear RNA from the MCF7 cell line and RNase R-treated RNA from the HeLa cell line as templates. Intriguingly, we validated the fusion junction of E2–E4 with linear RNA as a template (Figure 4C), suggesting the existence of linear RNA production from the fusion gene. The decreased expression of junctions E2–E4 after RNase R treatment also supported the linear production, which further suggested that the fusion direction is GBF1 to MACROD2 (Figure 4E). In addition, the f-circRNA fused by two antisense genes (PRICKLE2-AS1 and PTPRT-AS1) was experimentally validated (Figure 4F; Figure 4—figure supplement 2). All the major fusion junctions of GBF1/MACROD2 and E3-E1 of PRICKLE2-AS1/PTPRT-AS1 were validated by RNA-seq of the MCF7 cell line (Figure 4G).

Figure 4 with 2 supplements see all
Detection and validation of fusion circRNA (f-circRNA) in the MCF7 cell line.

(A) Diagram of identification of f-circRNA with circFL-seq data. (B) Diagram of five high-quality f-circRNA isoforms (read counts≥5) fused by GBF1 and MACROD2. The transcript ratio represents the fractions of the isoforms. (C–E) Validation of f-circRNA junctions from GBF1/MACROD2 by agarose gel electrophoresis (C), Sanger sequencing (D), and RT-qPCR (E). (C) Agarose gel electrophoresis showing the RT-PCR products of f-circRNA junctions with RNase R-treated MCF7 and HeLa RNA and poly(A) selected MCF7 RNA as a template. (F) Agarose gel electrophoresis showing the RT-PCR products of f-circRNA junctions from PRICKLE2-AS1/PTPRT-AS1. (G) Information on five f-circRNA junctions detected by circFL-seq, RNA-seq, and RT-qPCR.

Discussion

In this study, we established a full-length circRNA sequencing method, circFL-seq, by nanopore sequencing. Compared to short-read sequencing, which is limited to reconstructing short circRNAs (<500 nt), long-read sequencing with circFL-seq has shown advances in comprehensively identifying full-length circRNAs of all sizes (64–2334 nt in our data and more than 40% identified isoforms >500 nt). Very recently, isoCirc (Xin et al., 2021a) and CIRI-long (Zhang et al., 2021a) also employed nanopore sequencing to identify full-length circRNAs by utilizing rolling circles. circFL-seq and CIRI-long employ the RCRT strategy, while isoCirc employs the RCA strategy. Benefiting from RCA with a circular cDNA template from ligation of the RT product, isoCirc produces longer reads (up to 50 kb) containing more rolling circles than RCRT-based methods (ca. 1 kb) and thus has an advantage in error correction during CS identification. As a trade-off, the cost of each full-length read is much higher in isoCirc than in circFL-seq and CIRI-long. In addition, the unique ligation of cDNA in isoCirc may introduce false positives for circRNA detection by ligating cDNA from residual linear RNA or truncated cDNA of circRNA, which is hardly totally recognized and removed by computational analysis. In contrast to isoCirc, both the circFL-seq and CIRI-long-based methods employ RCRT to produce rolling circles during first-strand synthesis. However, circFL-seq synthesizes second-strand cDNA with an anchor primer, while CIRI-long uses template switching. Thus, these two methods are more resistant to the interference of residual linear RNA and generate shorter full-length reads of circRNA than isoCirc. Although decreasing the sequencing cost of nanopore, shorter full-length reads have to perform error correction among reads in addition to intra reads. Thus, RCRT-based methods are not as sensitive as isoCirc with the same number of full-length reads. The computational pipelines of circFL-seq and CIRI-long are compatible given their very similar experimental protocols. Both of these methods calculate CS from tandem repeats of reads to refine low-quality sequences, followed by localization of CS. However, CS may ignore reads that are less than two full circles. In contrast, the RG mode of circFL-seq identified 56% more full-length circRNAs with fewer than two circles in our data, which helps to improve the sensitivity of circRNA detection. In addition, circFL-seq leverages primer sequences to adjust the strand origin of circRNA, especially for BSJ with a non-canonical splicing motif. For details, we summarize the three methods in Supplementary file 6.

Previous short-read RNA sequencing was used to quantify circRNAs based on BSJ-spanning reads, while circFL-seq with full-length reads was employed to finely quantify circRNAs at the isoform level. Thus, circFL-seq is able to identify differential internal AS events, helping to unravel the function of sequence variants. In our comparisons, circFL-seq was more sensitive than isoCirc and CIRI-long for AS detection, although the performance required full evaluation by ground-truth circRNA isoforms. Another novel application of full-length circRNA sequencing is to identify f-circRNA, which has been found to play important roles in cancer pathogenesis (Guarnerio et al., 2016) and has become a promising biomarker for liquid biopsy (Tan et al., 2018a). Current methods (Guarnerio et al., 2016; Tan et al., 2018a; Tan et al., 2018b; Wu et al., 2019a) validated the proposed f-circRNA by designing divergent primers on known gene fusion junctions and thus were inefficient and possibly ignored f-circRNA from unknown gene fusions. With cancer cell line data, circFL-seq showed its ability to high-throughput identify f-circRNA, which will promote our knowledge of f-circRNA.

Overall, circFL-seq is able to identify and quantify circRNAs at the isoform level and is more sensitive in circRNA AS event detection and f-circRNA detection. In addition, circFL-seq has been applied in a large-scale sequencer, PromethION, with pooling multiple libraries, which will further decrease the sequencing cost to twice that of RNA-seq. Thus, our developed circFL-seq process is an effective and affordable high-throughput full-length circRNA sequencing method for screening functional circRNAs at the omics scale.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Cell line (Homo sapiens)HeLaJiadong Wang LaboratoryRRID:CVCL_0030
Cell line (H. sapiens)SKOV3Jiadong Wang LaboratoryRRID:CVCL_0532
Cell line (H. sapiens)MCF7Jiadong Wang LaboratoryRRID:CVCL_0031
Cell line (H. sapiens)HEK293TJiadong Wang LaboratoryRRID:CVCL_0063
Cell line (H. sapiens)SH-SY5YJian Chen LaboratoryRRID:CVCL_0019
Cell line (H. sapiens)VCaPiCell BioscienceRRID:CVCL_2235
Cell line (H. sapiens)HEK293iCell BioscienceRRID:CVCL_0045
Commercial assay or kitTotal RNA of human brainClontechCat. #: 636530
Commercial assay or kitTotal RNA of human testisClontechCat. #: 636533

Cell culture and RNA isolation

Request a detailed protocol

The human cell lines HeLa, SKOV3, MCF7, VCaP, SH-SY5Y, HEK293T, and HEK293 were used in this study. HeLa, SKOV3, MCF7, and HEK293T cell lines were obtained from Jiadong Wang Laboratory (Xu et al., 2020). SH-SY5Y cell line was obtained from Jian Chen Laboratory. VCaP and HEK293 cell lines were purchased from iCell Bioscience. Cell lines were cultured in DMEM (Invitrogen) supplemented with 10% fetal bovine serum (YEASEN) and 1% penicillin/streptomycin (Solarbio) at 37°C with 5% CO2. Cell lines were collected at 80–90% confluency. All cell lines were authenticated with STR profiling and tested negative for mycoplasma contamination. Total RNA was extracted by the FastPure Cell/Tissue Total RNA Isolation Kit (Vazyme) according to the manufacturer’s instructions. Total RNA from the human brain (#636530) and testis (#636533) was purchased from Clontech.

circRNA enrichment for library construction of RNA-seq and circFL-seq

rRNA depletion

Request a detailed protocol

Similar to a previous method (Morlan et al., 2012), nonoverlapping synthetic DNA probes of complementary sequences (Supplementary file 7) of 18S and 28S rRNA at a final concentration of 1 μM as well as 5S and 5.8S rRNA, 12S and 16S mtrRNA, ETS, and ITS at a final concentration of 0.1 μM were pooled. About 1 μl of DNA probe was mixed with 2 μg of total RNA and 2 μl of hybridization buffer (750 mM Tris-HCl, 750 mM NaCl) at a final reaction volume of 15 μl. The mixture was heated to 95°C for 2 min, slowly cooled to 22°C (–0.1°C/s), and incubated for an additional 5 min at 22°C. Also, 2 μl of thermostable RNase H (NEB) was added along with 2 μl of 10× RNase H buffer at a final reaction volume of 20 μl, incubated at 50°C for 30 min, and placed on ice. DNA probes were removed by 2.5 μl DNase I (NEB) with 3 μl 10× DNase I buffer at a final volume of 30 μl, incubated at 37°C for 30 min, and placed on ice.

Poly(A) tailing

Request a detailed protocol

The reaction above was then mixed with 4 μl ATP (10 mM), 4 μl 10× Poly(A) Polymerase Reaction Buffer and 1 μl Poly(A) Polymerase (NEB) for poly(A) tailing and incubated at 37°C for 10 min. Purified RNA was isolated by 2.5× RNA Clean Beads (Vazyme) according to the manufacturer’s instructions.

RNase R treatment

Request a detailed protocol

For circFL-seq, RNA was incubated at 37°C for 30 min and 70°C for 10 min in a 10 μl reaction that contained 1 U RNase R (Lucigen) and 1 μl 10× RNase R buffer. For RNA-seq, RNA was incubated under the same conditions but in a 20 μl reaction containing 2 U RNase R and 2 μl 10× RNase R buffer.

Full-length circRNA cDNA preparation for circFL-seq

Reverse transcription

Request a detailed protocol

After enrichment of circRNAs, collected RNAs were reverse transcribed into first cDNA strands in a 20 μl reaction by P1-N6 (5′-GTCGACGGCGCGCCGGATCCATANNNNNN-3′) with HiScript III reverse transcriptase (Vazyme) for 5 min at 25°C, 50 min at 50°C, 2 min at 70°C, and 5 s at 85°C, followed by purification with 0.75× DNA Clean Beads (Vazyme) according to the manufacturer’s instructions.

Poly(A) tailing

Request a detailed protocol

Then, poly(A) tails were added at the 3′ ends in a 20 μl reaction by terminal deoxynucleotidyl transferase (Invitrogen) with final dATP and ddATP concentrations of 2.5 mM and 25 μM, respectively, followed by purification with 0.75× DNA Clean Beads.

Second-strand synthesis

Request a detailed protocol

Next, second-strand cDNAs were synthesized using P2-T24 (5′-ATATCTCGAGGGCGCGCCGGATCCTTTTTTTTTTTTTTTTTTTTTTTT-3′) by I-5 High-Fidelity DNA polymerase (MCLAB) at 98°C for 2 min, 50°C for 2 min, and 72°C for 5 min.

Amplification

Request a detailed protocol

Then, cDNAs were equally split into four 50 μl PCRs with primers P1 (5′-GTCGACGGCGCGCCGGATCCATA-3′) and P2 (5′-ATATCTCGAGGGCGCGCCGGATCC-3′) and amplified by 20 cycles of 98°C for 10 s, 67°C for 15 s, and 72°C for 75 s, followed by 0.5× DNA Clean Bead purification. Approximately 10–50 ng purified cDNAs were further amplified by 8–10 cycles in a set of four 50 μl PCRs with P1 and P2, followed by purification with 0.5× DNA Clean Beads.

Nanopore library construction and sequencing

View detailed protocol

A DNA library for barcoding and ligation sequencing was prepared following protocols EXP-NBD104 and SQK-LSK109. Briefly, 0.5–1 μg of circFL-seq cDNA was repaired and dA-tailed by NEBNext FFPE DNA Repair Mix (NEB) and NEBNext Ultra II End repair/dA-tailing Module (NEB), followed by purification with DNA Clean Beads. For multiplexing, repaired and end-prepped DNA was barcoded with Native Barcode by NEB Blunt/TA Ligase Master Mix (NEB), followed by purification with DNA Clean Beads. Barcoded samples were pooled together in equimolar amounts. Single samples without barcodes or pooled barcoded samples (700 ng) were ligated to an ONT Adapter with NEBNext Quick Ligation Module (NEB), followed by purification with DNA Clean Beads. The DNA library was mixed with sequencing buffer, and beads were loaded onto a PromethION or MinION R9.4 Flow Cell and run on a PromethION (performed by Grandomics) or MinION sequencer, respectively.

cDNA library preparation and sequencing of RNA-seq

Request a detailed protocol

After circRNA enrichment, RNA was isolated by 2.5× RNA Clean Beads (Vazyme). cDNA libraries were constructed and barcoded by the VAHTS Universal V6 RNA-seq Library Prep Kit for Illumina (Vazyme) and VAHTS RNA Adapters set1/set2 for Illumina (Vazyme) according to the manufacturer’s instructions for the strand-specific rRNA depletion library. The cDNA library was fragmented, and an insert size of approximately 300 bp was selected. Eight libraries were pooled together and sequenced on the Illumina HiSeq X Ten platform of Annoroad Gene Technology with a paired-end read length of 150 bp.

Computational analysis of full-length circRNAs with circFL-seq

Raw sequenced data in fast5 format files were transformed to FASTQ format files (kept reads with qscore≥7.0) and were demultiplexed by guppy (4.2.2). We employed porechop (v0.2.4) to trim barcode and circFL-seq primers (P1 and P2) and split chimeric reads to obtain clean reads for each sample. Then, RG mode (https://github.com/yangence/circfull, Liu et al., 2021) was employed to detect circRNAs with clean reads. cRG mode was used to correct circFL-seq reads by de novo self-correction to calculate CS, and RG mode was rerun with a query sequence of three copies of CS. With primer sequences at both ends of sequenced reads, circFL-seq detected the strand origin of reads and adjusted circRNA results with strand information. Finally, we filtered out low-quality full-length circRNAs after integrating circRNA results.

RG mode

Request a detailed protocol

Clean reads were mapped to the human genome (hg19) by minimap2 (Li, 2018) (v2.12) with the parameters ‘-ax splice -p 0.5’ or ‘-ax splice -p 0.5 -uf’ for transcript-strand reads. Aligned reads with chiastic overlapping segments were recognized as candidate circRNA reads (CCRs). CCRs were classified into three types: normal, fusion on the same chromosome, and fusion on different chromosomes. The boundaries of the chiastic segment of the CCRs were detected as potential BSJs. FSJs were determined according to the skipped region from the reference. For each read, a pseudo reference sequence was generated by concatenating two sequences from 150 nt upstream to 150 nt downstream of the BSJ region. Then, CCRs were realigned against the pseudo references, and accurate sites of BSJs were determined by integrating multiple aligned BSJs, gene annotation (GENCODE v19), and canonical splicing motifs (GT/AG). FSJs of CCRs were also corrected based on the integration of multiple aligned FSJs, gene annotation, and FSJs from other CCRs with the same BSJ. Next, the full-length circRNA was constructed based on the FSJ(s) and BSJ(s). Incorrect construction of full-length circRNAs caused by mistaken alignment against tandem repeat sequences of the reference genome was identified by TideHunter (v1.0) with parameters ‘-f 2 –c 1.2 l’, and these circRNAs were filtered out.

The de novo self-correction

Request a detailed protocol

The CS of clean reads was detected by TideHunter (v1.0) with the parameters ‘-f 2 c 1.5 p 30 l’. Following evaluation by Tandem Repeats Finder (v4.09) with parameters ‘2 5 7 80 5 5 2000 h -ngs’, CS was removed if containing internal tandem repeats, defined by an alignment score >40 or length of internal tandem repeats longer than half of the CS.

cRG mode

Request a detailed protocol

To locate the genomic region of CS, pseudo query sequences from three combined monomers of a CS were created. The position of full-length circRNA was detected with RG mode.

Identification of the strand origin of clean reads

Request a detailed protocol

CCRs with FSJs detected were first selected to determine the strand origin of sequenced reads, that is, first or second strand, based on the GT/AG motif of the FSJ and mapped strand. For each 100 nt flanking end of the raw read, the maximum identical number of bases to primers P1, P2, T24, and their reverse complementary sequences was calculated by the Smith-Waterman algorithm. With these numbers of identical bases as predictor variables and strand origin as the target variable, a random forest classifier was trained on 75% selected CCRs to predict the strand origin of all clean reads. The performance of the classifier was evaluated on the remaining 25% CCRs by accuracy and AUC value for each sample (Supplementary file 8). CircRNA with ambiguous strand direction was adjusted by rerunning RG mode with stranded reads.

Filtration of low quality circRNA

Request a detailed protocol

After RG detection supplemented with cRG detection, low-quality circRNAs with an unsplicing ratio of BSJ<0.1 were filtered out. CircRNA isoforms with reliable BSJs and FSJs were retained if more than half of the circRNA reads were perfectly aligned on ±4 bp of junctions. CircRNA BSJs were filtered out if both BSJs were located in ±30 bp of repeat elements.

Reverse transcription for PCR and RT-qPCR validation

Request a detailed protocol

For PCR validation, sample RNA was reverse transcribed to cDNA products for 5 min at 25°C, 50 min at 50°C, 2 min at 70°C, and 5 s at 85°C with random hexamers by HiScript III. For RT-qPCR, sample RNA was reverse transcribed to cDNA products for 15 min at 37°C and 5 s at 85°C with random hexamers and oligo(T) according to the manufacturer’s instructions.

Validation of full-length sequences of circRNA

Request a detailed protocol

A pair of divergent PCR primers was designed and included 6–8 additional G bases at the 5′ end (Supplementary file 9) to reduce the accumulation of short PCR products. With cDNA products of the RT or circFL-seq library as template, PCR amplification in a 25 μl reaction of Takara Ex Taq Hot Start was carried out by the following program: 98°C for 10 s followed by 3 cycles of 98°C for 10 s, 60°C for 30 s, and 72°C for 90 s, then 30 cycles of 98°C for 10 s and 72°C for 90 s, with a final extension at 72°C for 60 s. The PCR products were analyzed on 1.2% agarose gels (TSINGKE), and the rolling circle bands were cut out and extracted, followed by TA cloning to a pEASY-T1 cloning vector (TransGen). Then, the clones with inserts were Sanger sequenced.

Validation of AS in circRNAs

Request a detailed protocol

Total RNA of the HeLa cell line (1 μg) was treated with RNase R (4 U) in a 10 μl reaction and then reverse transcribed to the cDNA products. For each circRNA isoform from the same BSJ, divergent primers (Supplementary file 10) targeting but not across splicing junctions were used to validate the AS junction site in a PCR volume of 25 μl. The PCR products were analyzed on 1.2% agarose gels, and the target bands were cut out and extracted, followed by TA cloning and Sanger sequencing.

Quantification of circRNA expression by RT-qPCR

Request a detailed protocol

Total RNA of HeLa and SKOV3 cell lines (1 μg) w/wo RNase R treatment (4 U) was reverse transcribed for RT-qPCR. For specific circRNAs, primers (Supplementary file 11) across a BSJ with cDNA as template in a 20 μl reaction were set according to the manufacturer’s instructions for ChamQ Universal SYBR qPCR Master Mix (Vazyme). For specific circRNA isoforms, we designed primers (Supplementary file 12) for both specific BSJs and alternative FSJs in the reaction. For the quantification of internal AS events, two major circular isoforms 1 and 2 from the same BSJ were selected for the evaluation of the transcript ratio with the following formula: 2CTisoform1,22CTisoform1+2CTisoform2 . Primers targeting GAPDH as a reference gene were used. Thermal cycling was carried out on an Applied Biosystems 7500 Fast system at 95°C for 5 min, followed by 40 cycles of 95°C for 10 s and 60°C for 24 s.

Validation of f-circRNA

Request a detailed protocol

Total RNA of the MCF7 cell line (1 μg) w/wo RNase R treatment (4 U) was reverse transcribed for PCR. We validated full-length f-circRNA with primers (Supplementary file 13) for the two fusion junctions by RCRT and Sanger sequencing. Linear RNA was isolated to determine the origin of fusion junctions from BS or gene fusion. Because of the uncertainty of the poly(A) tail in linear RNA, total RNA was first treated with poly(A) tailing (NEB), and then linear RNA was selected by the poly(A) mRNA magnetic isolation module (NEB) according to the manufacturer’s instructions. PCR was performed with the same primers of both fusion junctions of f-circRNA. For quantification of the fusion junction of f-circRNA, total RNA of the MCF7 cell line (1 μg) w/wo RNase R treatment (4 U) was reverse transcribed for RT-qPCR. Primers (Supplementary file 14) of target fusion junctions were designed for RT-qPCR.

CircRNA analysis from RNA-seq

Request a detailed protocol

RNA-seq data were aligned to the human reference genome (hg19) by BWA (Li and Durbin, 2009) (v0.7.17-r1188). CircRNA BSJs were detected and quantified by CIRI2 with gene annotation (GENCODE v19). Full-length circRNA structures were constructed by CIRI-AS (Gao et al., 2016), CIRI-full (Zheng et al., 2019), and CIRI-vis (Zheng and Zhao, 2020). For fusion junctions, we directly searched the ±10 bp junction site in RNA-seq reads.

Analysis of isoCirc and CIRI-long data, isoCirc nanopore sequencing data were downloaded from the Sequence Read Archive (SRA: SRP235284). CircRNAs were analyzed with the isoCirc computational pipeline (https://github.com/Xinglab/isoCirc, Xin et al., 2021b) with the human reference genome (hg19) and gene annotation (GENCODE v19). CIRI-long nanopore sequencing data were downloaded from Genome Sequence Archive (GSA: CRA003317). CircRNAs were analyzed with the CIRI-long computational pipeline (https://github.com/bioinfo-biols/CIRI-long, Zhang et al., 2021b) with the mouse reference genome (mm10) and gene annotation (GENCODE vM23).

Identification of differentially expressed circRNA

Request a detailed protocol

CircRNA BSJs with at least 10 read counts in two or more RNA-seq samples and two or more circFL-seq samples of either the HeLa or SKOV3 cell line were kept to identify differentially expressed BSJs. DESeq2 with ‘mean’ fitType was employed to analyze differential expression between HeLa and SKOV3 cells. Differential BSJs at an FDR<0.05 were recognized as DECs.

Data availability

Request a detailed protocol

The circFL-seq and RNA-seq data produced by this study have been deposited in SRA (PRJNA722575). Information on circRNAs detected by circFL-seq is available in the figshare repository (https://doi.org/10.6084/m9.figshare.14265650.v1). The computational software of circFL-seq can be accessed from https://github.com/yangence/circfull, (Liu et al., 2021) copy archived at swh:1:rev:2e8be5c116e9c226661c63e35223f2120272ccfc.

Data availability

The circFL-seq and RNA-seq data produced by this study have been deposited in SRA (PRJNA722575). The information of circRNAs detected by circFL-seq is available in the figshare repository (https://doi.org/10.6084/m9.figshare.14265650.v1). The computational software circfull can be accessed from https://github.com/yangence/circfull (copy archived at https://archive.softwareheritage.org/swh:1:rev:2e8be5c116e9c226661c63e35223f2120272ccfc).

The following data sets were generated
    1. Liu ZL
    (2021) NCBI BioProject
    ID PRJNA722575. circFL-seq, a full-length circRNA sequencing method.
The following previously published data sets were used
    1. Xin R
    2. Gao Y
    3. Gao Y
    4. Wang R
    5. Kadash-Edmondson KE
    6. Xing Y
    (2021) NCBI Gene Expression Omnibus
    ID GSE141693. isoCirc catalogs full-length circular RNA isoforms in human transcriptomes.
    1. Zhang J
    2. Hou L
    3. Zuo Z
    4. Ji P
    5. Zhang X
    6. Xue Y
    7. Zhao F
    (2021) NGDC Genome Sequence Archive
    ID CRA003317. Comprehensive profiling of circular RNAs with nanopore sequencing and CIRI-long.

References

Decision letter

  1. Gene W Yeo
    Reviewing Editor; University of California, San Diego, United States
  2. James L Manley
    Senior Editor; Columbia University, United States
  3. Fangqing Zhao
    Reviewer; Beijing Institute of Life Science, Chinese Academy of Sciences, China

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Acceptance summary:

Yang and colleagues report circFL-seq a method for sequencing full length circular RNAs with rolling circle RT and nanopore sequencing. While two other methods have recently been published, this manuscript does add to a growing literature on long read sequencing of circular RNAs.

Decision letter after peer review:

Thank you for submitting your article "circFL-seq reveals full-length circular RNAs with rolling circular reverse transcription and nanopore sequencing" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and James Manley as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Fangqing Zhao (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1. Reviewers all agree that a major weakness of the present manuscript is with the comparison to existing methods. The authors should compare circFL-seq to CIRI-long, and reviewers agree that a three way comparison to isoCirc is informative. The authors should accompany their comparisons with a discussion about strengths and weaknesses of each of the methods.

2. The current claims and conclusions that circFL-seq is the superior method is not well supported by their data in this version of the manuscript. The authors should provide fair critiques of the other method (see comments about ligation). The authors need to point out potential limitations. For example, Is CircFL-seq biased to detecting only highly expressed transcripts?

3. In comparison to isoCirc, circFL-seq identified fewer circRNA isoforms with higher read coverage of the detected circRNAs, which may be a PCR artifact. The authors need to address this concern. In addition, in two published studies, isoCirc and CIRI-long have also used nanopore sequencing to characterize circRNA isoforms and alternative splicing events. However, both studies have reported a relatively higher percentage of retained introns (isoCirc: Figure 4b, CIRI-long: Supplementary Figure 13) compared to the number of 3.5% of intron retention events in line 139. The authors should clarify the reason behind this difference.

4. The authors found six f-circ derived from GBF1-MACROD2 fusion and validated their junctions using Sanger sequencing. Besides, the authors also used short-read RNA-seq to validate the linear fusion junctions. What's the ratio of linear and circular transcript derived from these gene fusion loci? Is there any possibility that these f-circRNAs are derived from trans-splicing events? Considering that short-read RNA-seq data cannot effectively distinguish circular and linear transcripts, the authors may try to search for nanopore reads spanning the fusion region, which can provide direct evidence for these gene fusion events.

5. Because of the high error rate of nanopore sequencing, the authors should compare the error rate of CS sequence before and after cRG correction to elucidate the ability to correct sequencing errors with the cRG mode.

6. The authors trained a random forest classifier to predict the strand origin of circular reads. How many CCRs were used as the training set, and how's the performance of the random forest classifier? The authors should provide more data about this step.

General comments:

1. The Figures and Supplemental Figures were not labeled with figure numbers, making it extremely difficult to read (especially some figures span across more than one page). The authors should label the figure number more clearly.

2. Some figures are not clearly labeled. For example, in Figure 2j, what does each lane represent? Also, in Figure 3e, it is not clear what the y-axis is. The authors state that it is read coverage, but how come the value is from 0 to 1? In Figure 3d, the authors state that it is the correlation between HeLa and SKOV3, but it was not clear what axis is HeLa and SKOV3, respectively. Again, what does the shaded area mean in these figures (figure 3c, 3d, 3f, and 3g)? The authors should check through the figure label more carefully and correct them accordingly.

Reviewer #1 (Recommendations for the authors):

Overall, this is a nice piece of work that can be published after revision. Below are my specific comments for the authors:

1. As noted in my public review, the authors should carry out a two-way comparison between circFL-seq and CIRI-long, as well as a three-way comparison between circFL-seq, CIRI-long, and isoCirc. Given that circFL-seq and CIRI-long are both based on RCRT while isoCirc is based on RCA, it would be interesting to see if circFL-seq and CIRI-long produce more concordant results in terms of the discovery and quantitation of full-length circular RNAs, given the similarity in their experimental strategies.

2. In both Introduction and Discussion, the authors noted that the use of ligation in isoCirc may lead to false discoveries of circular RNAs. While this statement is technically correct from an experimental standpoint, such false discoveries can be recognized and removed computationally – therefore this is not a fair critique of isoCirc. In fact, the isoCirc computational pipeline is designed to remove such artifacts, using stringent requirements for alignment quality and presence of canonical splice site motifs in all forward and back splice junctions within full-length circular RNA transcripts.

3. Line 65-67: "Thus, an accurate but affordable method to detect full-length circRNA remains to be developed for wide application in screening functional circRNAs at the omics scale". This statement leaves the impression that such a method currently does not exist, which is not a fair representation of the current literature with the recent publications of isoCirc and CIRI-long.

4. The authors reported that compared to isoCirc, circFL-seq produced more full-length circular RNA reads at the same library depth but identified fewer circular RNA isoforms. The authors appeared to present this finding as a positive feature of circFL-seq. For example, in line 265-267, the authors stated that "as a trade-off, isoCirc produced fewer reads with the same sequencing depth, which raised sequencing costs and weakened its ability to detect and accurately quantify high-quality circRNAs". There are multiple issues with this statement. In terms of the precision of circular RNA quantitation (e.g. as evaluated based on comparison among nanopore replicates as well as comparison to short-read RNA-seq data), the metrics presented for circFL-seq are in fact quite comparable to the metrics presented in the isoCirc paper, so there is no evidence that circFL-seq provides a better quantitation of circular RNAs. Moreover, given that the ground-truth is not known, an alternative interpretation to this observation is that circFL-seq may be biased towards highly expressed circular RNAs, and may lack the ability to discover moderately and lowly expressed circular RNAs. Overall, in comparing different methods, the authors should aim to provide an impartial discussion about the strengths and weaknesses of individual methods, and avoid over-interpreting the data in favor of their own method.

5. The discussion about CDR1as is interesting. Can CIRI-long detect this circular RNA?

6. The authors should also cite the BioRxiv preprint by Rahimi et al., (https://doi.org/10.1101/567164), which is another method for nanopore sequencing of circular RNAs.

7. Overall, the manuscript is easy to read and follow, but it could benefit from a thorough editing by a language editor.

Reviewer #2 (Recommendations for the authors):

1. In comparison to isoCirc, circFL-seq identified fewer circRNA isoforms with higher read coverage of the detected circRNAs. This raises a concern that the outcome may result from that RCRT captures circRNA less efficiently than RCA, resulting in fewer circRNA molecules are captured in circFL-seq. The higher read coverage may simply come from sequencing the same circRNA molecule from the PCR amplification artifacts. This may also explain why circFL-seq cannot detect circRNAs with low read count or lowly expressed circRNAs. In this case, the authors cannot use back splice junction (BSJ) detection saturation as an indicator to compare the required read-depth between isoCirc and circFL-seq. Also, given the concern above, the "high read coverage" does not necessarily mean "high quality" nor "high accuracy" as claimed by the authors. The authors should address this concern before claiming on the benefits of high read coverage.

2. The advantages of circFL-seq over other existing technologies are not well-supported. For example, the authors claim that RCRT has lower residual linear RNA contamination than RCA, but the authors do not provide any data or evidence supporting the claim. Also, the authors claim that the circFL-seq gives higher circRNA read coverage; hence it is beneficial for circRNA quantification. However, the real "benefits" over other technologies (RNA-seq and isoCirc) for circRNA quantification are not clear since the RNA-seq (and isoCirc) quantification is significantly correlated with circFL-seq as demonstrated by the authors.

3. In the manuscript, the results are often comparable with known database and existing technologies when the authors focus on the "high quality" circRNAs only (circFL-seq read counts >= 5) which also have high expression level. The fact suggests that circFL-seq result is trust-worthy on "high quality" only. It also suggests that circFL-seq may fail to detect the lowly expressed circRNAs. The authors should note and discuss these limitations.

General comments:

1. The Figures and Supplemental Figures were not labeled with figure numbers, making it extremely difficult to read (especially some figures span across more than one page). The authors should label the figure number more clearly.

2. Some figures are not clearly labeled. For example, in Figure 2j, what does each lane represent? Also, in Figure 3e, it is not clear what the y-axis is. The authors state that it is read coverage, but how come the value is from 0 to 1? In Figure 3d, the authors state that it is the correlation between HeLa and SKOV3, but it was not clear what axis is HeLa and SKOV3, respectively. Again, what does the shaded area mean in these figures (figure 3c, 3d, 3f, and 3g)? The authors should check through the figure label more carefully and correct them accordingly.

Specific comments:

1. In the Abstract section, the authors claim that "… the detection of cancer-related fusion circRNAs…". However, the authors did not provide any data or literature suggesting that the fusion circRNAs they identified by circFL-seq are really "cancer-related" or biologically meaningful. The claim needs to be revised or proved.

2. In Figure2—figure supplement 1a, the full-length circRNA reads contribute very little percentage of the total clean reads (~2-5%). How come the RCRT method generate so little full-length circRNA reads? The authors should comment on this and discuss it.

3.In Figure 3—figure supplement 3, the author claim that more reads are required for ONT to confidently identify circRNAs. Doesn't this compromise the cost-efficient claim of circFL-seq made by the authors earlier? The authors should comment on this and discuss it.

4. In Figure 3—figure supplement 6a-d, how the saturation curves are calculated? It seems like isoCirc has much lower sequencing depth (~0.3 M) than circFL-seq. How do the authors compare the BSJ saturation in this case?

5. When comparing the validity of circFL-seq and isoCirc, why do the authors focus on top 100 expressed BSJ only? A better comparison should be the total BSJ in circFL-seq and isoCirc.

6. The authors should not use the same absolute circFL-seq BSJ read counts to define high-quality BSJs in isoCirc for the following reasons: (i) the read counts >= 5 in circFL-seq is arbitrary, there is no evidence suggesting that the isoCirc should use the same read counts to define high-quality BSJs. (ii) Since isoCirc captures more circRNAs, a lower BSJ read counts per circRNA is expected given the same sequencing depth. In both cases, lower BSJ read counts in isoCirc does not necessarily mean the BSJ is not "high-quality". Thus, the authors should not use absolute circFL-seq BSJ read counts as an indicator for the BSJ quality in isoCirc.

7. In the PLOD2 circFL-seq and RNA-seq example shown by the authors, the authors suggest that the circPLOD2 has lower exon skipping event than its parent linear RNA in HeLa cells. However, given that the BSJ is exactly the same between exon-skipped and non-exon-skipped circPLOD2, how does the back-splicing mechanism distinguish different parent linear RNA isoforms that selectively back-splices the non-exon-skipped linear RNA to generate specific circPLOD2 isoform in HeLa cells?

8. Are the f-circ detected by circFL-seq generated by RNA fusion or genomic fusion? Although the RNA-seq suggests a genomic fusion, it does not completely eliminate the possibility of a genomic fusion. A genomic PCR followed by Sanger sequencing should be performed to validate the fusion junction of the genome.

Reviewer #3 (Recommendations for the authors):

Considering that there are two recently published circRNA reconstruction tools based on nanopore sequencing, the authors should comprehensively compare their method with these two tools, and carefully discuss the advantages and disadvantages of these methods.

Specific comments:

1. In the Discussion section (line 260-262), the authors compared circFL-seq with the recently published CIRI-long method. Both circFL-seq and CIRI-long use a similar rolling circle reverse transcription strategy to amplify circRNAs. The authors may discuss the difference and (dis)advantages between their method and previous methods (isoCirc and CIRI-long).

2. In section "Comparison with RNA-seq and isoCirc for circRNA detection", the authors compared circFL-seq with the isoCirc method and found that circFL-seq produced more circular reads but identified fewer circRNA isoforms, which is an interesting result. Does it mean that isoCirc has a better sensitivity or higher false discovery rate in detecting lowly expressed circRNAs? The authors should include more comparison (e.g. venn diagram between three sets under different BSJ thresholds) between circFL-seq, isoCirc, and public circRNA database (e.g. circAtlas [PMID: 32345360, PMID: 30893614]) to demonstrate the advantages of their method.

3. The authors found six f-circ derived from GBF1-MACROD2 fusion and validated their junctions using Sanger sequencing. Besides, the authors also used short-read RNA-seq to validate the linear fusion junctions. What's the ratio of linear and circular transcript derived from these gene fusion loci? Is there any possibility that these f-circRNAs are derived from trans-splicing events? Considering that short-read RNA-seq data cannot effectively distinguish circular and linear transcripts, the authors may try to search for nanopore reads spanning the fusion region, which can provide direct evidence for these gene fusion events.

4. Because of the high error rate of nanopore sequencing, the authors should compare the error rate of CS sequence before and after cRG correction to elucidate the ability to correct sequencing errors with the cRG mode.

5. The authors trained a random forest classifier to predict the strand origin of circular reads. How many CCRs were used as the training set, and how's the performance of the random forest classifier? The authors should provide more data about this step.

6. In section "Evaluation of quantification of full-length circRNAs", it would be nice if the authors could compare the quantification results between their method on nanopore reads and previous method (e.g. CIRIquant) on short-read RNA-seq data.

7. The authors used different names (circFL-seq, circfull) to denote their sequencing and data analysis methods. It would be better if they can unify the name, say, circFL-seq, which may avoid misunderstanding.

https://doi.org/10.7554/eLife.69457.sa1

Author response

Essential revisions:

1. Reviewers all agree that a major weakness of the present manuscript is with the comparison to existing methods. The authors should compare circFL-seq to CIRI-long, and reviewers agree that a three way comparison to isoCirc is informative. The authors should accompany their comparisons with a discussion about strengths and weaknesses of each of the methods.

We agree that comparison among the three methods will facilitate selecting proper method for different purposes. However, detection of full-length circRNAs is affected by various confounding factors (as summarized in Author response table 1), including input of total RNA, species of samples, libraries per sample, sequencing platform and libraries per Flow Cell. As these factors were not consistent among the three methods developed at almost the same time, we compared these methods in principle.

Author response table 1
Summary of confounding factors among three methods.
detailscircFL-seqCIRI-longisoCirc
input of total RNA (μg)2120
speciesHumanmousehuman
samples7 cell linesbrain and testisbrain1 cell line12 tissues
platformONT PromethION, MinIONONT MinIONONT MinION
libraries per sampleone/twomultiplemultiple
libraries per Flow Cell (sequencing depth)one/multiplemultipleone

All the three methods produced rolling circles of circRNA for full-length sequencing by ONT. However, circFL-seq and CIRI-long employed RCRT strategy, while isoCirc with RCA strategy. Benefiting from RCA with circular cDNA template from ligation of reverse transcription (RT) product, isoCirc produced longer reads (up to 50 kb) containing more rolling circles than RCRT based methods (ca. 1 kb), and thus had an advantage in error correction during consensus sequences identification. As a trade-off, the cost of each full-length read is much higher in isoCirc than circFL-seq and CIRI-long. In addition, the unique ligation of cDNA in isoCirc may introduce false positive for circRNA detection by ligating cDNA from residual linear RNA. In contrast to isoCirc, both of circFL-seq and CIRI-long based methods employed RCRT to produce rolling circles during first-strand synthesis. The difference was that circFL-seq synthesized second-strand cDNA with an anchor primer while CIRI-long used template switching. Thus, these two methods are more resistant to the interference of residual linear RNA and generate shorter full-length reads of circRNA than isoCirc. Although decreasing sequencing cost of nanopore, shorter full-length reads have to perform error correction among reads besides intra-reads. Thus, RCRT based methods are not as sensitive as isoCirc with the same amount of full-length reads.

Along with detection of circRNA full-length sequences, all the three methods were also able to detect alternative splicing events of circRNAs. Both circFL-seq and isoCirc detected AS events of circRNA in human samples. For HEK293 cell line sequenced by one MinION, circFL-seq could be more precise than isoCirc for BSJ detection (75.3% vs 43.8%) with known circRNAs annotated in databases for comparison, and identified more than two times of ES, A3SS, and A5SS but similar amount of IR events than isoCirc. (Supplementary file 4). Thus, benefiting from more full-length circRNA reads, circFL-seq is more sensitive for AS events. However, circFL-seq may fail to detect some IR events. We found that the reads of isoCirc-only IR showed high error rate in the intron region (Author response image 1) which could due to the strong second structure. The RT step and cDNA ligation of isoCirc could avoid or remedy the early stop of RT in the strong RNA structure of intron.

Author response image 1
Alignment schematic of nanopore reads of isoCirc-only IR events.

Although circFL-seq and CIRI-long were applied in different species (human vs. mouse), the computational pipeline was compatible and could be compared with each other for the very similar experimental protocols. We employed the computational pipeline of circFL-seq and CIRI-long to analyze HEK293 cell line of circFL-seq and mouse brain data of CIRI-long, respectively. We found the computational pipeline of circFL-seq is much more sensitive with similar precision for BSJ detection. For AS events, circFL-seq is also more sensitivity, especially for ES, A3SS and A5SS events. (Supplementary file 5).

We added above results and discussions in our revised manuscript and Supplementary file 4-6.

2. The current claims and conclusions that circFL-seq is the superior method is not well supported by their data in this version of the manuscript. The authors should provide fair critiques of the other method (see comments about ligation). The authors need to point out potential limitations. For example, Is CircFL-seq biased to detecting only highly expressed transcripts?

In the comparison with isoCirc on one MinION data of HEK293 cell line, circFL-seq is more precise with all read counts (both highly and lowly expressed transcripts). Thus, circFL-seq was not biased to only highly expressed transcripts (Author response table 2). The shortness of circFL-seq compared to isoCirc is that the RT step could early stop resulting in lack of IR detection from strong second structure of intron (see Author response image 1).

Author response table 2
Comparison of isoCirc and circFL-seq for BSJ detection in HEK293 cell line.
total circRNA BSJs
# read counts for BSJ1234>4all
isoCirc HEK293 SRR1061205032,2043,5721,2376201,77739,410
isoCirc HEK293 SRR1061205134,4933,9161,3616871,91542,372
isoCirc HEK293 SRR1061205244,5865,2701,7078602,63555,058
isoCirc HEK293 SRR1061205339,4845,2741,8711,0222,97050,621
isoCirc HEK293 SRR1061205440,9285,2591,8971,0713,00952,164
isoCirc HEK293 SRR1061205530,6473,7791,3877102,02438,547
isoCirc HEK293 all158,87523,3028,8215,13326,782222,913
circFL-seq HEK29313,9064,8892,8301,5254,71927,869
known circRNA BSJs annotated in database
# read counts for BSJ1234>4all
isoCirc HEK293 SRR1061205010,4582,5281,0805711,62016,257
isoCirc HEK293 SRR1061205110,8892,6631,1946151,75117,112
isoCirc HEK293 SRR1061205212,7273,4471,4517732,39620,794
isoCirc HEK293 SRR1061205314,8283,8931,6659442,71124,041
isoCirc HEK293 SRR1061205415,0783,8341,6789712,75024,311
isoCirc HEK293 SRR1061205512,5342,9691,2646601,86019,287
isoCirc HEK293 all28,91712,0886,8214,41125,30177,538
circFL-seq HEK2938,8363,8212,3771,3654,58920,988
% known circRNA BSJs
# read counts for BSJ1234>4all
isoCirc HEK293 SRR1061205032.570.887.392.191.241.3
isoCirc HEK293 SRR1061205131.668.087.789.591.440.4
isoCirc HEK293 SRR1061205228.565.485.089.990.937.8
isoCirc HEK293 SRR1061205337.673.889.092.491.347.5
isoCirc HEK293 SRR1061205436.872.988.590.791.446.6
isoCirc HEK293 SRR1061205540.978.691.193.091.950.0
isoCirc HEK293 all18.251.977.385.994.534.8
circFL-seq HEK29363.578.284.089.597.275.3

3. In comparison to isoCirc, circFL-seq identified fewer circRNA isoforms with higher read coverage of the detected circRNAs, which may be a PCR artifact. The authors need to address this concern. In addition, in two published studies, isoCirc and CIRI-long have also used nanopore sequencing to characterize circRNA isoforms and alternative splicing events. However, both studies have reported a relatively higher percentage of retained introns (isoCirc: Figure 4b, CIRI-long: Supplementary Figure 13) compared to the number of 3.5% of intron retention events in line 139. The authors should clarify the reason behind this difference.

Sorry for the misunderstanding. Compared to isoCirc, circFL-seq detected fewer circRNA BSJs. However, less than half of isoCirc detected BSJs were not annotated in database, while 75.3% of circFL-seq detected BSJs were annotated in databse (Author response table 2). Thus, circFL-seq produced more full-length reads for true positive circRNAs. We rephrased our manuscript.

Thank you for the inspiring question. We conducted a comprehensive investigation about AS detection among the three methods. Actually, the total amounts of IR events are comparable among the three methods. However, circFL-seq identified more ES, A3SS, and A5SS, resulting in the lower IR proportion. We added these data in the manuscript and Supplementary file 4,5.

4. The authors found six f-circ derived from GBF1-MACROD2 fusion and validated their junctions using Sanger sequencing. Besides, the authors also used short-read RNA-seq to validate the linear fusion junctions. What's the ratio of linear and circular transcript derived from these gene fusion loci? Is there any possibility that these f-circRNAs are derived from trans-splicing events? Considering that short-read RNA-seq data cannot effectively distinguish circular and linear transcripts, the authors may try to search for nanopore reads spanning the fusion region, which can provide direct evidence for these gene fusion events.

Based on the nanopore reads counts of f-circ and RT-qPCR results, we inferred that the ratio of circular and linear transcript of gene fusion loci was 9.9%. To verify these f-circ were circular transcripts rather than linear transcripts, we detected these f-circ in RNase R treated samples and validated by Sanger sequencing (Figure 4C-E in manuscript). In addition, we also found nanopore reads that contained more than 4 full circles of f-circ. Together, these evidences supported the bona fide fusion circRNAs from GBF1-MACROD2.

We thank the reviewers for the interesting question that the f-circ were derived from genomic fusion of GBF1 and MACROD2 or trans-splicing events between GBF1 and MACROD2. Given that there were at least five f-circ isoforms of GBF1-MACROD2, we speculated that these f-circ were derived from genomic fusion rather than trans-splicing. However, length of the adjacent introns (83,935 bp for GBF1 and 544,981 bp for MACROD2) of the fusion site were too large to detect genomic fusion by PCR. Long-read sequencing of genomic DNA could be helpful to identify the fusion junction and confirm the origin in the future.

A nanopore read with four circles of f-circ from GBF1-MACROD2 E1-E2-E4

TCAAACGAAATGCCCGATGGAGCACCACTTTCCACTGGATGAAGAACGGGATCACAAACGCATAGTTTCGGTCATCTAAAGGAGGTTTTAAACAGTATAACAGTGGATGGCTGTATTCATAGAGCAGCCGGCCCCTGTTTGCTAGCTGAATGTCGTAACCTGAATGAAAACAGTGATACTGGACATGCAAAAATCACATGTGGCTATGACCTTCCTGCGAAATGTTTGCCAAGATGGTGGATAAGAATATTTACATCATTCAGGGGAGATTAACATTGTGGTTGGGGCCATCAAACGAAATGCCCGATGGAGCACCCATACACCACTGGATGAGAACGCCCCGGGATCCTCTGCTGCATAGTTTCGGTCATCTAAAGGAGGTTTTAAACAGTATAACAGTGGATGGCTGTATTCATAGAGCAGCCGGCCCCTGTTTGCTAGCTGAATGTCGTAACCTGAATGGCTGTGATACTGGACATGCAAAAATCACATGTGGCTATGACCTTCCTGCAAAATGTTTGCCAAGATGGTGGATAAGAATATTTACACATCATTCAAGGGGAGATTGCTGTGTGGTTGGGGCCATCAAACGAAATGCCCGATGGAGCACCCGCACACACTGGATGAAGAACGGGATCCTGCTGCATAGTTTCGGTCATCTAAAGGAGGTTTAAACAGTATAACAGTGGATGGCTGTATTCATAGAGCAGCCGGCCCCTGTTTGCTAGCTGAATGTCGTAACCTGAATGGCTGTGATACTGGACATGCAAAAATCACATGTGGCTATGACCTTCCTGCAAAATGTTTGCCAAGATGGTGGATAAGAATATTTACATCATTCAAGGAGATTAACATTGTGGTTGGGGCCATCAAACGAAATGCTTGACGGAACACCCATACACCACTGGATGAAGAACGCAGGATCCTCTGCTGCATAGTTTCGGTCATCTAAAGGAGGTTTTAAACAGTATAACAGTGGATGGCTGTATTCATAGAGCAGCCGGCCCCCTGTTGCTAGCTGAATGTCGTAACCTGAATGGCTGTGATACTGGACATGCAAAAATCACATGTGGCTATGACCTTCCTGCAAAATGTTTGCCAAGATTGGTGGATAAGAATATTTACATCATTCCGGGAGATTAACATTGTGGTTGGGGCCATCAAACGAAATGCCCGATGGAGCACCCGCGCCTACTGGATGAGCAAGGATCCTCTGCTGCATAGTTTCGGTCATCTAAAGGAGGTTTTAAACAGTATAACAGTGGATGGCTGTATTCATAGAGCAGCCGGCCCCTGTTTGCTAGCTGAATGTCGTAACCTGAATGGCTGTGATACTGGACATGCAAAAATCACATGTGTATGACCTTCCTGCAAAATGTTTCTGCCAGATGGTATGGATC

5. Because of the high error rate of nanopore sequencing, the authors should compare the error rate of CS sequence before and after cRG correction to elucidate the ability to correct sequencing errors with the cRG mode.

Thanks for the advice. With the suggested analysis, we found that cRG correction significantly reduced the error rate of both mismatch and indels. We have supplemented the information in our manuscript and Figure 2—figure supplement 2C, D.

6. The authors trained a random forest classifier to predict the strand origin of circular reads. How many CCRs were used as the training set, and how's the performance of the random forest classifier? The authors should provide more data about this step.

Thanks for the suggestion. We have supplemented details of the random forest classifier including the sample size for training and performance for predication (accuracy, AUC value). We have supplemented the information in our manuscript and Supplementary file 8.

General comments:

1. The Figures and Supplemental Figures were not labeled with figure numbers, making it extremely difficult to read (especially some figures span across more than one page). The authors should label the figure number more clearly.

Sorry for the inconvenience. We have revised and labeled all figures and supplemental figures.

2. Some figures are not clearly labeled. For example, in Figure 2j, what does each lane represent? Also, in Figure 3e, it is not clear what the y-axis is. The authors state that it is read coverage, but how come the value is from 0 to 1? In Figure 3d, the authors state that it is the correlation between HeLa and SKOV3, but it was not clear what axis is HeLa and SKOV3, respectively. Again, what does the shaded area mean in these figures (figure 3c, 3d, 3f, and 3g)? The authors should check through the figure label more carefully and correct them accordingly.

Thank you for your careful review. We have labeled the lane name in Figure 2j. We have revised ‘read coverage’ to ‘adjusted coverage’ in Figure 3e, which was a ratio value normalized by max read coverage. In Figure 3d, the x-axis is the fold change of expression level between HeLa and SKOV3, the y-axis is the -∆∆CT of RT-qPCR between HeLa and SKOV3. The shaded areas in Figure 3c, 3d, 3f, 3g were 95% confidence intervals. Besides revisions in figures, we have added more details in figure legend (marked as G2).

Reviewer #1 (Recommendations for the authors):

Overall, this is a nice piece of work that can be published after revision. Below are my specific comments for the authors:

1. As noted in my public review, the authors should carry out a two-way comparison between circFL-seq and CIRI-long, as well as a three-way comparison between circFL-seq, CIRI-long, and isoCirc. Given that circFL-seq and CIRI-long are both based on RCRT while isoCirc is based on RCA, it would be interesting to see if circFL-seq and CIRI-long produce more concordant results in terms of the discovery and quantitation of full-length circular RNAs, given the similarity in their experimental strategies.

Thank you for your suggestion. We have performed the three-way comparison.

2. In both Introduction and Discussion, the authors noted that the use of ligation in isoCirc may lead to false discoveries of circular RNAs. While this statement is technically correct from an experimental standpoint, such false discoveries can be recognized and removed computationally – therefore this is not a fair critique of isoCirc. In fact, the isoCirc computational pipeline is designed to remove such artifacts, using stringent requirements for alignment quality and presence of canonical splice site motifs in all forward and back splice junctions within full-length circular RNA transcripts.

We agreed that false discoveries can be recognized and removed computationally. However, not all of those false discoveries were identified by the computational pipeline of isoCirc. We focused on the false discoveries of top expressed circRNAs to eliminate the effect of high noise of nanopore reads and disturbance of low expressed circRNAs. As clarified in our manuscript, when focusing on the top 100 expressed circRNA BSJs identified by each method, BSJs from both circFL-seq and RNA-seq (collected from isoCirc study) were annotated in databases, while 22 of the top 100 BSJs identified by isoCirc lacked evidence support from the database, RNA-seq, or circFL-seq results. Specifically, two products were from histone genes whose linear RNAs are resistant to RNase R degradation. In addition, we also failed to experimentally validate the two potential circular products with divergent primers. Thus, there are abundant false discoveries which were not removed by the computational pipeline of isoCirc. We have made this description clearer in our manuscript (marked as Q1.2).

3. Line 65-67: "Thus, an accurate but affordable method to detect full-length circRNA remains to be developed for wide application in screening functional circRNAs at the omics scale". This statement leaves the impression that such a method currently does not exist, which is not a fair representation of the current literature with the recent publications of isoCirc and CIRI-long.

We have revised this statement.

4. The authors reported that compared to isoCirc, circFL-seq produced more full-length circular RNA reads at the same library depth but identified fewer circular RNA isoforms. The authors appeared to present this finding as a positive feature of circFL-seq. For example, in line 265-267, the authors stated that "as a trade-off, isoCirc produced fewer reads with the same sequencing depth, which raised sequencing costs and weakened its ability to detect and accurately quantify high-quality circRNAs". There are multiple issues with this statement. In terms of the precision of circular RNA quantitation (e.g. as evaluated based on comparison among nanopore replicates as well as comparison to short-read RNA-seq data), the metrics presented for circFL-seq are in fact quite comparable to the metrics presented in the isoCirc paper, so there is no evidence that circFL-seq provides a better quantitation of circular RNAs. Moreover, given that the ground-truth is not known, an alternative interpretation to this observation is that circFL-seq may be biased towards highly expressed circular RNAs, and may lack the ability to discover moderately and lowly expressed circular RNAs. Overall, in comparing different methods, the authors should aim to provide an impartial discussion about the strengths and weaknesses of individual methods, and avoid over-interpreting the data in favor of their own method.

Because circFL-seq is more sensitive and more precise for BSJ detection, circFL-seq could also provide better quantitation. However, due to the advantage of quantification performance is not obvious, we have removed the advantage descriptions of circFL-seq for circRNA quantification.

4. The discussion about CDR1as is interesting. Can CIRI-long detect this circular RNA?

Thank you for your suggestion. Because CIRI-long only published mouse brain data, we employed the computational pipeline of CIRI-long on our circFL-seq data of HEK293 cell line. CIRI-long also successfully identified CDR1as. We also employed the computational pipeline of circFL-seq on isoCirc data but failed to detect CDR1as. Thus, we speculated that isoCirc data lack the sequenced reads of CDR1as, the reason of which could due to the experimental protocol. Because some circRNAs, including CDR1as, are sensitivity to RNase R (Linda Szabo1 and Julia Salzman, 2016, Nature Reviews Genetics), we hypothesized that digestion of linear transcripts by RNase R might require to be optimized in the protocol of isoCirc. We have removed the discussions about CDR1as in Discussion section.

5. The authors should also cite the BioRxiv preprint by Rahimi et al., (https://doi.org/10.1101/567164), which is another method for nanopore sequencing of circular RNAs.

We have cited the preprint paper in the introduction section as ref 13.

6. Overall, the manuscript is easy to read and follow, but it could benefit from a thorough editing by a language editor.

We have made a language editing by American Journal Experts (AJE).

Reviewer #2 (Recommendations for the authors):

Major concerns:

1. In comparison to isoCirc, circFL-seq identified fewer circRNA isoforms with higher read coverage of the detected circRNAs. This raises a concern that the outcome may result from that RCRT captures circRNA less efficiently than RCA, resulting in fewer circRNA molecules are captured in circFL-seq. The higher read coverage may simply come from sequencing the same circRNA molecule from the PCR amplification artifacts. This may also explain why circFL-seq cannot detect circRNAs with low read count or lowly expressed circRNAs. In this case, the authors cannot use back splice junction (BSJ) detection saturation as an indicator to compare the required read-depth between isoCirc and circFL-seq. Also, given the concern above, the "high read coverage" does not necessarily mean "high quality" nor "high accuracy" as claimed by the authors. The authors should address this concern before claiming on the benefits of high read coverage.

Thank you for the suggestion. We agreed that saturation curve was not appropriate and thus deleted related data from our revised manuscript (marked as Q2.1).

Comprehensive comparison between RCA and RCRT has been discussed above. And the worry about bias of high expressed circRNA has also been discussed above.

2. The advantages of circFL-seq over other existing technologies are not well-supported. For example, the authors claim that RCRT has lower residual linear RNA contamination than RCA, but the authors do not provide any data or evidence supporting the claim. Also, the authors claim that the circFL-seq gives higher circRNA read coverage; hence it is beneficial for circRNA quantification. However, the real "benefits" over other technologies (RNA-seq and isoCirc) for circRNA quantification are not clear since the RNA-seq (and isoCirc) quantification is significantly correlated with circFL-seq as demonstrated by the authors.

Sorry for the misunderstanding. We have performed the three-way comparison.

3. In the manuscript, the results are often comparable with known database and existing technologies when the authors focus on the "high quality" circRNAs only (circFL-seq read counts >= 5) which also have high expression level. The fact suggests that circFL-seq result is trust-worthy on "high quality" only. It also suggests that circFL-seq may fail to detect the lowly expressed circRNAs. The authors should note and discuss these limitations.

Sorry for confusing. The term “high quality” we used here was not equivalent to high expression level. With higher sequencing depth, lowly expressed circRNAs would become “high quality”. For lowly expressed circRNAs, circFL-seq was not more biased than isoCirc (Author response table 2).

General comments:

1. The Figures and Supplemental Figures were not labeled with figure numbers, making it extremely difficult to read (especially some figures span across more than one page). The authors should label the figure number more clearly.

2. Some figures are not clearly labeled. For example, in Figure 2j, what does each lane represent? Also, in Figure 3e, it is not clear what the y-axis is. The authors state that it is read coverage, but how come the value is from 0 to 1? In Figure 3d, the authors state that it is the correlation between HeLa and SKOV3, but it was not clear what axis is HeLa and SKOV3, respectively. Again, what does the shaded area mean in these figures (figure 3c, 3d, 3f, and 3g)? The authors should check through the figure label more carefully and correct them accordingly.

Sorry for the inconvenience. We have revised and labeled all figures and supplemental figures.

Specific comments:

1. In the Abstract section, the authors claim that "… the detection of cancer-related fusion circRNAs…". However, the authors did not provide any data or literature suggesting that the fusion circRNAs they identified by circFL-seq are really "cancer-related" or biologically meaningful. The claim needs to be revised or proved.

We have removed the phrase “cancer-related” in our revised manuscript.

2. In Figure2—figure supplement 1a, the full-length circRNA reads contribute very little percentage of the total clean reads (~2-5%). How come the RCRT method generate so little full-length circRNA reads? The authors should comment on this and discuss it.

In our manuscript, we have shown the percentage of BSJs reads from RNA-seq is only 0.15%–0.32%, an amount ten times lower than that of full-length circRNA reads with circFL-seq. In addition, the percentage isoCirc ranged between 3.5% and 4.0%, which is comparable with circFL-seq.

3.In Figure 3—figure supplement 3, the author claim that more reads are required for ONT to confidently identify circRNAs. Doesn't this compromise the cost-efficient claim of circFL-seq made by the authors earlier? The authors should comment on this and discuss it.

Sorry for the misunderstanding. circFL-seq is developed for full-length circRNA sequencing. In the detection of BSJs, the cost of circFL-seq is much higher than RNA-seq. For the cost-efficient question in the full-length level, we want to compare circFL-seq to isoCirc. Due to the high noise of ONT reads, both circFL-seq and isoCirc have shown circRNAs with more read counts is more reliable. Because circFL-seq have the advantage to produce more reads. Thus, circFL-seq is a cost-efficient method for full-length circRNA sequencing.

4. In Figure 3—figure supplement 6a-d, how the saturation curves are calculated? It seems like isoCirc has much lower sequencing depth (~0.3 M) than circFL-seq. How do the authors compare the BSJ saturation in this case?

We have revised our discussion about the figure.

5. When comparing the validity of circFL-seq and isoCirc, why do the authors focus on top 100 expressed BSJ only? A better comparison should be the total BSJ in circFL-seq and isoCirc.

In this section, we want to find evidence of false discoveries from cDNA ligation of isoCirc. We focused on the false discoveries of top expressed circRNAs to eliminate the effect of high noise of nanopore reads and disturbance of low expressed circRNAs.

6. The authors should not use the same absolute circFL-seq BSJ read counts to define high-quality BSJs in isoCirc for the following reasons: (i) the read counts >= 5 in circFL-seq is arbitrary, there is no evidence suggesting that the isoCirc should use the same read counts to define high-quality BSJs. (ii) Since isoCirc captures more circRNAs, a lower BSJ read counts per circRNA is expected given the same sequencing depth. In both cases, lower BSJ read counts in isoCirc does not necessarily mean the BSJ is not "high-quality". Thus, the authors should not use absolute circFL-seq BSJ read counts as an indicator for the BSJ quality in isoCirc.

Sorry for confusing. We agreed that an absolute cutoff value was not fair for methods comparison. Actually, we collected all full-length circRNA (read counts >= 1) for comparison. The details of updated results were described above.

7. In the PLOD2 circFL-seq and RNA-seq example shown by the authors, the authors suggest that the circPLOD2 has lower exon skipping event than its parent linear RNA in HeLa cells. However, given that the BSJ is exactly the same between exon-skipped and non-exon-skipped circPLOD2, how does the back-splicing mechanism distinguish different parent linear RNA isoforms that selectively back-splices the non-exon-skipped linear RNA to generate specific circPLOD2 isoform in HeLa cells?

Sorry for the misunderstanding. We actually compared the exon skipping between two circular isoforms instead of between circRNA and its linear cognate.

8. Are the f-circ detected by circFL-seq generated by RNA fusion or genomic fusion? Although the RNA-seq suggests a genomic fusion, it does not completely eliminate the possibility of a genomic fusion. A genomic PCR followed by Sanger sequencing should be performed to validate the fusion junction of the genome.

Thank you for your suggestion. Details was described above.

Reviewer #3 (Recommendations for the authors):

Considering that there are two recently published circRNA reconstruction tools based on nanopore sequencing, the authors should comprehensively compare their method with these two tools, and carefully discuss the advantages and disadvantages of these methods.

Specific comments:

1. In the Discussion section (line 260-262), the authors compared circFL-seq with the recently published CIRI-long method. Both circFL-seq and CIRI-long use a similar rolling circle reverse transcription strategy to amplify circRNAs. The authors may discuss the difference and (dis)advantages between their method and previous methods (isoCirc and CIRI-long).

We have discussed the differences of three methods and the limitation of circFL-seq.

2. In section "Comparison with RNA-seq and isoCirc for circRNA detection", the authors compared circFL-seq with the isoCirc method and found that circFL-seq produced more circular reads but identified fewer circRNA isoforms, which is an interesting result. Does it mean that isoCirc has a better sensitivity or higher false discovery rate in detecting lowly expressed circRNAs? The authors should include more comparison (e.g. venn diagram between three sets under different BSJ thresholds) between circFL-seq, isoCirc, and public circRNA database (e.g. circAtlas [PMID: 32345360, PMID: 30893614]) to demonstrate the advantages of their method.

A more detailed comparison was conducted.

3. The authors found six f-circ derived from GBF1-MACROD2 fusion and validated their junctions using Sanger sequencing. Besides, the authors also used short-read RNA-seq to validate the linear fusion junctions. What's the ratio of linear and circular transcript derived from these gene fusion loci? Is there any possibility that these f-circRNAs are derived from trans-splicing events? Considering that short-read RNA-seq data cannot effectively distinguish circular and linear transcripts, the authors may try to search for nanopore reads spanning the fusion region, which can provide direct evidence for these gene fusion events.

We have discussed the question in above.

4. Because of the high error rate of nanopore sequencing, the authors should compare the error rate of CS sequence before and after cRG correction to elucidate the ability to correct sequencing errors with the cRG mode.

We have discussed the question above.

5. The authors trained a random forest classifier to predict the strand origin of circular reads. How many CCRs were used as the training set, and how's the performance of the random forest classifier? The authors should provide more data about this step.

We have discussed the question above.

6. In section "Evaluation of quantification of full-length circRNAs", it would be nice if the authors could compare the quantification results between their method on nanopore reads and previous method (e.g. CIRIquant) on short-read RNA-seq data.

Thank you for suggestion. In the section, we have compared the quantification results between circFL-seq and RNA-seq (quantified by CIRI2), including Pearson correlation of expression level and DEC detection, which have shown comparable ability of BSJs quantification.

7. The authors used different names (circFL-seq, circfull) to denote their sequencing and data analysis methods. It would be better if they can unify the name, say, circFL-seq, which may avoid misunderstanding.

Thank you for suggestion. We have unified the name to circFL-seq in our manuscript.

https://doi.org/10.7554/eLife.69457.sa2

Article and author information

Author details

  1. Zelin Liu

    Institute of Systems Biomedicine, Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Key Laboratory for Neuroscience, Ministry of Education/National Health Commission of China , NHC Key Laboratory of Medical Immunology (Peking University), Beijing, China
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3516-3999
  2. Changyu Tao

    Department of Human Anatomy, Histology & Embryology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
    Contribution
    Data curation, Investigation
    Competing interests
    No competing interests declared
  3. Shiwei Li

    Department of Radiation Medicine, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
    Contribution
    Resources, Validation
    Competing interests
    No competing interests declared
  4. Minghao Du

    Department of Microbiology & Infectious Disease Center, School of Basic Medical Science Peking University Health Science Center, Beijing, China
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  5. Yongtai Bai

    Department of Radiation Medicine, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
    Contribution
    Resources, Validation
    Competing interests
    No competing interests declared
  6. Xueyan Hu

    Institute of Systems Biomedicine, Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Key Laboratory for Neuroscience, Ministry of Education/National Health Commission of China , NHC Key Laboratory of Medical Immunology (Peking University), Beijing, China
    Contribution
    Resources, Writing - review and editing
    Competing interests
    No competing interests declared
  7. Yu Li

    Chinese Institute for Brain Research, Beijing, China
    Contribution
    Resources
    Competing interests
    No competing interests declared
  8. Jian Chen

    Chinese Institute for Brain Research, Beijing, China
    Contribution
    Resources, Supervision
    Competing interests
    No competing interests declared
  9. Ence Yang

    1. Institute of Systems Biomedicine, Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Key Laboratory for Neuroscience, Ministry of Education/National Health Commission of China , NHC Key Laboratory of Medical Immunology (Peking University), Beijing, China
    2. Department of Microbiology & Infectious Disease Center, School of Basic Medical Science Peking University Health Science Center, Beijing, China
    3. Chinese Institute for Brain Research, Beijing, China
    Contribution
    Conceptualization, Funding acquisition, Project administration, Supervision, Writing - original draft
    For correspondence
    yangence@pku.edu.cn
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9526-2737

Funding

Beijing Municipal Science and Technology Commission (7212065)

  • Ence Yang

Chinese Academy of Sciences (2020-NKX-XM-01)

  • Ence Yang

Beijing Municipal Science and Technology Commission (Z181100001518005)

  • Ence Yang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The work was supported by grants from the Beijing Municipal Science and Technology Commission of China (7212065), the Chinese Institute for Brain Research, Beijing (2020-NKX-XM-01), and the Beijing Municipal Science and Technology Commission of China (Z181100001518005).

Senior Editor

  1. James L Manley, Columbia University, United States

Reviewing Editor

  1. Gene W Yeo, University of California, San Diego, United States

Reviewer

  1. Fangqing Zhao, Beijing Institute of Life Science, Chinese Academy of Sciences, China

Publication history

  1. Received: April 15, 2021
  2. Preprint posted: July 5, 2021 (view preprint)
  3. Accepted: October 13, 2021
  4. Accepted Manuscript published: October 14, 2021 (version 1)
  5. Version of Record published: October 26, 2021 (version 2)

Copyright

© 2021, Liu et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,513
    Page views
  • 227
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Zelin Liu
  2. Changyu Tao
  3. Shiwei Li
  4. Minghao Du
  5. Yongtai Bai
  6. Xueyan Hu
  7. Yu Li
  8. Jian Chen
  9. Ence Yang
(2021)
circFL-seq reveals full-length circular RNAs with rolling circular reverse transcription and nanopore sequencing
eLife 10:e69457.
https://doi.org/10.7554/eLife.69457

Further reading

    1. Chromosomes and Gene Expression
    2. Microbiology and Infectious Disease
    Bretta Hixson et al.
    Tools and Resources Updated

    Mosquitoes transmit numerous pathogens, but large gaps remain in our understanding of their physiology. To facilitate explorations of mosquito biology, we have created Aegypti-Atlas (http://aegyptiatlas.buchonlab.com/), an online resource hosting RNAseq profiles of Ae. aegypti body parts (head, thorax, abdomen, gut, Malpighian tubules, ovaries), gut regions (crop, proventriculus, anterior and posterior midgut, hindgut), and a gut time course of blood meal digestion. Using Aegypti-Atlas, we provide insights into regionalization of gut function, blood feeding response, and immune defenses. We find that the anterior and posterior midgut possess digestive specializations which are preserved in the blood-fed state. Blood feeding initiates the sequential induction and repression/depletion of multiple cohorts of peptidases. With respect to defense, immune signaling components, but not recognition or effector molecules, show enrichment in ovaries. Basal expression of antimicrobial peptides is dominated by holotricin and gambicin, which are expressed in carcass and digestive tissues, respectively, in a mutually exclusive manner. In the midgut, gambicin and other effectors are almost exclusively expressed in the anterior regions, while the posterior midgut exhibits hallmarks of immune tolerance. Finally, in a cross-species comparison between Ae. aegypti and Anopheles gambiae midguts, we observe that regional digestive and immune specializations are conserved, indicating that our dataset may be broadly relevant to multiple mosquito species. We demonstrate that the expression of orthologous genes is highly correlated, with the exception of a ‘species signature’ comprising a few highly/disparately expressed genes. With this work, we show the potential of Aegypti-Atlas to unlock a more complete understanding of mosquito biology.

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Bethany Sump et al.
    Research Article

    For some inducible genes, the rate and molecular mechanism of transcriptional activation depends on the prior experiences of the cell. This phenomenon, called epigenetic transcriptional memory, accelerates reactivation and requires both changes in chromatin structure and recruitment of poised RNA Polymerase II (RNAPII) to the promoter. Memory of inositol starvation in budding yeast involves a positive feedback loop between transcription factor-dependent interaction with the nuclear pore complex and histone H3 lysine 4 dimethylation (H3K4me2). While H3K4me2 is essential for recruitment of RNAPII and faster reactivation, RNAPII is not required for H3K4me2. Unlike RNAPII-dependent H3K4me2 associated with transcription, RNAPII-independent H3K4me2 requires Nup100, SET3C, the Leo1 subunit of the Paf1 complex and, upon degradation of an essential transcription factor, is inherited through multiple cell cycles. The writer of this mark (COMPASS) physically interacts with the potential reader (SET3C), suggesting a molecular mechanism for the spreading and re-incorporation of H3K4me2 following DNA replication.