Evolutionary dynamics of circular RNAs in primates

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Many primate genes produce circular RNAs (circRNAs). However, the extent of circRNA conservation between closely related species remains unclear. By comparing tissue-specific transcriptomes across over 70 million years of primate evolution, we identify that within 3 million years circRNA expression profiles diverged such that they are more related to species identity than organ type. However, our analysis also revealed a subset of circRNAs with conserved neural expression across tens of millions of years of evolution. By comparing to species-specific circRNAs, we identified that the downstream intron of the conserved circRNAs display a dramatic lengthening during evolution due to the insertion of novel retrotransposons. Our work provides comparative analyses of the mechanisms promoting circRNAs to generate increased transcriptomic complexity in primates.

Introduction

An important question in biology is how has the complexity of biological systems expanded while the number of protein-coding genes has remained mostly stable. Through decades of research, it has been shown that increased biological complexity has arisen in part by the dynamic generation of unique cell-specific transcriptomes, and as a consequence of the highly versatile programs of gene expression (Brawand et al., 2011; Cardoso-Moreira et al., 2019). However, studies of tissues across distant animal lineages have shown that gene expression is highly conserved between the same tissues in different species (Barbosa-Morais et al., 2012; Brawand et al., 2011; Cardoso-Moreira et al., 2019; Merkin et al., 2012; Reyes et al., 2013). Hence, gene expression alone is unlikely to explain the heterogeneous expansion in complexity (as defined by the number of cell types) across vertebrate evolution. Instead, it is becoming increasingly evident that the plethora of post-transcriptional mechanisms (Cheetham et al., 2020; Fiszbein et al., 2019; Gueroussov et al., 2017; Ha et al., 2018; Ha et al., 2021; Mattick, 2018) capable of greatly expanding transcriptomic diversity also underlies these advances.

Among these, an intriguing class produced by pre-mRNA processing are circular RNAs (circRNAs) (Zhang et al., 2013; Memczak et al., 2013; Li et al., 2018b; Gokool et al., 2020a). These RNAs can regulate protein localization (Liu et al., 2019), miRNA functionality (Piwecka et al., 2017), and a range of other processes (Li et al., 2018a; Gokool et al., 2020a), enabling increased regulatory complexity, especially in the immune and nervous systems (Gokool et al., 2020b; Li et al., 2017; Liu et al., 2019; Piwecka et al., 2017). CircRNAs form by back-splicing whereby an exon’s 3′-splice site is ligated to an upstream 5′-splice site forming a closed circRNA transcript (Barrett et al., 2015; Starke et al., 2015). Back-splicing occurs both co- and post-transcriptionally and is facilitated by inverted repeat elements that promote complementarity between adjacent introns favoring circRNA formation over linear splicing (Ivanov et al., 2015; Jeck et al., 2013; Liang and Wilusz, 2014; Zhang et al., 2014). These RNA-RNA interactions can be facilitated by RNA-binding proteins, such as Quaking (Conn et al., 2015), that help stabilize the hair-pin structure promoting circRNA formation.

The production of circRNAs can also arise due to the perturbed expression of trans-factors and the inhibition of the core splicing machinery (Aktaş et al., 2017; Liang et al., 2017). These spuriously produced circRNAs are maintained as their circular shape protects them from the activity of cellular exonucleases (Gokool et al., 2020b). In contrast, the variable usage of cis-regulatory elements in exons and flanking introns can be selected to promote circRNA expression in a cell-type, condition- or species-specific manner (Irimia and Blencowe, 2012; Nilsen and Graveley, 2010). Changes in circRNA expression may therefore represent a major source of species- and lineage-specific differences or error-prone mis-splicing. To provide insight into this quandary, here we describe a genome-wide analysis of circRNAs across physiologically equivalent organs from primate species spanning 70 million years of evolution. Our analysis uncovers extensive evidence of species-specific circRNAs that display no evidence of conservation even across relatively short evolutionary time periods. However, we also identify a small subset of circRNAs that are conserved across tens of millions of years displaying increased inclusion rates across evolutionary time. Our analysis comparing conserved circRNAs to species-specific circRNAs reveals that these circRNAs are flanked by newly inserted transposons that correlate with circRNA genesis and extend intron downstream of circRNA. Overall, our results identify evidence of circRNA conservation within closely related species and identify a reoccurring mechanism that correlates with circRNA genesis facilitating the expansion of transcriptomic complexity of primate cells.

Results

A core subset of circRNAs show conserved expression signatures but most are species-specific

To address the outstanding questions about the conservation and functional importance of circRNAs, we collected transcriptomic (RNA-seq) data (Peng et al., 2015; Pipes et al., 2013) from across nine tissues from eight primate species, consisting of three old-world monkeys, two hominoids, two new-world monkeys, and one prosimian (Supplementary file 1). These species were chosen on the basis of the quality of their genomes and their close evolutionary relationships enabling the evaluation of transcriptome changes between species ranging from <3 million years to >70 million years (see Figure 1A). For each species, we considered all primate-conserved internal exons as potential origins of back-spliced junctions (BSJs) with no restrictions on backward exon combination. Only canonical and annotated splice sites were used in analysis. RNA-seq reads were mapped to exon-exon junctions (EEJs) to determine ‘percent spliced in’ (PSI) for all circRNA with respect to the linear transcript. We also calculated PSI values for linear splicing of each internal exon and transcript per million (TPM) values to estimate gene expression. Orthology relationships between genes and exons were established to enable direct cross-species comparisons.

Figure 1 with 3 supplements see all

Download asset Open asset

Circular RNA (circRNA) expression signatures are conserved in some tissues.

(A) Phylogenetic tree of analyzed species with distance from human in millions of years (MYA) (divergence time according to TimeTree http://www.timetree.org/). Tissue datasets used in analysis identified on right with white squares denoting lack of dataset. (B) Clustering of samples based on expression values (transcripts per million). The variance of expression values was calculated, and the top 1000 most variable genes were used to calculate Pearson’s correlation (n = 1000 genes in 88 samples). Red colors indicate high correlation between samples, and blue describes low correlation. Vertical and horizontal adjacent heatmaps describe tissues (see A for key). (C) Barplot showing conservation of circRNAs based on back-spliced junction and based on occurrence within orthologous genes. (D) Clustering of conserved circRNAs based on percent spliced in (PSI) values. Clustered using Pearson’s correlation as in (B) (n = 149). Vertical and horizontal adjacent heatmaps describe tissues (inner heatmap; see A for key) and species (outer heatmap).

The circRNA analysis was done using Whippet because, according to our benchmarking results (see Materials and methods for details), it is an accurate and fast circRNA quantification tool. Our analysis of both simulated and collected RNA-seq data found that Whippet has a low false positive rate (<2%, see Materials and methods for details), which is in line with other methods (Szabo et al., 2015, Gokool et al., 2020a), a high rate of circRNA identification even at low read depths (~90%; Figure 1—figure supplement 3C) and is faster (~69 min) with less computational overhead (<3 GB of memory on a single core) than other highly cited circRNA algorithms we compared with (CIRCexplorer3 [Ma et al., 2019], CIRIquant [Zhang et al., 2020], and find_circ [Memczak et al., 2013]; Figure 1—figure supplement 3A and B).

We initially explored the expression relationships within our datasets using hierarchical clustering and Pearson’s correlations to determine the gene expression relationships between orthologous genes (see Materials and methods). In agreement with previous results (Brawand et al., 2011; Merkin et al., 2012; Barbosa-Morais et al., 2012; Reyes et al., 2013) from analysis across vertebrate species, a clear pattern emerged of tissue-specific conservation of gene expression (Figure 1B). This pattern suggests that most tissues possess a tissue-specific gene expression signature such that, for example, a liver-specific gene in chimp will likely also be liver-specific in lemur. In contrast to previous observations in vertebrates (Merkin et al., 2012), there are no clear species-specific exceptions to these patterns likely reflecting the closer evolutionary relationships studied.

To understand circRNA relationships between species, we performed an analogous pairwise clustering analysis using circRNA inclusion values. Replicates from the same tissue invariably clustered together. However, in contrast to gene expression, circRNA expression is segregated by species (Figure 1—figure supplement 1A). This suggests that despite all the exons studied being conserved across primates the majority of circRNAs showed species-specific expression with no orthologous circRNAs in other species (Figure 1C, ~67% are species-specific, n = 11,201). To evaluate the expression patterns of circRNA orthologs, we identified circRNAs with matched BSJs (see Materials and methods) conserved across ~45 million years of evolution. In this analysis, more complex patterns of circRNA conservation emerged with tissue-dominated clustering observed across all types of brain samples (Figure 1D) in line with previous observations (Rybak-Wolf et al., 2015; Venø et al., 2015; You et al., 2015). In contrast, for all other tissues circRNAs showed primarily species-specific clustering.

We next assessed if these changes may be explained by gene expression changes in the host gene. A comparison of genes containing conserved and species-specific circRNAs did not show any significant differences (Figure 2—figure supplement 4A and B, p=0.584 Wilcoxon rank-sum test), suggesting that differences between these subgroups are not driven by gene expression differences. We next evaluated if tissue-specific changes observed in the conserved circRNAs were due to tissue-specific gene expression or alternative splicing. Interestingly, genes containing conserved circRNAs neither displayed neural-specific gene expression (Figure 1—figure supplement 1B) or neural-specific alternative splicing changes (Figure 1—figure supplement 1C). This suggests that circRNA conservation and expression is independent of these regulatory layers.

We next investigated the genes containing circRNAs. Many orthologous genes consistently express circRNAs even if the precise BSJ is not conserved (Figure 1C). This phenomenon persisted across species with a median of 10 circRNAs detected per gene across tissues (Figure 1—figure supplement 1D). However, this circRNA production only occurred in a limited number of expressed genes (20.4% of orthologous expressed genes). This suggests that certain genomic areas are circRNA factories that are prone to produce large numbers of lowly expressed circRNAs.

These observations suggest that a core set of circRNAs show conserved tissue-specific patterns across neural tissues. However, the great prevalence of circRNAs showing species-specific expression indicates that the cis-regulatory or trans-regulatory environments may differ between even very closely related species to promote the species-specific production of circRNAs.

Features of conserved circRNAs

Our analysis (Figure 2A) reveals clear subsets of several hundred circRNAs exhibiting highly conserved circRNA expression. The circRNA ERC1 and many other examples from our data (Figure 2B, Supplementary file 2, and Figure 2—figure supplement 1A) demonstrate that circRNA expression can be conserved for tens of millions of years.

Figure 2 with 4 supplements see all

Download asset Open asset

Features of conserved circular RNAs (circRNAs).

(A) Schematic overview of identification of back-spliced junctions (BSJ) between species. (B) Percent spliced in (PSI) values for conserved circRNAs (top) CACNA1C_chr12:2504436–2512984 and (bottom) ERC1_chr12:1180540–1204512 across tissues and species analyzed. PSI values only calculated for circRNAs with more than five reads support. Gene name is indicated in top right-hand corner. (C) Violin plot describing relative expression levels of conserved and species-specific circRNAs. Violin plots show probability densities of the data with internal boxplot. Boxplot displays the interquartile range as a solid box, 1.5 times the interquartile range as vertical thin lines and the median as a horizontal line. p-Value calculated using Wilcoxon rank-sum test (p<0.187). TpM: transcripts per million. (D) Cumulative distribution plot of change in PSI values across all conserved (yellow) and species-specific (gray) circRNAs. A cumulative distribution plot describes the proportion of data (y-axis) less than or equal to a specified value (x-axis). Cumulative distribution F(x), cumulative distribution function. p-Value calculated using Wilcoxon rank-sum test (p<3.38 × 10^–74). (E) Cumulative distribution plots of circRNAs with different levels of conservation, as defined by consistent observation of BSJ across species indicated. See (D) for description of cumulative distribution plot. (F) Barplot describing number of exons per circRNA for conserved and species-specific circRNAs. Exons are defined by Ensembl and must show evidence of expression (PSI >5 and > 5 reads support) in tissue analyzed. (G) Barplot describing uniqueness of start (5′-splice site) and end (3′-splice site) for conserved and species-specific circRNAs. p-Values calculated from Fisher’s exact test (p<4.08 × 10^-64;; unique start and end – also see Figure 2—figure supplement 3).

To assess the phylogenetic distribution of circRNA across primates, we grouped them by PSI values requiring PSI ≥ 5 and at least five read support. Out of the approximately 56,000 internal exons with clear orthologs across primates, we identified a large set of circRNA expressing a ‘species-specific’ expression, as well as a set of ~773 ‘conserved circRNAs’ that shared expression across at least human, chimp, and baboon (Figure 2—figure supplement 1B and C). Using our transcriptomic data, we found that a circRNA identified in human was approximately five times more likely to be identified in baboon than in lemur, in line with the closer phylogenetic relationship of human to baboon than human to lemur.

To validate the quality of our identified circRNAs, we initially overlapped our data with circRNAs previously reported in circAtlas (Wu et al., 2020). This analysis found that 99.5% of the conserved circRNAs and 97.03% of species-specific circRNAs have been previously reported. Additionally, we verified our circRNAs dataset using RNase R data (see Materials and methods for details). This analysis of human data validated 82.7% of the conserved circRNAs (648 conserved circRNAs), despite these datasets not being from matched tissue samples (Figure 1—figure supplement 2A; see Materials and methods for details). To validate the conservation of our neuronal circRNAs, we next analyzed RNase R samples from different brain macaque regions. This analysis identified ~89% of the conserved circRNAs (324 conserved circRNAs;) (Figure 1—figure supplement 2F; see Materials and methods for details).

Initial analysis of conserved circRNAs revealed enrichment within neural tissues with over 70% showing consistent tissue expression across 30 million years of evolution (Supplementary file 2), in line with previous observations (Rybak-Wolf et al., 2015; Venø et al., 2015; You et al., 2015). Analysis of expression levels revealed no clear trends for increased expression of conserved circRNAs (Figure 2—figure supplement 2A, p<0.187, Wilcoxon rank-sum test vs. species-specific); however, these circRNAs did display increased inclusion rates or increased circRNA expression as compared to linear isoform (Figure 2—figure supplement 2B, p=3.38 × 10^⁻74, Wilcoxon rank-sum test vs. species-specific). Furthermore, this inclusion (or circularization) increased with the conservation age of the circRNA (Figure 2E, p=8.07 × 10^–19, Wilcoxon rank-sum test of hominoids vs. species-specific [human-specific]; p=2.14 × 10^–06, Wilcoxon rank-sum test of hominoids vs. shared until new-world monkeys). This suggests that over time these circRNAs are increasingly influencing the transcriptomic abundance of the linear isoform and the protein abundance of the gene.

Analysis of the exonic structure of conserved circRNAs showed that conserved circRNAs contain fewer exons (Figure 2F, Figure 2—figure supplement 4C, p = 2.23 × 10^–20, Wilcoxon rank-sum test) with a significant enrichment to contain 2–3 exons (p-value = 4.17 × 10^–08, Fisher’s exact test), which is in line with observations from previous studies (Ragan et al., 2019). Conserved circRNAs also rarely overlap with other circRNAs (Figure 2G, p=4.08 × 10^–64, Fisher’s exact test; see Materials and methods) displaying back-splicing at unique 5′- and 3′-splice sites. This indicates a tight control of the number of exons within a circRNA and the BSJs used.

Conserved circRNAs have extensive downstream introns and are flanked by inverted repeat elements

To investigate the role of cis-regulatory elements within conserved circRNAs, we analyzed almost 150 features associated with circRNA formation including a multitude of trans- and cis-regulatory factors and all major groups of transposons (see Materials and methods and Supplementary file 3). To evaluate the influence of these features on defining conserved circRNAs, we used two background datasets (see Supplementary file 2 and Materials and methods). The first is a background set of randomly combined alternative (10 < PSI < 90) exons extracted from genes containing conserved circRNAs (background set). The second is the group of ‘species-specific circRNAs’ defined previously.

Using logistic regression combined with a genetic algorithm for model selection taking into account multicollinearity (see Materials and methods), we initially sought to determine the relative contribution of this diverse range of features in defining conserved circRNAs. After initially training our model on a subset of conserved and background circRNAs (80%), we next assessed its performance on the rest of 20% cirRNAs and observed a high average true-positive rate of 86.7% (AUC, area under the receiver operating characteristic [ROC] curve; Figure 3—figure supplement 1A) for a model including 24 variables selected by feature analysis. This identifies a core set of 24 cis- and trans-regulatory features enriched within the conserved formation of circRNAs compared to our background set of introns (Figure 3A and B). This includes multiple features previously associated with conserved circRNAs, such as inverted repeat Alu elements (Jeck et al., 2013; Zhang et al., 2014), as well as exon and intron length (Ashwal-Fluss et al., 2014; Ivanov et al., 2015; Jeck et al., 2013; Liang et al., 2017).

Figure 3 with 1 supplement see all

Download asset Open asset

Characterization of cis- and trans-regulatory features of conserved circular RNAs (circRNAs).

(A) Barplot describing feature importance for logistic regression model of conserved circRNAs compared to background. Colors represent positive or negative influence. Transparency reflects log10(p-value of z-statistic). Error bars represent standard error. ‘_1’ is relative to first exon of circRNA and ‘_2’ is relative to final exon of circRNA. ss3: 3´-splice site; ss5: 5´-splice site; Alt3ss: alternative 3´-splice sites. Inverted repeats are repetitive elements on opposite strands in introns adjacent to circRNAs. See Supplementary file 3 for details of features. (B) Barplot describing feature importance for logistic regression model of conserved circRNAs compared to species-specific circRNAs. See (A) for plot interpretation and descriptions. (C) Cumulative distribution plots describing (left; p<1.39 × 10^–09) 5´-splice site strength at final exon of circRNAs and (right; p<1.37 × 10^–05) distribution of nucleosomes on intron downstream of circRNA. p-Values calculated by Wilcoxon rank-sum test and corrected for multitesting (Bonferroni). See Figure 2D for interpretation of cumulative distribution plot. (D) Pyramid plot showing the mean fraction of circRNAs with selected inverted repeat retrotransposon elements in adjacent introns.

We next used the same approach to determine features differentiating conserved and species-specific circRNAs. As expected, our model distinguished these categories less efficiently but was still able to achieve a true-positive rate of 65.4% (Figure 3—figure supplement 1B) driven by 12 features. Notable among these features was the depletion of nucleosomes in the downstream intron of the circRNA (Figure 3—figure supplement 1D, 1.57 × 10^–03, Bonferroni-corrected Wilcoxon rank-sum test [BH-Wilcox] vs. species-specific) and the presence of a more defined 3′-splice site at the final exon (p=2.04 × 10^–03, BH-Wilcox vs. species-specific). Introns adjacent to conserved circRNAs also exhibited a significant enrichment for repeat elements (Figure 3D, all p<1 × 10^–5, BH-Wilcox vs. species-specific) in particular inverted-repeat L1 and AluJ retrotransposons (:Figure 3D, L1: p<1.22 × 10^–23| AluJ: p<1.48 × 10^–18, BH-Wilcox). A further key distinguishing feature of interest was intron length. Conserved circRNAs exhibited shorter introns downstream of the first exon and an extended intron downstream of the final exon (Figure 4A and B). In species-specific circRNA, this adjacent downstream intron has a median length of 4624 nucleotides whilst in conserved circRNA the median is almost twice as long at 9923 nucleotides (Figure 4B, p<1.07 × 10^–35, BH-Wilcox). Finally, when comparing the major drivers of both models, we noticed over 90% (11/12) of features overlapped between the models. This suggests that conserved circRNAs are an extreme continuum of species-specific circRNAs. Therefore, understanding the processes contributing to circRNA conservation may also provide insight into the genesis of circRNAs across species.

Figure 4 with 1 supplement see all

Download asset Open asset

Conserved circular RNA (circRNA) downstream intron expanded during primate evolution.

(A) Scatterplot of downstream intron length for conserved and species-specific circRNAs. (B) Boxplot describing lengths of intron immediately downstream of circRNA for conserved and species-specific circRNAs (see Figure 2C for description of boxplots). p-Values calculated by Wilcoxon rank-sum test and corrected for multitesting (Bonferroni). nt: nucleotide (C) Cumulative distribution plot of change of length of orthologous downstream introns of conserved, species-specific and background circRNAs from lemur to human (see Figure 2D for description of cumulative distribution plots). p-Values calculated by Wilcoxon rank-sum test and corrected for multitesting (Bonferroni). (D) Cumulative distribution plot of length of novel repeat elements within the orthologous downstream introns of conserved, species-specific and background circRNAs from lemur to human (see Figure 2D for description of cumulative distribution plots). p-Values calculated by Wilcoxon rank-sum test and corrected for multitesting (Bonferroni). (E) Pyramid plot of the proportion of repeat elements inserted into the downstream introns of conserved, species-specific and background circRNAs from lemur to human. *p<0.05; **p<0.005, ***p<1 × 10^–5. p-Values calculated by Wilcoxon rank-sum test and corrected for multitesting (Bonferroni). (F) A schematic model of the results describing impact of our observations on circRNA formation. Boxes represent exons, straight lines are introns, repeat elements are red, arced lines represent back-spliced junction, and dashed lines represent RNA-RNA duplex.

Insertion of young transposons increases downstream intron length in conserved circRNAs

To investigate the evolutionary origins of the switch of conserved circRNAs from absence in prosimians and new-world monkeys to conservation within hominoids and old-world monkeys, we investigated the changes in intronic length for the orthologous introns between human (hominoids) and lemur (prosimians). In contrast to orthologous lemur introns, the human introns downstream of all identified circRNAs shows an almost fourfold expansion compared to background dataset of introns within circRNA containing genes (Figure 4C, p<3.84 × 10^–23, Wilcoxon rank-sum) and the upstream adjacent intron (Figure 4—figure supplement 1A, p<1.02 × 10^–10, Wilcoxon rank-sum). This difference is even greater in conserved circRNA, which display an almost twofold greater lengthening than species-specific circRNAs (or eightfold over background; Figure 4C, p<3.84 × 10^–06, Wilcoxon rank-sum). These observations suggest that the expansion of the intron downstream of the circRNA may increase the proportion of back-splicing events increasing the likelihood of circRNA conservation.

To investigate the drivers of this intronic expansion, we aligned the lemur and human introns to identify regions novel to humans. This analysis revealed the insertion of novel transposons at almost double the frequency in introns associated with conserved circRNAs (Figure 4D, p<5.48 × 10^–06, Wilcoxon rank-sum). Further evaluation of the retrotransposons revealed that this increase in length is driven by the novel insertion of AluJ and L1 elements (Figure 4E, AluJ: p<0.018; L1: p<1.73 × 10^–04, Wilcoxon rank-sum). This retrotransposition is potentially facilitated by the depletion of nucleosome occupancy in these introns compared to other human introns (Figure 3B, p<1.15 × 10^–07, BH-Wilcox). Together, this argues for the role of young transposons in creating longer intronic regions, which increases the time for RNA polymerase II to reach next canonical splice site and therefore increases likelihood of back-junction splicing to occur.

Discussion

The evolution of circRNAs has been previously studied across extensive evolutionary time revealing poor conservation for the majority of circRNAs (Rybak-Wolf et al., 2015; Venø et al., 2015). Our approach is unique as it focuses on the conservation of circRNAs in very closely related species, enabling us to account for the rapid evolution of these RNAs. This increased resolution allowed us to compare conserved versus non-conserved circRNAs, enabling us to reveal two disparate facts about circRNA expression. Firstly, we observe extensive variation in the production of the vast majority of circRNAs between species. With circRNAs often expressed within the same orthologous genes even if BSJ is not conserved. Conversely, we identify a core set of over 700 circRNAs that are conserved across millions of years of evolution. These circRNAs have higher inclusion rates and show increased inclusion across evolutionary age. Both groups are related in the cis- and trans-regulatory features that correlate with circRNA formation such as evidence of recent transposons insertion and extended adjacent introns (Ashwal-Fluss et al., 2014; Ivanov et al., 2015; Jeck et al., 2013; Liang et al., 2017). In line with previous work, the majority of circRNAs identified arose from the same gene locus (alternative circularization) (Burd et al., 2010; Jeck et al., 2013; Salzman et al., 2012; Zhang et al., 2014); however, we identify that this phenomenon is largely limited to species-specific circRNAs and disappears in the conserved group. Similarly, we identify that the adjacent introns of circRNAs are significantly longer with inverted Alu repeats (Ashwal-Fluss et al., 2014; Ivanov et al., 2015; Jeck et al., 2013; Liang et al., 2017); however, only in the conserved group do we observe a bias towards lengthening of the downstream adjacent intron with inverted L1 repeats dominating. Finally, in contrast to previous work, we do not identify that conserved circRNAs are more strongly expressed but instead that conserved circRNAs have greater relative expression compared to linear transcript with this ratio increasing with the evolutionary age of the circRNA. This decreased diversity of conserved circRNA production and increased relative expression is in line with data from linear splicing (Baek and Green, 2005; Barbosa-Morais et al., 2012; Gueroussov et al., 2017; Irimia et al., 2009; Merkin et al., 2012) and suggests circRNA selection is occurring. However, an important limitation of our approach is our usage of annotated splice sites, thus limiting our conclusions to exonic circRNAs from canonical splice sites.

A host of endogenous mechanisms dampen down the impact of the retrotransposons within gene bodies. For example, the formation of Alu exons is suppressed by the nuclear ribonucleoprotein HNRNPC (Zarnack et al., 2013) and the nuclear helicase DHX9 binds to inverted repeat Alu elements to suppress circRNA formation (Aktaş et al., 2017). Over time though, in selected examples, these inclusions can promote novel functionality (Attig et al., 2016; Attig et al., 2018; Avgan et al., 2019; Shen et al., 2011), enabling the creation of tissue-specific exons (Attig et al., 2018), miRNAs (Gu et al., 2009; Spengler et al., 2014), and promoter regions (Li et al., 2018a; Zhang et al., 2019). Our results suggest that circRNAs are undergoing a similar selection race with the recent insertion of multiple retrotransposons promoting increased circRNA production that in some cases stabilizes over time. It is important to note though that the production of a large number of circRNAs in itself can be functional (Liu et al., 2019). For example, in the immune system a wide diversity of circRNAs are produced to sequester-specific RNA-binding proteins. These proteins are released upon viral infection to inhibit translation of viral RNA (Liu et al., 2019). A major challenge for the field in the following years will arise from determining the contribution of noise versus function for each of these groups.

The investigation of mechanisms controlling circRNA production is a rapid and expanding field (Li et al., 2018b). Our results support a kinetic model (Schor et al., 2013) for circRNA function whereby trans-factors promote spliceosome recruitment to the final exon and the very long downstream introns extend the time window for back-splicing to occur. This is facilitated by inverted repeats increasing the proximity of 3′-splice site with the upstream 5′-splice site (see Figure 4G). The extension of the final intron therefore increases the likelihood of circRNA formation in time and space. Spatially by introducing new retrotransposons, which facilitates RNA-RNA duplex formation (Ivanov et al., 2015; Jeck et al., 2013; Li et al., 2017; Liang and Wilusz, 2014) to orientate the splice sites in close proximity and temporary by increasing the intron length, it expands the time window for such an event to occur (Veloso et al., 2014), which acts independent of the rate of RNA polymerase II across the gene body (Zhang et al., 2016). This model conforms with the previous observations of enrichment of inverted repeat Alu elements and of long introns surrounding circRNAs (Ashwal-Fluss et al., 2014; Dong et al., 2017; Ivanov et al., 2015; Jeck et al., 2013; Liang and Wilusz, 2014; Rybak-Wolf et al., 2015; Zhang et al., 2014).

The conservation of circRNAs we observe could therefore just be a result of increasing the probability for such an event to occur rather than evidence of functionality. However, circRNAs represent an extreme example of a trend in post-transcriptional regulation whereby low leaky expression creates a pool of possible novel substrates (Avgan et al., 2019; Barbosa-Morais et al., 2012; Fiszbein et al., 2019; Mattick, 2018; Merkin et al., 2012; Reyes et al., 2013), increasing the likelihood for unique functionality to arise (Gueroussov et al., 2017; Guo et al., 2020). For circRNAs, this can be aided by single-nucleotide changes that enable trans-acting factors, such as Quaking, ADAR, or NF90/110, to facilitate circRNA formation (Conn et al., 2015; Ivanov et al., 2015; Li et al., 2017).

In conclusion, our evolutionary analysis identifies that the noisy production of circRNAs is driven by the insertion of novel transposons in adjacent downstream introns that can over time stabilize to produce conserved circRNAs. This provides a pool of evolutionary potential that could contribute to the evolutionary rewiring of the cell.

Share this article

Cite this article

Circular RNA (circRNA) expression signatures are conserved in some tissues.

Features of conserved circular RNAs (circRNAs).

Characterization of cis- and trans-regulatory features of conserved circular RNAs (circRNAs).

Conserved circular RNA (circRNA) downstream intron expanded during primate evolution.

Author details

Gabriela Santos-Rodriguez

Contribution

Competing interests

Irina Voineagu

Contribution

Competing interests

Robert J Weatheritt

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organisms

Further reading